[Tarantool-patches] [RFC] Quorum-based synchronous replication

Sergey Ostanevich sergos at tarantool.org
Tue May 12 19:40:48 MSK 2020


On 08 мая 02:01, Konstantin Osipov wrote:
> 
> 
> > ### Synchronous replication enabling.
> > 
> > Synchronous operation can be required for a set of spaces in the data
> > scheme. That means only transactions that contain data modification for
> > these spaces should require quorum. Such transactions named synchronous.
> > As soon as last operation of synchronous transaction appeared in leader's
> > WAL, it will cause all following transactions - no matter if they are
> > synchronous or not - wait for the quorum. In case quorum is not achieved
> > the 'rollback' operation will cause rollback of all transactions after
> > the synchronous one. It will ensure the consistent state of the data both
> > on leader and replicas. In case user doesn't require synchronous operation
> > for any space then no changes to the WAL generation and replication will
> > appear.
> 
> 1) It's unclear what happens here if async tx follows a sync tx.
>    Does it wait for the sync tx? This reduces availability for

Definitely yes, unless we keep the 'dirty read' as it is at the moment
in memtx. This is the essence of the design, and it is temporary until 
the MVCC similar to the vinyl machinery appears. I intentionally didn't
include this big task into this RFC. 

It will provide similar capabilities, although it will keep only
dependent transactions in the undo log. Also, it looks like it will fit
well into the machinery of this RFC. 

>    async txs - so it's hardly acceptable. Besides, with
>    group=local spaces, one can quickly run out of memory for undo.
>    
> 
> Then it should be allowed to proceed and commit.
> 
> Then mixing sync and async tables in a single transaction
> shouldn't be allowed.
> 
> Imagine t1 is sync and t2 is async. tx1 changes t1 and t2, tx2
> changes t2. tx1 is not confirmed and must be rolled back. But it can
> not revert changes of tx2.
> 
> The spec should clarify that.
> 
> 2) First candidates to "sync" spaces are system spaces, especially
>    _schema (to fix box.once()) and _cluster (to fix parallel join
>    of multiple replicas).
> 
> I can't imagine it's possible to make system spaces synchronous
> with an external coordinator - the coordinator may not be
> available during box.cfg{}.

May not be - means no coordination, means the server can't start.
Again, we're not trying to elaborate the self-driven cluster at this
moment, we rely on external coonrdination.
> 
> 3) One can quickly run out of memory for undo. Any sync
>    transaction should be capped with a timeout to avoid OOMs. I
>    don't know how many times I should repeat it. The only good
>    solution for load control is in-memory WAL, which will allow to
>    rollback all transactions as soon as network partitioning is
>    detected.

How in-memry WAL can help save on _undo_ memory? 
To rollback whatever amount of transactions one need to store the undo. 

> 
> -- 
> Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list