[Tarantool-patches] [RFC] Quorum-based synchronous replication

Sergey Ostanevich sergos at tarantool.org
Wed Apr 8 12:18:44 MSK 2020


Hi!
Thanks for review!

Latest version is availabe at
https://github.com/tarantool/tarantool/blob/sergos/quorum-based-synchro/doc/rfc/quorum-based-synchro.md

> > The quorum should be collected as a table for a list of transactions
> > waiting for quorum. The latest transaction that collects the quorum is
> > considered as complete, as well as all transactions prior to it, since
> > all transactions should be applied in order. Leader writes a 'quorum'
> > message to the WAL and it is delivered to Replicas.
> I think we should cal the the message something like 'confirm'
> (not 'quorum'), and mention here that it has its own LSN.

I believe it was clear from the mention that it goes to WAL. Updated.

The quorum should be collected as a table for a list of transactions
waiting for quorum. The latest transaction that collects the quorum is
considered as complete, as well as all transactions prior to it, since
all transactions should be applied in order. Leader writes a 'confirm'
message to the WAL that refers to the transaction's LSN and it has its
own LSN. This confirm message is delivered to all replicas through the 
existing replication mechanism.

> Besides, it's very similar to phase two of two-phase-commit,
> we'll need it later.

We already discussed this, similarity is ended as soon as one quorum
means confirmation of the whole bunch of transactions before it, not the
one. 

> > Replica should report a positive or a negative result of the TXN to the
> > Leader via the IPROTO explicitly to allow Leader to collect the quorum
> > or anti-quorum for the TXN. In case negative result for the TXN received
> > from minor number of Replicas, then Leader has to send an error message
> > to each Replica, which in turn has to disconnect from the replication
> > the same way as it is done now in case of conflict.
> I'm sure that unconfirmed transactions must not be visible both
> on master and on replica since the could be aborted.
> We need read-committed.

So far I don't envision any problems with read-committed after we enable
transaction manager similar to vinyl. From the standpoint of replication
the rollback message will cancel all transactions that are later than
confirmed one. No matter if they are visible or not.

> > ### Snapshot generation.
> > We also can reuse current machinery of snapshot generation. Upon
> > receiving a request to create a snapshot an instance should request a
> > readview for the current commit operation. Although start of the
> > snapshot generation should be postponed until this commit operation
> > receives its quorum. In case operation is rolled back, the snapshot
> > generation should be aborted and restarted using current transaction
> > after rollback is complete.
> There is no guarantee that the replica will ever receive 'confirm'
> ('quorum') message, for example when the master is dead forever.
> That means that in some cases we are unable to make a snapshot..
> But if we make unconfirmed transactions invisible, the current
> read view will give us exactly what we need, but I have no idea
> how to handle WAL rotation ('restart') in this case.

Updated.

In case master appears unavailable a replica still have to be able to
create a snapshot. Replica can perform rollback for all transactions that
are not confirmed and claim its LSN as the latest confirmed txn. Then it
can create a snapshot in a regular way and start with blank xlog file.
All rolled back transactions will appear through the regular replication
in case master reappears later on.

> > After snapshot is created the WAL should start from the first operation
> > that follows the commit operation snapshot is generated for. That means
> > WAL will contain a quorum message that refers to a transaction that is
> > not present in the WAL. Apparently, we have to allow this for the case
> > quorum refers to a transaction with LSN less than the first entry in the
> > WAL and only once.
> Not 'only once', there could be several unconfirmed transactions
> and thus several 'confirm' messages.

Updated.

After snapshot is created the WAL should start from the first operation
that follows the commit operation snapshot is generated for. That means
WAL will contain 'confirm' messages that refer to transactions that are
not present in the WAL. Apparently, we have to allow this for the case
'confirm' refers to a transaction with LSN less than the first entry in
the WAL.



More information about the Tarantool-patches mailing list