Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Ostanevich <sergos@tarantool.org>
To: Aleksandr Lyapunov <alyapunov@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication
Date: Wed, 8 Apr 2020 12:18:44 +0300	[thread overview]
Message-ID: <20200408091844.GC18283@tarantool.org> (raw)
In-Reply-To: <b05475aa-754a-911c-94d9-1e87aa9e80be@tarantool.org>

Hi!
Thanks for review!

Latest version is availabe at
https://github.com/tarantool/tarantool/blob/sergos/quorum-based-synchro/doc/rfc/quorum-based-synchro.md

> > The quorum should be collected as a table for a list of transactions
> > waiting for quorum. The latest transaction that collects the quorum is
> > considered as complete, as well as all transactions prior to it, since
> > all transactions should be applied in order. Leader writes a 'quorum'
> > message to the WAL and it is delivered to Replicas.
> I think we should cal the the message something like 'confirm'
> (not 'quorum'), and mention here that it has its own LSN.

I believe it was clear from the mention that it goes to WAL. Updated.

The quorum should be collected as a table for a list of transactions
waiting for quorum. The latest transaction that collects the quorum is
considered as complete, as well as all transactions prior to it, since
all transactions should be applied in order. Leader writes a 'confirm'
message to the WAL that refers to the transaction's LSN and it has its
own LSN. This confirm message is delivered to all replicas through the 
existing replication mechanism.

> Besides, it's very similar to phase two of two-phase-commit,
> we'll need it later.

We already discussed this, similarity is ended as soon as one quorum
means confirmation of the whole bunch of transactions before it, not the
one. 

> > Replica should report a positive or a negative result of the TXN to the
> > Leader via the IPROTO explicitly to allow Leader to collect the quorum
> > or anti-quorum for the TXN. In case negative result for the TXN received
> > from minor number of Replicas, then Leader has to send an error message
> > to each Replica, which in turn has to disconnect from the replication
> > the same way as it is done now in case of conflict.
> I'm sure that unconfirmed transactions must not be visible both
> on master and on replica since the could be aborted.
> We need read-committed.

So far I don't envision any problems with read-committed after we enable
transaction manager similar to vinyl. From the standpoint of replication
the rollback message will cancel all transactions that are later than
confirmed one. No matter if they are visible or not.

> > ### Snapshot generation.
> > We also can reuse current machinery of snapshot generation. Upon
> > receiving a request to create a snapshot an instance should request a
> > readview for the current commit operation. Although start of the
> > snapshot generation should be postponed until this commit operation
> > receives its quorum. In case operation is rolled back, the snapshot
> > generation should be aborted and restarted using current transaction
> > after rollback is complete.
> There is no guarantee that the replica will ever receive 'confirm'
> ('quorum') message, for example when the master is dead forever.
> That means that in some cases we are unable to make a snapshot..
> But if we make unconfirmed transactions invisible, the current
> read view will give us exactly what we need, but I have no idea
> how to handle WAL rotation ('restart') in this case.

Updated.

In case master appears unavailable a replica still have to be able to
create a snapshot. Replica can perform rollback for all transactions that
are not confirmed and claim its LSN as the latest confirmed txn. Then it
can create a snapshot in a regular way and start with blank xlog file.
All rolled back transactions will appear through the regular replication
in case master reappears later on.

> > After snapshot is created the WAL should start from the first operation
> > that follows the commit operation snapshot is generated for. That means
> > WAL will contain a quorum message that refers to a transaction that is
> > not present in the WAL. Apparently, we have to allow this for the case
> > quorum refers to a transaction with LSN less than the first entry in the
> > WAL and only once.
> Not 'only once', there could be several unconfirmed transactions
> and thus several 'confirm' messages.

Updated.

After snapshot is created the WAL should start from the first operation
that follows the commit operation snapshot is generated for. That means
WAL will contain 'confirm' messages that refer to transactions that are
not present in the WAL. Apparently, we have to allow this for the case
'confirm' refers to a transaction with LSN less than the first entry in
the WAL.

  reply	other threads:[~2020-04-08  9:18 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 21:08 Sergey Ostanevich
2020-04-07 13:02 ` Aleksandr Lyapunov
2020-04-08  9:18   ` Sergey Ostanevich [this message]
2020-04-08 14:05     ` Konstantin Osipov
2020-04-08 15:06       ` Sergey Ostanevich
2020-04-14 12:58 ` Sergey Bronnikov
2020-04-14 14:43   ` Sergey Ostanevich
2020-04-15 11:09     ` sergos
2020-04-15 14:50       ` sergos
2020-04-16  7:13         ` Aleksandr Lyapunov
2020-04-17 10:10         ` Konstantin Osipov
2020-04-17 13:45           ` Sergey Ostanevich
2020-04-20 11:20         ` Serge Petrenko
2020-04-20 23:32 ` Vladislav Shpilevoy
2020-04-21 10:49   ` Sergey Ostanevich
2020-04-21 22:17     ` Vladislav Shpilevoy
2020-04-22 16:50       ` Sergey Ostanevich
2020-04-22 20:28         ` Vladislav Shpilevoy
2020-04-23  6:58       ` Konstantin Osipov
2020-04-23  9:14         ` Konstantin Osipov
2020-04-23 11:27           ` Sergey Ostanevich
2020-04-23 11:43             ` Konstantin Osipov
2020-04-23 15:11               ` Sergey Ostanevich
2020-04-23 20:39                 ` Konstantin Osipov
2020-04-23 21:38 ` Vladislav Shpilevoy
2020-04-23 22:28   ` Konstantin Osipov
2020-04-30 14:50   ` Sergey Ostanevich
2020-05-06  8:52     ` Konstantin Osipov
2020-05-06 16:39       ` Sergey Ostanevich
2020-05-06 18:44         ` Konstantin Osipov
2020-05-12 15:55           ` Sergey Ostanevich
2020-05-12 16:42             ` Konstantin Osipov
2020-05-13 21:39             ` Vladislav Shpilevoy
2020-05-13 23:54               ` Konstantin Osipov
2020-05-14 20:38               ` Sergey Ostanevich
2020-05-20 20:59                 ` Sergey Ostanevich
2020-05-25 23:41                   ` Vladislav Shpilevoy
2020-05-27 21:17                     ` Sergey Ostanevich
2020-06-09 16:19                       ` Sergey Ostanevich
2020-06-11 15:17                         ` Vladislav Shpilevoy
2020-06-12 20:31                           ` Sergey Ostanevich
2020-05-13 21:36         ` Vladislav Shpilevoy
2020-05-13 23:45           ` Konstantin Osipov
2020-05-06 18:55     ` Konstantin Osipov
2020-05-06 19:10       ` Konstantin Osipov
2020-05-12 16:03         ` Sergey Ostanevich
2020-05-13 21:42       ` Vladislav Shpilevoy
2020-05-14  0:05         ` Konstantin Osipov
2020-05-07 23:01     ` Konstantin Osipov
2020-05-12 16:40       ` Sergey Ostanevich
2020-05-12 17:47         ` Konstantin Osipov
2020-05-13 21:34           ` Vladislav Shpilevoy
2020-05-13 23:31             ` Konstantin Osipov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200408091844.GC18283@tarantool.org \
    --to=sergos@tarantool.org \
    --cc=alyapunov@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox