From: Sergey Ostanevich <sergos@tarantool.org> To: Aleksandr Lyapunov <alyapunov@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication Date: Wed, 8 Apr 2020 12:18:44 +0300 [thread overview] Message-ID: <20200408091844.GC18283@tarantool.org> (raw) In-Reply-To: <b05475aa-754a-911c-94d9-1e87aa9e80be@tarantool.org> Hi! Thanks for review! Latest version is availabe at https://github.com/tarantool/tarantool/blob/sergos/quorum-based-synchro/doc/rfc/quorum-based-synchro.md > > The quorum should be collected as a table for a list of transactions > > waiting for quorum. The latest transaction that collects the quorum is > > considered as complete, as well as all transactions prior to it, since > > all transactions should be applied in order. Leader writes a 'quorum' > > message to the WAL and it is delivered to Replicas. > I think we should cal the the message something like 'confirm' > (not 'quorum'), and mention here that it has its own LSN. I believe it was clear from the mention that it goes to WAL. Updated. The quorum should be collected as a table for a list of transactions waiting for quorum. The latest transaction that collects the quorum is considered as complete, as well as all transactions prior to it, since all transactions should be applied in order. Leader writes a 'confirm' message to the WAL that refers to the transaction's LSN and it has its own LSN. This confirm message is delivered to all replicas through the existing replication mechanism. > Besides, it's very similar to phase two of two-phase-commit, > we'll need it later. We already discussed this, similarity is ended as soon as one quorum means confirmation of the whole bunch of transactions before it, not the one. > > Replica should report a positive or a negative result of the TXN to the > > Leader via the IPROTO explicitly to allow Leader to collect the quorum > > or anti-quorum for the TXN. In case negative result for the TXN received > > from minor number of Replicas, then Leader has to send an error message > > to each Replica, which in turn has to disconnect from the replication > > the same way as it is done now in case of conflict. > I'm sure that unconfirmed transactions must not be visible both > on master and on replica since the could be aborted. > We need read-committed. So far I don't envision any problems with read-committed after we enable transaction manager similar to vinyl. From the standpoint of replication the rollback message will cancel all transactions that are later than confirmed one. No matter if they are visible or not. > > ### Snapshot generation. > > We also can reuse current machinery of snapshot generation. Upon > > receiving a request to create a snapshot an instance should request a > > readview for the current commit operation. Although start of the > > snapshot generation should be postponed until this commit operation > > receives its quorum. In case operation is rolled back, the snapshot > > generation should be aborted and restarted using current transaction > > after rollback is complete. > There is no guarantee that the replica will ever receive 'confirm' > ('quorum') message, for example when the master is dead forever. > That means that in some cases we are unable to make a snapshot.. > But if we make unconfirmed transactions invisible, the current > read view will give us exactly what we need, but I have no idea > how to handle WAL rotation ('restart') in this case. Updated. In case master appears unavailable a replica still have to be able to create a snapshot. Replica can perform rollback for all transactions that are not confirmed and claim its LSN as the latest confirmed txn. Then it can create a snapshot in a regular way and start with blank xlog file. All rolled back transactions will appear through the regular replication in case master reappears later on. > > After snapshot is created the WAL should start from the first operation > > that follows the commit operation snapshot is generated for. That means > > WAL will contain a quorum message that refers to a transaction that is > > not present in the WAL. Apparently, we have to allow this for the case > > quorum refers to a transaction with LSN less than the first entry in the > > WAL and only once. > Not 'only once', there could be several unconfirmed transactions > and thus several 'confirm' messages. Updated. After snapshot is created the WAL should start from the first operation that follows the commit operation snapshot is generated for. That means WAL will contain 'confirm' messages that refer to transactions that are not present in the WAL. Apparently, we have to allow this for the case 'confirm' refers to a transaction with LSN less than the first entry in the WAL.
next prev parent reply other threads:[~2020-04-08 9:18 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-03 21:08 Sergey Ostanevich 2020-04-07 13:02 ` Aleksandr Lyapunov 2020-04-08 9:18 ` Sergey Ostanevich [this message] 2020-04-08 14:05 ` Konstantin Osipov 2020-04-08 15:06 ` Sergey Ostanevich 2020-04-14 12:58 ` Sergey Bronnikov 2020-04-14 14:43 ` Sergey Ostanevich 2020-04-15 11:09 ` sergos 2020-04-15 14:50 ` sergos 2020-04-16 7:13 ` Aleksandr Lyapunov 2020-04-17 10:10 ` Konstantin Osipov 2020-04-17 13:45 ` Sergey Ostanevich 2020-04-20 11:20 ` Serge Petrenko 2020-04-20 23:32 ` Vladislav Shpilevoy 2020-04-21 10:49 ` Sergey Ostanevich 2020-04-21 22:17 ` Vladislav Shpilevoy 2020-04-22 16:50 ` Sergey Ostanevich 2020-04-22 20:28 ` Vladislav Shpilevoy 2020-04-23 6:58 ` Konstantin Osipov 2020-04-23 9:14 ` Konstantin Osipov 2020-04-23 11:27 ` Sergey Ostanevich 2020-04-23 11:43 ` Konstantin Osipov 2020-04-23 15:11 ` Sergey Ostanevich 2020-04-23 20:39 ` Konstantin Osipov 2020-04-23 21:38 ` Vladislav Shpilevoy 2020-04-23 22:28 ` Konstantin Osipov 2020-04-30 14:50 ` Sergey Ostanevich 2020-05-06 8:52 ` Konstantin Osipov 2020-05-06 16:39 ` Sergey Ostanevich 2020-05-06 18:44 ` Konstantin Osipov 2020-05-12 15:55 ` Sergey Ostanevich 2020-05-12 16:42 ` Konstantin Osipov 2020-05-13 21:39 ` Vladislav Shpilevoy 2020-05-13 23:54 ` Konstantin Osipov 2020-05-14 20:38 ` Sergey Ostanevich 2020-05-20 20:59 ` Sergey Ostanevich 2020-05-25 23:41 ` Vladislav Shpilevoy 2020-05-27 21:17 ` Sergey Ostanevich 2020-06-09 16:19 ` Sergey Ostanevich 2020-06-11 15:17 ` Vladislav Shpilevoy 2020-06-12 20:31 ` Sergey Ostanevich 2020-05-13 21:36 ` Vladislav Shpilevoy 2020-05-13 23:45 ` Konstantin Osipov 2020-05-06 18:55 ` Konstantin Osipov 2020-05-06 19:10 ` Konstantin Osipov 2020-05-12 16:03 ` Sergey Ostanevich 2020-05-13 21:42 ` Vladislav Shpilevoy 2020-05-14 0:05 ` Konstantin Osipov 2020-05-07 23:01 ` Konstantin Osipov 2020-05-12 16:40 ` Sergey Ostanevich 2020-05-12 17:47 ` Konstantin Osipov 2020-05-13 21:34 ` Vladislav Shpilevoy 2020-05-13 23:31 ` Konstantin Osipov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200408091844.GC18283@tarantool.org \ --to=sergos@tarantool.org \ --cc=alyapunov@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox