From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp5.mail.ru (smtp5.mail.ru [94.100.179.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 311E64696C3 for ; Wed, 8 Apr 2020 12:18:46 +0300 (MSK) Date: Wed, 8 Apr 2020 12:18:44 +0300 From: Sergey Ostanevich Message-ID: <20200408091844.GC18283@tarantool.org> References: <20200403210836.GB18283@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aleksandr Lyapunov Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for review! Latest version is availabe at https://github.com/tarantool/tarantool/blob/sergos/quorum-based-synchro/doc/rfc/quorum-based-synchro.md > > The quorum should be collected as a table for a list of transactions > > waiting for quorum. The latest transaction that collects the quorum is > > considered as complete, as well as all transactions prior to it, since > > all transactions should be applied in order. Leader writes a 'quorum' > > message to the WAL and it is delivered to Replicas. > I think we should cal the the message something like 'confirm' > (not 'quorum'), and mention here that it has its own LSN. I believe it was clear from the mention that it goes to WAL. Updated. The quorum should be collected as a table for a list of transactions waiting for quorum. The latest transaction that collects the quorum is considered as complete, as well as all transactions prior to it, since all transactions should be applied in order. Leader writes a 'confirm' message to the WAL that refers to the transaction's LSN and it has its own LSN. This confirm message is delivered to all replicas through the existing replication mechanism. > Besides, it's very similar to phase two of two-phase-commit, > we'll need it later. We already discussed this, similarity is ended as soon as one quorum means confirmation of the whole bunch of transactions before it, not the one. > > Replica should report a positive or a negative result of the TXN to the > > Leader via the IPROTO explicitly to allow Leader to collect the quorum > > or anti-quorum for the TXN. In case negative result for the TXN received > > from minor number of Replicas, then Leader has to send an error message > > to each Replica, which in turn has to disconnect from the replication > > the same way as it is done now in case of conflict. > I'm sure that unconfirmed transactions must not be visible both > on master and on replica since the could be aborted. > We need read-committed. So far I don't envision any problems with read-committed after we enable transaction manager similar to vinyl. From the standpoint of replication the rollback message will cancel all transactions that are later than confirmed one. No matter if they are visible or not. > > ### Snapshot generation. > > We also can reuse current machinery of snapshot generation. Upon > > receiving a request to create a snapshot an instance should request a > > readview for the current commit operation. Although start of the > > snapshot generation should be postponed until this commit operation > > receives its quorum. In case operation is rolled back, the snapshot > > generation should be aborted and restarted using current transaction > > after rollback is complete. > There is no guarantee that the replica will ever receive 'confirm' > ('quorum') message, for example when the master is dead forever. > That means that in some cases we are unable to make a snapshot.. > But if we make unconfirmed transactions invisible, the current > read view will give us exactly what we need, but I have no idea > how to handle WAL rotation ('restart') in this case. Updated. In case master appears unavailable a replica still have to be able to create a snapshot. Replica can perform rollback for all transactions that are not confirmed and claim its LSN as the latest confirmed txn. Then it can create a snapshot in a regular way and start with blank xlog file. All rolled back transactions will appear through the regular replication in case master reappears later on. > > After snapshot is created the WAL should start from the first operation > > that follows the commit operation snapshot is generated for. That means > > WAL will contain a quorum message that refers to a transaction that is > > not present in the WAL. Apparently, we have to allow this for the case > > quorum refers to a transaction with LSN less than the first entry in the > > WAL and only once. > Not 'only once', there could be several unconfirmed transactions > and thus several 'confirm' messages. Updated. After snapshot is created the WAL should start from the first operation that follows the commit operation snapshot is generated for. That means WAL will contain 'confirm' messages that refer to transactions that are not present in the WAL. Apparently, we have to allow this for the case 'confirm' refers to a transaction with LSN less than the first entry in the WAL.