From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E96B84696C3 for ; Fri, 17 Apr 2020 16:45:19 +0300 (MSK) Date: Fri, 17 Apr 2020 16:45:18 +0300 From: Sergey Ostanevich Message-ID: <20200417134518.GA15133@tarantool.org> References: <20200403210836.GB18283@tarantool.org> <20200414125848.GA1249@pony.bronevichok.ru> <20200414144308.GC1734@tarantool.org> <6105DEEE-7332-4A14-B56E-A89D4B107D56@tarantool.org> <4CD8ED86-38D9-4564-89EF-935AF74F29A6@tarantool.org> <20200417101017.GA17411@atlas> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200417101017.GA17411@atlas> Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , =?utf-8?B?0J3QuNC60L7Qu9Cw0Lkg0JrQsNGA0LvQvtCy?= , =?utf-8?B?0KLQuNC80YPRgCDQodCw0YTQuNC9?= , Mons Anderson , Aleksandr Lyapunov , Sergey Bronnikov , tarantool-patches@dev.tarantool.org Hi, thanks for review! On 17 апр 13:10, Konstantin Osipov wrote: > * sergos@tarantool.org [20/04/15 17:51]: > > ### Quorum commit > > This part looks correct. It only describes two paths out of many > though: > - leader is able to collect the majority > - leader is not able to collect the majority > > What happens when a leader receives a message for a round which is > complete? It just ignores it, the reason - see next comment. > How does a replica which missed a round catch up? > What happens if replica fails to apply txn 1 (e.g. because of a > duplciate key), but confirms txn 2? This should never happen, since each replica applies txns in strict order, means failure of txn 1 will happen before the confirmation of txn 2. As soon as replica fails to apply a txn it should report an error, disconnect and roll back all txns in it's pipeline. After that the replica will ne in a consistent state with Leader's lsn before the txn 1. > > What happens if txn1 gets no majority at the leader, but txn 2 > gets a majority? How are the followers rolled back? This situation means that some of ACKs from replicas didn't arrive. Which doesn't mean they failed to apply txn 1. Althoug, success of txn 2 means the txn 1 was also applied - hence, receiveing a txn N ACK from a replica means ACK for each txn M: M < N. > > In case of a leader failure a replica with the biggest LSN with former > > leader's ID is elected as a new leader. > > As long as multi-master is not banned, there may be multiple > leaders. Does this proposal suggest multi-master is banned? Then > it should describe the implementation of this, and in absense of > transparent query forwarding it will break all clients. > It was mentioned at the top of RFC: > What this RFC is not: > > - high availability (HA) solution with automated failover, roles > assignments an so on > - master-master configuration support Which I tend to describe as 'do not recommend'. Similar to what we have in documentation about the cascading replication configuration. Although, I heard from some users that they successfuly use such config. Regards, Sergos