From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp59.i.mail.ru (smtp59.i.mail.ru [217.69.128.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 29B77469710 for ; Thu, 14 May 2020 00:42:39 +0300 (MSK) References: <20200403210836.GB18283@tarantool.org> <20200430145033.GF112@tarantool.org> <20200506185559.GA2749@atlas> From: Vladislav Shpilevoy Message-ID: Date: Wed, 13 May 2020 23:42:37 +0200 MIME-Version: 1.0 In-Reply-To: <20200506185559.GA2749@atlas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Konstantin Osipov , Sergey Ostanevich , tarantool-patches@dev.tarantool.org Thanks for the discussion! On 06/05/2020 20:55, Konstantin Osipov wrote: > * Sergey Ostanevich [20/04/30 17:51]: > > A few more issues: > > - the spec assumes there is a full mesh. In any other > topology electing a leader based on the longest wal can easily > deadlock. Yet it provides no protection against non-full-mesh > setups. Currently the server can't even detect that this is not > a full-mesh setup, so can't check if the precondition for this > to work correctly is met. Yes, this is a very unstable construction. But we failed to come up with a solution right now, which would protect against accidental non-fullmesh. For example, how will it work, when I add a new node? If non-fullmesh is forbidden, the new node just can't be added ever, because this can't be done on all nodes simultaneously. > - the spec assumes that quorum is identical to the > number of replicas, and the number of replicas is stable across > cluster life time. Can I have quorum=2 while the number of > replicas is 4? Am I allowed to increase the number of replicas > online? What happens when a replica is added, > how exactly and starting from which transaction is the leader > required to collect a bigger quorum? Quorum <= number of replicas. It is a parameter, just like replication_connect_quorum. I think you are allowed to add new replicas. When a replica is added, it goes through the normal join process. > - the same goes for removing a replica. How is the quorum reduced? Node is just removed, I guess. If total number of nodes becomes less than quorum, obviously no transactions will be served. However what to do with the existing pending transactions, which already accounted the removed replica in their quorums? Should they be decremented? All what I am talking here are guesses. Which should be clarified in the RFC in the ideal world, of course. Tbh, we discussed the sync replication for may hours in voice, and this is a surprise, that all of them fit into such a small update of the RFC. Even though it didn't fit. Since we obviously still didn't clarify many things. Especially exact API look.