From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp45.i.mail.ru (smtp45.i.mail.ru [94.100.177.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 74B414696C3 for ; Wed, 22 Apr 2020 23:28:05 +0300 (MSK) References: <20200403210836.GB18283@tarantool.org> <20200421104918.GA112@tarantool.org> <20200422165029.GB112@tarantool.org> From: Vladislav Shpilevoy Message-ID: <87b94067-69c6-9c80-be83-8a4a029d52a8@tarantool.org> Date: Wed, 22 Apr 2020 22:28:03 +0200 MIME-Version: 1.0 In-Reply-To: <20200422165029.GB112@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [RFC] Quorum-based synchronous replication List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich Cc: tarantool-patches@dev.tarantool.org >>>> Basically, it would be nice to see the split-brain problem description here, >>>> and its solution for us. >>>> >>> I believe the split-brain is under orchestrator control either - we >>> should provide API to switch leader in the cluster, so that when a >>> former leader came back it will not get quorum for any txn it has, >>> replying to customers with failure as a result. >> >> Exactly. We should provide something for this from inside. But are there >> any details? How should that work? Should all the healthy replicas reject >> everything from the false-leader? Should the false-leader somehow realize, >> that it is not considered a leader anymore, and should stop itself? If we >> choose the former way, how a replica defines who is the true leader? For >> example, some replicas still may consider the old leader as a true master. >> If we choose the latter way, what is the algorithm of determining that we >> are not a leader anymore? >> > It is all about external orchestration - if replica can't get ping from > leader it stops, reporting its status to orchestrator. > If leader lost number of replicas that makes quorum impossible - it > stops replication, reporting to the orchestrator. > Will it be sufficient to cover the question? Perhaps.