From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 066E76EC55; Wed, 21 Jul 2021 02:21:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 066E76EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1626823261; bh=lqi5PgMxnhsQbtPoDMMPym5ak5DOUKkf0MeVknAdf4o=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=X4i+7CWkW+wCfqsCrWCMpI+bFNg6lghKkZ5hZKiXoMHN2hM4ynWvoonGlKaCpxjX3 GZMTji61hWVqOnQukP22IGd6Gdg+rG7KZYj0VhChKYjq22rOIUJi9jhaVZ6iZvoMw0 ri56o3hX+RyorQao01BKAuMDUsfOv4VTRuZicsUs= Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D22C46EC55 for ; Wed, 21 Jul 2021 02:20:59 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org D22C46EC55 Received: by mail-lf1-f42.google.com with SMTP id q16so446980lfa.5 for ; Tue, 20 Jul 2021 16:20:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=QWIieVzke9ZFRP6ZUBnMfnBQX+kFitKAXNmL/AxWW1M=; b=FPpykfroFgMj1oIn95VmAjo3OYVTvNwqqZ3vqsUs/IgXqHWSZWVX/p7Y0KKCjCNMl3 jK+OpjaF3EQHdQMcCgQi7t5MYCnN5NYDgY9fBqZJBSwavOm+FdZf9UYUkr6VK5vCjBEw H8957Hq7I4CsPc4+NzBRyK5weF0IWe+nZXsdfKAUbDmUgU7klxhE8erIfPJX02BRBE2O hY2VfxX5pXWRl5zpsuYku7OtA9jnZ8bQTXyVNHKcGelROFdciGmJz7qwc8g5DKVlwZkE HfJqjrMAupak1qDgbtCSolIxHHgWCKXCHVNwuadShWVgp3fky3QCkvlu+ZglYXyU6eCK DCIA== X-Gm-Message-State: AOAM531lCbMHKTyKt6fBr/yMdazooEawweozSHP8XtAoC6i3ZIYFVXM+ J/i0Fxq2LKcE/rGEoZH5QI5FzoQLSw== X-Google-Smtp-Source: ABdhPJzG5INooIPhexnMlZlTAI0FjO3+he5fd5LyhlgvjZOaR7q5yBOsZ1WAE439+mG+qtQ2TzS2FA== X-Received: by 2002:a05:6512:3fa2:: with SMTP id x34mr23264984lfa.437.1626823258846; Tue, 20 Jul 2021 16:20:58 -0700 (PDT) Received: from sterling.local ([46.188.68.12]) by smtp.gmail.com with ESMTPSA id a9sm587837lfs.186.2021.07.20.16.20.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Jul 2021 16:20:57 -0700 (PDT) Received: by sterling.local (Postfix, from userid 1000) id 4DE5CE63818; Wed, 21 Jul 2021 02:20:57 +0300 (MSK) Date: Wed, 21 Jul 2021 02:20:57 +0300 To: Cyrill Gorcunov Message-ID: <20210720232057.GA85781@starling> Mail-Followup-To: Konstantin Osipov , Cyrill Gorcunov , Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org References: <0c92a88ff1d392f8b03de59be8cb19a162bf78f8.1626392372.git.v.shpilevoy@tarantool.org> <20210716142959.GC146960@starling> <20210719091248.GA4257@starling> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH 1/2] replication: introduce ballot.can_be_leader X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Konstantin Osipov via Tarantool-patches Reply-To: Konstantin Osipov Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" * Cyrill Gorcunov [21/07/21 00:17]: > > Imagine there are nodes A, B, C, D, E. > > A is a leader, E is a voter which can not become a leader. > > > > Imagine A's log index is 5, B = 4, C = 3, D = 2, E = 5. > > > > The majority's log index is 4, so entry 4 is committed. A dies, B > > is partitioned away. The cluster is stuck, because neither C nor B > > can get a quorum (3 votes). > > > > Worse yet, if E's (voter) commit index is low, not high, it can vote for a > > node which doesn't have a committed entry. In that case you can > > lose a committed entry. > > Wait, Kostya, here is a set > > A B C D E > {5, 4, 3, 2, 5} > * * * > L F F F V > > where L - leader, F - follower, V - voter, LCI is 4 (least common index), > Q(uorum) = 3, then > > A B C D E > {-, -, 3, 2, 5} > F F V > > The Voter E won't be able to choose C or D because its log > is bigger and the cluster get stuck (this is guaranteed by > vclock comparision). Right. That's what I am saying - the cluster is stuck even though the quorum (3 nodes) is present. And this is not something consistent, such clusters will get stuck simply based on the state of the voter - sometimes they will, sometimes they won't. > Lets assume the E's index is low, say 3 > > A B C D E > {5, 4, 4, 3, 3} > * * > L F F F V > > in this config the leader won't commit record 5 until one > of {C,D,E} write the new record(s) since otherwise the quorum > won't be reached. Assume A and B get out of the set without > record 4 written to C > > A B C D E > {-, -, 4, 3, 3} > F F V > > Now the node E can vote for C and D because its index is LE. > And since C's index is bigger than others it will be elected > next as far as I understand, no? You're right, assuming the voter never casts a vote for a candidate with a shorter log the safety is not violated. I wasn't sure it's the case, and presumed that the voter may have no log of its own. But still there are issues with liveness. Raft PHD has learners, so why not use them instead. > The E won't be leader but will > help C to gather the majority. So the cluster should be safe > I only I'm not missing something obvious. -- Konstantin Osipov, Moscow, Russia