From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 446466ECE3; Thu, 20 Jan 2022 16:22:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 446466ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1642684966; bh=i01v+dJ9eHd0oC5cyMBL/MH8STu6mwkMTMBlb5aTQl8=; h=Date:To:References:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=sPO0LtBA4zhJodv/5pGYe8uMZVsaDeZnMwzvtmosj+I7HOCIBxg5wl7K9BaO1uXQ+ vprWjdpTRkZaLh1ax94a55UyDyV0H6yQnszyp2kV4eYISmLrVWmfFa75LOT9MTAinG VKQmZK2NUuOD4hb1WIygnIWvjtTyTDjZUIelBTxc= Received: from smtp47.i.mail.ru (smtp47.i.mail.ru [94.100.177.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id CEA166ECE3 for ; Thu, 20 Jan 2022 16:22:44 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CEA166ECE3 Received: by smtp47.i.mail.ru with esmtpa (envelope-from ) id 1nAXOm-0008Qo-3Z; Thu, 20 Jan 2022 16:22:44 +0300 Message-ID: <33c8217d-c839-1b1e-7595-db44c640eeac@tarantool.org> Date: Thu, 20 Jan 2022 16:22:43 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Content-Language: en-GB To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org References: <72798befd5d6e32f4386aaeeb3209cc93c0e44b4.1642639079.git.v.shpilevoy@tarantool.org> In-Reply-To: <72798befd5d6e32f4386aaeeb3209cc93c0e44b4.1642639079.git.v.shpilevoy@tarantool.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-4EC0790: 10 X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9ED6F014C95FE03043E97E9F8B28287C723E45FC99895AE8F1313CFAB8367EF908E2BE116634AD74DFA6629755AD32C9403BE26A120E9DDA862B9885EBD2D99047FD75BC03299E750 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE791E547C23789F646EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006373D58C44ED3182E498638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D893A188A570C849F9BB7244FBAA73268F117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC8C7ADC89C2F0B2A5A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F4460429728776938767073520B1593CA6EC85F86DF6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8B08F9A42B2210255C75ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: 0D63561A33F958A5DF7E972F0CA302752221EE9931C5B189E7B1DFB003A84057D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75B7BFB303F1C7DB4D8E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D348F9E8EBB462314877A8534CF82C567CD09D170A530D182FC37C1BFC96BFBF2B3F6CEC84E17FED3DE1D7E09C32AA3244C4B211245AB9877A22354D29D5409CE7063871F383B54D9B3729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojPeoZbWa28mzhQ9PAMnzVVw== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446B97C8570D631E9215410020A7C3EF0EBD55E04639EC50D4A424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BCB0DAF586E7D11B3E67EA787935ED9F1B X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH v2 4/5] raft: introduce split vote detection X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 20.01.2022 03:43, Vladislav Shpilevoy пишет: > Split vote is a situation when during election nobody can win and > will not win in this term for sure. Not a single node could get > enough votes. For example, with 4 nodes one could get 2 votes and > other also 2 votes. Nobody will get quorum 3 in this term. > > The patch makes raft able to notice that situation and speed up > the term bump. It is not bumped immediately though, because nodes > might do that simultaneously and will get the split vote again > after voting for self. There is a random delay. But it is just max > 10% of election timeout, so it should speed up split vote > resolution on 90% at least. > > Part of #5285 > --- > src/lib/raft/raft.c | 129 +++++++++++++++- > src/lib/raft/raft.h | 6 + > test/unit/raft.c | 301 +++++++++++++++++++++++++++++++++++- > test/unit/raft.result | 64 +++++++- > test/unit/raft_test_utils.c | 12 ++ > test/unit/raft_test_utils.h | 5 + > 6 files changed, 512 insertions(+), 5 deletions(-) > Thanks for the fixes! Please find 2 comments below. > +static void > +raft_check_split_vote(struct raft *raft) > +{ > + /* When leader is known, there is no election. Thus no vote to split. */ > + if (raft->leader != 0) > + return; > + /* Not a candidate = can't trigger term bump anyway. */ > + if (!raft->is_candidate) > + return; > + /* > + * WAL write in progress means the state is changing. All is rechecked > + * when it is done. > + */ > + if (raft->is_write_in_progress) > + return; > + if (!raft_has_split_vote(raft)) > + return; > + assert(raft_ev_is_active(&raft->timer)); > + /* > + * Could be already detected before. The timeout would be updated by now > + * then. > + */ > + if (raft->timer.repeat < raft->election_timeout) > + return; I don't think you should decrease timer.repeat. This 'vote speedup' is for a single term only. Besides the check below about delay >= remaining is enough to test if split vote detection was already triggered. > + > + assert(raft->state == RAFT_STATE_FOLLOWER || > + raft->state == RAFT_STATE_CANDIDATE); > + struct ev_loop *loop = raft_loop(); > + struct ev_timer *timer = &raft->timer; > + double delay = raft_new_random_election_shift(raft); > + /* > + * Could be too late to speed up anything - probably the term is almost > + * over anyway. > + */ > + double remaining = raft_ev_timer_remaining(loop, timer); > + if (delay >= remaining) > + delay = remaining; > + say_info("RAFT: split vote is discovered - %s, new term in %lf sec", > + raft_scores_str(raft), delay); > + raft_ev_timer_stop(loop, timer); > + raft_ev_timer_set(timer, delay, delay); ... > diff --git a/test/unit/raft_test_utils.h b/test/unit/raft_test_utils.h > index c68dc3b22..2138a829e 100644 > --- a/test/unit/raft_test_utils.h > +++ b/test/unit/raft_test_utils.h > @@ -32,6 +32,7 @@ > #include "fakesys/fakeev.h" > #include "fiber.h" > #include "raft/raft.h" > +#include "raft/raft_ev.h" Why do you need it here? > #include "unit.h" > > /** WAL simulation. It stores a list of rows which raft wanted to persist. */ > @@ -105,6 +106,7 @@ struct raft_node { > int cfg_election_quorum; > double cfg_death_timeout; > uint32_t cfg_instance_id; > + int cfg_cluster_size; > struct vclock *cfg_vclock; > }; > > @@ -227,6 +229,9 @@ raft_node_cfg_is_enabled(struct raft_node *node, bool value); > void > raft_node_cfg_is_candidate(struct raft_node *node, bool value); > > +void > +raft_node_cfg_cluster_size(struct raft_node *node, int value); > + > void > raft_node_cfg_election_timeout(struct raft_node *node, double value); > -- Serge Petrenko