From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id F2C8A6EC55; Fri, 16 Jul 2021 14:30:38 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org F2C8A6EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1626435039; bh=nq9Oob//7MwYKb4Wa/X5y2HEQpH6wmTbkcwxM0yUGMY=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=pP8efG9SkyPACrt8SubrqmL6kFL7Mr1gvMgq1jRHa9QSutJMYFu/JgJMH2wuU1I7g /gGwIYSBM+11zfIKQuJdd+W4hjFz13z+nByvBGKtunHKzuJowuHGOIrsZAHHKN2S5x 1H4vNyLW2q57RduMVUlamTsqjhw9xIcS2fXWkNQk= Received: from smtp43.i.mail.ru (smtp43.i.mail.ru [94.100.177.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id AB83A6EC55 for ; Fri, 16 Jul 2021 14:30:37 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AB83A6EC55 Received: by smtp43.i.mail.ru with esmtpa (envelope-from ) id 1m4M3B-0002KX-1E; Fri, 16 Jul 2021 14:30:37 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com References: Message-ID: <911781cb-0aac-de47-3e08-e888b646a429@tarantool.org> Date: Fri, 16 Jul 2021 14:30:36 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-4EC0790: 10 X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C30288BCF456A452EC92BAB6D044D5CCDE182A05F5380850408EF4BB0C77EFF7114F0A9982B90CD710180735463BA6DE7B6B892E499FF812FB X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7FC9AA4D9DEB2779FEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F7900637CDA089757FB31C668638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D82F2CF81C90D04BDE52D41407A1B38514117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352033AC447995A7AD18618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B60A62CEF541B197C8089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: 0D63561A33F958A5F0F659E5E326A795AC74E6B83CF0DA36DAD5A59F19C4047BD59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA7501A9DF589746230F410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34F1ADD4D8CD3C81CEFCF0B4B9BE4209000AF4FEC211CF2C53E4C96C284F6F4CC9D1CEA248319CC4951D7E09C32AA3244C48A3C4B99C378881C3AC1E5BBF569878F522A1CF68F4BE05FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojk34dk3RWCsYcJXEVgzCPSw== X-Mailru-Sender: 3B9A0136629DC9125D61937A2360A446517E851649A596BCCA9ED6F713D2F9AA7F067C1AD5AF773D424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 2/2] election: during bootstrap prefer candidates X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 16.07.2021 02:49, Vladislav Shpilevoy пишет: > During cluster bootstrap the boot master election algorithm didn't > take into account election modes of the instances. It could be > that all nodes have box.cfg.read_only = false, none is booted, > all are read-only now. Then the node with the smallest UUID was > chosen even if it was box.cfg.election_mode='voter' node. > > It could neither boot nor register other nodes and the cluster > couldn't start. > > The patch makes the boot master election prefer the instances > which can become a Raft leader. If all the other parameters didn't > help. > > Closes #6018 > --- Hi! Thanks for the patch! > .../unreleased/gh-6018-election-boot-voter.md | 4 + > src/box/box.cc | 25 +++- > src/box/replication.cc | 11 +- > .../gh-6018-election-boot-voter.result | 116 ++++++++++++++++++ > .../gh-6018-election-boot-voter.test.lua | 59 +++++++++ > test/replication/gh-6018-master.lua | 17 +++ > test/replication/gh-6018-replica.lua | 15 +++ > test/replication/suite.cfg | 1 + > 8 files changed, 245 insertions(+), 3 deletions(-) > create mode 100644 changelogs/unreleased/gh-6018-election-boot-voter.md > create mode 100644 test/replication/gh-6018-election-boot-voter.result > create mode 100644 test/replication/gh-6018-election-boot-voter.test.lua > create mode 100644 test/replication/gh-6018-master.lua > create mode 100644 test/replication/gh-6018-replica.lua > > diff --git a/changelogs/unreleased/gh-6018-election-boot-voter.md b/changelogs/unreleased/gh-6018-election-boot-voter.md > new file mode 100644 > index 000000000..080484bbe > --- /dev/null > +++ b/changelogs/unreleased/gh-6018-election-boot-voter.md > @@ -0,0 +1,4 @@ > +## bugfix/replication > + > +* Fixed a cluster sometimes being unable to bootstrap if it contains nodes with > + `election_mode` `manual` or `voter` (gh-6018). > diff --git a/src/box/box.cc b/src/box/box.cc > index ef3efe3e0..3105b04b6 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -3519,7 +3519,30 @@ box_cfg_xc(void) > * should take the control over the situation and start a new > * term immediately. > */ > - raft_new_term(box_raft()); > + struct raft *raft = box_raft(); > + if (box_election_mode == ELECTION_MODE_MANUAL) { > + raft_start_candidate(raft); > + raft_new_term(raft); > + int rc = box_raft_wait_leader_found(); > + /* > + * No need to check if the mode is still manual - it > + * couldn't change because box.cfg is protected with a > + * fiber lock. > + */ > + assert(box_election_mode == ELECTION_MODE_MANUAL); > + raft_stop_candidate(raft, false); > + /* > + * It should not fail, because on bootstrap the node is > + * a single registered instance. It can't not win the > + * elections while being a lone participant. But still > + * check the result so as not to a ignore potential > + * problems. > + */ > + if (rc != 0) > + diag_raise(); > + } else { > + raft_new_term(raft); > + } Could you please extract this fix into a separate commit? Speaking of your problems with raft_try_candidate. I also can't think of a good enough alternative. For promote it would be nice to do: do {     raft_try_candidate_for_1_term(); } while (leader is not known); and simply raft_try_candidate_for_1_term(); for bootstrap. But raft_try_candidate_for_1_term() looks hard to implement. -- Serge Petrenko