From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 583246EC55; Fri, 16 Jul 2021 02:50:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 583246EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1626393054; bh=6Jmo74gvBCwwDJ/4b+187JzGfcjLPcV1ohyRHksVB+Y=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=hIOxeV+DAWNpESttBcKyEJNnUonaOnuTVw+RZ996flSLQCGBJp1NpXUOvC8/hOtGJ pnIVgnQQ1RNZMgUzuQkxvaBpM/s3TzgCjH6b8A+28oyfOky7JyuHU9MCaK3KdmjKs3 NLBsp/28WeguaEtfQs2iqZAweXUEch6Zpajg0EFg= Received: from smtpng3.i.mail.ru (smtpng3.i.mail.ru [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5F70F6EC5C for ; Fri, 16 Jul 2021 02:49:54 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 5F70F6EC5C Received: by smtpng3.m.smailru.net with esmtpa (envelope-from ) id 1m4B73-000289-EF; Fri, 16 Jul 2021 02:49:54 +0300 To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com, sergepetrenko@tarantool.org Date: Fri, 16 Jul 2021 01:49:50 +0200 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: 646B95376F6C166E X-77F55803: 4F1203BC0FB41BD941C43E597735A9C320FB367CC9CBEE80CFFDA1774AEA76E0182A05F5380850404318F353C22D9C42BA397AE2BD0E0044497079AD22CB473195BBFC7145EDB60A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE700DE1570087B56A5C2099A533E45F2D0395957E7521B51C2CFCAF695D4D8E9FCEA1F7E6F0F101C6778DA827A17800CE70386A6136E33FD82EA1F7E6F0F101C6723150C8DA25C47586E58E00D9D99D84E1BDDB23E98D2D38BBCA57AF85F7723F25B4F0ADD6F0680B2553EAF73176DE13ACC7F00164DA146DAFE8445B8C89999728AA50765F7900637F6B57BC7E64490618DEB871D839B7333395957E7521B51C2DFABB839C843B9C08941B15DA834481F8AA50765F7900637F6B57BC7E6449061A352F6E88A58FB86F5D81C698A659EA7E827F84554CEF5019E625A9149C048EE9ECD01F8117BC8BEE2021AF6380DFAD18AA50765F790063735872C767BF85DA227C277FBC8AE2E8BB07C9E286C61B7F975ECD9A6C639B01B4E70A05D1297E1BBCB5012B2E24CD356 X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CAA8DC07915BC95B7C98462B55000B045B323120D99D363A59C2B6934AE262D3EE7EAB7254005DCED7532B743992DF240BDC6A1CF3F042BAD6DF99611D93F60EF3033054805BDE987699F904B3F4130E343918A1A30D5E7FCCB5012B2E24CD356 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D344888A9AABCDD32258713522F4E2F831F907B038186A5C421EFD5CC35A6FC29D0466862C39B701F8E1D7E09C32AA3244CB80D41282BF60C8F3363D50107D4288C35DA7DC5AF9B58C0FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojSyb42jm8PHITCSUevzWZNg== X-Mailru-Sender: 689FA8AB762F7393C37E3C1AEC41BA5D015DAB75BE0F9AD45619DBAE6F83992A3841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E267EA787935ED9F1B X-Mras: Ok Subject: [Tarantool-patches] [PATCH 2/2] election: during bootstrap prefer candidates X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" During cluster bootstrap the boot master election algorithm didn't take into account election modes of the instances. It could be that all nodes have box.cfg.read_only = false, none is booted, all are read-only now. Then the node with the smallest UUID was chosen even if it was box.cfg.election_mode='voter' node. It could neither boot nor register other nodes and the cluster couldn't start. The patch makes the boot master election prefer the instances which can become a Raft leader. If all the other parameters didn't help. Closes #6018 --- .../unreleased/gh-6018-election-boot-voter.md | 4 + src/box/box.cc | 25 +++- src/box/replication.cc | 11 +- .../gh-6018-election-boot-voter.result | 116 ++++++++++++++++++ .../gh-6018-election-boot-voter.test.lua | 59 +++++++++ test/replication/gh-6018-master.lua | 17 +++ test/replication/gh-6018-replica.lua | 15 +++ test/replication/suite.cfg | 1 + 8 files changed, 245 insertions(+), 3 deletions(-) create mode 100644 changelogs/unreleased/gh-6018-election-boot-voter.md create mode 100644 test/replication/gh-6018-election-boot-voter.result create mode 100644 test/replication/gh-6018-election-boot-voter.test.lua create mode 100644 test/replication/gh-6018-master.lua create mode 100644 test/replication/gh-6018-replica.lua diff --git a/changelogs/unreleased/gh-6018-election-boot-voter.md b/changelogs/unreleased/gh-6018-election-boot-voter.md new file mode 100644 index 000000000..080484bbe --- /dev/null +++ b/changelogs/unreleased/gh-6018-election-boot-voter.md @@ -0,0 +1,4 @@ +## bugfix/replication + +* Fixed a cluster sometimes being unable to bootstrap if it contains nodes with + `election_mode` `manual` or `voter` (gh-6018). diff --git a/src/box/box.cc b/src/box/box.cc index ef3efe3e0..3105b04b6 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -3519,7 +3519,30 @@ box_cfg_xc(void) * should take the control over the situation and start a new * term immediately. */ - raft_new_term(box_raft()); + struct raft *raft = box_raft(); + if (box_election_mode == ELECTION_MODE_MANUAL) { + raft_start_candidate(raft); + raft_new_term(raft); + int rc = box_raft_wait_leader_found(); + /* + * No need to check if the mode is still manual - it + * couldn't change because box.cfg is protected with a + * fiber lock. + */ + assert(box_election_mode == ELECTION_MODE_MANUAL); + raft_stop_candidate(raft, false); + /* + * It should not fail, because on bootstrap the node is + * a single registered instance. It can't not win the + * elections while being a lone participant. But still + * check the result so as not to a ignore potential + * problems. + */ + if (rc != 0) + diag_raise(); + } else { + raft_new_term(raft); + } } /* box.cfg.read_only is not read yet. */ diff --git a/src/box/replication.cc b/src/box/replication.cc index a0b3e0186..622d12f74 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -978,12 +978,19 @@ replicaset_find_join_master(void) * config is stronger because if it is configured as read-only, * it is in read-only state for sure, until the config is * changed. + * + * In a cluster with leader election enabled all instances might + * look equal by the scores above. Then must prefer the ones + * which can be elected as a leader, because only they would be + * able to boot themselves and register the others. */ if (ballot->is_booted) - score += 10; + score += 1000; if (!ballot->is_ro_cfg) - score += 5; + score += 100; if (!ballot->is_ro) + score += 10; + if (ballot->can_be_leader) score += 1; if (leader_score < score) goto elect; diff --git a/test/replication/gh-6018-election-boot-voter.result b/test/replication/gh-6018-election-boot-voter.result new file mode 100644 index 000000000..c960aa4bd --- /dev/null +++ b/test/replication/gh-6018-election-boot-voter.result @@ -0,0 +1,116 @@ +-- test-run result file version 2 +-- +-- gh-6018: in a auto-election cluster nodes with voter state could be selected +-- as bootstrap leaders. They should not, because a voter can't be ever writable +-- and it can neither boot itself nor register other nodes. +-- +-- Similar situation was with the manual election. All instances might have +-- manual election mode. Such a cluster wouldn't be able to boot if their +-- bootstrap master wouldn't become an elected leader automatically at least +-- once. +-- +test_run = require('test_run').new() + | --- + | ... + +function boot_with_master_election_mode(mode) \ + test_run:cmd('create server master with '.. \ + 'script="replication/gh-6018-master.lua"') \ + test_run:cmd('start server master with wait=False, args="'..mode..'"') \ + test_run:cmd('create server replica with '.. \ + 'script="replication/gh-6018-replica.lua"') \ + test_run:cmd('start server replica') \ +end + | --- + | ... + +function stop_cluster() \ + test_run:cmd('stop server replica') \ + test_run:cmd('stop server master') \ + test_run:cmd('delete server replica') \ + test_run:cmd('delete server master') \ +end + | --- + | ... + +-- +-- Candidate leader. +-- +boot_with_master_election_mode('candidate') + | --- + | ... + +test_run:switch('master') + | --- + | - true + | ... +test_run:wait_cond(function() return not box.info.ro end) + | --- + | - true + | ... +assert(box.info.election.state == 'leader') + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +assert(box.info.ro) + | --- + | - true + | ... +assert(box.info.election.state == 'follower') + | --- + | - true + | ... + +test_run:switch('default') + | --- + | - true + | ... +stop_cluster() + | --- + | ... + +-- +-- Manual leader. +-- +boot_with_master_election_mode('manual') + | --- + | ... + +test_run:switch('master') + | --- + | - true + | ... +test_run:wait_cond(function() return not box.info.ro end) + | --- + | - true + | ... +assert(box.info.election.state == 'leader') + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +assert(box.info.ro) + | --- + | - true + | ... +assert(box.info.election.state == 'follower') + | --- + | - true + | ... + +test_run:switch('default') + | --- + | - true + | ... +stop_cluster() + | --- + | ... diff --git a/test/replication/gh-6018-election-boot-voter.test.lua b/test/replication/gh-6018-election-boot-voter.test.lua new file mode 100644 index 000000000..800e20c8f --- /dev/null +++ b/test/replication/gh-6018-election-boot-voter.test.lua @@ -0,0 +1,59 @@ +-- +-- gh-6018: in a auto-election cluster nodes with voter state could be selected +-- as bootstrap leaders. They should not, because a voter can't be ever writable +-- and it can neither boot itself nor register other nodes. +-- +-- Similar situation was with the manual election. All instances might have +-- manual election mode. Such a cluster wouldn't be able to boot if their +-- bootstrap master wouldn't become an elected leader automatically at least +-- once. +-- +test_run = require('test_run').new() + +function boot_with_master_election_mode(mode) \ + test_run:cmd('create server master with '.. \ + 'script="replication/gh-6018-master.lua"') \ + test_run:cmd('start server master with wait=False, args="'..mode..'"') \ + test_run:cmd('create server replica with '.. \ + 'script="replication/gh-6018-replica.lua"') \ + test_run:cmd('start server replica') \ +end + +function stop_cluster() \ + test_run:cmd('stop server replica') \ + test_run:cmd('stop server master') \ + test_run:cmd('delete server replica') \ + test_run:cmd('delete server master') \ +end + +-- +-- Candidate leader. +-- +boot_with_master_election_mode('candidate') + +test_run:switch('master') +test_run:wait_cond(function() return not box.info.ro end) +assert(box.info.election.state == 'leader') + +test_run:switch('replica') +assert(box.info.ro) +assert(box.info.election.state == 'follower') + +test_run:switch('default') +stop_cluster() + +-- +-- Manual leader. +-- +boot_with_master_election_mode('manual') + +test_run:switch('master') +test_run:wait_cond(function() return not box.info.ro end) +assert(box.info.election.state == 'leader') + +test_run:switch('replica') +assert(box.info.ro) +assert(box.info.election.state == 'follower') + +test_run:switch('default') +stop_cluster() diff --git a/test/replication/gh-6018-master.lua b/test/replication/gh-6018-master.lua new file mode 100644 index 000000000..1192204ff --- /dev/null +++ b/test/replication/gh-6018-master.lua @@ -0,0 +1,17 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) + +box.cfg({ + listen = 'unix/:./gh-6018-master.sock', + replication = { + 'unix/:./gh-6018-master.sock', + 'unix/:./gh-6018-replica.sock', + }, + election_mode = arg[1], + instance_uuid = 'cbf06940-0790-498b-948d-042b62cf3d29', + replication_timeout = 0.1, +}) + +box.ctl.wait_rw() +box.schema.user.grant('guest', 'super') diff --git a/test/replication/gh-6018-replica.lua b/test/replication/gh-6018-replica.lua new file mode 100644 index 000000000..71e669141 --- /dev/null +++ b/test/replication/gh-6018-replica.lua @@ -0,0 +1,15 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) + +box.cfg({ + listen = 'unix/:./gh-6018-replica.sock', + replication = { + 'unix/:./gh-6018-master.sock', + 'unix/:./gh-6018-replica.sock', + }, + election_mode = 'voter', + -- Smaller than master UUID. + instance_uuid = 'cbf06940-0790-498b-948d-042b62cf3d28', + replication_timeout = 0.1, +}) diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg index 69f2f3511..2bfc3b845 100644 --- a/test/replication/suite.cfg +++ b/test/replication/suite.cfg @@ -45,6 +45,7 @@ "gh-5536-wal-limit.test.lua": {}, "gh-5566-final-join-synchro.test.lua": {}, "gh-5613-bootstrap-prefer-booted.test.lua": {}, + "gh-6018-election-boot-voter.test.lua": {}, "gh-6027-applier-error-show.test.lua": {}, "gh-6032-promote-wal-write.test.lua": {}, "gh-6057-qsync-confirm-async-no-wal.test.lua": {}, -- 2.24.3 (Apple Git-128)