From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 2C65D6ECE3; Sat, 15 Jan 2022 03:51:02 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 2C65D6ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1642207862; bh=EVEcv/QJ2zODUDqV8knzRkdYvuOPSNH4X6673cpVBbA=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=lCmFCG/DwXqPY82mJI3JdcU+cDPXpI4Y7HVMZ51QpI0j/DWH5FUiF/+coGuFoYCqu Wo1m/6mV8Y7YDBmWJY4gjo4x++IYIq2Z07L8UPm+fh0aewCJvUSVKourNPVgb5RkHi NO9DwCCuEJytkL8v1KwJ9+Xm/VfcqSerTYtILmhY= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id BC0896EC42 for ; Sat, 15 Jan 2022 03:49:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org BC0896EC42 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1n8XFb-0002zu-Vx; Sat, 15 Jan 2022 03:49:00 +0300 To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org Date: Sat, 15 Jan 2022 01:48:56 +0100 Message-Id: X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9CD668969C51240A410EC091FA1CD680410FE328A054810E9182A05F53808504020834E1E40D151CFB489D5866A21B5562ACC2D22E49118555A3B550C3A73E43A X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7408FFE705ACEE2A7EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006378F6D32451C4A3CAA8638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D87BD9DD9DBE62CCB12EAFE05BC0C58D36117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCAA867293B0326636D2E47CDBA5A96583BD4B6F7A4D31EC0BC014FD901B82EE079FA2833FD35BB23D27C277FBC8AE2E8BAA867293B0326636D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B62CFFCC7B69C47339089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975CBF0E6906D4BA60DF4565B0934C7C64B8E70C79C890552C019C2B6934AE262D3EE7EAB7254005DCED114C52B35DBB74F4E7EAB7254005DCEDA5DF9383870C0FED1E0A4E2319210D9B64D260DF9561598F01A9E91200F654B02120514CE2D1D3528E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34E54F8089C01448AA97288FABC9255105F715D349E2C1CD39AC8D632B7523151D83552C1586852F571D7E09C32AA3244CD31840DDA531902F8078491DE52C24F2F26BFA4C8A6946B8729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojEEn7XIFHT9pCk2IvfnadSg== X-Mailru-Sender: 689FA8AB762F739339CABD9B3CA9A7D6F549CF61A9F14C91A348C69A11BE63F23841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 4/4] election: activate raft split vote handling X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" Raft needs to know cluster size in order to detect and handle split vote. The patch uses registered server count as cluster size. It is not documented nor has a changelog file because this is an optimization. Can't be observed except in logs or with a watch. Closes #5285 --- src/box/raft.c | 4 +- .../election_split_vote_test.lua | 92 +++++++++++++++++++ 2 files changed, 95 insertions(+), 1 deletion(-) create mode 100644 test/replication-luatest/election_split_vote_test.lua diff --git a/src/box/raft.c b/src/box/raft.c index 1e360dc88..1908b71b6 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -229,7 +229,9 @@ box_raft_update_election_quorum(void) * be lost. */ int quorum = MIN(replication_synchro_quorum, max); - raft_cfg_election_quorum(box_raft(), quorum); + struct raft *raft = box_raft(); + raft_cfg_election_quorum(raft, quorum); + raft_cfg_cluster_size(raft, replicaset.registered_count); } void diff --git a/test/replication-luatest/election_split_vote_test.lua b/test/replication-luatest/election_split_vote_test.lua new file mode 100644 index 000000000..f31bfd7f3 --- /dev/null +++ b/test/replication-luatest/election_split_vote_test.lua @@ -0,0 +1,92 @@ +local t = require('luatest') +local cluster = require('test.luatest_helpers.cluster') +local helpers = require('test.luatest_helpers') +local wait_timeout = 120 + +-- +-- gh-5285: split vote is when in the current term there can't be winner of the +-- leader role. Number of unused votes is not enough for anyone to get the +-- quorum. It can be detected to speed up the term bump. +-- +local g = t.group('split-vote') + +g.before_each(function() + g.cluster = cluster:new({}) + local node1_uri = helpers.instance_uri('node1') + local node2_uri = helpers.instance_uri('node2') + local replication = {node1_uri, node2_uri} + local box_cfg = { + listen = node1_uri, + replication = replication, + -- To speed up new term when try to elect a first leader. + replication_timeout = 0.1, + replication_synchro_quorum = 2, + election_timeout = 1000000, + } + g.node1 = g.cluster:build_server({alias = 'node1', box_cfg = box_cfg}) + + box_cfg.listen = node2_uri + g.node2 = g.cluster:build_server({alias = 'node2', box_cfg = box_cfg}) + + g.cluster:add_server(g.node1) + g.cluster:add_server(g.node2) + g.cluster:start() +end) + +g.after_each(function() + g.cluster:drop() +end) + +g.test_split_vote = function(g) + -- Stop the replication so as the nodes can't request votes from each other. + local node1_repl = g.node1:exec(function() + local repl = box.cfg.replication + box.cfg{replication = {}} + return repl + end) + local node2_repl = g.node2:exec(function() + local repl = box.cfg.replication + box.cfg{replication = {}} + return repl + end) + + -- Both vote for self but don't see the split-vote yet. + g.node1:exec(function() + box.cfg{election_mode = 'candidate'} + end) + g.node2:exec(function() + box.cfg{election_mode = 'candidate'} + end) + + -- Wait for the votes to actually happen. + t.helpers.retrying({timeout = wait_timeout}, function() + local func = function() + return box.info.election.vote == box.info.id + end + assert(g.node1:exec(func)) + assert(g.node2:exec(func)) + end) + + -- Now let the nodes notice the split vote. + g.node1:exec(function(repl) + box.cfg{replication = repl} + end, {node1_repl}) + g.node2:exec(function(repl) + box.cfg{replication = repl} + end, {node2_repl}) + + t.helpers.retrying({timeout = wait_timeout}, function() + local msg = 'split vote is discovered' + assert(g.node1:grep_log(msg) or g.node2:grep_log(msg)) + end) + + -- Ensure a leader is eventually elected. Nothing is broken for good. + g.node1:exec(function() + box.cfg{election_timeout = 1} + end) + g.node2:exec(function() + box.cfg{election_timeout = 1} + end) + g.node1:wait_election_leader_found() + g.node2:wait_election_leader_found() +end -- 2.24.3 (Apple Git-128)