From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C28286ECE3; Sat, 15 Jan 2022 03:49:29 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C28286ECE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1642207769; bh=Fdfc/e5XSLVOnutzilZnEge5F17/2WcCyeLegc5DNSM=; h=To:Date:In-Reply-To:References:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=A4E1p6pkGtv/qh5QEWtKotkf+rsTEeNg3vslvjzpbrUfq3Uw3tznH9kqfo2PwPabh MinKchHUa1TwHwFK9adkA6CQlxzzegh483wz9iaeVylkXraLzdkcl8ucNI3QenseEH 1odK3tCdxVXPnX2JmoN5MfDpmPutaqKsyzj5rrrU= Received: from smtpng1.i.mail.ru (smtpng1.i.mail.ru [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 782556ECE3 for ; Sat, 15 Jan 2022 03:48:58 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 782556ECE3 Received: by smtpng1.m.smailru.net with esmtpa (envelope-from ) id 1n8XFZ-0002zu-OB; Sat, 15 Jan 2022 03:48:58 +0300 To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org Date: Sat, 15 Jan 2022 01:48:53 +0100 Message-Id: <8573bc1d91d28b77c7aa87ebb5fb398c1287fcc0.1642207647.git.v.shpilevoy@tarantool.org> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD9CD668969C51240A45842FFE3EF17D9AE407DDED95B12EF39182A05F538085040B8431949F018EC75142BB8C07D45EBF30A20C8772D19D0CF1606A461B94D3CBC X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE76256B078082242EEEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006372B866E63D69A26578638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D8702040410956A388359FED9ED1AB3EC6117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC2EE5AD8F952D28FBA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FF04B652EEC242312D2E47CDBA5A96583BA9C0B312567BB231DD303D21008E29813377AFFFEAFD269A417C69337E82CC2E827F84554CEF50127C277FBC8AE2E8BA83251EDC214901ED5E8D9A59859A8B62CFFCC7B69C47339089D37D7C0E48F6C5571747095F342E88FB05168BE4CE3AF X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C4C7A0BC55FA0FE5FCE50C82159372257D0DC9BA1F9AE240B944C381D6895C4E96B1881A6453793CE9C32612AADDFBE061C61BE10805914D3804EBA3D8E7E5B87ABF8C51168CD8EBDB02B6BDC1B5FAA2C5DC48ACC2A39D04F89CDFB48F4795C241BDAD6C7F3747799A X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34A5112A9AECFE11B19E9B605E12F2B8395ADA3A6198FBDA773E3DB3B7309F2033650192826708858D1D7E09C32AA3244C1D8AD12C21D622FFDEC736B7EBA8C010795D98D676DD64D0729B2BEF169E0186 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojEEn7XIFHT9qlZoQJkgGbPg== X-Mailru-Sender: 689FA8AB762F739339CABD9B3CA9A7D603B9A0D1D2C7C45A61A0702F82007F523841015FED1DE5223CC9A89AB576DD93FB559BB5D741EB963CF37A108A312F5C27E8A8C3839CE0E25FEEDEB644C299C0ED14614B50AE0675 X-Mras: Ok Subject: [Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Vladislav Shpilevoy via Tarantool-patches Reply-To: Vladislav Shpilevoy Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" It used to crash if done during election on a node voted for anybody, it is a candidate, it doesn't know a leader yet, but has a WAL write in progress. Thus it could only happen if the term was bumped by a message from a non-leader node and wasn't flushed to the disk yet. The patch makes the reconfig check if there is a WAL write in progress. Then don't do anything. Could also check for volatile vote instead of persistent, but it would create the same problem for the case when started writing vote for self and didn't finish yet. Reconfig would crash. --- .../unreleased/election-timeout-cfg-crash.md | 5 ++ src/lib/raft/raft.c | 3 +- test/unit/raft.c | 75 ++++++++++++++++++- test/unit/raft.result | 16 +++- 4 files changed, 96 insertions(+), 3 deletions(-) create mode 100644 changelogs/unreleased/election-timeout-cfg-crash.md diff --git a/changelogs/unreleased/election-timeout-cfg-crash.md b/changelogs/unreleased/election-timeout-cfg-crash.md new file mode 100644 index 000000000..e870e3665 --- /dev/null +++ b/changelogs/unreleased/election-timeout-cfg-crash.md @@ -0,0 +1,5 @@ +## bugfix/raft + +* Reconfiguration of `box.cfg.election_timeout` could lead to a crash or + undefined behaviour if done during an ongoing election with a special WAL + write in progress. diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c index 0e6dc6155..1ae8fe87f 100644 --- a/src/lib/raft/raft.c +++ b/src/lib/raft/raft.c @@ -935,7 +935,8 @@ raft_cfg_election_timeout(struct raft *raft, double timeout) return; raft->election_timeout = timeout; - if (raft->vote != 0 && raft->leader == 0 && raft->is_candidate) { + if (raft->vote != 0 && raft->leader == 0 && raft->is_candidate && + !raft->is_write_in_progress) { assert(raft_ev_is_active(&raft->timer)); struct ev_loop *loop = raft_loop(); double timeout = raft_ev_timer_remaining(loop, &raft->timer) - diff --git a/test/unit/raft.c b/test/unit/raft.c index 8f4d2dd9a..520b94466 100644 --- a/test/unit/raft.c +++ b/test/unit/raft.c @@ -1296,7 +1296,7 @@ raft_test_enable_disable(void) static void raft_test_too_long_wal_write(void) { - raft_start_test(8); + raft_start_test(22); struct raft_node node; raft_node_create(&node); @@ -1362,6 +1362,79 @@ raft_test_too_long_wal_write(void) ok(raft_time() - ts == node.cfg_death_timeout, "timer works again"); is(node.raft.state, RAFT_STATE_CANDIDATE, "became candidate"); + /* + * During WAL write it is possible to reconfigure election timeout. + * The dangerous case is when the timer is active already. It happens + * when the node voted and is a candidate, but leader is unknown. + */ + raft_node_destroy(&node); + raft_node_create(&node); + + raft_node_cfg_election_timeout(&node, 100); + raft_run_next_event(); + is(node.raft.term, 2, "term is bumped"); + + /* Bump term again but it is not written to WAL yet. */ + raft_node_block(&node); + is(raft_node_send_vote_response(&node, + 3 /* Term. */, + 3 /* Vote. */, + 2 /* Source. */ + ), 0, "2 votes for 3 in a new term"); + raft_run_next_event(); + is(node.raft.term, 2, "term is old"); + is(node.raft.vote, 1, "vote is used for self"); + is(node.raft.volatile_term, 3, "volatile term is new"); + is(node.raft.volatile_vote, 0, "volatile vote is unused"); + + raft_node_cfg_election_timeout(&node, 50); + raft_node_unblock(&node); + ts = raft_time(); + raft_run_next_event(); + ts = raft_time() - ts; + /* 50 + <= 10% random delay. */ + ok(ts >= 50 && ts <= 55, "new election timeout works"); + ok(raft_node_check_full_state(&node, + RAFT_STATE_CANDIDATE /* State. */, + 0 /* Leader. */, + 4 /* Term. */, + 1 /* Vote. */, + 4 /* Volatile term. */, + 1 /* Volatile vote. */, + "{0: 4}" /* Vclock. */ + ), "new term is started with vote for self"); + + /* + * Similar case when a vote is being written but not finished yet. + */ + raft_node_destroy(&node); + raft_node_create(&node); + + raft_node_cfg_election_timeout(&node, 100); + raft_node_block(&node); + raft_run_next_event(); + is(node.raft.term, 1, "term is old"); + is(node.raft.vote, 0, "vote is unused"); + is(node.raft.volatile_term, 2, "volatile term is new"); + is(node.raft.volatile_vote, 1, "volatile vote is self"); + + raft_node_cfg_election_timeout(&node, 50); + raft_node_unblock(&node); + ts = raft_time(); + raft_run_next_event(); + ts = raft_time() - ts; + /* 50 + <= 10% random delay. */ + ok(ts >= 50 && ts <= 55, "new election timeout works"); + ok(raft_node_check_full_state(&node, + RAFT_STATE_CANDIDATE /* State. */, + 0 /* Leader. */, + 3 /* Term. */, + 1 /* Vote. */, + 3 /* Volatile term. */, + 1 /* Volatile vote. */, + "{0: 2}" /* Vclock. */ + ), "new term is started with vote for self"); + raft_node_destroy(&node); raft_finish_test(); } diff --git a/test/unit/raft.result b/test/unit/raft.result index 14689ea81..764d82969 100644 --- a/test/unit/raft.result +++ b/test/unit/raft.result @@ -220,7 +220,7 @@ ok 11 - subtests ok 12 - subtests *** raft_test_enable_disable: done *** *** raft_test_too_long_wal_write *** - 1..8 + 1..22 ok 1 - vote for 2 ok 2 - vote is volatile ok 3 - message from leader @@ -229,6 +229,20 @@ ok 12 - subtests ok 6 - wal write is finished ok 7 - timer works again ok 8 - became candidate + ok 9 - term is bumped + ok 10 - 2 votes for 3 in a new term + ok 11 - term is old + ok 12 - vote is used for self + ok 13 - volatile term is new + ok 14 - volatile vote is unused + ok 15 - new election timeout works + ok 16 - new term is started with vote for self + ok 17 - term is old + ok 18 - vote is unused + ok 19 - volatile term is new + ok 20 - volatile vote is self + ok 21 - new election timeout works + ok 22 - new term is started with vote for self ok 13 - subtests *** raft_test_too_long_wal_write: done *** *** raft_test_promote_restore *** -- 2.24.3 (Apple Git-128)