[Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Sat Jan 15 03:48:53 MSK 2022


It used to crash if done during election on a node voted for
anybody, it is a candidate, it doesn't know a leader yet, but has
a WAL write in progress.

Thus it could only happen if the term was bumped by a message from
a non-leader node and wasn't flushed to the disk yet.

The patch makes the reconfig check if there is a WAL write in
progress. Then don't do anything.

Could also check for volatile vote instead of persistent, but it
would create the same problem for the case when started writing
vote for self and didn't finish yet. Reconfig would crash.
---
 .../unreleased/election-timeout-cfg-crash.md  |  5 ++
 src/lib/raft/raft.c                           |  3 +-
 test/unit/raft.c                              | 75 ++++++++++++++++++-
 test/unit/raft.result                         | 16 +++-
 4 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100644 changelogs/unreleased/election-timeout-cfg-crash.md

diff --git a/changelogs/unreleased/election-timeout-cfg-crash.md b/changelogs/unreleased/election-timeout-cfg-crash.md
new file mode 100644
index 000000000..e870e3665
--- /dev/null
+++ b/changelogs/unreleased/election-timeout-cfg-crash.md
@@ -0,0 +1,5 @@
+## bugfix/raft
+
+* Reconfiguration of `box.cfg.election_timeout` could lead to a crash or
+  undefined behaviour if done during an ongoing election with a special WAL
+  write in progress.
diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index 0e6dc6155..1ae8fe87f 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -935,7 +935,8 @@ raft_cfg_election_timeout(struct raft *raft, double timeout)
 		return;
 
 	raft->election_timeout = timeout;
-	if (raft->vote != 0 && raft->leader == 0 && raft->is_candidate) {
+	if (raft->vote != 0 && raft->leader == 0 && raft->is_candidate &&
+	    !raft->is_write_in_progress) {
 		assert(raft_ev_is_active(&raft->timer));
 		struct ev_loop *loop = raft_loop();
 		double timeout = raft_ev_timer_remaining(loop, &raft->timer) -
diff --git a/test/unit/raft.c b/test/unit/raft.c
index 8f4d2dd9a..520b94466 100644
--- a/test/unit/raft.c
+++ b/test/unit/raft.c
@@ -1296,7 +1296,7 @@ raft_test_enable_disable(void)
 static void
 raft_test_too_long_wal_write(void)
 {
-	raft_start_test(8);
+	raft_start_test(22);
 	struct raft_node node;
 	raft_node_create(&node);
 
@@ -1362,6 +1362,79 @@ raft_test_too_long_wal_write(void)
 	ok(raft_time() - ts == node.cfg_death_timeout, "timer works again");
 	is(node.raft.state, RAFT_STATE_CANDIDATE, "became candidate");
 
+	/*
+	 * During WAL write it is possible to reconfigure election timeout.
+	 * The dangerous case is when the timer is active already. It happens
+	 * when the node voted and is a candidate, but leader is unknown.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+
+	raft_node_cfg_election_timeout(&node, 100);
+	raft_run_next_event();
+	is(node.raft.term, 2, "term is bumped");
+
+	/* Bump term again but it is not written to WAL yet. */
+	raft_node_block(&node);
+	is(raft_node_send_vote_response(&node,
+		3 /* Term. */,
+		3 /* Vote. */,
+		2 /* Source. */
+	), 0, "2 votes for 3 in a new term");
+	raft_run_next_event();
+	is(node.raft.term, 2, "term is old");
+	is(node.raft.vote, 1, "vote is used for self");
+	is(node.raft.volatile_term, 3, "volatile term is new");
+	is(node.raft.volatile_vote, 0, "volatile vote is unused");
+
+	raft_node_cfg_election_timeout(&node, 50);
+	raft_node_unblock(&node);
+	ts = raft_time();
+	raft_run_next_event();
+	ts = raft_time() - ts;
+	/* 50 + <= 10% random delay. */
+	ok(ts >= 50 && ts <= 55, "new election timeout works");
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		4 /* Term. */,
+		1 /* Vote. */,
+		4 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 4}" /* Vclock. */
+	), "new term is started with vote for self");
+
+	/*
+	 * Similar case when a vote is being written but not finished yet.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+
+	raft_node_cfg_election_timeout(&node, 100);
+	raft_node_block(&node);
+	raft_run_next_event();
+	is(node.raft.term, 1, "term is old");
+	is(node.raft.vote, 0, "vote is unused");
+	is(node.raft.volatile_term, 2, "volatile term is new");
+	is(node.raft.volatile_vote, 1, "volatile vote is self");
+
+	raft_node_cfg_election_timeout(&node, 50);
+	raft_node_unblock(&node);
+	ts = raft_time();
+	raft_run_next_event();
+	ts = raft_time() - ts;
+	/* 50 + <= 10% random delay. */
+	ok(ts >= 50 && ts <= 55, "new election timeout works");
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		3 /* Term. */,
+		1 /* Vote. */,
+		3 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 2}" /* Vclock. */
+	), "new term is started with vote for self");
+
 	raft_node_destroy(&node);
 	raft_finish_test();
 }
diff --git a/test/unit/raft.result b/test/unit/raft.result
index 14689ea81..764d82969 100644
--- a/test/unit/raft.result
+++ b/test/unit/raft.result
@@ -220,7 +220,7 @@ ok 11 - subtests
 ok 12 - subtests
 	*** raft_test_enable_disable: done ***
 	*** raft_test_too_long_wal_write ***
-    1..8
+    1..22
     ok 1 - vote for 2
     ok 2 - vote is volatile
     ok 3 - message from leader
@@ -229,6 +229,20 @@ ok 12 - subtests
     ok 6 - wal write is finished
     ok 7 - timer works again
     ok 8 - became candidate
+    ok 9 - term is bumped
+    ok 10 - 2 votes for 3 in a new term
+    ok 11 - term is old
+    ok 12 - vote is used for self
+    ok 13 - volatile term is new
+    ok 14 - volatile vote is unused
+    ok 15 - new election timeout works
+    ok 16 - new term is started with vote for self
+    ok 17 - term is old
+    ok 18 - vote is unused
+    ok 19 - volatile term is new
+    ok 20 - volatile vote is self
+    ok 21 - new election timeout works
+    ok 22 - new term is started with vote for self
 ok 13 - subtests
 	*** raft_test_too_long_wal_write: done ***
 	*** raft_test_promote_restore ***
-- 
2.24.3 (Apple Git-128)



More information about the Tarantool-patches mailing list