Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/4] Split vote
@ 2022-01-15  0:48 Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig Vladislav Shpilevoy via Tarantool-patches
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-15  0:48 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Split vote handling in Raft, its usage in storage, and a bug fix found while
working on this in the first commit.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5285-raft-split-vote
Issue: https://github.com/tarantool/tarantool/issues/5285

Vladislav Shpilevoy (4):
  raft: fix crash on election_timeout reconfig
  raft: track all votes, even not own
  raft: introduce split vote detection
  election: activate raft split vote handling

 .../unreleased/election-timeout-cfg-crash.md  |   5 +
 src/box/raft.c                                |   4 +-
 src/lib/raft/raft.c                           | 142 ++++++++-
 src/lib/raft/raft.h                           |  30 +-
 .../election_split_vote_test.lua              |  92 ++++++
 test/unit/raft.c                              | 299 +++++++++++++++++-
 test/unit/raft.result                         |  64 +++-
 test/unit/raft_test_utils.c                   |  12 +
 test/unit/raft_test_utils.h                   |   5 +
 9 files changed, 622 insertions(+), 31 deletions(-)
 create mode 100644 changelogs/unreleased/election-timeout-cfg-crash.md
 create mode 100644 test/replication-luatest/election_split_vote_test.lua

-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig
  2022-01-15  0:48 [Tarantool-patches] [PATCH 0/4] Split vote Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-15  0:48 ` Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 2/4] raft: track all votes, even not own Vladislav Shpilevoy via Tarantool-patches
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-15  0:48 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

It used to crash if done during election on a node voted for
anybody, it is a candidate, it doesn't know a leader yet, but has
a WAL write in progress.

Thus it could only happen if the term was bumped by a message from
a non-leader node and wasn't flushed to the disk yet.

The patch makes the reconfig check if there is a WAL write in
progress. Then don't do anything.

Could also check for volatile vote instead of persistent, but it
would create the same problem for the case when started writing
vote for self and didn't finish yet. Reconfig would crash.
---
 .../unreleased/election-timeout-cfg-crash.md  |  5 ++
 src/lib/raft/raft.c                           |  3 +-
 test/unit/raft.c                              | 75 ++++++++++++++++++-
 test/unit/raft.result                         | 16 +++-
 4 files changed, 96 insertions(+), 3 deletions(-)
 create mode 100644 changelogs/unreleased/election-timeout-cfg-crash.md

diff --git a/changelogs/unreleased/election-timeout-cfg-crash.md b/changelogs/unreleased/election-timeout-cfg-crash.md
new file mode 100644
index 000000000..e870e3665
--- /dev/null
+++ b/changelogs/unreleased/election-timeout-cfg-crash.md
@@ -0,0 +1,5 @@
+## bugfix/raft
+
+* Reconfiguration of `box.cfg.election_timeout` could lead to a crash or
+  undefined behaviour if done during an ongoing election with a special WAL
+  write in progress.
diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index 0e6dc6155..1ae8fe87f 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -935,7 +935,8 @@ raft_cfg_election_timeout(struct raft *raft, double timeout)
 		return;
 
 	raft->election_timeout = timeout;
-	if (raft->vote != 0 && raft->leader == 0 && raft->is_candidate) {
+	if (raft->vote != 0 && raft->leader == 0 && raft->is_candidate &&
+	    !raft->is_write_in_progress) {
 		assert(raft_ev_is_active(&raft->timer));
 		struct ev_loop *loop = raft_loop();
 		double timeout = raft_ev_timer_remaining(loop, &raft->timer) -
diff --git a/test/unit/raft.c b/test/unit/raft.c
index 8f4d2dd9a..520b94466 100644
--- a/test/unit/raft.c
+++ b/test/unit/raft.c
@@ -1296,7 +1296,7 @@ raft_test_enable_disable(void)
 static void
 raft_test_too_long_wal_write(void)
 {
-	raft_start_test(8);
+	raft_start_test(22);
 	struct raft_node node;
 	raft_node_create(&node);
 
@@ -1362,6 +1362,79 @@ raft_test_too_long_wal_write(void)
 	ok(raft_time() - ts == node.cfg_death_timeout, "timer works again");
 	is(node.raft.state, RAFT_STATE_CANDIDATE, "became candidate");
 
+	/*
+	 * During WAL write it is possible to reconfigure election timeout.
+	 * The dangerous case is when the timer is active already. It happens
+	 * when the node voted and is a candidate, but leader is unknown.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+
+	raft_node_cfg_election_timeout(&node, 100);
+	raft_run_next_event();
+	is(node.raft.term, 2, "term is bumped");
+
+	/* Bump term again but it is not written to WAL yet. */
+	raft_node_block(&node);
+	is(raft_node_send_vote_response(&node,
+		3 /* Term. */,
+		3 /* Vote. */,
+		2 /* Source. */
+	), 0, "2 votes for 3 in a new term");
+	raft_run_next_event();
+	is(node.raft.term, 2, "term is old");
+	is(node.raft.vote, 1, "vote is used for self");
+	is(node.raft.volatile_term, 3, "volatile term is new");
+	is(node.raft.volatile_vote, 0, "volatile vote is unused");
+
+	raft_node_cfg_election_timeout(&node, 50);
+	raft_node_unblock(&node);
+	ts = raft_time();
+	raft_run_next_event();
+	ts = raft_time() - ts;
+	/* 50 + <= 10% random delay. */
+	ok(ts >= 50 && ts <= 55, "new election timeout works");
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		4 /* Term. */,
+		1 /* Vote. */,
+		4 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 4}" /* Vclock. */
+	), "new term is started with vote for self");
+
+	/*
+	 * Similar case when a vote is being written but not finished yet.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+
+	raft_node_cfg_election_timeout(&node, 100);
+	raft_node_block(&node);
+	raft_run_next_event();
+	is(node.raft.term, 1, "term is old");
+	is(node.raft.vote, 0, "vote is unused");
+	is(node.raft.volatile_term, 2, "volatile term is new");
+	is(node.raft.volatile_vote, 1, "volatile vote is self");
+
+	raft_node_cfg_election_timeout(&node, 50);
+	raft_node_unblock(&node);
+	ts = raft_time();
+	raft_run_next_event();
+	ts = raft_time() - ts;
+	/* 50 + <= 10% random delay. */
+	ok(ts >= 50 && ts <= 55, "new election timeout works");
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		3 /* Term. */,
+		1 /* Vote. */,
+		3 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 2}" /* Vclock. */
+	), "new term is started with vote for self");
+
 	raft_node_destroy(&node);
 	raft_finish_test();
 }
diff --git a/test/unit/raft.result b/test/unit/raft.result
index 14689ea81..764d82969 100644
--- a/test/unit/raft.result
+++ b/test/unit/raft.result
@@ -220,7 +220,7 @@ ok 11 - subtests
 ok 12 - subtests
 	*** raft_test_enable_disable: done ***
 	*** raft_test_too_long_wal_write ***
-    1..8
+    1..22
     ok 1 - vote for 2
     ok 2 - vote is volatile
     ok 3 - message from leader
@@ -229,6 +229,20 @@ ok 12 - subtests
     ok 6 - wal write is finished
     ok 7 - timer works again
     ok 8 - became candidate
+    ok 9 - term is bumped
+    ok 10 - 2 votes for 3 in a new term
+    ok 11 - term is old
+    ok 12 - vote is used for self
+    ok 13 - volatile term is new
+    ok 14 - volatile vote is unused
+    ok 15 - new election timeout works
+    ok 16 - new term is started with vote for self
+    ok 17 - term is old
+    ok 18 - vote is unused
+    ok 19 - volatile term is new
+    ok 20 - volatile vote is self
+    ok 21 - new election timeout works
+    ok 22 - new term is started with vote for self
 ok 13 - subtests
 	*** raft_test_too_long_wal_write: done ***
 	*** raft_test_promote_restore ***
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 2/4] raft: track all votes, even not own
  2022-01-15  0:48 [Tarantool-patches] [PATCH 0/4] Split vote Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-15  0:48 ` Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 3/4] raft: introduce split vote detection Vladislav Shpilevoy via Tarantool-patches
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-15  0:48 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

To detect split vote a node needs to see that number of free votes
is not enough for anyone to win even if it gets them all.

Hence every node needs to count votes for all other nodes.

The patch makes raft store votes in a bit more complicated manner
than a simple bitmap for just the current instance.

Part of #5285
---
 src/lib/raft/raft.c   | 41 +++++++++++++++++++++++++++++++++--------
 src/lib/raft/raft.h   | 24 +++++++++++++++++-------
 test/unit/raft.c      | 17 ++++++++++-------
 test/unit/raft.result |  7 ++++---
 4 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index 1ae8fe87f..289d53fd5 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -152,6 +152,16 @@ raft_can_vote_for(const struct raft *raft, const struct vclock *v)
 	return cmp == 0 || cmp == 1;
 }
 
+static inline void
+raft_add_vote(struct raft *raft, int src, int dst)
+{
+	struct raft_vote *v = &raft->votes[src];
+	if (v->did_vote)
+		return;
+	v->did_vote = true;
+	++raft->votes[dst].count;
+}
+
 /** Schedule broadcast of the complete Raft state to all the followers. */
 static void
 raft_schedule_broadcast(struct raft *raft);
@@ -279,6 +289,12 @@ void
 raft_process_recovery(struct raft *raft, const struct raft_msg *req)
 {
 	say_verbose("RAFT: recover %s", raft_msg_to_string(req));
+	/*
+	 * Instance ID is not unknown until recovery ends. Because apparently it
+	 * can change during join. In Raft it is set only one time when recovery
+	 * ends for good.
+	 */
+	assert(raft->self == 0);
 	if (req->term != 0) {
 		raft->term = req->term;
 		raft->volatile_term = req->term;
@@ -335,6 +351,8 @@ raft_process_msg(struct raft *raft, const struct raft_msg *req, uint32_t source)
 	 * persisted long time ago and still broadcasted. Or a vote response.
 	 */
 	if (req->vote != 0) {
+		raft_add_vote(raft, source, req->vote);
+
 		switch (raft->state) {
 		case RAFT_STATE_FOLLOWER:
 		case RAFT_STATE_LEADER:
@@ -395,11 +413,10 @@ raft_process_msg(struct raft *raft, const struct raft_msg *req, uint32_t source)
 			 * and now was answered by some other instance.
 			 */
 			assert(raft->volatile_vote == raft->self);
-			bool was_set = bit_set(&raft->vote_mask, source);
-			raft->vote_count += !was_set;
-			if (raft->vote_count < raft->election_quorum) {
+			int vote_count = raft_vote_count(raft);
+			if (vote_count < raft->election_quorum) {
 				say_info("RAFT: accepted vote for self, vote "
-					 "count is %d/%d", raft->vote_count,
+					 "count is %d/%d", vote_count,
 					 raft->election_quorum);
 				break;
 			}
@@ -640,13 +657,11 @@ raft_sm_become_candidate(struct raft *raft)
 	assert(raft->state == RAFT_STATE_FOLLOWER);
 	assert(raft->leader == 0);
 	assert(raft->vote == raft->self);
+	assert(raft_vote_count(raft) >= 1);
 	assert(raft->is_candidate);
 	assert(!raft->is_write_in_progress);
 	assert(raft->election_quorum > 1);
 	raft->state = RAFT_STATE_CANDIDATE;
-	raft->vote_count = 1;
-	raft->vote_mask = 0;
-	bit_set(&raft->vote_mask, raft->self);
 	raft_sm_wait_election_end(raft);
 	/* State is visible and it is changed - broadcast. */
 	raft_schedule_broadcast(raft);
@@ -663,6 +678,7 @@ raft_sm_schedule_new_term(struct raft *raft, uint64_t new_term)
 	raft->volatile_vote = 0;
 	raft->leader = 0;
 	raft->state = RAFT_STATE_FOLLOWER;
+	memset(raft->votes, 0, sizeof(raft->votes));
 	/*
 	 * The instance could be promoted for the previous term. But promotion
 	 * has no effect on following terms.
@@ -685,6 +701,8 @@ raft_sm_schedule_new_vote(struct raft *raft, uint32_t new_vote)
 	assert(raft->leader == 0);
 	assert(raft->state == RAFT_STATE_FOLLOWER);
 	raft->volatile_vote = new_vote;
+	assert(!raft->votes[raft->self].did_vote);
+	raft_add_vote(raft, raft->self, raft->self);
 	raft_sm_pause_and_dump(raft);
 	/* Nothing visible is changed - no broadcast. */
 }
@@ -956,7 +974,7 @@ raft_cfg_election_quorum(struct raft *raft, int election_quorum)
 	assert(election_quorum > 0);
 	raft->election_quorum = election_quorum;
 	if (raft->state == RAFT_STATE_CANDIDATE &&
-	    raft->vote_count >= raft->election_quorum)
+	    raft_vote_count(raft) >= raft->election_quorum)
 		raft_sm_become_leader(raft);
 }
 
@@ -984,6 +1002,13 @@ raft_cfg_instance_id(struct raft *raft, uint32_t instance_id)
 	assert(raft->self == 0);
 	assert(instance_id != 0);
 	raft->self = instance_id;
+	/*
+	 * Couldn't do that reliably during recovery. Instance ID can change
+	 * more than once during join. Here instance ID is configured when it is
+	 * known forever and is safe to use.
+	 */
+	if (raft->vote != 0)
+		raft_add_vote(raft, instance_id, raft->vote);
 }
 
 void
diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
index 53e2c58a8..4fbfaa91a 100644
--- a/src/lib/raft/raft.h
+++ b/src/lib/raft/raft.h
@@ -138,6 +138,14 @@ struct raft_vtab {
 	raft_schedule_async_f schedule_async;
 };
 
+/** Vote descriptor of a single node. */
+struct raft_vote {
+	/** Whether the node voted for anybody. */
+	bool did_vote;
+	/** How many votes the node got in the current term. */
+	int count;
+};
+
 struct raft {
 	/** Instance ID of this node. */
 	uint32_t self;
@@ -189,13 +197,8 @@ struct raft {
 	 */
 	uint64_t term;
 	uint32_t vote;
-	/**
-	 * Bit 1 on position N means that a vote from instance with ID = N was
-	 * obtained.
-	 */
-	vclock_map_t vote_mask;
-	/** Number of votes for this instance. Valid only in candidate state. */
-	int vote_count;
+	/** Statistics which node voted for who. */
+	struct raft_vote votes[VCLOCK_MAX];
 	/** Number of votes necessary for successful election. */
 	int election_quorum;
 	/**
@@ -243,6 +246,13 @@ raft_is_enabled(const struct raft *raft)
 	return raft->is_enabled;
 }
 
+/** Number of votes for self. */
+static inline int
+raft_vote_count(const struct raft *raft)
+{
+	return raft->votes[raft->self].count;
+}
+
 /** Process a raft entry stored in WAL/snapshot. */
 void
 raft_process_recovery(struct raft *raft, const struct raft_msg *req);
diff --git a/test/unit/raft.c b/test/unit/raft.c
index 520b94466..9f18f4d8e 100644
--- a/test/unit/raft.c
+++ b/test/unit/raft.c
@@ -70,7 +70,7 @@ raft_test_leader_election(void)
 		1 /* Volatile vote. */,
 		"{0: 1}" /* Vclock. */
 	), "elections with a new term");
-	is(node.raft.vote_count, 1, "single vote for self");
+	is(raft_vote_count(&node.raft), 1, "single vote for self");
 	ok(node.update_count > 0, "trigger worked");
 	node.update_count = 0;
 
@@ -100,7 +100,7 @@ raft_test_leader_election(void)
 		1 /* Vote. */,
 		2 /* Source. */
 	), 0, "vote response from 2");
-	is(node.raft.vote_count, 2, "2 votes - 1 self and 1 foreign");
+	is(raft_vote_count(&node.raft), 2, "2 votes - 1 self and 1 foreign");
 	ok(!node.has_work, "no work to do - not enough votes yet");
 
 	raft_run_for(node.cfg_election_timeout / 2);
@@ -115,7 +115,7 @@ raft_test_leader_election(void)
 		1 /* Vote. */,
 		3 /* Source. */
 	), 0, "vote response from 3");
-	is(node.raft.vote_count, 3, "2 votes - 1 self and 2 foreign");
+	is(raft_vote_count(&node.raft), 3, "2 votes - 1 self and 2 foreign");
 	is(node.raft.state, RAFT_STATE_LEADER, "became leader");
 	ok(node.update_count > 0, "trigger worked");
 	node.update_count = 0;
@@ -142,7 +142,7 @@ raft_test_leader_election(void)
 static void
 raft_test_recovery(void)
 {
-	raft_start_test(12);
+	raft_start_test(13);
 	struct raft_msg msg;
 	struct raft_node node;
 	raft_node_create(&node);
@@ -224,6 +224,8 @@ raft_test_recovery(void)
 		"{0: 1}" /* Vclock. */
 	), "restart always as a follower");
 
+	is(raft_vote_count(&node.raft), 1, "vote count is restored correctly");
+
 	raft_checkpoint_remote(&node.raft, &msg);
 	ok(raft_msg_check(&msg,
 		RAFT_STATE_FOLLOWER /* State. */,
@@ -383,7 +385,7 @@ raft_test_vote_skip(void)
 		1 /* Vote. */,
 		2 /* Source. */
 	), 0, "message is accepted");
-	is(node.raft.vote_count, 1, "but ignored - too old term");
+	is(raft_vote_count(&node.raft), 1, "but ignored - too old term");
 
 	/* Competing vote requests are skipped. */
 
@@ -392,7 +394,8 @@ raft_test_vote_skip(void)
 		3 /* Vote. */,
 		2 /* Source. */
 	), 0, "message is accepted");
-	is(node.raft.vote_count, 1, "but ignored - vote not for this node");
+	is(raft_vote_count(&node.raft), 1,
+	   "but ignored - vote not for this node");
 	is(node.raft.state, RAFT_STATE_CANDIDATE, "this node does not give up");
 
 	/* Vote requests are ignored when node is disabled. */
@@ -1071,7 +1074,7 @@ raft_test_election_quorum(void)
 		1 /* Vote. */,
 		2 /* Source. */
 	), 0, "send vote response from second node");
-	is(node.raft.vote_count, 2, "vote is accepted");
+	is(raft_vote_count(&node.raft), 2, "vote is accepted");
 	is(node.raft.state, RAFT_STATE_CANDIDATE, "but still candidate");
 
 	raft_node_cfg_election_quorum(&node, 2);
diff --git a/test/unit/raft.result b/test/unit/raft.result
index 764d82969..3fcdf8180 100644
--- a/test/unit/raft.result
+++ b/test/unit/raft.result
@@ -29,7 +29,7 @@
 ok 1 - subtests
 	*** raft_test_leader_election: done ***
 	*** raft_test_recovery ***
-    1..12
+    1..13
     ok 1 - became candidate
     ok 2 - remote checkpoint of a candidate
     ok 3 - local checkpoint of a candidate
@@ -40,8 +40,9 @@ ok 1 - subtests
     ok 8 - remote checkpoint of a leader
     ok 9 - local checkpoint of a leader
     ok 10 - restart always as a follower
-    ok 11 - remote checkpoint of a leader
-    ok 12 - local checkpoint of a leader
+    ok 11 - vote count is restored correctly
+    ok 12 - remote checkpoint of a leader
+    ok 13 - local checkpoint of a leader
 ok 2 - subtests
 	*** raft_test_recovery: done ***
 	*** raft_test_bad_msg ***
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 3/4] raft: introduce split vote detection
  2022-01-15  0:48 [Tarantool-patches] [PATCH 0/4] Split vote Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 2/4] raft: track all votes, even not own Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-15  0:48 ` Vladislav Shpilevoy via Tarantool-patches
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 4/4] election: activate raft split vote handling Vladislav Shpilevoy via Tarantool-patches
  2022-01-16 14:10 ` [Tarantool-patches] [PATCH 0/4] Split vote Konstantin Osipov via Tarantool-patches
  4 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-15  0:48 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Split vote is a situation when during election nobody can win and
will not win in this term for sure. Not a single node could get
enough votes. For example, with 4 nodes one could get 2 votes and
other also 2 votes. Nobody will get quorum 3 in this term.

The patch makes raft able to notice that situation and speed up
the term bump. It is not bumped immediately though, because nodes
might do that simultaneously and will get the split vote again
after voting for self. There is a random delay. But it is just max
10% of election timeout, so it should speed up split vote
resolution on 90% at least.

Part of #5285
---
 src/lib/raft/raft.c         | 104 +++++++++++++++++-
 src/lib/raft/raft.h         |   6 ++
 test/unit/raft.c            | 207 +++++++++++++++++++++++++++++++++++-
 test/unit/raft.result       |  41 ++++++-
 test/unit/raft_test_utils.c |  12 +++
 test/unit/raft_test_utils.h |   5 +
 6 files changed, 370 insertions(+), 5 deletions(-)

diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c
index 289d53fd5..5dcbc7821 100644
--- a/src/lib/raft/raft.c
+++ b/src/lib/raft/raft.c
@@ -152,20 +152,69 @@ raft_can_vote_for(const struct raft *raft, const struct vclock *v)
 	return cmp == 0 || cmp == 1;
 }
 
-static inline void
+static inline bool
 raft_add_vote(struct raft *raft, int src, int dst)
 {
 	struct raft_vote *v = &raft->votes[src];
 	if (v->did_vote)
-		return;
+		return false;
 	v->did_vote = true;
 	++raft->votes[dst].count;
+	return true;
+}
+
+static bool
+raft_has_split_vote(const struct raft *raft)
+{
+	int max_vote = 0;
+	int vote_vac = raft->cluster_size;
+	int quorum = raft->election_quorum;
+	for (int i = 0; i < VCLOCK_MAX; ++i) {
+		int count = raft->votes[i].count;
+		vote_vac -= count;
+		if (count > max_vote)
+			max_vote = count;
+	}
+	return max_vote < quorum && max_vote + vote_vac < quorum;
+}
+
+static int
+raft_scores_snprintf(const struct raft *raft, char *buf, int size)
+{
+	int total = 0;
+	bool is_empty = true;
+	SNPRINT(total, snprintf, buf, size, "{");
+	for (int i = 0; i < VCLOCK_MAX; ++i) {
+		int count = raft->votes[i].count;
+		if (count == 0)
+			continue;
+		if (!is_empty)
+			SNPRINT(total, snprintf, buf, size, ", ");
+		is_empty = false;
+		SNPRINT(total, snprintf, buf, size, "%d: %d", i, count);
+	}
+	SNPRINT(total, snprintf, buf, size, "}");
+	return total;
+}
+
+static const char *
+raft_scores_str(const struct raft *raft)
+{
+	char *buf = tt_static_buf();
+	int rc = raft_scores_snprintf(raft, buf, TT_STATIC_BUF_LEN);
+	assert(rc >= 0);
+	(void)rc;
+	return buf;
 }
 
 /** Schedule broadcast of the complete Raft state to all the followers. */
 static void
 raft_schedule_broadcast(struct raft *raft);
 
+/** If there is split vote, the node might reduce the next term delay. */
+static void
+raft_check_split_vote(struct raft *raft);
+
 /** Raft state machine methods. 'sm' stands for State Machine. */
 
 /**
@@ -351,7 +400,8 @@ raft_process_msg(struct raft *raft, const struct raft_msg *req, uint32_t source)
 	 * persisted long time ago and still broadcasted. Or a vote response.
 	 */
 	if (req->vote != 0) {
-		raft_add_vote(raft, source, req->vote);
+		if (raft_add_vote(raft, source, req->vote))
+		    raft_check_split_vote(raft);
 
 		switch (raft->state) {
 		case RAFT_STATE_FOLLOWER:
@@ -1019,6 +1069,13 @@ raft_cfg_vclock(struct raft *raft, const struct vclock *vclock)
 	raft->vclock = vclock;
 }
 
+void
+raft_cfg_cluster_size(struct raft *raft, int size)
+{
+	raft->cluster_size = size;
+	raft_check_split_vote(raft);
+}
+
 void
 raft_new_term(struct raft *raft)
 {
@@ -1043,6 +1100,46 @@ raft_schedule_broadcast(struct raft *raft)
 	raft_schedule_async(raft);
 }
 
+static void
+raft_check_split_vote(struct raft *raft)
+{
+	/* When leader is known, there is no election. Thus no vote to split. */
+	if (raft->leader != 0)
+		return;
+	/* Not a candidate = can't trigger term bump anyway. */
+	if (!raft->is_candidate)
+		return;
+	/*
+	 * WAL write in progress means the state is changing. All is rechecked
+	 * when it is done.
+	 */
+	if (raft->is_write_in_progress)
+		return;
+	if (!raft_has_split_vote(raft))
+		return;
+	assert(raft_ev_is_active(&raft->timer));
+	if (raft->timer.at < raft->election_timeout)
+		return;
+
+	assert(raft->state == RAFT_STATE_FOLLOWER ||
+	       raft->state == RAFT_STATE_CANDIDATE);
+	struct ev_loop *loop = raft_loop();
+	struct ev_timer *timer = &raft->timer;
+	double delay = raft_new_random_election_shift(raft);
+	/*
+	 * Could be too late to speed up anything - probably the term is almost
+	 * over anyway.
+	 */
+	double remaining = raft_ev_timer_remaining(loop, timer);
+	if (delay >= remaining)
+		delay = remaining;
+	say_info("RAFT: split vote is discovered - %s, new term in %lf sec",
+		 raft_scores_str(raft), delay);
+	raft_ev_timer_stop(loop, timer);
+	raft_ev_timer_set(timer, delay, delay);
+	raft_ev_timer_start(loop, timer);
+}
+
 void
 raft_create(struct raft *raft, const struct raft_vtab *vtab)
 {
@@ -1053,6 +1150,7 @@ raft_create(struct raft *raft, const struct raft_vtab *vtab)
 		.election_quorum = 1,
 		.election_timeout = 5,
 		.death_timeout = 5,
+		.cluster_size = VCLOCK_MAX,
 		.vtab = vtab,
 	};
 	raft_ev_timer_init(&raft->timer, raft_sm_schedule_new_election_cb,
diff --git a/src/lib/raft/raft.h b/src/lib/raft/raft.h
index 4fbfaa91a..33597d153 100644
--- a/src/lib/raft/raft.h
+++ b/src/lib/raft/raft.h
@@ -219,6 +219,8 @@ struct raft {
 	 * elections can be started.
 	 */
 	double death_timeout;
+	/** Number of instances registered in the cluster. */
+	int cluster_size;
 	/** Virtual table to perform application-specific actions. */
 	const struct raft_vtab *vtab;
 	/**
@@ -333,6 +335,10 @@ raft_cfg_instance_id(struct raft *raft, uint32_t instance_id);
 void
 raft_cfg_vclock(struct raft *raft, const struct vclock *vclock);
 
+/** Configure number of registered instances in the cluster. */
+void
+raft_cfg_cluster_size(struct raft *raft, int size);
+
 /**
  * Bump the term. When it is persisted, the node checks if there is a leader,
  * and if there is not, a new election is started. That said, this function can
diff --git a/test/unit/raft.c b/test/unit/raft.c
index 9f18f4d8e..9b462f755 100644
--- a/test/unit/raft.c
+++ b/test/unit/raft.c
@@ -1497,10 +1497,214 @@ raft_test_promote_restore(void)
 	raft_finish_test();
 }
 
+static void
+raft_test_split_vote(void)
+{
+	raft_start_test(35);
+	struct raft_node node;
+	raft_node_create(&node);
+
+	/* Normal split vote. */
+
+	raft_node_cfg_cluster_size(&node, 4);
+	raft_node_cfg_election_quorum(&node, 3);
+	raft_run_next_event();
+
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		2 /* Term. */,
+		1 /* Vote. */,
+		2 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 1}" /* Vclock. */
+	), "elections with a new term");
+
+	/* Make so node 1 has votes 1 and 2. Node 3 has votes 3 and 4. */
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		1 /* Vote. */,
+		2 /* Source. */
+	), 0, "vote response for 1 from 2");
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		3 /* Vote. */,
+		3 /* Source. */
+	), 0, "vote response for 3 from 3");
+
+	ok(node.raft.timer.at >= node.raft.election_timeout, "term timeout >= "
+	   "election timeout normally");
+
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		3 /* Vote. */,
+		4 /* Source. */
+	), 0, "vote response for 3 from 4");
+
+	ok(node.raft.timer.at < node.raft.election_timeout, "split vote "
+	   "reduced the term timeout");
+
+	raft_run_next_event();
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		3 /* Term. */,
+		1 /* Vote. */,
+		3 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 2}" /* Vclock. */
+	), "a new term");
+
+	ok(node.raft.timer.at >= node.raft.election_timeout, "timeout is "
+	   "normal again");
+
+	/*
+	 * Cluster size change can make split vote.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+	raft_node_cfg_cluster_size(&node, 3);
+	raft_node_cfg_election_quorum(&node, 2);
+	raft_run_next_event();
+	is(node.raft.state, RAFT_STATE_CANDIDATE, "is candidate");
+	is(node.raft.vote, 1, "voted for self");
+
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		2 /* Vote. */,
+		2 /* Source. */
+	), 0, "vote response for 2 from 2");
+	ok(node.raft.timer.at >= node.raft.election_timeout, "the vote is not "
+	   "split yet");
+
+	raft_node_cfg_cluster_size(&node, 2);
+	ok(node.raft.timer.at < node.raft.election_timeout, "cluster size "
+	   "change makes split vote");
+
+	raft_run_next_event();
+	ok(raft_node_check_full_state(&node,
+		RAFT_STATE_CANDIDATE /* State. */,
+		0 /* Leader. */,
+		3 /* Term. */,
+		1 /* Vote. */,
+		3 /* Volatile term. */,
+		1 /* Volatile vote. */,
+		"{0: 2}" /* Vclock. */
+	), "a new term");
+
+	/*
+	 * Split vote can be even when the leader is already known - then
+	 * nothing to do. Votes are just left from the beginning of the term and
+	 * then probably cluster size reduced a bit.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+	raft_node_cfg_cluster_size(&node, 3);
+	raft_node_cfg_election_quorum(&node, 2);
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		2 /* Vote. */,
+		2 /* Source. */
+	), 0, "vote response for 2 from 2");
+	/* There is also a vote from 3 for 2. But it wasn't delivered to 1. */
+	is(raft_node_send_leader(&node,
+		2 /* Term. */,
+		2 /* Source. */
+	), 0, "message is accepted");
+	is(node.raft.leader, 2, "other node's leadership is accepted");
+	is(raft_vote_count(&node.raft), 1, "the only own vote was from self");
+
+	raft_node_cfg_cluster_size(&node, 2);
+	ok(node.raft.timer.at >= node.raft.death_timeout, "cluster change does "
+	   "not affect the leader's death waiting");
+
+	/*
+	 * Non-candidate should ignore split vote.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+	raft_node_cfg_cluster_size(&node, 3);
+	raft_node_cfg_election_quorum(&node, 3);
+	raft_node_cfg_is_candidate(&node, false);
+
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		2 /* Vote. */,
+		2 /* Source. */
+	), 0, "vote response for 2 from 2");
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		3 /* Vote. */,
+		3 /* Source. */
+	), 0, "vote response for 3 from 3");
+
+	ok(!raft_ev_is_active(&node.raft.timer), "voter couldn't schedule "
+	    "new term");
+
+	/*
+	 * Split vote can get worse, but it shouldn't lead to new term delay
+	 * restart.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+	raft_node_cfg_cluster_size(&node, 3);
+	raft_node_cfg_election_quorum(&node, 3);
+
+	raft_run_next_event();
+	is(node.raft.term, 2, "bump term");
+	is(node.raft.vote, 1, "vote for self");
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		2 /* Vote. */,
+		2 /* Source. */
+	), 0, "vote response for 2 from 2");
+
+	double delay = node.raft.timer.at;
+	ok(delay < node.raft.election_timeout, "split vote is already "
+	   "inevitable");
+
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		3 /* Vote. */,
+		3 /* Source. */
+	), 0, "vote response for 3 from 3");
+
+	is(delay, node.raft.timer.at, "split vote got worse, but delay didn't "
+	   "change");
+
+	/*
+	 * Handle split vote when WAL write is in progress.
+	 */
+	raft_node_destroy(&node);
+	raft_node_create(&node);
+	raft_node_cfg_cluster_size(&node, 2);
+	raft_node_cfg_election_quorum(&node, 2);
+
+	raft_node_block(&node);
+	raft_run_next_event();
+	is(node.raft.term, 1, "old term");
+	is(node.raft.vote, 0, "unused vote");
+	is(node.raft.volatile_term, 2, "new volatile term");
+	is(node.raft.volatile_vote, 1, "new volatile vote");
+	is(raft_node_send_vote_response(&node,
+		2 /* Term. */,
+		2 /* Vote. */,
+		2 /* Source. */
+	), 0, "vote response for 2 from 2");
+
+	raft_node_unblock(&node);
+	raft_run_next_event();
+	is(node.raft.term, 3, "bump term");
+	is(node.raft.vote, 1, "vote for self");
+
+	raft_node_destroy(&node);
+	raft_finish_test();
+}
+
 static int
 main_f(va_list ap)
 {
-	raft_start_test(14);
+	raft_start_test(15);
 
 	(void) ap;
 	fakeev_init();
@@ -1519,6 +1723,7 @@ main_f(va_list ap)
 	raft_test_enable_disable();
 	raft_test_too_long_wal_write();
 	raft_test_promote_restore();
+	raft_test_split_vote();
 
 	fakeev_free();
 
diff --git a/test/unit/raft.result b/test/unit/raft.result
index 3fcdf8180..d12aae710 100644
--- a/test/unit/raft.result
+++ b/test/unit/raft.result
@@ -1,5 +1,5 @@
 	*** main_f ***
-1..14
+1..15
 	*** raft_test_leader_election ***
     1..24
     ok 1 - 1 pending message at start
@@ -262,4 +262,43 @@ ok 13 - subtests
     ok 12 - not a candidate
 ok 14 - subtests
 	*** raft_test_promote_restore: done ***
+	*** raft_test_split_vote ***
+    1..35
+    ok 1 - elections with a new term
+    ok 2 - vote response for 1 from 2
+    ok 3 - vote response for 3 from 3
+    ok 4 - term timeout >= election timeout normally
+    ok 5 - vote response for 3 from 4
+    ok 6 - split vote reduced the term timeout
+    ok 7 - a new term
+    ok 8 - timeout is normal again
+    ok 9 - is candidate
+    ok 10 - voted for self
+    ok 11 - vote response for 2 from 2
+    ok 12 - the vote is not split yet
+    ok 13 - cluster size change makes split vote
+    ok 14 - a new term
+    ok 15 - vote response for 2 from 2
+    ok 16 - message is accepted
+    ok 17 - other node's leadership is accepted
+    ok 18 - the only own vote was from self
+    ok 19 - cluster change does not affect the leader's death waiting
+    ok 20 - vote response for 2 from 2
+    ok 21 - vote response for 3 from 3
+    ok 22 - voter couldn't schedule new term
+    ok 23 - bump term
+    ok 24 - vote for self
+    ok 25 - vote response for 2 from 2
+    ok 26 - split vote is already inevitable
+    ok 27 - vote response for 3 from 3
+    ok 28 - split vote got worse, but delay didn't change
+    ok 29 - old term
+    ok 30 - unused vote
+    ok 31 - new volatile term
+    ok 32 - new volatile vote
+    ok 33 - vote response for 2 from 2
+    ok 34 - bump term
+    ok 35 - vote for self
+ok 15 - subtests
+	*** raft_test_split_vote: done ***
 	*** main_f: done ***
diff --git a/test/unit/raft_test_utils.c b/test/unit/raft_test_utils.c
index b5754cf78..1d7dd01ba 100644
--- a/test/unit/raft_test_utils.c
+++ b/test/unit/raft_test_utils.c
@@ -194,6 +194,7 @@ raft_node_create(struct raft_node *node)
 	node->cfg_election_quorum = 3;
 	node->cfg_death_timeout = 5;
 	node->cfg_instance_id = 1;
+	node->cfg_cluster_size = VCLOCK_MAX;
 	node->cfg_vclock = &node->journal.vclock;
 	raft_journal_create(&node->journal, node->cfg_instance_id);
 	raft_node_start(node);
@@ -365,6 +366,7 @@ raft_node_start(struct raft_node *node)
 	raft_cfg_election_quorum(&node->raft, node->cfg_election_quorum);
 	raft_cfg_death_timeout(&node->raft, node->cfg_death_timeout);
 	raft_cfg_instance_id(&node->raft, node->cfg_instance_id);
+	raft_cfg_cluster_size(&node->raft, node->cfg_cluster_size);
 	raft_cfg_vclock(&node->raft, node->cfg_vclock);
 	raft_run_async_work();
 }
@@ -423,6 +425,16 @@ raft_node_cfg_is_candidate(struct raft_node *node, bool value)
 	}
 }
 
+void
+raft_node_cfg_cluster_size(struct raft_node *node, int value)
+{
+	node->cfg_cluster_size = value;
+	if (raft_node_is_started(node)) {
+		raft_cfg_cluster_size(&node->raft, value);
+		raft_run_async_work();
+	}
+}
+
 void
 raft_node_cfg_election_timeout(struct raft_node *node, double value)
 {
diff --git a/test/unit/raft_test_utils.h b/test/unit/raft_test_utils.h
index c68dc3b22..2138a829e 100644
--- a/test/unit/raft_test_utils.h
+++ b/test/unit/raft_test_utils.h
@@ -32,6 +32,7 @@
 #include "fakesys/fakeev.h"
 #include "fiber.h"
 #include "raft/raft.h"
+#include "raft/raft_ev.h"
 #include "unit.h"
 
 /** WAL simulation. It stores a list of rows which raft wanted to persist. */
@@ -105,6 +106,7 @@ struct raft_node {
 	int cfg_election_quorum;
 	double cfg_death_timeout;
 	uint32_t cfg_instance_id;
+	int cfg_cluster_size;
 	struct vclock *cfg_vclock;
 };
 
@@ -227,6 +229,9 @@ raft_node_cfg_is_enabled(struct raft_node *node, bool value);
 void
 raft_node_cfg_is_candidate(struct raft_node *node, bool value);
 
+void
+raft_node_cfg_cluster_size(struct raft_node *node, int value);
+
 void
 raft_node_cfg_election_timeout(struct raft_node *node, double value);
 
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH 4/4] election: activate raft split vote handling
  2022-01-15  0:48 [Tarantool-patches] [PATCH 0/4] Split vote Vladislav Shpilevoy via Tarantool-patches
                   ` (2 preceding siblings ...)
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 3/4] raft: introduce split vote detection Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-15  0:48 ` Vladislav Shpilevoy via Tarantool-patches
  2022-01-16 14:10 ` [Tarantool-patches] [PATCH 0/4] Split vote Konstantin Osipov via Tarantool-patches
  4 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2022-01-15  0:48 UTC (permalink / raw)
  To: tarantool-patches, sergepetrenko

Raft needs to know cluster size in order to detect and handle
split vote. The patch uses registered server count as cluster
size.

It is not documented nor has a changelog file because this is an
optimization. Can't be observed except in logs or with a watch.

Closes #5285
---
 src/box/raft.c                                |  4 +-
 .../election_split_vote_test.lua              | 92 +++++++++++++++++++
 2 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 test/replication-luatest/election_split_vote_test.lua

diff --git a/src/box/raft.c b/src/box/raft.c
index 1e360dc88..1908b71b6 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -229,7 +229,9 @@ box_raft_update_election_quorum(void)
 	 *   be lost.
 	 */
 	int quorum = MIN(replication_synchro_quorum, max);
-	raft_cfg_election_quorum(box_raft(), quorum);
+	struct raft *raft = box_raft();
+	raft_cfg_election_quorum(raft, quorum);
+	raft_cfg_cluster_size(raft, replicaset.registered_count);
 }
 
 void
diff --git a/test/replication-luatest/election_split_vote_test.lua b/test/replication-luatest/election_split_vote_test.lua
new file mode 100644
index 000000000..f31bfd7f3
--- /dev/null
+++ b/test/replication-luatest/election_split_vote_test.lua
@@ -0,0 +1,92 @@
+local t = require('luatest')
+local cluster = require('test.luatest_helpers.cluster')
+local helpers = require('test.luatest_helpers')
+local wait_timeout = 120
+
+--
+-- gh-5285: split vote is when in the current term there can't be winner of the
+-- leader role. Number of unused votes is not enough for anyone to get the
+-- quorum. It can be detected to speed up the term bump.
+--
+local g = t.group('split-vote')
+
+g.before_each(function()
+    g.cluster = cluster:new({})
+    local node1_uri = helpers.instance_uri('node1')
+    local node2_uri = helpers.instance_uri('node2')
+    local replication = {node1_uri, node2_uri}
+    local box_cfg = {
+        listen = node1_uri,
+        replication = replication,
+        -- To speed up new term when try to elect a first leader.
+        replication_timeout = 0.1,
+        replication_synchro_quorum = 2,
+        election_timeout = 1000000,
+    }
+    g.node1 = g.cluster:build_server({alias = 'node1', box_cfg = box_cfg})
+
+    box_cfg.listen = node2_uri
+    g.node2 = g.cluster:build_server({alias = 'node2', box_cfg = box_cfg})
+
+    g.cluster:add_server(g.node1)
+    g.cluster:add_server(g.node2)
+    g.cluster:start()
+end)
+
+g.after_each(function()
+    g.cluster:drop()
+end)
+
+g.test_split_vote = function(g)
+    -- Stop the replication so as the nodes can't request votes from each other.
+    local node1_repl = g.node1:exec(function()
+        local repl = box.cfg.replication
+        box.cfg{replication = {}}
+        return repl
+    end)
+    local node2_repl = g.node2:exec(function()
+        local repl = box.cfg.replication
+        box.cfg{replication = {}}
+        return repl
+    end)
+
+    -- Both vote for self but don't see the split-vote yet.
+    g.node1:exec(function()
+        box.cfg{election_mode = 'candidate'}
+    end)
+    g.node2:exec(function()
+        box.cfg{election_mode = 'candidate'}
+    end)
+
+    -- Wait for the votes to actually happen.
+    t.helpers.retrying({timeout = wait_timeout}, function()
+        local func = function()
+            return box.info.election.vote == box.info.id
+        end
+        assert(g.node1:exec(func))
+        assert(g.node2:exec(func))
+    end)
+
+    -- Now let the nodes notice the split vote.
+    g.node1:exec(function(repl)
+        box.cfg{replication = repl}
+    end, {node1_repl})
+    g.node2:exec(function(repl)
+        box.cfg{replication = repl}
+    end, {node2_repl})
+
+    t.helpers.retrying({timeout = wait_timeout}, function()
+        local msg = 'split vote is discovered'
+        assert(g.node1:grep_log(msg) or g.node2:grep_log(msg))
+    end)
+
+    -- Ensure a leader is eventually elected. Nothing is broken for good.
+    g.node1:exec(function()
+        box.cfg{election_timeout = 1}
+    end)
+    g.node2:exec(function()
+        box.cfg{election_timeout = 1}
+    end)
+    g.node1:wait_election_leader_found()
+    g.node2:wait_election_leader_found()
+end
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/4] Split vote
  2022-01-15  0:48 [Tarantool-patches] [PATCH 0/4] Split vote Vladislav Shpilevoy via Tarantool-patches
                   ` (3 preceding siblings ...)
  2022-01-15  0:48 ` [Tarantool-patches] [PATCH 4/4] election: activate raft split vote handling Vladislav Shpilevoy via Tarantool-patches
@ 2022-01-16 14:10 ` Konstantin Osipov via Tarantool-patches
  4 siblings, 0 replies; 6+ messages in thread
From: Konstantin Osipov via Tarantool-patches @ 2022-01-16 14:10 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

* Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org> [22/01/15 03:52]:
> Split vote handling in Raft, its usage in storage, and a bug fix found while
> working on this in the first commit.
> 
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5285-raft-split-vote
> Issue: https://github.com/tarantool/tarantool/issues/5285

While I don't mind this patch it is an unnecessary complication on
top of the spec. To prevent the deteriorating effects on liveness
from split vote one needs pre-voting, not split-vote detection.

> Vladislav Shpilevoy (4):
>   raft: fix crash on election_timeout reconfig
>   raft: track all votes, even not own
>   raft: introduce split vote detection
>   election: activate raft split vote handling

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-16 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-15  0:48 [Tarantool-patches] [PATCH 0/4] Split vote Vladislav Shpilevoy via Tarantool-patches
2022-01-15  0:48 ` [Tarantool-patches] [PATCH 1/4] raft: fix crash on election_timeout reconfig Vladislav Shpilevoy via Tarantool-patches
2022-01-15  0:48 ` [Tarantool-patches] [PATCH 2/4] raft: track all votes, even not own Vladislav Shpilevoy via Tarantool-patches
2022-01-15  0:48 ` [Tarantool-patches] [PATCH 3/4] raft: introduce split vote detection Vladislav Shpilevoy via Tarantool-patches
2022-01-15  0:48 ` [Tarantool-patches] [PATCH 4/4] election: activate raft split vote handling Vladislav Shpilevoy via Tarantool-patches
2022-01-16 14:10 ` [Tarantool-patches] [PATCH 0/4] Split vote Konstantin Osipov via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox