Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter
@ 2020-10-22 10:44 Serge Petrenko
  2020-10-22 11:05 ` Serge Petrenko
  0 siblings, 1 reply; 2+ messages in thread
From: Serge Petrenko @ 2020-10-22 10:44 UTC (permalink / raw)
  To: kyukhin, v.shpilevoy; +Cc: tarantool-patches

When an instance is configured as candidate, it has a leader death timer
ticking constantly to schedule an election as soon as leader disappears.
When the instance receives the leader's heartbeat, it resets the timer
to its initial value.

When being a voter, the instance ignores heartbeats, since it has
nothing to wait for. So its timer must be stopped. Otherwise it'll try
to schedule a new election and fail.

Stop the timer on transition from candidate to voter.
---
https://github.com/tarantool/tarantool/tree/sp/raft-crash-on-voter

 src/box/raft.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/box/raft.c b/src/box/raft.c
index b70f47006..4a8e54cac 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -952,12 +952,18 @@ raft_cfg_is_candidate(bool is_candidate)
 			 * until the new state is fully persisted.
 			 */
 		}
-	} else if (raft.state != RAFT_STATE_FOLLOWER) {
-		if (raft.state == RAFT_STATE_LEADER)
-			raft.leader = 0;
-		raft.state = RAFT_STATE_FOLLOWER;
-		/* State is visible and changed - broadcast. */
-		raft_schedule_broadcast();
+	} else {
+		if (raft.state != RAFT_STATE_LEADER) {
+			/* Do not wait for anything while being a voter. */
+			ev_timer_stop(loop(), &raft.timer);
+		}
+		if (raft.state != RAFT_STATE_FOLLOWER) {
+			if (raft.state == RAFT_STATE_LEADER)
+				raft.leader = 0;
+			raft.state = RAFT_STATE_FOLLOWER;
+			/* State is visible and changed - broadcast. */
+			raft_schedule_broadcast();
+		}
 	}
 	box_update_ro_summary();
 }
-- 
2.24.3 (Apple Git-128)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter
  2020-10-22 10:44 [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter Serge Petrenko
@ 2020-10-22 11:05 ` Serge Petrenko
  0 siblings, 0 replies; 2+ messages in thread
From: Serge Petrenko @ 2020-10-22 11:05 UTC (permalink / raw)
  To: kyukhin, v.shpilevoy; +Cc: tarantool-patches


22.10.2020 13:44, Serge Petrenko пишет:
> When an instance is configured as candidate, it has a leader death timer
> ticking constantly to schedule an election as soon as leader disappears.
> When the instance receives the leader's heartbeat, it resets the timer
> to its initial value.
>
> When being a voter, the instance ignores heartbeats, since it has
> nothing to wait for. So its timer must be stopped. Otherwise it'll try
> to schedule a new election and fail.
>
> Stop the timer on transition from candidate to voter.


I've added a small test. It crashes on master and passes on this patch.

The diff's below.

diff --git a/test/replication/gh-5426-election-on-off.result 
b/test/replication/gh-5426-election-on-off.result
index 1abfb9154..7444ef7f2 100644
--- a/test/replication/gh-5426-election-on-off.result
+++ b/test/replication/gh-5426-election-on-off.result
@@ -111,6 +111,45 @@ test_run:wait_cond(function() return 
box.info.election.leader == 0 end)
   | - true
   | ...

+-- A crash when follower transitions from candidate to voter.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode='candidate'}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.state == 
'leader' end)
+ | ---
+ | - true
+ | ...
+box.cfg{replication_timeout=0.01}
+ | ---
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+-- A small timeout so that the timer goes off faster and the crash happens.
+box.cfg{replication_timeout=0.01}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.leader ~= 0 end)
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode='candidate'}
+ | ---
+ | ...
+box.cfg{election_mode='voter'}
+ | ---
+ | ...
+-- Wait for the timer to go off.
+require('fiber').sleep(4 * box.cfg.replication_timeout)
+ | ---
+ | ...
+
  test_run:switch('default')
   | ---
   | - true
diff --git a/test/replication/gh-5426-election-on-off.test.lua 
b/test/replication/gh-5426-election-on-off.test.lua
index d6b980d0a..bdf06903b 100644
--- a/test/replication/gh-5426-election-on-off.test.lua
+++ b/test/replication/gh-5426-election-on-off.test.lua
@@ -47,6 +47,21 @@ box.cfg{election_mode = 'off'}
  test_run:switch('replica')
  test_run:wait_cond(function() return box.info.election.leader == 0 end)

+-- A crash when follower transitions from candidate to voter.
+test_run:switch('default')
+box.cfg{election_mode='candidate'}
+test_run:wait_cond(function() return box.info.election.state == 
'leader' end)
+box.cfg{replication_timeout=0.01}
+
+test_run:switch('replica')
+-- A small timeout so that the timer goes off faster and the crash happens.
+box.cfg{replication_timeout=0.01}
+test_run:wait_cond(function() return box.info.election.leader ~= 0 end)
+box.cfg{election_mode='candidate'}
+box.cfg{election_mode='voter'}
+-- Wait for the timer to go off.
+require('fiber').sleep(4 * box.cfg.replication_timeout)
+
  test_run:switch('default')
  test_run:cmd('stop server replica')
  test_run:cmd('delete server replica')

> ---
> https://github.com/tarantool/tarantool/tree/sp/raft-crash-on-voter
>
>   src/box/raft.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/src/box/raft.c b/src/box/raft.c
> index b70f47006..4a8e54cac 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -952,12 +952,18 @@ raft_cfg_is_candidate(bool is_candidate)
>   			 * until the new state is fully persisted.
>   			 */
>   		}
> -	} else if (raft.state != RAFT_STATE_FOLLOWER) {
> -		if (raft.state == RAFT_STATE_LEADER)
> -			raft.leader = 0;
> -		raft.state = RAFT_STATE_FOLLOWER;
> -		/* State is visible and changed - broadcast. */
> -		raft_schedule_broadcast();
> +	} else {
> +		if (raft.state != RAFT_STATE_LEADER) {
> +			/* Do not wait for anything while being a voter. */
> +			ev_timer_stop(loop(), &raft.timer);
> +		}
> +		if (raft.state != RAFT_STATE_FOLLOWER) {
> +			if (raft.state == RAFT_STATE_LEADER)
> +				raft.leader = 0;
> +			raft.state = RAFT_STATE_FOLLOWER;
> +			/* State is visible and changed - broadcast. */
> +			raft_schedule_broadcast();
> +		}
>   	}
>   	box_update_ro_summary();
>   }

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-10-22 11:05 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 10:44 [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter Serge Petrenko
2020-10-22 11:05 ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox