Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: kyukhin@tarantool.org, v.shpilevoy@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter
Date: Thu, 22 Oct 2020 14:05:07 +0300	[thread overview]
Message-ID: <dcbe07ee-57f2-511d-1ed4-5776dc6aee14@tarantool.org> (raw)
In-Reply-To: <20201022104456.51722-1-sergepetrenko@tarantool.org>


22.10.2020 13:44, Serge Petrenko пишет:
> When an instance is configured as candidate, it has a leader death timer
> ticking constantly to schedule an election as soon as leader disappears.
> When the instance receives the leader's heartbeat, it resets the timer
> to its initial value.
>
> When being a voter, the instance ignores heartbeats, since it has
> nothing to wait for. So its timer must be stopped. Otherwise it'll try
> to schedule a new election and fail.
>
> Stop the timer on transition from candidate to voter.


I've added a small test. It crashes on master and passes on this patch.

The diff's below.

diff --git a/test/replication/gh-5426-election-on-off.result 
b/test/replication/gh-5426-election-on-off.result
index 1abfb9154..7444ef7f2 100644
--- a/test/replication/gh-5426-election-on-off.result
+++ b/test/replication/gh-5426-election-on-off.result
@@ -111,6 +111,45 @@ test_run:wait_cond(function() return 
box.info.election.leader == 0 end)
   | - true
   | ...

+-- A crash when follower transitions from candidate to voter.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode='candidate'}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.state == 
'leader' end)
+ | ---
+ | - true
+ | ...
+box.cfg{replication_timeout=0.01}
+ | ---
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+-- A small timeout so that the timer goes off faster and the crash happens.
+box.cfg{replication_timeout=0.01}
+ | ---
+ | ...
+test_run:wait_cond(function() return box.info.election.leader ~= 0 end)
+ | ---
+ | - true
+ | ...
+box.cfg{election_mode='candidate'}
+ | ---
+ | ...
+box.cfg{election_mode='voter'}
+ | ---
+ | ...
+-- Wait for the timer to go off.
+require('fiber').sleep(4 * box.cfg.replication_timeout)
+ | ---
+ | ...
+
  test_run:switch('default')
   | ---
   | - true
diff --git a/test/replication/gh-5426-election-on-off.test.lua 
b/test/replication/gh-5426-election-on-off.test.lua
index d6b980d0a..bdf06903b 100644
--- a/test/replication/gh-5426-election-on-off.test.lua
+++ b/test/replication/gh-5426-election-on-off.test.lua
@@ -47,6 +47,21 @@ box.cfg{election_mode = 'off'}
  test_run:switch('replica')
  test_run:wait_cond(function() return box.info.election.leader == 0 end)

+-- A crash when follower transitions from candidate to voter.
+test_run:switch('default')
+box.cfg{election_mode='candidate'}
+test_run:wait_cond(function() return box.info.election.state == 
'leader' end)
+box.cfg{replication_timeout=0.01}
+
+test_run:switch('replica')
+-- A small timeout so that the timer goes off faster and the crash happens.
+box.cfg{replication_timeout=0.01}
+test_run:wait_cond(function() return box.info.election.leader ~= 0 end)
+box.cfg{election_mode='candidate'}
+box.cfg{election_mode='voter'}
+-- Wait for the timer to go off.
+require('fiber').sleep(4 * box.cfg.replication_timeout)
+
  test_run:switch('default')
  test_run:cmd('stop server replica')
  test_run:cmd('delete server replica')

> ---
> https://github.com/tarantool/tarantool/tree/sp/raft-crash-on-voter
>
>   src/box/raft.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/src/box/raft.c b/src/box/raft.c
> index b70f47006..4a8e54cac 100644
> --- a/src/box/raft.c
> +++ b/src/box/raft.c
> @@ -952,12 +952,18 @@ raft_cfg_is_candidate(bool is_candidate)
>   			 * until the new state is fully persisted.
>   			 */
>   		}
> -	} else if (raft.state != RAFT_STATE_FOLLOWER) {
> -		if (raft.state == RAFT_STATE_LEADER)
> -			raft.leader = 0;
> -		raft.state = RAFT_STATE_FOLLOWER;
> -		/* State is visible and changed - broadcast. */
> -		raft_schedule_broadcast();
> +	} else {
> +		if (raft.state != RAFT_STATE_LEADER) {
> +			/* Do not wait for anything while being a voter. */
> +			ev_timer_stop(loop(), &raft.timer);
> +		}
> +		if (raft.state != RAFT_STATE_FOLLOWER) {
> +			if (raft.state == RAFT_STATE_LEADER)
> +				raft.leader = 0;
> +			raft.state = RAFT_STATE_FOLLOWER;
> +			/* State is visible and changed - broadcast. */
> +			raft_schedule_broadcast();
> +		}
>   	}
>   	box_update_ro_summary();
>   }

-- 
Serge Petrenko

      reply	other threads:[~2020-10-22 11:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22 10:44 Serge Petrenko
2020-10-22 11:05 ` Serge Petrenko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcbe07ee-57f2-511d-1ed4-5776dc6aee14@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=kyukhin@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox