From: Serge Petrenko <sergepetrenko@tarantool.org> To: kyukhin@tarantool.org, v.shpilevoy@tarantool.org Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter Date: Thu, 22 Oct 2020 14:05:07 +0300 [thread overview] Message-ID: <dcbe07ee-57f2-511d-1ed4-5776dc6aee14@tarantool.org> (raw) In-Reply-To: <20201022104456.51722-1-sergepetrenko@tarantool.org> 22.10.2020 13:44, Serge Petrenko пишет: > When an instance is configured as candidate, it has a leader death timer > ticking constantly to schedule an election as soon as leader disappears. > When the instance receives the leader's heartbeat, it resets the timer > to its initial value. > > When being a voter, the instance ignores heartbeats, since it has > nothing to wait for. So its timer must be stopped. Otherwise it'll try > to schedule a new election and fail. > > Stop the timer on transition from candidate to voter. I've added a small test. It crashes on master and passes on this patch. The diff's below. diff --git a/test/replication/gh-5426-election-on-off.result b/test/replication/gh-5426-election-on-off.result index 1abfb9154..7444ef7f2 100644 --- a/test/replication/gh-5426-election-on-off.result +++ b/test/replication/gh-5426-election-on-off.result @@ -111,6 +111,45 @@ test_run:wait_cond(function() return box.info.election.leader == 0 end) | - true | ... +-- A crash when follower transitions from candidate to voter. +test_run:switch('default') + | --- + | - true + | ... +box.cfg{election_mode='candidate'} + | --- + | ... +test_run:wait_cond(function() return box.info.election.state == 'leader' end) + | --- + | - true + | ... +box.cfg{replication_timeout=0.01} + | --- + | ... + +test_run:switch('replica') + | --- + | - true + | ... +-- A small timeout so that the timer goes off faster and the crash happens. +box.cfg{replication_timeout=0.01} + | --- + | ... +test_run:wait_cond(function() return box.info.election.leader ~= 0 end) + | --- + | - true + | ... +box.cfg{election_mode='candidate'} + | --- + | ... +box.cfg{election_mode='voter'} + | --- + | ... +-- Wait for the timer to go off. +require('fiber').sleep(4 * box.cfg.replication_timeout) + | --- + | ... + test_run:switch('default') | --- | - true diff --git a/test/replication/gh-5426-election-on-off.test.lua b/test/replication/gh-5426-election-on-off.test.lua index d6b980d0a..bdf06903b 100644 --- a/test/replication/gh-5426-election-on-off.test.lua +++ b/test/replication/gh-5426-election-on-off.test.lua @@ -47,6 +47,21 @@ box.cfg{election_mode = 'off'} test_run:switch('replica') test_run:wait_cond(function() return box.info.election.leader == 0 end) +-- A crash when follower transitions from candidate to voter. +test_run:switch('default') +box.cfg{election_mode='candidate'} +test_run:wait_cond(function() return box.info.election.state == 'leader' end) +box.cfg{replication_timeout=0.01} + +test_run:switch('replica') +-- A small timeout so that the timer goes off faster and the crash happens. +box.cfg{replication_timeout=0.01} +test_run:wait_cond(function() return box.info.election.leader ~= 0 end) +box.cfg{election_mode='candidate'} +box.cfg{election_mode='voter'} +-- Wait for the timer to go off. +require('fiber').sleep(4 * box.cfg.replication_timeout) + test_run:switch('default') test_run:cmd('stop server replica') test_run:cmd('delete server replica') > --- > https://github.com/tarantool/tarantool/tree/sp/raft-crash-on-voter > > src/box/raft.c | 18 ++++++++++++------ > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/src/box/raft.c b/src/box/raft.c > index b70f47006..4a8e54cac 100644 > --- a/src/box/raft.c > +++ b/src/box/raft.c > @@ -952,12 +952,18 @@ raft_cfg_is_candidate(bool is_candidate) > * until the new state is fully persisted. > */ > } > - } else if (raft.state != RAFT_STATE_FOLLOWER) { > - if (raft.state == RAFT_STATE_LEADER) > - raft.leader = 0; > - raft.state = RAFT_STATE_FOLLOWER; > - /* State is visible and changed - broadcast. */ > - raft_schedule_broadcast(); > + } else { > + if (raft.state != RAFT_STATE_LEADER) { > + /* Do not wait for anything while being a voter. */ > + ev_timer_stop(loop(), &raft.timer); > + } > + if (raft.state != RAFT_STATE_FOLLOWER) { > + if (raft.state == RAFT_STATE_LEADER) > + raft.leader = 0; > + raft.state = RAFT_STATE_FOLLOWER; > + /* State is visible and changed - broadcast. */ > + raft_schedule_broadcast(); > + } > } > box_update_ro_summary(); > } -- Serge Petrenko
prev parent reply other threads:[~2020-10-22 11:05 UTC|newest] Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-22 10:44 Serge Petrenko 2020-10-22 11:05 ` Serge Petrenko [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=dcbe07ee-57f2-511d-1ed4-5776dc6aee14@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=kyukhin@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH] raft: fix an assertion failure on transition to voter' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox