From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, tarantool-patches@dev.tarantool.org, gorcunov@gmail.com Subject: Re: [Tarantool-patches] [PATCH 1/1] raft: handle remote leader resign during WAL write Date: Thu, 10 Jun 2021 17:24:44 +0300 [thread overview] Message-ID: <b117eb18-54de-6dc5-928f-b75fb521410e@tarantool.org> (raw) In-Reply-To: <08a836b17506f5b6e9edfcaee36bbeea5754f917.1623098844.git.v.shpilevoy@tarantool.org> 07.06.2021 23:48, Vladislav Shpilevoy пишет: > If Raft state machine sees the current leader has explicitly > resigned from its role, it starts a new election round right away. > > But in the code starting a new round there was an assumption that > there is no a volatile state. There was, in fact. > > The patch makes the election start code use the volatile state to > bump the term. It should be safe, because the other nodes won't > receive it anyway until the new term is persisted. > > There was an alternative - do not schedule new election until the > current WAL write ends. It wasn't done, because would achieve the > same (the term would be bumped and persisted) but with bigger a > latency. > > Another reason is that if the leader would appear and resign > during WAL write on another candidate, in the end of its WAL write > the latter would see 0 leader and would think this term didn't > have one yet. And would try to elect self now, in the current > term. It makes little sense, because it won't win - the current > term had already had a leader and the majority of votes is > already taken. > > Closes #6129 Thanks for the patch! LGTM. > --- > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-6129-raft-crash > Issue: https://github.com/tarantool/tarantool/issues/6129 > > .../gh-6129-raft-resign-during-wal.md | 7 ++ > src/lib/raft/raft.c | 8 +- > test/unit/raft.c | 80 ++++++++++++++++++- > test/unit/raft.result | 10 ++- > 4 files changed, 101 insertions(+), 4 deletions(-) > create mode 100644 changelogs/unreleased/gh-6129-raft-resign-during-wal.md > > diff --git a/changelogs/unreleased/gh-6129-raft-resign-during-wal.md b/changelogs/unreleased/gh-6129-raft-resign-during-wal.md > new file mode 100644 > index 000000000..eb6cf3984 > --- /dev/null > +++ b/changelogs/unreleased/gh-6129-raft-resign-during-wal.md > @@ -0,0 +1,7 @@ > +## bugfix/raft > + > +* Fixed a rare crash with the leader election enabled (any mode except `off`), > + which could happen if a leader resigned from its role at the same time as some > + other node was writing something related to the elections to WAL. The crash > + was in debug build and in the release build it would lead to undefined > + behaviour (gh-6129). > diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c > index 247c7a71a..eacdddb7e 100644 > --- a/src/lib/raft/raft.c > +++ b/src/lib/raft/raft.c > @@ -687,10 +687,9 @@ static void > raft_sm_schedule_new_election(struct raft *raft) > { > say_info("RAFT: begin new election round"); > - assert(raft_is_fully_on_disk(raft)); > assert(raft->is_candidate); > /* Everyone is a follower until its vote for self is persisted. */ > - raft_sm_schedule_new_term(raft, raft->term + 1); > + raft_sm_schedule_new_term(raft, raft->volatile_term + 1); > raft_sm_schedule_new_vote(raft, raft->self); > } > > @@ -701,6 +700,11 @@ raft_sm_schedule_new_election_cb(struct ev_loop *loop, struct ev_timer *timer, > (void)events; > struct raft *raft = timer->data; > assert(timer == &raft->timer); > + /* > + * Otherwise the timer would be stopped and the callback wouldn't be > + * invoked. > + */ > + assert(raft_is_fully_on_disk(raft)); > raft_ev_timer_stop(loop, timer); > raft_sm_schedule_new_election(raft); > } > diff --git a/test/unit/raft.c b/test/unit/raft.c > index 29e48da13..6369c42d3 100644 > --- a/test/unit/raft.c > +++ b/test/unit/raft.c > @@ -572,7 +572,7 @@ raft_test_vote_skip(void) > static void > raft_test_leader_resign(void) > { > - raft_start_test(15); > + raft_start_test(23); > struct raft_node node; > > /* > @@ -672,6 +672,84 @@ raft_test_leader_resign(void) > NULL /* Vclock. */ > ), "resign notification is sent"); > > + /* > + * gh-6129: resign of a remote leader during a local WAL write should > + * schedule a new election after the WAL write. > + * > + * Firstly start a new term. > + */ > + raft_node_block(&node); > + raft_node_cfg_is_candidate(&node, true); > + raft_run_next_event(); > + /* Volatile term is new, but the persistent one is not updated yet. */ > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_FOLLOWER /* State. */, > + 0 /* Leader. */, > + 2 /* Term. */, > + 1 /* Vote. */, > + 3 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 1}" /* Vclock. */ > + ), "new election is waiting for WAL write"); > + > + /* Now another node wins the election earlier. */ > + is(raft_node_send_leader(&node, > + 3 /* Term. */, > + 2 /* Source. */ > + ), 0, "message is accepted"); > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_FOLLOWER /* State. */, > + 2 /* Leader. */, > + 2 /* Term. */, > + 1 /* Vote. */, > + 3 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 1}" /* Vclock. */ > + ), "the leader is accepted"); > + > + /* > + * The leader resigns and triggers a new election round on the first > + * node. A new election is triggered, but still waiting for the previous > + * WAL write to end. > + */ > + is(raft_node_send_follower(&node, > + 3 /* Term. */, > + 2 /* Source. */ > + ), 0, "message is accepted"); > + /* Note how the volatile term is updated again. */ > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_FOLLOWER /* State. */, > + 0 /* Leader. */, > + 2 /* Term. */, > + 1 /* Vote. */, > + 4 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 1}" /* Vclock. */ > + ), "the leader has resigned, new election is scheduled"); > + raft_node_unblock(&node); > + > + /* Ensure the node still collects votes after the WAL write. */ > + is(raft_node_send_vote_response(&node, > + 4 /* Term. */, > + 1 /* Vote. */, > + 2 /* Source. */ > + ), 0, "vote from 2"); > + is(raft_node_send_vote_response(&node, > + 4 /* Term. */, > + 1 /* Vote. */, > + 3 /* Source. */ > + ), 0, "vote from 3"); > + raft_run_next_event(); > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_LEADER /* State. */, > + 1 /* Leader. */, > + 4 /* Term. */, > + 1 /* Vote. */, > + 4 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 2}" /* Vclock. */ > + ), "the leader is elected"); > + > raft_node_destroy(&node); > > raft_finish_test(); > diff --git a/test/unit/raft.result b/test/unit/raft.result > index 3a3dc5dd2..598a7e760 100644 > --- a/test/unit/raft.result > +++ b/test/unit/raft.result > @@ -111,7 +111,7 @@ ok 4 - subtests > ok 5 - subtests > *** raft_test_vote_skip: done *** > *** raft_test_leader_resign *** > - 1..15 > + 1..23 > ok 1 - message is accepted > ok 2 - leader is elected > ok 3 - message is accepted > @@ -127,6 +127,14 @@ ok 5 - subtests > ok 13 - became leader > ok 14 - the leader has resigned > ok 15 - resign notification is sent > + ok 16 - new election is waiting for WAL write > + ok 17 - message is accepted > + ok 18 - the leader is accepted > + ok 19 - message is accepted > + ok 20 - the leader has resigned, new election is scheduled > + ok 21 - vote from 2 > + ok 22 - vote from 3 > + ok 23 - the leader is elected > ok 6 - subtests > *** raft_test_leader_resign: done *** > *** raft_test_split_brain *** -- Serge Petrenko
next prev parent reply other threads:[~2021-06-10 14:24 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-07 20:48 Vladislav Shpilevoy via Tarantool-patches 2021-06-10 14:24 ` Serge Petrenko via Tarantool-patches [this message] 2021-06-10 20:01 ` Vladislav Shpilevoy via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=b117eb18-54de-6dc5-928f-b75fb521410e@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 1/1] raft: handle remote leader resign during WAL write' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox