From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id AA03B6EC55; Thu, 10 Jun 2021 17:24:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org AA03B6EC55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1623335086; bh=GiqmZQutrzscjszergiV3nb0fhOCbGi3O+2wf7fU+7A=; h=To:References:Date:In-Reply-To:Subject:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=WDSMPdXYuXbzMZg1xKLgDBG9jdztkKuWN/1pBDrptp7rFQhJXeyEqKGG8Q3L0+zBO qOtBg7dJhsBnR5sXvgGsP4SgDe12WAK1gw5v1xdcHr6r9RNYhfyKbK19YjBY9Sh4yo K8ARvR7SyJgVopGakjbComj/pIB+gxuxicUNXmW4= Received: from smtp49.i.mail.ru (smtp49.i.mail.ru [94.100.177.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7F3D06EC55 for ; Thu, 10 Jun 2021 17:24:45 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 7F3D06EC55 Received: by smtp49.i.mail.ru with esmtpa (envelope-from ) id 1lrLbw-0004b2-NQ; Thu, 10 Jun 2021 17:24:45 +0300 To: Vladislav Shpilevoy , tarantool-patches@dev.tarantool.org, gorcunov@gmail.com References: <08a836b17506f5b6e9edfcaee36bbeea5754f917.1623098844.git.v.shpilevoy@tarantool.org> Message-ID: Date: Thu, 10 Jun 2021 17:24:44 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <08a836b17506f5b6e9edfcaee36bbeea5754f917.1623098844.git.v.shpilevoy@tarantool.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-7564579A: B8F34718100C35BD X-77F55803: 4F1203BC0FB41BD9D5B0DA836B685C54F2E6026ACF3A18D093F1E8B1F5B66022182A05F538085040B4DDEBB119DDD43B28E92546FA77615AD7290B7E991E5546EA360FDCEE14460F X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE781C216450ED815FAEA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F790063727BBC20C3D5F36038638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D81DAE89C25E56206431E3D6E2504A43F9117882F4460429724CE54428C33FAD305F5C1EE8F4F765FCF1175FABE1C0F9B6A471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F44604297287769387670735201E561CDFBCA1751FCB629EEF1311BF91D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EECCD848CCB6FE560C28765F5520A300B2D8FC6C240DEA7642DBF02ECDB25306B2B78CF848AE20165D0A6AB1C7CE11FEE317119E5299B287EE2D242C3BD2E3F4C6C4224003CC836476EA7A3FFF5B025636E2021AF6380DFAD1A18204E546F3947CB11811A4A51E3B096D1867E19FE1407959CC434672EE6371089D37D7C0E48F6C8AA50765F7900637BC468E7E89D8C5D6EFF80C71ABB335746BA297DBC24807EABDAD6C7F3747799A X-B7AD71C0: AC4F5C86D027EB782CDD5689AFBDA7A24209795067102C07E8F7B195E1C97831C7CB5FA6484C0E277E6ED54993E2D80B X-C1DE0DAB: 0D63561A33F958A567F36172397E8665988BF09B769F1CE951F0557F9267B192D59269BC5F550898D99A6476B3ADF6B47008B74DF8BB9EF7333BD3B22AA88B938A852937E12ACA75448CF9D3A7B2C848410CA545F18667F91A7EA1CDA0B5A7A0 X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D340E23B77EA18078174A37A6DD1D0A0FF5DF5A5237CA44D61BE9DD0502FD37E5E1F9AF44316EF61EE51D7E09C32AA3244CC11B760A98F3B07A213693B400D965193FD9C8CA1B0515E0FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojCpYK6nkTlbEIEEludukxOQ== X-Mailru-Sender: 583F1D7ACE8F49BD9DF7A8DAE6E2B08AA501CF572C2C81C04F3FB02807AF2F3128C6A8407667C890424AE0EB1F3D1D21E2978F233C3FAE6EE63DB1732555E4A8EE80603BA4A5B0BC112434F685709FCF0DA7A0AF5A3A8387 X-Mras: Ok Subject: Re: [Tarantool-patches] [PATCH 1/1] raft: handle remote leader resign during WAL write X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" 07.06.2021 23:48, Vladislav Shpilevoy пишет: > If Raft state machine sees the current leader has explicitly > resigned from its role, it starts a new election round right away. > > But in the code starting a new round there was an assumption that > there is no a volatile state. There was, in fact. > > The patch makes the election start code use the volatile state to > bump the term. It should be safe, because the other nodes won't > receive it anyway until the new term is persisted. > > There was an alternative - do not schedule new election until the > current WAL write ends. It wasn't done, because would achieve the > same (the term would be bumped and persisted) but with bigger a > latency. > > Another reason is that if the leader would appear and resign > during WAL write on another candidate, in the end of its WAL write > the latter would see 0 leader and would think this term didn't > have one yet. And would try to elect self now, in the current > term. It makes little sense, because it won't win - the current > term had already had a leader and the majority of votes is > already taken. > > Closes #6129 Thanks for the patch! LGTM. > --- > Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-6129-raft-crash > Issue: https://github.com/tarantool/tarantool/issues/6129 > > .../gh-6129-raft-resign-during-wal.md | 7 ++ > src/lib/raft/raft.c | 8 +- > test/unit/raft.c | 80 ++++++++++++++++++- > test/unit/raft.result | 10 ++- > 4 files changed, 101 insertions(+), 4 deletions(-) > create mode 100644 changelogs/unreleased/gh-6129-raft-resign-during-wal.md > > diff --git a/changelogs/unreleased/gh-6129-raft-resign-during-wal.md b/changelogs/unreleased/gh-6129-raft-resign-during-wal.md > new file mode 100644 > index 000000000..eb6cf3984 > --- /dev/null > +++ b/changelogs/unreleased/gh-6129-raft-resign-during-wal.md > @@ -0,0 +1,7 @@ > +## bugfix/raft > + > +* Fixed a rare crash with the leader election enabled (any mode except `off`), > + which could happen if a leader resigned from its role at the same time as some > + other node was writing something related to the elections to WAL. The crash > + was in debug build and in the release build it would lead to undefined > + behaviour (gh-6129). > diff --git a/src/lib/raft/raft.c b/src/lib/raft/raft.c > index 247c7a71a..eacdddb7e 100644 > --- a/src/lib/raft/raft.c > +++ b/src/lib/raft/raft.c > @@ -687,10 +687,9 @@ static void > raft_sm_schedule_new_election(struct raft *raft) > { > say_info("RAFT: begin new election round"); > - assert(raft_is_fully_on_disk(raft)); > assert(raft->is_candidate); > /* Everyone is a follower until its vote for self is persisted. */ > - raft_sm_schedule_new_term(raft, raft->term + 1); > + raft_sm_schedule_new_term(raft, raft->volatile_term + 1); > raft_sm_schedule_new_vote(raft, raft->self); > } > > @@ -701,6 +700,11 @@ raft_sm_schedule_new_election_cb(struct ev_loop *loop, struct ev_timer *timer, > (void)events; > struct raft *raft = timer->data; > assert(timer == &raft->timer); > + /* > + * Otherwise the timer would be stopped and the callback wouldn't be > + * invoked. > + */ > + assert(raft_is_fully_on_disk(raft)); > raft_ev_timer_stop(loop, timer); > raft_sm_schedule_new_election(raft); > } > diff --git a/test/unit/raft.c b/test/unit/raft.c > index 29e48da13..6369c42d3 100644 > --- a/test/unit/raft.c > +++ b/test/unit/raft.c > @@ -572,7 +572,7 @@ raft_test_vote_skip(void) > static void > raft_test_leader_resign(void) > { > - raft_start_test(15); > + raft_start_test(23); > struct raft_node node; > > /* > @@ -672,6 +672,84 @@ raft_test_leader_resign(void) > NULL /* Vclock. */ > ), "resign notification is sent"); > > + /* > + * gh-6129: resign of a remote leader during a local WAL write should > + * schedule a new election after the WAL write. > + * > + * Firstly start a new term. > + */ > + raft_node_block(&node); > + raft_node_cfg_is_candidate(&node, true); > + raft_run_next_event(); > + /* Volatile term is new, but the persistent one is not updated yet. */ > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_FOLLOWER /* State. */, > + 0 /* Leader. */, > + 2 /* Term. */, > + 1 /* Vote. */, > + 3 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 1}" /* Vclock. */ > + ), "new election is waiting for WAL write"); > + > + /* Now another node wins the election earlier. */ > + is(raft_node_send_leader(&node, > + 3 /* Term. */, > + 2 /* Source. */ > + ), 0, "message is accepted"); > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_FOLLOWER /* State. */, > + 2 /* Leader. */, > + 2 /* Term. */, > + 1 /* Vote. */, > + 3 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 1}" /* Vclock. */ > + ), "the leader is accepted"); > + > + /* > + * The leader resigns and triggers a new election round on the first > + * node. A new election is triggered, but still waiting for the previous > + * WAL write to end. > + */ > + is(raft_node_send_follower(&node, > + 3 /* Term. */, > + 2 /* Source. */ > + ), 0, "message is accepted"); > + /* Note how the volatile term is updated again. */ > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_FOLLOWER /* State. */, > + 0 /* Leader. */, > + 2 /* Term. */, > + 1 /* Vote. */, > + 4 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 1}" /* Vclock. */ > + ), "the leader has resigned, new election is scheduled"); > + raft_node_unblock(&node); > + > + /* Ensure the node still collects votes after the WAL write. */ > + is(raft_node_send_vote_response(&node, > + 4 /* Term. */, > + 1 /* Vote. */, > + 2 /* Source. */ > + ), 0, "vote from 2"); > + is(raft_node_send_vote_response(&node, > + 4 /* Term. */, > + 1 /* Vote. */, > + 3 /* Source. */ > + ), 0, "vote from 3"); > + raft_run_next_event(); > + ok(raft_node_check_full_state(&node, > + RAFT_STATE_LEADER /* State. */, > + 1 /* Leader. */, > + 4 /* Term. */, > + 1 /* Vote. */, > + 4 /* Volatile term. */, > + 1 /* Volatile vote. */, > + "{0: 2}" /* Vclock. */ > + ), "the leader is elected"); > + > raft_node_destroy(&node); > > raft_finish_test(); > diff --git a/test/unit/raft.result b/test/unit/raft.result > index 3a3dc5dd2..598a7e760 100644 > --- a/test/unit/raft.result > +++ b/test/unit/raft.result > @@ -111,7 +111,7 @@ ok 4 - subtests > ok 5 - subtests > *** raft_test_vote_skip: done *** > *** raft_test_leader_resign *** > - 1..15 > + 1..23 > ok 1 - message is accepted > ok 2 - leader is elected > ok 3 - message is accepted > @@ -127,6 +127,14 @@ ok 5 - subtests > ok 13 - became leader > ok 14 - the leader has resigned > ok 15 - resign notification is sent > + ok 16 - new election is waiting for WAL write > + ok 17 - message is accepted > + ok 18 - the leader is accepted > + ok 19 - message is accepted > + ok 20 - the leader has resigned, new election is scheduled > + ok 21 - vote from 2 > + ok 22 - vote from 3 > + ok 23 - the leader is elected > ok 6 - subtests > *** raft_test_leader_resign: done *** > *** raft_test_split_brain *** -- Serge Petrenko