From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org Subject: [Tarantool-patches] [PATCH 3/3] [tosquash] raft: don't start new election immediately after restart Date: Fri, 25 Sep 2020 23:29:13 +0200 [thread overview] Message-ID: <88552bc37d2211cac244f308da2605a4dbc637d5.1601069274.git.v.shpilevoy@tarantool.org> (raw) In-Reply-To: <cover.1601069274.git.v.shpilevoy@tarantool.org> That creates problems in a cluster when followers restart. Because if they would bump terms after restart, they would turn the current leader into follower, therefore interrupting the cluster progression. Better let them wait for the leader death timeout if a leader will appear. Because this is what will happen in 99% cases. --- src/box/box.cc | 12 +++++++++++- src/box/raft.c | 49 ++++++++++++++++++++++++++++++++++++++++++++----- src/box/raft.h | 8 ++++++++ 3 files changed, 63 insertions(+), 6 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index 1169f2cd5..957bf4d7a 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -2799,8 +2799,18 @@ box_cfg_xc(void) title("running"); say_info("ready to accept requests"); - if (!is_bootstrap_leader) + if (!is_bootstrap_leader) { replicaset_sync(); + } else { + /* + * When the cluster is just bootstrapped and this instance is a + * leader, it makes no sense to wait for a leader appearance. + * There is no one. Moreover this node *is* a leader, so it + * should take the control over the situation and start a new + * term immediately. + */ + raft_new_term(); + } /* box.cfg.read_only is not read yet. */ assert(box_is_ro()); diff --git a/src/box/raft.c b/src/box/raft.c index 8e5f7803c..608722824 100644 --- a/src/box/raft.c +++ b/src/box/raft.c @@ -209,6 +209,17 @@ raft_sm_stop(void); static void raft_sm_wait_leader_dead(void); +/** + * Wait for the leader death timeout until a leader lets the node know he is + * alive. Otherwise the node will start a new term. Can be useful when it is not + * known whether the leader is alive, but it is undesirable to start a new term + * immediately. Because in case the leader is alive, a new term would stun him + * and therefore would stun DB write requests. Usually happens when a follower + * restarts and may need some time to hear something from the leader. + */ +static void +raft_sm_wait_leader_found(void); + /** * If election is started by this node, or it voted for some other node started * the election, and it can be a leader itself, it will wait until the current @@ -785,6 +796,19 @@ raft_sm_wait_leader_dead(void) ev_timer_start(loop(), &raft.timer); } +static void +raft_sm_wait_leader_found(void) +{ + assert(!ev_is_active(&raft.timer)); + assert(!raft.is_write_in_progress); + assert(raft.is_candidate); + assert(raft.state == RAFT_STATE_FOLLOWER); + assert(raft.leader == 0); + double death_timeout = replication_disconnect_timeout(); + ev_timer_set(&raft.timer, death_timeout, death_timeout); + ev_timer_start(loop(), &raft.timer); +} + static void raft_sm_wait_election_end(void) { @@ -811,12 +835,20 @@ raft_sm_start(void) assert(raft.state == RAFT_STATE_FOLLOWER); raft.is_enabled = true; raft.is_candidate = raft.is_cfg_candidate; - if (!raft.is_candidate) + if (!raft.is_candidate) { /* Nop. */; - else if (raft.leader != 0) + } else if (raft.leader != 0) { raft_sm_wait_leader_dead(); - else - raft_sm_schedule_new_election(); + } else { + /* + * Don't start new election. The situation is most likely + * happened because this node was restarted. Instance restarts + * may happen in the cluster, and each restart shouldn't + * disturb the current leader. Give it time to notify this node + * that there is a leader. + */ + raft_sm_wait_leader_found(); + } box_update_ro_summary(); } @@ -892,7 +924,7 @@ raft_cfg_is_candidate(bool is_candidate) * node who sent newer data to this node. */ if (raft.leader == 0 && raft_is_fully_on_disk()) - raft_sm_schedule_new_election(); + raft_sm_wait_leader_found(); } else if (raft.state != RAFT_STATE_FOLLOWER) { if (raft.state == RAFT_STATE_LEADER) raft.leader = 0; @@ -946,6 +978,13 @@ raft_cfg_death_timeout(void) } } +void +raft_new_term(void) +{ + if (raft.is_enabled) + raft_sm_schedule_new_term(raft.volatile_term + 1); +} + static void raft_schedule_broadcast(void) { diff --git a/src/box/raft.h b/src/box/raft.h index 8c1cf467f..be77a5473 100644 --- a/src/box/raft.h +++ b/src/box/raft.h @@ -229,6 +229,14 @@ raft_cfg_election_quorum(void); void raft_cfg_death_timeout(void); +/** + * Bump the term. When it is persisted, the node checks if there is a leader, + * and if there is not, a new election is started. That said, this function can + * be used as tool to forcefully start new election, or restart an existing. + */ +void +raft_new_term(void); + /** * Save complete Raft state into a request to be sent to other instances of the * cluster. It is allowed to save anything here, not only persistent state. -- 2.21.1 (Apple Git-122.3)
next prev parent reply other threads:[~2020-09-25 21:29 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-25 21:29 [Tarantool-patches] [PATCH 0/3] Raft follow-ups Vladislav Shpilevoy 2020-09-25 21:29 ` [Tarantool-patches] [PATCH 1/3] [tosquash] raft: vote without a new term when possible Vladislav Shpilevoy 2020-09-25 21:29 ` [Tarantool-patches] [PATCH 2/3] [tosquash] raft: fix 99% CPU issue in raft io worker Vladislav Shpilevoy 2020-09-25 21:29 ` Vladislav Shpilevoy [this message] 2020-09-28 7:09 ` [Tarantool-patches] [PATCH 0/3] Raft follow-ups Serge Petrenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=88552bc37d2211cac244f308da2605a4dbc637d5.1601069274.git.v.shpilevoy@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 3/3] [tosquash] raft: don'\''t start new election immediately after restart' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox