Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org
Subject: [Tarantool-patches] [PATCH 3/3] [tosquash] raft: don't start new election immediately after restart
Date: Fri, 25 Sep 2020 23:29:13 +0200	[thread overview]
Message-ID: <88552bc37d2211cac244f308da2605a4dbc637d5.1601069274.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1601069274.git.v.shpilevoy@tarantool.org>

That creates problems in a cluster when followers restart.
Because if they would bump terms after restart, they would turn
the current leader into follower, therefore interrupting the
cluster progression.

Better let them wait for the leader death timeout if a leader
will appear. Because this is what will happen in 99% cases.
---
 src/box/box.cc | 12 +++++++++++-
 src/box/raft.c | 49 ++++++++++++++++++++++++++++++++++++++++++++-----
 src/box/raft.h |  8 ++++++++
 3 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 1169f2cd5..957bf4d7a 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2799,8 +2799,18 @@ box_cfg_xc(void)
 	title("running");
 	say_info("ready to accept requests");
 
-	if (!is_bootstrap_leader)
+	if (!is_bootstrap_leader) {
 		replicaset_sync();
+	} else {
+		/*
+		 * When the cluster is just bootstrapped and this instance is a
+		 * leader, it makes no sense to wait for a leader appearance.
+		 * There is no one. Moreover this node *is* a leader, so it
+		 * should take the control over the situation and start a new
+		 * term immediately.
+		 */
+		raft_new_term();
+	}
 
 	/* box.cfg.read_only is not read yet. */
 	assert(box_is_ro());
diff --git a/src/box/raft.c b/src/box/raft.c
index 8e5f7803c..608722824 100644
--- a/src/box/raft.c
+++ b/src/box/raft.c
@@ -209,6 +209,17 @@ raft_sm_stop(void);
 static void
 raft_sm_wait_leader_dead(void);
 
+/**
+ * Wait for the leader death timeout until a leader lets the node know he is
+ * alive. Otherwise the node will start a new term. Can be useful when it is not
+ * known whether the leader is alive, but it is undesirable to start a new term
+ * immediately. Because in case the leader is alive, a new term would stun him
+ * and therefore would stun DB write requests. Usually happens when a follower
+ * restarts and may need some time to hear something from the leader.
+ */
+static void
+raft_sm_wait_leader_found(void);
+
 /**
  * If election is started by this node, or it voted for some other node started
  * the election, and it can be a leader itself, it will wait until the current
@@ -785,6 +796,19 @@ raft_sm_wait_leader_dead(void)
 	ev_timer_start(loop(), &raft.timer);
 }
 
+static void
+raft_sm_wait_leader_found(void)
+{
+	assert(!ev_is_active(&raft.timer));
+	assert(!raft.is_write_in_progress);
+	assert(raft.is_candidate);
+	assert(raft.state == RAFT_STATE_FOLLOWER);
+	assert(raft.leader == 0);
+	double death_timeout = replication_disconnect_timeout();
+	ev_timer_set(&raft.timer, death_timeout, death_timeout);
+	ev_timer_start(loop(), &raft.timer);
+}
+
 static void
 raft_sm_wait_election_end(void)
 {
@@ -811,12 +835,20 @@ raft_sm_start(void)
 	assert(raft.state == RAFT_STATE_FOLLOWER);
 	raft.is_enabled = true;
 	raft.is_candidate = raft.is_cfg_candidate;
-	if (!raft.is_candidate)
+	if (!raft.is_candidate) {
 		/* Nop. */;
-	else if (raft.leader != 0)
+	} else if (raft.leader != 0) {
 		raft_sm_wait_leader_dead();
-	else
-		raft_sm_schedule_new_election();
+	} else {
+		/*
+		 * Don't start new election. The situation is most likely
+		 * happened because this node was restarted. Instance restarts
+		 * may happen in the cluster, and each restart shouldn't
+		 * disturb the current leader. Give it time to notify this node
+		 * that there is a leader.
+		 */
+		raft_sm_wait_leader_found();
+	}
 	box_update_ro_summary();
 }
 
@@ -892,7 +924,7 @@ raft_cfg_is_candidate(bool is_candidate)
 		 * node who sent newer data to this node.
 		 */
 		if (raft.leader == 0 && raft_is_fully_on_disk())
-			raft_sm_schedule_new_election();
+			raft_sm_wait_leader_found();
 	} else if (raft.state != RAFT_STATE_FOLLOWER) {
 		if (raft.state == RAFT_STATE_LEADER)
 			raft.leader = 0;
@@ -946,6 +978,13 @@ raft_cfg_death_timeout(void)
 	}
 }
 
+void
+raft_new_term(void)
+{
+	if (raft.is_enabled)
+		raft_sm_schedule_new_term(raft.volatile_term + 1);
+}
+
 static void
 raft_schedule_broadcast(void)
 {
diff --git a/src/box/raft.h b/src/box/raft.h
index 8c1cf467f..be77a5473 100644
--- a/src/box/raft.h
+++ b/src/box/raft.h
@@ -229,6 +229,14 @@ raft_cfg_election_quorum(void);
 void
 raft_cfg_death_timeout(void);
 
+/**
+ * Bump the term. When it is persisted, the node checks if there is a leader,
+ * and if there is not, a new election is started. That said, this function can
+ * be used as tool to forcefully start new election, or restart an existing.
+ */
+void
+raft_new_term(void);
+
 /**
  * Save complete Raft state into a request to be sent to other instances of the
  * cluster. It is allowed to save anything here, not only persistent state.
-- 
2.21.1 (Apple Git-122.3)

  parent reply	other threads:[~2020-09-25 21:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25 21:29 [Tarantool-patches] [PATCH 0/3] Raft follow-ups Vladislav Shpilevoy
2020-09-25 21:29 ` [Tarantool-patches] [PATCH 1/3] [tosquash] raft: vote without a new term when possible Vladislav Shpilevoy
2020-09-25 21:29 ` [Tarantool-patches] [PATCH 2/3] [tosquash] raft: fix 99% CPU issue in raft io worker Vladislav Shpilevoy
2020-09-25 21:29 ` Vladislav Shpilevoy [this message]
2020-09-28  7:09 ` [Tarantool-patches] [PATCH 0/3] Raft follow-ups Serge Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88552bc37d2211cac244f308da2605a4dbc637d5.1601069274.git.v.shpilevoy@tarantool.org \
    --to=v.shpilevoy@tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 3/3] [tosquash] raft: don'\''t start new election immediately after restart' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox