Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org, gorcunov@gmail.com,
	sergepetrenko@tarantool.org
Subject: [Tarantool-patches] [PATCH 5/6] replication: use 'score' to find a join-master
Date: Sat,  5 Jun 2021 01:37:59 +0200
Message-ID: <6675abcfa409f1fd6e05a7e7852b42e1a08d1795.1622849790.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1622849790.git.v.shpilevoy@tarantool.org>

The patch refactors the algorithm of finding a join-master (in
replicaset_find_join_master()) to use scores instead of multiple
iterations with different criteria.

The original code was relatively fine as long as it had only
one parameter to change - whether should it skip
`box.cfg{read_only = true}` nodes.

Although it was clear that it was "on the edge" of acceptable
complexity due to a second non-configurable parameter whether a
replica is in read-only state regardless of its config.

It is going to get more complicated when the algorithm will take
into account the third parameter whether an instance is
bootstrapped.

Then it should make decisions like "among bootstrapped nodes try
to prefer instances not having read_only=true, and not being in
read-only state". The easiest way to do so is to use
scores/weights incremented according to the instance's parameters
matching certain "good points".

Part of #5613
---
 src/box/replication.cc | 62 ++++++++++++++++--------------------------
 1 file changed, 23 insertions(+), 39 deletions(-)

diff --git a/src/box/replication.cc b/src/box/replication.cc
index 990f6239c..d33e70f28 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -960,71 +960,55 @@ replicaset_next(struct replica *replica)
  * replicas, choose a read-only replica with biggest vclock
  * as a leader, in hope it will become read-write soon.
  */
-static struct replica *
-replicaset_round(bool skip_ro)
+struct replica *
+replicaset_find_join_master(void)
 {
 	struct replica *leader = NULL;
+	int leader_score = -1;
 	replicaset_foreach(replica) {
 		struct applier *applier = replica->applier;
 		if (applier == NULL)
 			continue;
 		const struct ballot *ballot = &applier->ballot;
-		/**
-		 * While bootstrapping a new cluster, read-only
-		 * replicas shouldn't be considered as a leader.
-		 * The only exception if there is no read-write
-		 * replicas since there is still a possibility
-		 * that all replicas exist in cluster table.
-		 */
-		if (skip_ro && ballot->is_ro_cfg)
-			continue;
-		if (leader == NULL) {
-			leader = replica;
-			continue;
-		}
-		const struct ballot *leader_ballot = &leader->applier->ballot;
+		int score = 0;
 		/*
-		 * Try to find a replica which has already left
-		 * orphan mode.
+		 * Prefer instances not configured as read-only via box.cfg, and
+		 * not being in read-only state due to any other reason. The
+		 * config is stronger because if it is configured as read-only,
+		 * it is in read-only state for sure, until the config is
+		 * changed.
 		 */
-		if (ballot->is_ro && !leader_ballot->is_ro)
+		if (!ballot->is_ro_cfg)
+			score += 5;
+		if (!ballot->is_ro)
+			score += 1;
+		if (leader_score < score)
+			goto elect;
+		if (score < leader_score)
 			continue;
+		const struct ballot *leader_ballot;
+		leader_ballot = &leader->applier->ballot;
 		/*
 		 * Choose the replica with the most advanced
 		 * vclock. If there are two or more replicas
 		 * with the same vclock, prefer the one with
 		 * the lowest uuid.
 		 */
-		int cmp = vclock_compare_ignore0(&ballot->vclock,
-						 &leader_ballot->vclock);
+		int cmp;
+		cmp = vclock_compare_ignore0(&ballot->vclock,
+					     &leader_ballot->vclock);
 		if (cmp < 0)
 			continue;
 		if (cmp == 0 && tt_uuid_compare(&replica->uuid,
 						&leader->uuid) > 0)
 			continue;
+	elect:
 		leader = replica;
+		leader_score = score;
 	}
 	return leader;
 }
 
-struct replica *
-replicaset_find_join_master(void)
-{
-	bool skip_ro = true;
-	/**
-	 * Two loops, first prefers read-write replicas among others.
-	 * Second for backward compatibility, if there is no such
-	 * replicas at all.
-	 */
-	struct replica *leader = replicaset_round(skip_ro);
-	if (leader == NULL) {
-		skip_ro = false;
-		leader = replicaset_round(skip_ro);
-	}
-
-	return leader;
-}
-
 struct replica *
 replica_by_uuid(const struct tt_uuid *uuid)
 {
-- 
2.24.3 (Apple Git-128)


  parent reply	other threads:[~2021-06-04 23:40 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04 23:37 [Tarantool-patches] [PATCH 0/6] Instance join should prefer booted instances Vladislav Shpilevoy via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 1/6] replication: refactor replicaset_leader() Vladislav Shpilevoy via Tarantool-patches
2021-06-10 13:54   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 2/6] replication: ballot.is_ro -> is_ro_cfg Vladislav Shpilevoy via Tarantool-patches
2021-06-10 13:56   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 3/6] replication: ballot.is_loading -> is_ro Vladislav Shpilevoy via Tarantool-patches
2021-06-10 13:58   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` [Tarantool-patches] [PATCH 4/6] replication: introduce ballot.is_booted Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:02   ` Serge Petrenko via Tarantool-patches
2021-06-04 23:37 ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-06-10 14:06   ` [Tarantool-patches] [PATCH 5/6] replication: use 'score' to find a join-master Serge Petrenko via Tarantool-patches
2021-06-10 15:02   ` Cyrill Gorcunov via Tarantool-patches
2021-06-10 20:09     ` Vladislav Shpilevoy via Tarantool-patches
2021-06-10 21:28       ` Cyrill Gorcunov via Tarantool-patches
2021-06-04 23:38 ` [Tarantool-patches] [PATCH 6/6] replication: prefer to join from booted replicas Vladislav Shpilevoy via Tarantool-patches
2021-06-06 17:06   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:14   ` Serge Petrenko via Tarantool-patches
2021-06-06 17:03 ` [Tarantool-patches] [PATCH 7/6] raft: test join to a raft cluster Vladislav Shpilevoy via Tarantool-patches
2021-06-06 23:01   ` Vladislav Shpilevoy via Tarantool-patches
2021-06-10 14:17   ` Serge Petrenko via Tarantool-patches
2021-06-10 15:03 ` [Tarantool-patches] [PATCH 0/6] Instance join should prefer booted instances Cyrill Gorcunov via Tarantool-patches
2021-06-11 20:56 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6675abcfa409f1fd6e05a7e7852b42e1a08d1795.1622849790.git.v.shpilevoy@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git