Tarantool development patches archive
 help / color / mirror / Atom feed
From: Cyrill Gorcunov <gorcunov@gmail.com>
To: tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum
Date: Tue, 24 Nov 2020 18:24:04 +0300	[thread overview]
Message-ID: <20201124152405.1174898-3-gorcunov@gmail.com> (raw)
In-Reply-To: <20201124152405.1174898-1-gorcunov@gmail.com>

When synchronous replication is used we prefer a user to specify
a quorum number, ie the number of replicas where data must be
replicated before the master node continue accepting new
transactions.

This is not very convenient since a user may not know initially
how many replicas will be used. Moreover the number of replicas
may vary dynamically. For this sake we allow to specify the
number of quorum in a symbolic way.

For example

box.cfg {
	replication_synchro_quorum = "n/2+1",
}

where `n` is a number of registered replicas in a cluster.
Once new replica attached or old one detached the number
is renewed and propagated.

Internally on each replica_set_id() and replica_clear_id(),
ie at moment when replica get registered or unregistered,
we call box_renew_replication_synchro_quorum() helper which
finds out if evaluation of replication_synchro_quorum is
needed and if so we calculate new replication_synchro_quorum
value based on number of currently registered replicas. Then
we notify dependent systems such as qsync and raft to update
their guts.

Closes #5446

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Synchronous replication

The `replication_synchro_quorum` parameter allows to specify
value not just as a plain integer number but as a formula too.
The formula should use symbol `n` to represent amount of
registered replicas.

For example the canonical definition for a quorum (ie majority
of members in a set) of `n` replicas is `n/2+1`. For such
configuration one can define

```
box.cfg {replication_synchro_quorum = "n/2+1"}
```

Note that for sake of simplicity quorum evaluation never returns
negative values thus for the case of formula say `n-2` the result
will be 1 until number of replicas become 4 and more.
---
 src/box/box.cc           | 115 +++++++++++++++++++++++++++++++++++++--
 src/box/box.h            |   1 +
 src/box/lua/load_cfg.lua |   2 +-
 src/box/replication.cc   |   4 ++
 4 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/src/box/box.cc b/src/box/box.cc
index 1f7dec362..e4cb013c3 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -562,10 +562,81 @@ box_check_replication_sync_lag(void)
 	return lag;
 }
 
+/**
+ * Evaluate replication syncro quorum number from a formula.
+ */
+static int
+eval_replication_synchro_quorum(int nr_replicas)
+{
+	const char fmt[] =
+		"local f, err = loadstring(\"return (%s)\")\n"
+		"if not f then return 'failed to load \"%s\"' end\n"
+		"setfenv(f, { n = %d })\n"
+		"local ok, res = pcall(f)\n"
+		"if not ok then return res end\n"
+		"return math.floor(res)\n";
+	char buf[512];
+	int value = -1;
+
+	const char *expr = cfg_gets("replication_synchro_quorum");
+	size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
+			      expr, nr_replicas);
+	if (ret >= sizeof(buf)) {
+		diag_set(ClientError, ER_CFG,
+			 "replication_synchro_quorum",
+			 "the expression is too big");
+		return -1;
+	}
+
+	luaL_loadstring(tarantool_L, buf);
+	lua_call(tarantool_L, 0, 1);
+
+	if (lua_isnumber(tarantool_L, -1)) {
+		value = (int)lua_tonumber(tarantool_L, -1);
+	} else {
+		diag_set(ClientError, ER_CFG,
+			 "replication_synchro_quorum",
+			 lua_tostring(tarantool_L, -1));
+		return -1;
+	}
+	lua_pop(tarantool_L, 1);
+
+	/*
+	 * At least we should have 1 node to sync, thus
+	 * if the formula has evaluated to some negative
+	 * value (say it was n-2) do not treat it as an
+	 * error but just yield a minimum valid magnitude.
+	 */
+	if (value < 0) {
+		say_warn("replication_synchro_quorum evaluated "
+			 "to the negative value %d, ignore", value);
+	}
+	return MAX(1, MIN(value, VCLOCK_MAX-1));
+}
+
 static int
 box_check_replication_synchro_quorum(void)
 {
-	int quorum = cfg_geti("replication_synchro_quorum");
+	int quorum = 0;
+
+	if (!cfg_isnumber("replication_synchro_quorum")) {
+		/*
+		 * The formula uses symbolic name 'n' as
+		 * a number of currently registered replicas
+		 * thus lets pass a sane value here.
+		 *
+		 * Note though at moment of bootstrap this value
+		 * is zero but the evaluator will return a valid
+		 * number back, we rather use this variable in
+		 * a sake of "sense" pointing out that we're
+		 * depending on number of replicas.
+		 */
+		int value = replicaset.registered_count;
+		quorum = eval_replication_synchro_quorum(value);
+	} else {
+		quorum = cfg_geti("replication_synchro_quorum");
+	}
+
 	if (quorum <= 0 || quorum >= VCLOCK_MAX) {
 		diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
 			 "the value must be greater than zero and less than "
@@ -918,18 +989,54 @@ box_set_replication_sync_lag(void)
 	replication_sync_lag = box_check_replication_sync_lag();
 }
 
+/**
+ * Assign new replication_synchro_quorum value
+ * and notify dependent subsystems.
+ */
+static void
+set_replication_synchro_quorum(int quorum)
+{
+	assert(quorum > 0 && quorum < VCLOCK_MAX);
+
+	replication_synchro_quorum = quorum;
+	txn_limbo_on_parameters_change(&txn_limbo);
+	raft_cfg_election_quorum(box_raft());
+}
+
 int
 box_set_replication_synchro_quorum(void)
 {
 	int value = box_check_replication_synchro_quorum();
 	if (value < 0)
 		return -1;
-	replication_synchro_quorum = value;
-	txn_limbo_on_parameters_change(&txn_limbo);
-	raft_cfg_election_quorum(box_raft());
+	set_replication_synchro_quorum(value);
 	return 0;
 }
 
+/**
+ * Renew replication_synchro_quorum value if defined
+ * as a formula and we need to recalculate it.
+ */
+void
+box_renew_replication_synchro_quorum(void)
+{
+	if (cfg_isnumber("replication_synchro_quorum"))
+		return;
+
+	/*
+	 * The formula has been verified already on the bootstrap
+	 * stage (and on dynamic reconfig as well), still there
+	 * is a Lua call inside, heck knowns what could go wrong
+	 * there thus panic if we're screwed.
+	 */
+	int value = replicaset.registered_count;
+	int quorum = eval_replication_synchro_quorum(value);
+	if (quorum < 0)
+		panic("failed to eval replication_synchro_quorum");
+	say_info("renew replication_synchro_quorum = %d", quorum);
+	set_replication_synchro_quorum(quorum);
+}
+
 int
 box_set_replication_synchro_timeout(void)
 {
diff --git a/src/box/box.h b/src/box/box.h
index b47a220b7..dd943f169 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void);
 void box_set_replication_connect_quorum(void);
 void box_set_replication_sync_lag(void);
 int box_set_replication_synchro_quorum(void);
+void box_renew_replication_synchro_quorum(void);
 int box_set_replication_synchro_timeout(void);
 void box_set_replication_sync_timeout(void);
 void box_set_replication_skip_conflict(void);
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 76e2e92c2..af66c0e46 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -172,7 +172,7 @@ local template_cfg = {
     replication_timeout = 'number',
     replication_sync_lag = 'number',
     replication_sync_timeout = 'number',
-    replication_synchro_quorum = 'number',
+    replication_synchro_quorum = 'string, number',
     replication_synchro_timeout = 'number',
     replication_connect_timeout = 'number',
     replication_connect_quorum = 'number',
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 65512cf0f..f89b8dbc3 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -250,6 +250,8 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
 	say_info("assigned id %d to replica %s",
 		 replica->id, tt_uuid_str(&replica->uuid));
 	replica->anon = false;
+
+	box_renew_replication_synchro_quorum();
 }
 
 void
@@ -298,6 +300,8 @@ replica_clear_id(struct replica *replica)
 		assert(!replica->anon);
 		replica_delete(replica);
 	}
+
+	box_renew_replication_synchro_quorum();
 }
 
 void
-- 
2.26.2

  parent reply	other threads:[~2020-11-24 15:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 15:24 [Tarantool-patches] [PATCH v2 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov
2020-11-24 15:24 ` Cyrill Gorcunov [this message]
2020-11-25 11:36   ` [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Serge Petrenko
2020-11-25 11:55     ` Cyrill Gorcunov
2020-11-25 12:10       ` Serge Petrenko
2020-11-25 12:19         ` Cyrill Gorcunov
2020-11-25 12:04   ` Serge Petrenko
2020-11-25 12:12     ` Cyrill Gorcunov
2020-11-25 12:46       ` Serge Petrenko
2020-11-25 12:53         ` Cyrill Gorcunov
2020-11-25 13:49           ` Serge Petrenko
2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov
2020-11-25 13:57   ` Serge Petrenko
2020-11-25 14:10     ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124152405.1174898-3-gorcunov@gmail.com \
    --to=gorcunov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox