Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum
Date: Wed, 25 Nov 2020 14:36:18 +0300	[thread overview]
Message-ID: <f5e82316-0a58-09f8-17be-3a062a4780d7@tarantool.org> (raw)
In-Reply-To: <20201124152405.1174898-3-gorcunov@gmail.com>


24.11.2020 18:24, Cyrill Gorcunov пишет:
> When synchronous replication is used we prefer a user to specify
> a quorum number, ie the number of replicas where data must be
> replicated before the master node continue accepting new
> transactions.
>
> This is not very convenient since a user may not know initially
> how many replicas will be used. Moreover the number of replicas
> may vary dynamically. For this sake we allow to specify the
> number of quorum in a symbolic way.
>
> For example
>
> box.cfg {
> 	replication_synchro_quorum = "n/2+1",
> }
>
> where `n` is a number of registered replicas in a cluster.
> Once new replica attached or old one detached the number
> is renewed and propagated.
>
> Internally on each replica_set_id() and replica_clear_id(),
> ie at moment when replica get registered or unregistered,
> we call box_renew_replication_synchro_quorum() helper which
> finds out if evaluation of replication_synchro_quorum is
> needed and if so we calculate new replication_synchro_quorum
> value based on number of currently registered replicas. Then
> we notify dependent systems such as qsync and raft to update
> their guts.
>
> Closes #5446
>
> Signed-off-by: Cyrill Gorcunov<gorcunov@gmail.com>
>
> @TarantoolBot document
> Title: Synchronous replication
>
> The `replication_synchro_quorum` parameter allows to specify
> value not just as a plain integer number but as a formula too.
> The formula should use symbol `n` to represent amount of
> registered replicas.
>
> For example the canonical definition for a quorum (ie majority
> of members in a set) of `n` replicas is `n/2+1`. For such
> configuration one can define
>
> ```
> box.cfg {replication_synchro_quorum = "n/2+1"}
> ```
>
> Note that for sake of simplicity quorum evaluation never returns
> negative values thus for the case of formula say `n-2` the result
> will be 1 until number of replicas become 4 and more.


Hi! Thanks for the patch!

Please find my comments below.

> ---
>   src/box/box.cc           | 115 +++++++++++++++++++++++++++++++++++++--
>   src/box/box.h            |   1 +
>   src/box/lua/load_cfg.lua |   2 +-
>   src/box/replication.cc   |   4 ++
>   4 files changed, 117 insertions(+), 5 deletions(-)
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 1f7dec362..e4cb013c3 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -562,10 +562,81 @@ box_check_replication_sync_lag(void)
>   	return lag;
>   }
>   
> +/**
> + * Evaluate replication syncro quorum number from a formula.
> + */
> +static int
> +eval_replication_synchro_quorum(int nr_replicas)
> +{
> +	const char fmt[] =
> +		"local f, err = loadstring(\"return (%s)\")\n"
> +		"if not f then return 'failed to load \"%s\"' end\n"
> +		"setfenv(f, { n = %d })\n"
> +		"local ok, res = pcall(f)\n"
> +		"if not ok then return res end\n"
> +		"return math.floor(res)\n";
> +	char buf[512];
> +	int value = -1;
> +
> +	const char *expr = cfg_gets("replication_synchro_quorum");
> +	size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
> +			      expr, nr_replicas);
> +	if (ret >= sizeof(buf)) {
> +		diag_set(ClientError, ER_CFG,
> +			 "replication_synchro_quorum",
> +			 "the expression is too big");
> +		return -1;
> +	}
> +
> +	luaL_loadstring(tarantool_L, buf);
> +	lua_call(tarantool_L, 0, 1);
> +
> +	if (lua_isnumber(tarantool_L, -1)) {
> +		value = (int)lua_tonumber(tarantool_L, -1);
> +	} else {
> +		diag_set(ClientError, ER_CFG,
> +			 "replication_synchro_quorum",
> +			 lua_tostring(tarantool_L, -1));
> +		return -1;
> +	}
> +	lua_pop(tarantool_L, 1);
> +
> +	/*
> +	 * At least we should have 1 node to sync, thus
> +	 * if the formula has evaluated to some negative
> +	 * value (say it was n-2) do not treat it as an
> +	 * error but just yield a minimum valid magnitude.
> +	 */
> +	if (value < 0) {
> +		say_warn("replication_synchro_quorum evaluated "
> +			 "to the negative value %d, ignore", value);
> +	}
> +	return MAX(1, MIN(value, VCLOCK_MAX-1));


I'd add a warning for value >= VCLOCK_MAX too.

And since you're checking for value explicitly anyway, IMO it'd be easier to
follow the code this way (up to you):


if (value < 0) {
     say_warn(...);
     value = 1;
} else if (value >= VCLOCK_MAX) {
     say_warn(...);
     value = VCLOCK_MAX - 1;
}
return value;


> +}
> +
>   static int
>   box_check_replication_synchro_quorum(void)
>   {
> -	int quorum = cfg_geti("replication_synchro_quorum");
> +	int quorum = 0;
> +
> +	if (!cfg_isnumber("replication_synchro_quorum")) {
> +		/*
> +		 * The formula uses symbolic name 'n' as
> +		 * a number of currently registered replicas
> +		 * thus lets pass a sane value here.
> +		 *
> +		 * Note though at moment of bootstrap this value
> +		 * is zero but the evaluator will return a valid
> +		 * number back, we rather use this variable in
> +		 * a sake of "sense" pointing out that we're
> +		 * depending on number of replicas.
> +		 */


This comment is not quite correct. This code will work not only on
bootstrap, but also on reconfiguration (box_set_replication_synchro_quorum
is called each time you issue box.cfg{replication_synchro_quorum=...})

So the value we pass below is not only sane, its the only correct value 
to use.

I'd leave only this part of the comment (or any its variation you like):

Note that at the moment of bootstrap this value is zero.
We rely on evaluator returning a correct result (quorum = 1) in this case.


Or maybe simply pass MAX(1, replicaset.registered_count) each time?
With a relevant comment about bootstrap.


> +		int value = replicaset.registered_count;
> +		quorum = eval_replication_synchro_quorum(value);
> +	} else {
> +		quorum = cfg_geti("replication_synchro_quorum");
> +	}
> +
>   	if (quorum <= 0 || quorum >= VCLOCK_MAX) {
>   		diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
>   			 "the value must be greater than zero and less than "
> @@ -918,18 +989,54 @@ box_set_replication_sync_lag(void)
>   	replication_sync_lag = box_check_replication_sync_lag();
>   }
>   
> +/**
> + * Assign new replication_synchro_quorum value
> + * and notify dependent subsystems.
> + */
> +static void
> +set_replication_synchro_quorum(int quorum)
> +{
> +	assert(quorum > 0 && quorum < VCLOCK_MAX);
> +
> +	replication_synchro_quorum = quorum;
> +	txn_limbo_on_parameters_change(&txn_limbo);
> +	raft_cfg_election_quorum(box_raft());
> +}
> +
>   int
>   box_set_replication_synchro_quorum(void)
>   {
>   	int value = box_check_replication_synchro_quorum();
>   	if (value < 0)
>   		return -1;
> -	replication_synchro_quorum = value;
> -	txn_limbo_on_parameters_change(&txn_limbo);
> -	raft_cfg_election_quorum(box_raft());
> +	set_replication_synchro_quorum(value);
>   	return 0;
>   }
>   
> +/**
> + * Renew replication_synchro_quorum value if defined
> + * as a formula and we need to recalculate it.
> + */
> +void
> +box_renew_replication_synchro_quorum(void)


What do you think of `box_update_replication_synchro_quorum`?


> +{
> +	if (cfg_isnumber("replication_synchro_quorum"))
> +		return;
> +
> +	/*
> +	 * The formula has been verified already on the bootstrap
> +	 * stage (and on dynamic reconfig as well), still there
> +	 * is a Lua call inside, heck knowns what could go wrong
> +	 * there thus panic if we're screwed.
> +	 */
> +	int value = replicaset.registered_count;
> +	int quorum = eval_replication_synchro_quorum(value);
> +	if (quorum < 0)
> +		panic("failed to eval replication_synchro_quorum");


I propose to use the same check we had in 
box_check_replication_synchro_quorum():

quorum <= 0 || quorum >= VCLOCK_MAX

For now the values are always in correct range (or < 0), but who knows
how quorum eval may change in future?


> +	say_info("renew replication_synchro_quorum = %d", quorum);
> +	set_replication_synchro_quorum(quorum);
> +}
> +
>   int
>   box_set_replication_synchro_timeout(void)
>   {
> diff --git a/src/box/box.h b/src/box/box.h
> index b47a220b7..dd943f169 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void);
>   void box_set_replication_connect_quorum(void);
>   void box_set_replication_sync_lag(void);
>   int box_set_replication_synchro_quorum(void);
> +void box_renew_replication_synchro_quorum(void);
>   int box_set_replication_synchro_timeout(void);
>   void box_set_replication_sync_timeout(void);
>   void box_set_replication_skip_conflict(void);
> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
> index 76e2e92c2..af66c0e46 100644
> --- a/src/box/lua/load_cfg.lua
> +++ b/src/box/lua/load_cfg.lua
> @@ -172,7 +172,7 @@ local template_cfg = {
>       replication_timeout = 'number',
>       replication_sync_lag = 'number',
>       replication_sync_timeout = 'number',
> -    replication_synchro_quorum = 'number',
> +    replication_synchro_quorum = 'string, number',
>       replication_synchro_timeout = 'number',
>       replication_connect_timeout = 'number',
>       replication_connect_quorum = 'number',
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 65512cf0f..f89b8dbc3 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -250,6 +250,8 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
>   	say_info("assigned id %d to replica %s",
>   		 replica->id, tt_uuid_str(&replica->uuid));
>   	replica->anon = false;
> +
> +	box_renew_replication_synchro_quorum();
>   }
>   
>   void
> @@ -298,6 +300,8 @@ replica_clear_id(struct replica *replica)
>   		assert(!replica->anon);
>   		replica_delete(replica);
>   	}
> +
> +	box_renew_replication_synchro_quorum();
>   }
>   
>   void

-- 
Serge Petrenko

  reply	other threads:[~2020-11-25 11:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 15:24 [Tarantool-patches] [PATCH v2 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov
2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
2020-11-25 11:36   ` Serge Petrenko [this message]
2020-11-25 11:55     ` Cyrill Gorcunov
2020-11-25 12:10       ` Serge Petrenko
2020-11-25 12:19         ` Cyrill Gorcunov
2020-11-25 12:04   ` Serge Petrenko
2020-11-25 12:12     ` Cyrill Gorcunov
2020-11-25 12:46       ` Serge Petrenko
2020-11-25 12:53         ` Cyrill Gorcunov
2020-11-25 13:49           ` Serge Petrenko
2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov
2020-11-25 13:57   ` Serge Petrenko
2020-11-25 14:10     ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5e82316-0a58-09f8-17be-3a062a4780d7@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox