From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
tml <tarantool-patches@dev.tarantool.org>
Cc: Mons Anderson <v.perepelitsa@corp.mail.ru>
Subject: Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum
Date: Sat, 5 Dec 2020 00:52:23 +0100 [thread overview]
Message-ID: <bae58957-3699-cc0b-aae5-ebe223617e67@tarantool.org> (raw)
In-Reply-To: <20201203140446.66312-3-gorcunov@gmail.com>
Hi! Thanks for the patch!
See 6 comments below.
On 03.12.2020 15:04, Cyrill Gorcunov wrote:
> When synchronous replication is used we prefer a user to specify
> a quorum number, ie the number of replicas where data must be
> replicated before the master node continue accepting new
> transactions.
>
> This is not very convenient since a user may not know initially
> how many replicas will be used. Moreover the number of replicas
> may vary dynamically. For this sake we allow to specify the
> number of quorum in a symbolic way.
1. Would be better to have this paragraph in the docbot request.
> For example
>
> box.cfg {
> replication_synchro_quorum = "N/2+1",
> }
>
> where `N` is a number of registered replicas in a cluster.
> Once new replica attached or old one detached the number
> is renewed and propagated.
>
> Internally on each replica_set_id() and replica_clear_id(),
> ie at moment when replica get registered or unregistered,
> we call box_update_replication_synchro_quorum() helper which
> finds out if evaluation of replication_synchro_quorum is
> needed and if so we calculate new replication_synchro_quorum
> value based on number of currently registered replicas. Then
> we notify dependent systems such as qsync and raft to update
> their guts.
>
> Note: we do *not* change the default settings for this option,
> it remains 1 by default for now. Change the default option should
> be done as a separate commit once we make sure that everything is
> fine.
>
> Closes #5446
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
>
> @TarantoolBot document
> Title: Synchronous replication
>
> The `replication_synchro_quorum` parameter allows to specify
> value not just as a plain integer number but as a formula too.
> The formula should use symbol `N` to represent amount of
> registered replicas.
>
> For example the canonical definition for a quorum (ie majority
> of members in a set) of `N` replicas is `N/2+1`. For such
> configuration one can define
>
> ```
> box.cfg {replication_synchro_quorum = "N/2+1"}
> ```
>
> Note that for sake of simplicity quorum evaluation never returns
> negative values thus for the case of formula say `N-2` the result
> will be 1 until number of replicas become 4 and more.
> ---
> src/box/box.cc | 142 +++++++++++++++++++++++++++++++++++++--
> src/box/box.h | 1 +
> src/box/lua/load_cfg.lua | 2 +-
> src/box/replication.cc | 4 +-
> 4 files changed, 142 insertions(+), 7 deletions(-)
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index a8bc3471d..b9d078de4 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -554,10 +554,101 @@ box_check_replication_sync_lag(void)
> return lag;
> }
>
> +/**
> + * Evaluate replication syncro quorum number from a formula.
> + */
> +static int
> +eval_replication_synchro_quorum(int nr_replicas)
> +{
> + const char fmt[] =
> + "local expr = [[%s]]\n"
> + "local f, err = loadstring('return ('..expr..')')\n"
> + "if not f then "
> + "error(string.format('Failed to load \%\%s:"
> + "\%\%s', expr, err)) "
> + "end\n"
> + "setfenv(f, {N = %d, math = {"
> + "ceil = math.ceil,"
> + "floor = math.floor,"
> + "abs = math.abs,"
> + "random = math.random,"
> + "min = math.min,"
> + "max = math.abs,"
> + "sqrt = math.sqrt,"
> + "fmod = math.fmod,"
> + "}})\n"
> + "return math.floor(f())\n";
> + char buf[1024];
2. Probably better use tt_sprintf(). But this is ok too.
> + int value = -1;
> +
> + const char *expr = cfg_gets("replication_synchro_quorum");
> + size_t ret = snprintf(buf, sizeof(buf), fmt, expr, nr_replicas);
> + if (ret >= sizeof(buf)) {
> + diag_set(ClientError, ER_CFG,
> + "replication_synchro_quorum",
> + "the expression is too big");
> + return -1;
> + }
> +
> + luaL_loadstring(tarantool_L, buf);
> + if (lua_pcall(tarantool_L, 0, 1, 0) != 0) {
> + diag_set(ClientError, ER_CFG,
> + "replication_synchro_quorum",
> + lua_tostring(tarantool_L, -1));
> + return -1;
> + }
> +
> + if (lua_isnumber(tarantool_L, -1))
> + value = (int)lua_tonumber(tarantool_L, -1);
> + lua_pop(tarantool_L, 1);
> +
> + /*
> + * At least we should have 1 node to sync, thus
> + * if the formula has evaluated to some negative
> + * value (say it was n-2) do not treat it as an
> + * error but just yield a minimum valid magnitude.
> + */
> + if (value < 0) {
> + const int value_min = 1;
> + say_warn_ratelimited("replication_synchro_quorum evaluated "
> + "to the negative value %d, set to %d",
> + value, value_min);
3. Why is it ratelimited? It is not a thing happening on each request,
which could clog the logs.
4. What if number of replicas is 1, and the formula is N - 1? I don't
think 0 would be valid in this case. 0 is valid only and only when
replica count is also 0.
> + value = value_min;
> + } else if (value >= VCLOCK_MAX) {
> + const int value_max = VCLOCK_MAX - 1;
> + say_warn_ratelimited("replication_synchro_quorum evaluated "
> + "to value %d, set to %d",
> + value, value_max);
> + value = value_max;
> + }
> +
> + /*
> + * We never return 0, even if we're in bootstrap
> + * stage were number of replicas equals zero we
> + * should consider the node itself as a minimum
> + * quorum number.
> + */
> + return MAX(1, value);> @@ -910,18 +1001,61 @@ box_set_replication_sync_lag(void)
> replication_sync_lag = box_check_replication_sync_lag();
> }
>
> +/**
> + * Assign new replication_synchro_quorum value
> + * and notify dependent subsystems.
> + */
> +static void
> +set_replication_synchro_quorum(int quorum)
> +{
> + assert(quorum > 0 && quorum < VCLOCK_MAX);
> +
> + replication_synchro_quorum = quorum;
> + txn_limbo_on_parameters_change(&txn_limbo);
> + box_raft_update_election_quorum();
> +}
> +
> int
> box_set_replication_synchro_quorum(void)
> {
> int value = box_check_replication_synchro_quorum();
> if (value < 0)
> return -1;
> - replication_synchro_quorum = value;
> - txn_limbo_on_parameters_change(&txn_limbo);
> - box_raft_update_election_quorum();
> + set_replication_synchro_quorum(value);
> return 0;
> }
>
> +/**
> + * Renew replication_synchro_quorum value if defined
> + * as a formula and we need to recalculate it.
> + */
> +void
> +box_update_replication_synchro_quorum(void)
> +{
> + if (cfg_isnumber("replication_synchro_quorum")) {
> + /*
> + * Even if replication_synchro_quorum is a constant
> + * number the RAFT engine should be notified on
> + * change of replicas amount.
> + */
> + box_raft_update_election_quorum();
> + return;
> + }
5. Too complex. Better move set_replication_synchro_quorum into
box_update_replication_synchro_quorum(), and call the latter
from box_set_replication_synchro_quorum(), like it was before
with the limbo and raft.
Having more than one quorum-updater function looks confusing.
> +
> + /*
> + * The formula has been verified already on the bootstrap
> + * stage (and on dynamic reconfig as well), still there
> + * is a Lua call inside, heck knowns what could go wrong
> + * there thus panic if we're screwed.
> + */
> + int value = replicaset.registered_count;
> + int quorum = eval_replication_synchro_quorum(value);
> + if (quorum < 0 || quorum >= VCLOCK_MAX)
> + panic("failed to eval replication_synchro_quorum");
> + say_info("update replication_synchro_quorum = %d", quorum);
> + set_replication_synchro_quorum(quorum);
> +}
6.
tarantool> box.cfg{replication_synchro_quorum='"1"'}
2020-12-05 00:49:02.251 [59985] main/103/interactive I> set 'replication_synchro_quorum' configuration option to "\"1\""
---
...
I gave string "1", but it is accepted. Can it be fixed?
Why don't I see this log line "update replication_synchro_quorum",
which you use in the test in the next commit?
next prev parent reply other threads:[~2020-12-04 23:52 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov
2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
2020-12-04 23:52 ` Vladislav Shpilevoy [this message]
2020-12-07 20:17 ` Cyrill Gorcunov
2020-12-07 21:25 ` Vladislav Shpilevoy
2020-12-07 21:48 ` Cyrill Gorcunov
2020-12-08 8:02 ` Cyrill Gorcunov
2020-12-09 23:22 ` Vladislav Shpilevoy
2020-12-11 12:25 ` Cyrill Gorcunov
2020-12-13 18:12 ` Vladislav Shpilevoy
2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov
2020-12-04 23:52 ` Vladislav Shpilevoy
2020-12-08 8:48 ` Cyrill Gorcunov
2020-12-09 23:22 ` Vladislav Shpilevoy
2020-12-04 10:15 ` [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Serge Petrenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bae58957-3699-cc0b-aae5-ebe223617e67@tarantool.org \
--to=v.shpilevoy@tarantool.org \
--cc=gorcunov@gmail.com \
--cc=tarantool-patches@dev.tarantool.org \
--cc=v.perepelitsa@corp.mail.ru \
--subject='Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox