From: Serge Petrenko <sergepetrenko@tarantool.org> To: Cyrill Gorcunov <gorcunov@gmail.com>, tml <tarantool-patches@dev.tarantool.org> Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Date: Wed, 25 Nov 2020 14:36:18 +0300 [thread overview] Message-ID: <f5e82316-0a58-09f8-17be-3a062a4780d7@tarantool.org> (raw) In-Reply-To: <20201124152405.1174898-3-gorcunov@gmail.com> 24.11.2020 18:24, Cyrill Gorcunov пишет: > When synchronous replication is used we prefer a user to specify > a quorum number, ie the number of replicas where data must be > replicated before the master node continue accepting new > transactions. > > This is not very convenient since a user may not know initially > how many replicas will be used. Moreover the number of replicas > may vary dynamically. For this sake we allow to specify the > number of quorum in a symbolic way. > > For example > > box.cfg { > replication_synchro_quorum = "n/2+1", > } > > where `n` is a number of registered replicas in a cluster. > Once new replica attached or old one detached the number > is renewed and propagated. > > Internally on each replica_set_id() and replica_clear_id(), > ie at moment when replica get registered or unregistered, > we call box_renew_replication_synchro_quorum() helper which > finds out if evaluation of replication_synchro_quorum is > needed and if so we calculate new replication_synchro_quorum > value based on number of currently registered replicas. Then > we notify dependent systems such as qsync and raft to update > their guts. > > Closes #5446 > > Signed-off-by: Cyrill Gorcunov<gorcunov@gmail.com> > > @TarantoolBot document > Title: Synchronous replication > > The `replication_synchro_quorum` parameter allows to specify > value not just as a plain integer number but as a formula too. > The formula should use symbol `n` to represent amount of > registered replicas. > > For example the canonical definition for a quorum (ie majority > of members in a set) of `n` replicas is `n/2+1`. For such > configuration one can define > > ``` > box.cfg {replication_synchro_quorum = "n/2+1"} > ``` > > Note that for sake of simplicity quorum evaluation never returns > negative values thus for the case of formula say `n-2` the result > will be 1 until number of replicas become 4 and more. Hi! Thanks for the patch! Please find my comments below. > --- > src/box/box.cc | 115 +++++++++++++++++++++++++++++++++++++-- > src/box/box.h | 1 + > src/box/lua/load_cfg.lua | 2 +- > src/box/replication.cc | 4 ++ > 4 files changed, 117 insertions(+), 5 deletions(-) > > diff --git a/src/box/box.cc b/src/box/box.cc > index 1f7dec362..e4cb013c3 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -562,10 +562,81 @@ box_check_replication_sync_lag(void) > return lag; > } > > +/** > + * Evaluate replication syncro quorum number from a formula. > + */ > +static int > +eval_replication_synchro_quorum(int nr_replicas) > +{ > + const char fmt[] = > + "local f, err = loadstring(\"return (%s)\")\n" > + "if not f then return 'failed to load \"%s\"' end\n" > + "setfenv(f, { n = %d })\n" > + "local ok, res = pcall(f)\n" > + "if not ok then return res end\n" > + "return math.floor(res)\n"; > + char buf[512]; > + int value = -1; > + > + const char *expr = cfg_gets("replication_synchro_quorum"); > + size_t ret = snprintf(buf, sizeof(buf), fmt, expr, > + expr, nr_replicas); > + if (ret >= sizeof(buf)) { > + diag_set(ClientError, ER_CFG, > + "replication_synchro_quorum", > + "the expression is too big"); > + return -1; > + } > + > + luaL_loadstring(tarantool_L, buf); > + lua_call(tarantool_L, 0, 1); > + > + if (lua_isnumber(tarantool_L, -1)) { > + value = (int)lua_tonumber(tarantool_L, -1); > + } else { > + diag_set(ClientError, ER_CFG, > + "replication_synchro_quorum", > + lua_tostring(tarantool_L, -1)); > + return -1; > + } > + lua_pop(tarantool_L, 1); > + > + /* > + * At least we should have 1 node to sync, thus > + * if the formula has evaluated to some negative > + * value (say it was n-2) do not treat it as an > + * error but just yield a minimum valid magnitude. > + */ > + if (value < 0) { > + say_warn("replication_synchro_quorum evaluated " > + "to the negative value %d, ignore", value); > + } > + return MAX(1, MIN(value, VCLOCK_MAX-1)); I'd add a warning for value >= VCLOCK_MAX too. And since you're checking for value explicitly anyway, IMO it'd be easier to follow the code this way (up to you): if (value < 0) { say_warn(...); value = 1; } else if (value >= VCLOCK_MAX) { say_warn(...); value = VCLOCK_MAX - 1; } return value; > +} > + > static int > box_check_replication_synchro_quorum(void) > { > - int quorum = cfg_geti("replication_synchro_quorum"); > + int quorum = 0; > + > + if (!cfg_isnumber("replication_synchro_quorum")) { > + /* > + * The formula uses symbolic name 'n' as > + * a number of currently registered replicas > + * thus lets pass a sane value here. > + * > + * Note though at moment of bootstrap this value > + * is zero but the evaluator will return a valid > + * number back, we rather use this variable in > + * a sake of "sense" pointing out that we're > + * depending on number of replicas. > + */ This comment is not quite correct. This code will work not only on bootstrap, but also on reconfiguration (box_set_replication_synchro_quorum is called each time you issue box.cfg{replication_synchro_quorum=...}) So the value we pass below is not only sane, its the only correct value to use. I'd leave only this part of the comment (or any its variation you like): Note that at the moment of bootstrap this value is zero. We rely on evaluator returning a correct result (quorum = 1) in this case. Or maybe simply pass MAX(1, replicaset.registered_count) each time? With a relevant comment about bootstrap. > + int value = replicaset.registered_count; > + quorum = eval_replication_synchro_quorum(value); > + } else { > + quorum = cfg_geti("replication_synchro_quorum"); > + } > + > if (quorum <= 0 || quorum >= VCLOCK_MAX) { > diag_set(ClientError, ER_CFG, "replication_synchro_quorum", > "the value must be greater than zero and less than " > @@ -918,18 +989,54 @@ box_set_replication_sync_lag(void) > replication_sync_lag = box_check_replication_sync_lag(); > } > > +/** > + * Assign new replication_synchro_quorum value > + * and notify dependent subsystems. > + */ > +static void > +set_replication_synchro_quorum(int quorum) > +{ > + assert(quorum > 0 && quorum < VCLOCK_MAX); > + > + replication_synchro_quorum = quorum; > + txn_limbo_on_parameters_change(&txn_limbo); > + raft_cfg_election_quorum(box_raft()); > +} > + > int > box_set_replication_synchro_quorum(void) > { > int value = box_check_replication_synchro_quorum(); > if (value < 0) > return -1; > - replication_synchro_quorum = value; > - txn_limbo_on_parameters_change(&txn_limbo); > - raft_cfg_election_quorum(box_raft()); > + set_replication_synchro_quorum(value); > return 0; > } > > +/** > + * Renew replication_synchro_quorum value if defined > + * as a formula and we need to recalculate it. > + */ > +void > +box_renew_replication_synchro_quorum(void) What do you think of `box_update_replication_synchro_quorum`? > +{ > + if (cfg_isnumber("replication_synchro_quorum")) > + return; > + > + /* > + * The formula has been verified already on the bootstrap > + * stage (and on dynamic reconfig as well), still there > + * is a Lua call inside, heck knowns what could go wrong > + * there thus panic if we're screwed. > + */ > + int value = replicaset.registered_count; > + int quorum = eval_replication_synchro_quorum(value); > + if (quorum < 0) > + panic("failed to eval replication_synchro_quorum"); I propose to use the same check we had in box_check_replication_synchro_quorum(): quorum <= 0 || quorum >= VCLOCK_MAX For now the values are always in correct range (or < 0), but who knows how quorum eval may change in future? > + say_info("renew replication_synchro_quorum = %d", quorum); > + set_replication_synchro_quorum(quorum); > +} > + > int > box_set_replication_synchro_timeout(void) > { > diff --git a/src/box/box.h b/src/box/box.h > index b47a220b7..dd943f169 100644 > --- a/src/box/box.h > +++ b/src/box/box.h > @@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void); > void box_set_replication_connect_quorum(void); > void box_set_replication_sync_lag(void); > int box_set_replication_synchro_quorum(void); > +void box_renew_replication_synchro_quorum(void); > int box_set_replication_synchro_timeout(void); > void box_set_replication_sync_timeout(void); > void box_set_replication_skip_conflict(void); > diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua > index 76e2e92c2..af66c0e46 100644 > --- a/src/box/lua/load_cfg.lua > +++ b/src/box/lua/load_cfg.lua > @@ -172,7 +172,7 @@ local template_cfg = { > replication_timeout = 'number', > replication_sync_lag = 'number', > replication_sync_timeout = 'number', > - replication_synchro_quorum = 'number', > + replication_synchro_quorum = 'string, number', > replication_synchro_timeout = 'number', > replication_connect_timeout = 'number', > replication_connect_quorum = 'number', > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 65512cf0f..f89b8dbc3 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -250,6 +250,8 @@ replica_set_id(struct replica *replica, uint32_t replica_id) > say_info("assigned id %d to replica %s", > replica->id, tt_uuid_str(&replica->uuid)); > replica->anon = false; > + > + box_renew_replication_synchro_quorum(); > } > > void > @@ -298,6 +300,8 @@ replica_clear_id(struct replica *replica) > assert(!replica->anon); > replica_delete(replica); > } > + > + box_renew_replication_synchro_quorum(); > } > > void -- Serge Petrenko
next prev parent reply other threads:[~2020-11-25 11:36 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-24 15:24 [Tarantool-patches] [PATCH v2 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov 2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov 2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov 2020-11-25 11:36 ` Serge Petrenko [this message] 2020-11-25 11:55 ` Cyrill Gorcunov 2020-11-25 12:10 ` Serge Petrenko 2020-11-25 12:19 ` Cyrill Gorcunov 2020-11-25 12:04 ` Serge Petrenko 2020-11-25 12:12 ` Cyrill Gorcunov 2020-11-25 12:46 ` Serge Petrenko 2020-11-25 12:53 ` Cyrill Gorcunov 2020-11-25 13:49 ` Serge Petrenko 2020-11-24 15:24 ` [Tarantool-patches] [PATCH v2 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov 2020-11-25 13:57 ` Serge Petrenko 2020-11-25 14:10 ` Cyrill Gorcunov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=f5e82316-0a58-09f8-17be-3a062a4780d7@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=gorcunov@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 2/3] cfg: support symbolic evaluation of replication_synchro_quorum' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox