* [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically @ 2020-12-03 14:04 Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-03 14:04 UTC (permalink / raw) To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy From the issue description: > The reason is that users will likely not understand how the option > should be configured properly, and will break something accidentally. > So this idea about allowance to write an expression on the cluster > size allows to specify the canonical N/2 + 1 formula, and not update > it manually on all instances, when a new node is added, or an existing > one is deleted. v2 (by Serge): - keep replication_synchro_quorum been skipped at bootstrap in load_cfg.lua - eliminate redundant say_info calls - call quorum update routine from replica_set_id/replica_clear_id - use replicaset.registered_count directly when evaluating the formula - make quorum evaluation procedure always return value in allowed range, the only error which may happen here is some syntax error or Lua evaluation errors - a test has been added v3 (by Serge, Mons, Vlad): - use replica.lua in tests - use N symbol in formula - use lua_pcall when evaluating a formula - make formula more safe itself, provide various math helpers - use box_update_replication_synchro_quorum name as a general updater from replication code - do not forget to update raft election quorum inside box_update_replication_synchro_quorum - print warns inside functions evaluator if value get out of bounds issue https://github.com/tarantool/tarantool/issues/5446 branch gorcunov/gh-5446-eval-quorum-3 Cyrill Gorcunov (3): cfg: add cfg_isnumber helper cfg: support symbolic evaluation of replication_synchro_quorum test: add replication/gh-5446-sqync-eval-quorum.test.lua src/box/box.cc | 142 +++++++++++++++- src/box/box.h | 1 + src/box/lua/load_cfg.lua | 2 +- src/box/replication.cc | 4 +- src/cfg.c | 9 + src/cfg.h | 6 + .../gh-5446-sqync-eval-quorum.result | 156 ++++++++++++++++++ .../gh-5446-sqync-eval-quorum.test.lua | 62 +++++++ test/replication/replica-quorum-1.lua | 1 + test/replication/replica-quorum-2.lua | 1 + test/replication/replica-quorum-3.lua | 1 + test/replication/replica-quorum-4.lua | 1 + 12 files changed, 379 insertions(+), 7 deletions(-) create mode 100644 test/replication/gh-5446-sqync-eval-quorum.result create mode 100644 test/replication/gh-5446-sqync-eval-quorum.test.lua create mode 120000 test/replication/replica-quorum-1.lua create mode 120000 test/replication/replica-quorum-2.lua create mode 120000 test/replication/replica-quorum-3.lua create mode 120000 test/replication/replica-quorum-4.lua base-commit: bd03dfc76c0b76f56374c3e66052e2af0c50ae65 -- 2.26.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper 2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov @ 2020-12-03 14:04 ` Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-03 14:04 UTC (permalink / raw) To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy We will need it to figure out if parameter is a numeric value when doing configuration check. Part-of #5446 Acked-by: Serge Petrenko <sergepetrenko@tarantool.org> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- src/cfg.c | 9 +++++++++ src/cfg.h | 6 ++++++ 2 files changed, 15 insertions(+) diff --git a/src/cfg.c b/src/cfg.c index 46cff1999..cf07d5817 100644 --- a/src/cfg.c +++ b/src/cfg.c @@ -57,6 +57,15 @@ cfg_geti(const char *param) return val; } +bool +cfg_isnumber(const char *param) +{ + cfg_get(param); + bool ret = !!lua_isnumber(tarantool_L, -1); + lua_pop(tarantool_L, 1); + return ret; +} + int cfg_getb(const char *param) { diff --git a/src/cfg.h b/src/cfg.h index 140bdddb8..e2955e6b2 100644 --- a/src/cfg.h +++ b/src/cfg.h @@ -40,6 +40,12 @@ extern "C" { int cfg_geti(const char *param); +/** + * Test if cfg parameter is a number. + */ +bool +cfg_isnumber(const char *param); + /** * Gets boolean parameter of cfg. * Returns -1 in case of nil -- 2.26.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov @ 2020-12-03 14:04 ` Cyrill Gorcunov 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov 2020-12-04 10:15 ` [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Serge Petrenko 3 siblings, 1 reply; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-03 14:04 UTC (permalink / raw) To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy When synchronous replication is used we prefer a user to specify a quorum number, ie the number of replicas where data must be replicated before the master node continue accepting new transactions. This is not very convenient since a user may not know initially how many replicas will be used. Moreover the number of replicas may vary dynamically. For this sake we allow to specify the number of quorum in a symbolic way. For example box.cfg { replication_synchro_quorum = "N/2+1", } where `N` is a number of registered replicas in a cluster. Once new replica attached or old one detached the number is renewed and propagated. Internally on each replica_set_id() and replica_clear_id(), ie at moment when replica get registered or unregistered, we call box_update_replication_synchro_quorum() helper which finds out if evaluation of replication_synchro_quorum is needed and if so we calculate new replication_synchro_quorum value based on number of currently registered replicas. Then we notify dependent systems such as qsync and raft to update their guts. Note: we do *not* change the default settings for this option, it remains 1 by default for now. Change the default option should be done as a separate commit once we make sure that everything is fine. Closes #5446 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: Synchronous replication The `replication_synchro_quorum` parameter allows to specify value not just as a plain integer number but as a formula too. The formula should use symbol `N` to represent amount of registered replicas. For example the canonical definition for a quorum (ie majority of members in a set) of `N` replicas is `N/2+1`. For such configuration one can define ``` box.cfg {replication_synchro_quorum = "N/2+1"} ``` Note that for sake of simplicity quorum evaluation never returns negative values thus for the case of formula say `N-2` the result will be 1 until number of replicas become 4 and more. --- src/box/box.cc | 142 +++++++++++++++++++++++++++++++++++++-- src/box/box.h | 1 + src/box/lua/load_cfg.lua | 2 +- src/box/replication.cc | 4 +- 4 files changed, 142 insertions(+), 7 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index a8bc3471d..b9d078de4 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -554,10 +554,101 @@ box_check_replication_sync_lag(void) return lag; } +/** + * Evaluate replication syncro quorum number from a formula. + */ +static int +eval_replication_synchro_quorum(int nr_replicas) +{ + const char fmt[] = + "local expr = [[%s]]\n" + "local f, err = loadstring('return ('..expr..')')\n" + "if not f then " + "error(string.format('Failed to load \%\%s:" + "\%\%s', expr, err)) " + "end\n" + "setfenv(f, {N = %d, math = {" + "ceil = math.ceil," + "floor = math.floor," + "abs = math.abs," + "random = math.random," + "min = math.min," + "max = math.abs," + "sqrt = math.sqrt," + "fmod = math.fmod," + "}})\n" + "return math.floor(f())\n"; + char buf[1024]; + int value = -1; + + const char *expr = cfg_gets("replication_synchro_quorum"); + size_t ret = snprintf(buf, sizeof(buf), fmt, expr, nr_replicas); + if (ret >= sizeof(buf)) { + diag_set(ClientError, ER_CFG, + "replication_synchro_quorum", + "the expression is too big"); + return -1; + } + + luaL_loadstring(tarantool_L, buf); + if (lua_pcall(tarantool_L, 0, 1, 0) != 0) { + diag_set(ClientError, ER_CFG, + "replication_synchro_quorum", + lua_tostring(tarantool_L, -1)); + return -1; + } + + if (lua_isnumber(tarantool_L, -1)) + value = (int)lua_tonumber(tarantool_L, -1); + lua_pop(tarantool_L, 1); + + /* + * At least we should have 1 node to sync, thus + * if the formula has evaluated to some negative + * value (say it was n-2) do not treat it as an + * error but just yield a minimum valid magnitude. + */ + if (value < 0) { + const int value_min = 1; + say_warn_ratelimited("replication_synchro_quorum evaluated " + "to the negative value %d, set to %d", + value, value_min); + value = value_min; + } else if (value >= VCLOCK_MAX) { + const int value_max = VCLOCK_MAX - 1; + say_warn_ratelimited("replication_synchro_quorum evaluated " + "to value %d, set to %d", + value, value_max); + value = value_max; + } + + /* + * We never return 0, even if we're in bootstrap + * stage were number of replicas equals zero we + * should consider the node itself as a minimum + * quorum number. + */ + return MAX(1, value); +} + static int box_check_replication_synchro_quorum(void) { - int quorum = cfg_geti("replication_synchro_quorum"); + int quorum = 0; + + if (!cfg_isnumber("replication_synchro_quorum")) { + /* + * The formula uses symbolic name 'N' as + * a number of currently registered replicas. + */ + int value = replicaset.registered_count; + quorum = eval_replication_synchro_quorum(value); + if (quorum < 0) + return -1; + } else { + quorum = cfg_geti("replication_synchro_quorum"); + } + if (quorum <= 0 || quorum >= VCLOCK_MAX) { diag_set(ClientError, ER_CFG, "replication_synchro_quorum", "the value must be greater than zero and less than " @@ -910,18 +1001,61 @@ box_set_replication_sync_lag(void) replication_sync_lag = box_check_replication_sync_lag(); } +/** + * Assign new replication_synchro_quorum value + * and notify dependent subsystems. + */ +static void +set_replication_synchro_quorum(int quorum) +{ + assert(quorum > 0 && quorum < VCLOCK_MAX); + + replication_synchro_quorum = quorum; + txn_limbo_on_parameters_change(&txn_limbo); + box_raft_update_election_quorum(); +} + int box_set_replication_synchro_quorum(void) { int value = box_check_replication_synchro_quorum(); if (value < 0) return -1; - replication_synchro_quorum = value; - txn_limbo_on_parameters_change(&txn_limbo); - box_raft_update_election_quorum(); + set_replication_synchro_quorum(value); return 0; } +/** + * Renew replication_synchro_quorum value if defined + * as a formula and we need to recalculate it. + */ +void +box_update_replication_synchro_quorum(void) +{ + if (cfg_isnumber("replication_synchro_quorum")) { + /* + * Even if replication_synchro_quorum is a constant + * number the RAFT engine should be notified on + * change of replicas amount. + */ + box_raft_update_election_quorum(); + return; + } + + /* + * The formula has been verified already on the bootstrap + * stage (and on dynamic reconfig as well), still there + * is a Lua call inside, heck knowns what could go wrong + * there thus panic if we're screwed. + */ + int value = replicaset.registered_count; + int quorum = eval_replication_synchro_quorum(value); + if (quorum < 0 || quorum >= VCLOCK_MAX) + panic("failed to eval replication_synchro_quorum"); + say_info("update replication_synchro_quorum = %d", quorum); + set_replication_synchro_quorum(quorum); +} + int box_set_replication_synchro_timeout(void) { diff --git a/src/box/box.h b/src/box/box.h index b47a220b7..c3e1a1276 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void); void box_set_replication_connect_quorum(void); void box_set_replication_sync_lag(void); int box_set_replication_synchro_quorum(void); +void box_update_replication_synchro_quorum(void); int box_set_replication_synchro_timeout(void); void box_set_replication_sync_timeout(void); void box_set_replication_skip_conflict(void); diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index 76e2e92c2..af66c0e46 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -172,7 +172,7 @@ local template_cfg = { replication_timeout = 'number', replication_sync_lag = 'number', replication_sync_timeout = 'number', - replication_synchro_quorum = 'number', + replication_synchro_quorum = 'string, number', replication_synchro_timeout = 'number', replication_connect_timeout = 'number', replication_connect_quorum = 'number', diff --git a/src/box/replication.cc b/src/box/replication.cc index 931c73a37..3126d86ac 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -251,7 +251,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id) say_info("assigned id %d to replica %s", replica->id, tt_uuid_str(&replica->uuid)); replica->anon = false; - box_raft_update_election_quorum(); + box_update_replication_synchro_quorum(); } void @@ -300,7 +300,7 @@ replica_clear_id(struct replica *replica) assert(!replica->anon); replica_delete(replica); } - box_raft_update_election_quorum(); + box_update_replication_synchro_quorum(); } void -- 2.26.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov @ 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-07 20:17 ` Cyrill Gorcunov 2020-12-07 21:25 ` Vladislav Shpilevoy 0 siblings, 2 replies; 16+ messages in thread From: Vladislav Shpilevoy @ 2020-12-04 23:52 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson Hi! Thanks for the patch! See 6 comments below. On 03.12.2020 15:04, Cyrill Gorcunov wrote: > When synchronous replication is used we prefer a user to specify > a quorum number, ie the number of replicas where data must be > replicated before the master node continue accepting new > transactions. > > This is not very convenient since a user may not know initially > how many replicas will be used. Moreover the number of replicas > may vary dynamically. For this sake we allow to specify the > number of quorum in a symbolic way. 1. Would be better to have this paragraph in the docbot request. > For example > > box.cfg { > replication_synchro_quorum = "N/2+1", > } > > where `N` is a number of registered replicas in a cluster. > Once new replica attached or old one detached the number > is renewed and propagated. > > Internally on each replica_set_id() and replica_clear_id(), > ie at moment when replica get registered or unregistered, > we call box_update_replication_synchro_quorum() helper which > finds out if evaluation of replication_synchro_quorum is > needed and if so we calculate new replication_synchro_quorum > value based on number of currently registered replicas. Then > we notify dependent systems such as qsync and raft to update > their guts. > > Note: we do *not* change the default settings for this option, > it remains 1 by default for now. Change the default option should > be done as a separate commit once we make sure that everything is > fine. > > Closes #5446 > > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > > @TarantoolBot document > Title: Synchronous replication > > The `replication_synchro_quorum` parameter allows to specify > value not just as a plain integer number but as a formula too. > The formula should use symbol `N` to represent amount of > registered replicas. > > For example the canonical definition for a quorum (ie majority > of members in a set) of `N` replicas is `N/2+1`. For such > configuration one can define > > ``` > box.cfg {replication_synchro_quorum = "N/2+1"} > ``` > > Note that for sake of simplicity quorum evaluation never returns > negative values thus for the case of formula say `N-2` the result > will be 1 until number of replicas become 4 and more. > --- > src/box/box.cc | 142 +++++++++++++++++++++++++++++++++++++-- > src/box/box.h | 1 + > src/box/lua/load_cfg.lua | 2 +- > src/box/replication.cc | 4 +- > 4 files changed, 142 insertions(+), 7 deletions(-) > > diff --git a/src/box/box.cc b/src/box/box.cc > index a8bc3471d..b9d078de4 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -554,10 +554,101 @@ box_check_replication_sync_lag(void) > return lag; > } > > +/** > + * Evaluate replication syncro quorum number from a formula. > + */ > +static int > +eval_replication_synchro_quorum(int nr_replicas) > +{ > + const char fmt[] = > + "local expr = [[%s]]\n" > + "local f, err = loadstring('return ('..expr..')')\n" > + "if not f then " > + "error(string.format('Failed to load \%\%s:" > + "\%\%s', expr, err)) " > + "end\n" > + "setfenv(f, {N = %d, math = {" > + "ceil = math.ceil," > + "floor = math.floor," > + "abs = math.abs," > + "random = math.random," > + "min = math.min," > + "max = math.abs," > + "sqrt = math.sqrt," > + "fmod = math.fmod," > + "}})\n" > + "return math.floor(f())\n"; > + char buf[1024]; 2. Probably better use tt_sprintf(). But this is ok too. > + int value = -1; > + > + const char *expr = cfg_gets("replication_synchro_quorum"); > + size_t ret = snprintf(buf, sizeof(buf), fmt, expr, nr_replicas); > + if (ret >= sizeof(buf)) { > + diag_set(ClientError, ER_CFG, > + "replication_synchro_quorum", > + "the expression is too big"); > + return -1; > + } > + > + luaL_loadstring(tarantool_L, buf); > + if (lua_pcall(tarantool_L, 0, 1, 0) != 0) { > + diag_set(ClientError, ER_CFG, > + "replication_synchro_quorum", > + lua_tostring(tarantool_L, -1)); > + return -1; > + } > + > + if (lua_isnumber(tarantool_L, -1)) > + value = (int)lua_tonumber(tarantool_L, -1); > + lua_pop(tarantool_L, 1); > + > + /* > + * At least we should have 1 node to sync, thus > + * if the formula has evaluated to some negative > + * value (say it was n-2) do not treat it as an > + * error but just yield a minimum valid magnitude. > + */ > + if (value < 0) { > + const int value_min = 1; > + say_warn_ratelimited("replication_synchro_quorum evaluated " > + "to the negative value %d, set to %d", > + value, value_min); 3. Why is it ratelimited? It is not a thing happening on each request, which could clog the logs. 4. What if number of replicas is 1, and the formula is N - 1? I don't think 0 would be valid in this case. 0 is valid only and only when replica count is also 0. > + value = value_min; > + } else if (value >= VCLOCK_MAX) { > + const int value_max = VCLOCK_MAX - 1; > + say_warn_ratelimited("replication_synchro_quorum evaluated " > + "to value %d, set to %d", > + value, value_max); > + value = value_max; > + } > + > + /* > + * We never return 0, even if we're in bootstrap > + * stage were number of replicas equals zero we > + * should consider the node itself as a minimum > + * quorum number. > + */ > + return MAX(1, value);> @@ -910,18 +1001,61 @@ box_set_replication_sync_lag(void) > replication_sync_lag = box_check_replication_sync_lag(); > } > > +/** > + * Assign new replication_synchro_quorum value > + * and notify dependent subsystems. > + */ > +static void > +set_replication_synchro_quorum(int quorum) > +{ > + assert(quorum > 0 && quorum < VCLOCK_MAX); > + > + replication_synchro_quorum = quorum; > + txn_limbo_on_parameters_change(&txn_limbo); > + box_raft_update_election_quorum(); > +} > + > int > box_set_replication_synchro_quorum(void) > { > int value = box_check_replication_synchro_quorum(); > if (value < 0) > return -1; > - replication_synchro_quorum = value; > - txn_limbo_on_parameters_change(&txn_limbo); > - box_raft_update_election_quorum(); > + set_replication_synchro_quorum(value); > return 0; > } > > +/** > + * Renew replication_synchro_quorum value if defined > + * as a formula and we need to recalculate it. > + */ > +void > +box_update_replication_synchro_quorum(void) > +{ > + if (cfg_isnumber("replication_synchro_quorum")) { > + /* > + * Even if replication_synchro_quorum is a constant > + * number the RAFT engine should be notified on > + * change of replicas amount. > + */ > + box_raft_update_election_quorum(); > + return; > + } 5. Too complex. Better move set_replication_synchro_quorum into box_update_replication_synchro_quorum(), and call the latter from box_set_replication_synchro_quorum(), like it was before with the limbo and raft. Having more than one quorum-updater function looks confusing. > + > + /* > + * The formula has been verified already on the bootstrap > + * stage (and on dynamic reconfig as well), still there > + * is a Lua call inside, heck knowns what could go wrong > + * there thus panic if we're screwed. > + */ > + int value = replicaset.registered_count; > + int quorum = eval_replication_synchro_quorum(value); > + if (quorum < 0 || quorum >= VCLOCK_MAX) > + panic("failed to eval replication_synchro_quorum"); > + say_info("update replication_synchro_quorum = %d", quorum); > + set_replication_synchro_quorum(quorum); > +} 6. tarantool> box.cfg{replication_synchro_quorum='"1"'} 2020-12-05 00:49:02.251 [59985] main/103/interactive I> set 'replication_synchro_quorum' configuration option to "\"1\"" --- ... I gave string "1", but it is accepted. Can it be fixed? Why don't I see this log line "update replication_synchro_quorum", which you use in the test in the next commit? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-04 23:52 ` Vladislav Shpilevoy @ 2020-12-07 20:17 ` Cyrill Gorcunov 2020-12-07 21:25 ` Vladislav Shpilevoy 1 sibling, 0 replies; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-07 20:17 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml Here is a diff I force pushed --- diff --git a/src/box/box.cc b/src/box/box.cc index b9d078de4..2dfb5bc1c 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -578,17 +578,9 @@ eval_replication_synchro_quorum(int nr_replicas) "fmod = math.fmod," "}})\n" "return math.floor(f())\n"; - char buf[1024]; int value = -1; - const char *expr = cfg_gets("replication_synchro_quorum"); - size_t ret = snprintf(buf, sizeof(buf), fmt, expr, nr_replicas); - if (ret >= sizeof(buf)) { - diag_set(ClientError, ER_CFG, - "replication_synchro_quorum", - "the expression is too big"); - return -1; - } + const char *buf = tt_sprintf(fmt, expr, nr_replicas); luaL_loadstring(tarantool_L, buf); if (lua_pcall(tarantool_L, 0, 1, 0) != 0) { @@ -608,17 +600,11 @@ eval_replication_synchro_quorum(int nr_replicas) * value (say it was n-2) do not treat it as an * error but just yield a minimum valid magnitude. */ - if (value < 0) { - const int value_min = 1; - say_warn_ratelimited("replication_synchro_quorum evaluated " - "to the negative value %d, set to %d", - value, value_min); - value = value_min; - } else if (value >= VCLOCK_MAX) { + if (value >= VCLOCK_MAX) { const int value_max = VCLOCK_MAX - 1; - say_warn_ratelimited("replication_synchro_quorum evaluated " - "to value %d, set to %d", - value, value_max); + say_warn("replication_synchro_quorum evaluated " + "to value %d, set to %d", + value, value_max); value = value_max; } We use tt_sprintf and drop value < 0 test (because we return MAX(1, value) @@ -1001,30 +987,6 @@ box_set_replication_sync_lag(void) replication_sync_lag = box_check_replication_sync_lag(); } -/** - * Assign new replication_synchro_quorum value - * and notify dependent subsystems. - */ -static void -set_replication_synchro_quorum(int quorum) -{ - assert(quorum > 0 && quorum < VCLOCK_MAX); - - replication_synchro_quorum = quorum; - txn_limbo_on_parameters_change(&txn_limbo); - box_raft_update_election_quorum(); -} - -int -box_set_replication_synchro_quorum(void) -{ - int value = box_check_replication_synchro_quorum(); - if (value < 0) - return -1; - set_replication_synchro_quorum(value); - return 0; -} - /** * Renew replication_synchro_quorum value if defined * as a formula and we need to recalculate it. @@ -1053,7 +1015,20 @@ box_update_replication_synchro_quorum(void) if (quorum < 0 || quorum >= VCLOCK_MAX) panic("failed to eval replication_synchro_quorum"); say_info("update replication_synchro_quorum = %d", quorum); - set_replication_synchro_quorum(quorum); + + replication_synchro_quorum = quorum; + txn_limbo_on_parameters_change(&txn_limbo); + box_raft_update_election_quorum(); +} + +int +box_set_replication_synchro_quorum(void) +{ + int value = box_check_replication_synchro_quorum(); + if (value < 0) + return -1; + box_update_replication_synchro_quorum(); + return 0; } Use box_update_replication_synchro_quorum from inside of box_set_replication_synchro_quorum. I'm working on test now but it is yet incomplete. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-07 20:17 ` Cyrill Gorcunov @ 2020-12-07 21:25 ` Vladislav Shpilevoy 2020-12-07 21:48 ` Cyrill Gorcunov 1 sibling, 1 reply; 16+ messages in thread From: Vladislav Shpilevoy @ 2020-12-07 21:25 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson Hi! > 6. > tarantool> box.cfg{replication_synchro_quorum='"1"'} > 2020-12-05 00:49:02.251 [59985] main/103/interactive I> set 'replication_synchro_quorum' configuration option to "\"1\"" > --- > ... > > I gave string "1", but it is accepted. Can it be fixed? > > Why don't I see this log line "update replication_synchro_quorum", > which you use in the test in the next commit? You asked, how to check if the expression is not a number. This diff works on the example above - raises an error. ==================== @@ -577,7 +577,11 @@ eval_replication_synchro_quorum(int nr_replicas) "sqrt = math.sqrt," "fmod = math.fmod," "}})\n" - "return math.floor(f())\n"; + "local res = f()\n" + "if type(res) ~= 'number' then\n" + "error('Expression should return a number')\n" + "end\n" + "return math.floor(res)\n"; char buf[1024]; int value = -1; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-07 21:25 ` Vladislav Shpilevoy @ 2020-12-07 21:48 ` Cyrill Gorcunov 2020-12-08 8:02 ` Cyrill Gorcunov 0 siblings, 1 reply; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-07 21:48 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml On Mon, Dec 07, 2020 at 10:25:43PM +0100, Vladislav Shpilevoy wrote: > Hi! > > > 6. > > tarantool> box.cfg{replication_synchro_quorum='"1"'} > > 2020-12-05 00:49:02.251 [59985] main/103/interactive I> set 'replication_synchro_quorum' configuration option to "\"1\"" > > --- > > ... > > > > I gave string "1", but it is accepted. Can it be fixed? > > > > Why don't I see this log line "update replication_synchro_quorum", > > which you use in the test in the next commit? > > You asked, how to check if the expression is not a number. > This diff works on the example above - raises an error. > Thanks! Force pushed. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-07 21:48 ` Cyrill Gorcunov @ 2020-12-08 8:02 ` Cyrill Gorcunov 2020-12-09 23:22 ` Vladislav Shpilevoy 0 siblings, 1 reply; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-08 8:02 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml The whole patch force pushed --- From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Thu, 3 Dec 2020 12:43:53 +0300 Subject: [PATCH 2/3] cfg: support symbolic evaluation of replication_synchro_quorum When synchronous replication is used we prefer a user to specify a quorum number, ie the number of replicas where data must be replicated before the master node continue accepting new transactions. This is not very convenient since a user may not know initially how many replicas will be used. Moreover the number of replicas may vary dynamically. For this sake we allow to specify the number of quorum in a symbolic way. For example box.cfg { replication_synchro_quorum = "N/2+1", } where `N` is a number of registered replicas in a cluster. Once new replica attached or old one detached the number is renewed and propagated. Internally on each replica_set_id() and replica_clear_id(), ie at moment when replica get registered or unregistered, we call box_update_replication_synchro_quorum() helper which finds out if evaluation of replication_synchro_quorum is needed and if so we calculate new replication_synchro_quorum value based on number of currently registered replicas. Then we notify dependent systems such as qsync and raft to update their guts. Note: we do *not* change the default settings for this option, it remains 1 by default for now. Change the default option should be done as a separate commit once we make sure that everything is fine. Closes #5446 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: Synchronous replication The plain integer value might be convenient for simple scenarios. But in case if number of replicas is dynamic a better way should be to describe a quorum via formula. The formula should use symbol `N` to represent amount of registered replicas. For example the canonical definition for a quorum (ie majority of members in a set) of `N` replicas is `N/2+1`. For such configuration one can define ``` box.cfg {replication_synchro_quorum = "N/2+1"} ``` Note that for sake of simplicity quorum evaluation never returns negative values thus for the case of formula say `N-2` the result will be 1 until number of replicas become 4 and more. --- src/box/box.cc | 121 +++++++++++++++++++++++++++++++++++++-- src/box/box.h | 1 + src/box/lua/load_cfg.lua | 2 +- src/box/replication.cc | 4 +- 4 files changed, 121 insertions(+), 7 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index a8bc3471d..2076a8dc3 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -554,10 +554,91 @@ box_check_replication_sync_lag(void) return lag; } +/** + * Evaluate replication syncro quorum number from a formula. + */ +static int +eval_replication_synchro_quorum(int nr_replicas) +{ + const char fmt[] = + "local expr = [[%s]]\n" + "local f, err = loadstring('return ('..expr..')')\n" + "if not f then " + "error(string.format('Failed to load \%\%s:" + "\%\%s', expr, err)) " + "end\n" + "setfenv(f, {N = %d, math = {" + "ceil = math.ceil," + "floor = math.floor," + "abs = math.abs," + "random = math.random," + "min = math.min," + "max = math.abs," + "sqrt = math.sqrt," + "fmod = math.fmod," + "}})\n" + "local res = f()\n" + "if type(res) ~= 'number' then\n" + "error('Expression should return a number')\n" + "end\n" + "return math.floor(res)\n"; + int value = -1; + const char *expr = cfg_gets("replication_synchro_quorum"); + const char *buf = tt_sprintf(fmt, expr, nr_replicas); + + luaL_loadstring(tarantool_L, buf); + if (lua_pcall(tarantool_L, 0, 1, 0) != 0) { + diag_set(ClientError, ER_CFG, + "replication_synchro_quorum", + lua_tostring(tarantool_L, -1)); + return -1; + } + + if (lua_isnumber(tarantool_L, -1)) + value = (int)lua_tonumber(tarantool_L, -1); + lua_pop(tarantool_L, 1); + + /* + * At least we should have 1 node to sync, thus + * if the formula has evaluated to some negative + * value (say it was n-2) do not treat it as an + * error but just yield a minimum valid magnitude. + */ + if (value >= VCLOCK_MAX) { + const int value_max = VCLOCK_MAX - 1; + say_warn("replication_synchro_quorum evaluated " + "to value %d, set to %d", + value, value_max); + value = value_max; + } + + /* + * We never return 0, even if we're in bootstrap + * stage were number of replicas equals zero we + * should consider the node itself as a minimum + * quorum number. + */ + return MAX(1, value); +} + static int box_check_replication_synchro_quorum(void) { - int quorum = cfg_geti("replication_synchro_quorum"); + int quorum = 0; + + if (!cfg_isnumber("replication_synchro_quorum")) { + /* + * The formula uses symbolic name 'N' as + * a number of currently registered replicas. + */ + int value = replicaset.registered_count; + quorum = eval_replication_synchro_quorum(value); + if (quorum < 0) + return -1; + } else { + quorum = cfg_geti("replication_synchro_quorum"); + } + if (quorum <= 0 || quorum >= VCLOCK_MAX) { diag_set(ClientError, ER_CFG, "replication_synchro_quorum", "the value must be greater than zero and less than " @@ -910,15 +991,47 @@ box_set_replication_sync_lag(void) replication_sync_lag = box_check_replication_sync_lag(); } +/** + * Renew replication_synchro_quorum value if defined + * as a formula and we need to recalculate it. + */ +void +box_update_replication_synchro_quorum(void) +{ + if (cfg_isnumber("replication_synchro_quorum")) { + /* + * Even if replication_synchro_quorum is a constant + * number the RAFT engine should be notified on + * change of replicas amount. + */ + box_raft_update_election_quorum(); + return; + } + + /* + * The formula has been verified already on the bootstrap + * stage (and on dynamic reconfig as well), still there + * is a Lua call inside, heck knowns what could go wrong + * there thus panic if we're screwed. + */ + int value = replicaset.registered_count; + int quorum = eval_replication_synchro_quorum(value); + if (quorum < 0 || quorum >= VCLOCK_MAX) + panic("failed to eval replication_synchro_quorum"); + say_info("update replication_synchro_quorum = %d", quorum); + + replication_synchro_quorum = quorum; + txn_limbo_on_parameters_change(&txn_limbo); + box_raft_update_election_quorum(); +} + int box_set_replication_synchro_quorum(void) { int value = box_check_replication_synchro_quorum(); if (value < 0) return -1; - replication_synchro_quorum = value; - txn_limbo_on_parameters_change(&txn_limbo); - box_raft_update_election_quorum(); + box_update_replication_synchro_quorum(); return 0; } diff --git a/src/box/box.h b/src/box/box.h index b47a220b7..c3e1a1276 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void); void box_set_replication_connect_quorum(void); void box_set_replication_sync_lag(void); int box_set_replication_synchro_quorum(void); +void box_update_replication_synchro_quorum(void); int box_set_replication_synchro_timeout(void); void box_set_replication_sync_timeout(void); void box_set_replication_skip_conflict(void); diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index 76e2e92c2..af66c0e46 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -172,7 +172,7 @@ local template_cfg = { replication_timeout = 'number', replication_sync_lag = 'number', replication_sync_timeout = 'number', - replication_synchro_quorum = 'number', + replication_synchro_quorum = 'string, number', replication_synchro_timeout = 'number', replication_connect_timeout = 'number', replication_connect_quorum = 'number', diff --git a/src/box/replication.cc b/src/box/replication.cc index 931c73a37..3126d86ac 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -251,7 +251,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id) say_info("assigned id %d to replica %s", replica->id, tt_uuid_str(&replica->uuid)); replica->anon = false; - box_raft_update_election_quorum(); + box_update_replication_synchro_quorum(); } void @@ -300,7 +300,7 @@ replica_clear_id(struct replica *replica) assert(!replica->anon); replica_delete(replica); } - box_raft_update_election_quorum(); + box_update_replication_synchro_quorum(); } void -- 2.26.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-08 8:02 ` Cyrill Gorcunov @ 2020-12-09 23:22 ` Vladislav Shpilevoy 2020-12-11 12:25 ` Cyrill Gorcunov 0 siblings, 1 reply; 16+ messages in thread From: Vladislav Shpilevoy @ 2020-12-09 23:22 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: Mons Anderson, tml Hi! Thanks for the patch! Your branch fails all the qsync tests in CI: https://gitlab.com/tarantool/tarantool/-/pipelines/226651134 See 6 comments below. > Subject: [PATCH 2/3] cfg: support symbolic evaluation of replication_synchro_quorum > > When synchronous replication is used we prefer a user to specify > a quorum number, ie the number of replicas where data must be > replicated before the master node continue accepting new > transactions. > > This is not very convenient since a user may not know initially > how many replicas will be used. Moreover the number of replicas > may vary dynamically. For this sake we allow to specify the > number of quorum in a symbolic way. > > For example > > box.cfg { > replication_synchro_quorum = "N/2+1", > } > > where `N` is a number of registered replicas in a cluster. > Once new replica attached or old one detached the number > is renewed and propagated. > > Internally on each replica_set_id() and replica_clear_id(), > ie at moment when replica get registered or unregistered, > we call box_update_replication_synchro_quorum() helper which > finds out if evaluation of replication_synchro_quorum is > needed and if so we calculate new replication_synchro_quorum > value based on number of currently registered replicas. Then > we notify dependent systems such as qsync and raft to update > their guts. > > Note: we do *not* change the default settings for this option, > it remains 1 by default for now. Change the default option should > be done as a separate commit once we make sure that everything is > fine. > > Closes #5446 > > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > > @TarantoolBot document > Title: Synchronous replication 1. Please, be more specific. Imagine if all github tickets about qsync would have the same title 'Synchronous replication'. > The plain integer value might be convenient for simple scenarios. The plain integer for what? Please, keep in mind that the docteam will see everything below "@TarantoolBot document" mark. It means, at this sentence the reader is already lost, because no context at all. This text will be read not by developers, and out of the commit message context. Only by replication_synchro_quorum below the reader may assume, that as a 'plain integer' you mean the old way of specifying replication_synchro_quorum. Which we know, but the docteam does not remember, that replication_synchro_quorum was an integer before, and now can be a string. State explicitly what is the request is about, how it worked before, what changed now, and why. 'Why' part is good - that it handles the 'dynamic' part for you. > But in case if number of replicas is dynamic a better way should > be to describe a quorum via formula. The formula should use symbol > `N` to represent amount of registered replicas. > > For example the canonical definition for a quorum (ie majority > of members in a set) of `N` replicas is `N/2+1`. For such > configuration one can define > > ``` > box.cfg {replication_synchro_quorum = "N/2+1"} > ``` > > Note that for sake of simplicity quorum evaluation never returns > negative values thus for the case of formula say `N-2` the result > will be 1 until number of replicas become 4 and more. > --- > src/box/box.cc | 121 +++++++++++++++++++++++++++++++++++++-- > src/box/box.h | 1 + > src/box/lua/load_cfg.lua | 2 +- > src/box/replication.cc | 4 +- > 4 files changed, 121 insertions(+), 7 deletions(-) > > diff --git a/src/box/box.cc b/src/box/box.cc > index a8bc3471d..2076a8dc3 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -554,10 +554,91 @@ box_check_replication_sync_lag(void) > return lag; > } > > +/** > + * Evaluate replication syncro quorum number from a formula. > + */ > +static int > +eval_replication_synchro_quorum(int nr_replicas) 2. Number of replicas is always passed as replicaset.registered_count. I suggest you to make this function take no args, and read replicaset.registered_count internally. Also would be good to rename it to box_eval_... . Because you touch box things here. Such as cfg_gets("replication_synchro_quorum"), for example. Which reads box.cfg. > +{ > + const char fmt[] = > + "local expr = [[%s]]\n" > + "local f, err = loadstring('return ('..expr..')')\n" > + "if not f then " > + "error(string.format('Failed to load \%\%s:" > + "\%\%s', expr, err)) " > + "end\n" > + "setfenv(f, {N = %d, math = {" > + "ceil = math.ceil," > + "floor = math.floor," > + "abs = math.abs," > + "random = math.random," > + "min = math.min," > + "max = math.abs," > + "sqrt = math.sqrt," > + "fmod = math.fmod," > + "}})\n" > + "local res = f()\n" > + "if type(res) ~= 'number' then\n" > + "error('Expression should return a number')\n" > + "end\n" > + "return math.floor(res)\n"; > + int value = -1; > + const char *expr = cfg_gets("replication_synchro_quorum"); > + const char *buf = tt_sprintf(fmt, expr, nr_replicas); 3. What is the result is >= TT_STATIC_BUF_LEN? I suspect a user will get a surprising error message, or will even get no error, but the expression will be just truncated. Does not look good. Oh, shit. I just found that cfg_gets() also uses the static buffer. Besides, it just truncates the string value to 256 chars. So whatever you specify as replication_synchro_quorum, if it is longer than 256, it is silently truncated. Also does not look good. But don't know how to fix it, and if we want to fix it now. > + > + luaL_loadstring(tarantool_L, buf); > + if (lua_pcall(tarantool_L, 0, 1, 0) != 0) { > + diag_set(ClientError, ER_CFG, > + "replication_synchro_quorum", > + lua_tostring(tarantool_L, -1)); > + return -1; > + } > + > + if (lua_isnumber(tarantool_L, -1)) > + value = (int)lua_tonumber(tarantool_L, -1); > + lua_pop(tarantool_L, 1); > + > + /* > + * At least we should have 1 node to sync, thus > + * if the formula has evaluated to some negative > + * value (say it was n-2) do not treat it as an > + * error but just yield a minimum valid magnitude. > + */ > + if (value >= VCLOCK_MAX) { > + const int value_max = VCLOCK_MAX - 1; > + say_warn("replication_synchro_quorum evaluated " > + "to value %d, set to %d", > + value, value_max); > + value = value_max; 4. When I said I want to see a warning when the quorum is evaluated to 0, while number of replicas is > 0, I didn't mean to delete the validation at all. Your example about 'n-2' is a proof that a negative value means an issue in user's code. Because if node count is 3, the quorum will be 1, and synchro guarantees simply don't work. But since we are here, there are two options: - Delete the upper bound validation as well. Because it makes no sense to check it if we allow to overflow the other bound. This warning does not warn about all invalid values anyway. Moreover, a "<= 0" quorum is more dangerous than a too big quorum IMO, as the user will commit data but with weaker guarantees. - Return the lower bound check and properly catch the case when the quorum is 0 illegally. It is easy. If the formula returned a negative value, it is always a warning. If the formula returned 0, but the number of replicas is > 0, then this is a warning. Everything else is correct (if no upper overflow). Including the case when the formula returned 0, and the number of replicas is 0 (happens at bootstrap). Also we could warn even in the latter case (0 quorum, 0 replicas). Because it signals that the user does not use N/2+1 formula. If user will want to do strange things like N*3/4 or N-2, then he will see warnings, will think more, and will add 'if's into his code or min/max calls, or will fix an issue in his code. Another third special option is kinda stupid, but reliable as fuck. When formula is changed, you can try it with all replica counts from 0 to 31. And if any returns an out of range value, we return an error saying on which size the bad value was returned. Then during cluster size changes we will never get a bad value. And the user won't need to read the logs to see the errors. Personally, I would just do it from the beginning. Box.cfg is rare, and all 32 values will be checked in the order of ones of microseconds I think. > + } > + > + /* > + * We never return 0, even if we're in bootstrap > + * stage were number of replicas equals zero we > + * should consider the node itself as a minimum > + * quorum number. > + */ > + return MAX(1, value); > +} > + > static int > box_check_replication_synchro_quorum(void) > { > - int quorum = cfg_geti("replication_synchro_quorum"); > + int quorum = 0; > + > + if (!cfg_isnumber("replication_synchro_quorum")) { > + /* > + * The formula uses symbolic name 'N' as > + * a number of currently registered replicas. > + */ > + int value = replicaset.registered_count; > + quorum = eval_replication_synchro_quorum(value); > + if (quorum < 0) > + return -1;> + } else { > + quorum = cfg_geti("replication_synchro_quorum"); > + } > + > if (quorum <= 0 || quorum >= VCLOCK_MAX) { > diag_set(ClientError, ER_CFG, "replication_synchro_quorum", > "the value must be greater than zero and less than " > @@ -910,15 +991,47 @@ box_set_replication_sync_lag(void) > replication_sync_lag = box_check_replication_sync_lag(); > } > > +/** > + * Renew replication_synchro_quorum value if defined > + * as a formula and we need to recalculate it. > + */ > +void > +box_update_replication_synchro_quorum(void) > +{ > + if (cfg_isnumber("replication_synchro_quorum")) { > + /* > + * Even if replication_synchro_quorum is a constant > + * number the RAFT engine should be notified on > + * change of replicas amount. > + */ > + box_raft_update_election_quorum(); 5. Why don't you update the limbo? And why don't you change replication_synchro_quorum global variable? It is not changed anywhere now. > + return; > + } > + > + /* > + * The formula has been verified already on the bootstrap > + * stage (and on dynamic reconfig as well), still there > + * is a Lua call inside, heck knowns what could go wrong 6. knowns -> knows. > + * there thus panic if we're screwed. > + */ > + int value = replicaset.registered_count; > + int quorum = eval_replication_synchro_quorum(value); > + if (quorum < 0 || quorum >= VCLOCK_MAX) > + panic("failed to eval replication_synchro_quorum"); > + say_info("update replication_synchro_quorum = %d", quorum); > + > + replication_synchro_quorum = quorum; > + txn_limbo_on_parameters_change(&txn_limbo); > + box_raft_update_election_quorum(); > +} > + > int > box_set_replication_synchro_quorum(void) > { > int value = box_check_replication_synchro_quorum(); > if (value < 0) > return -1; > - replication_synchro_quorum = value; > - txn_limbo_on_parameters_change(&txn_limbo); > - box_raft_update_election_quorum(); > + box_update_replication_synchro_quorum(); > return 0; > } ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-09 23:22 ` Vladislav Shpilevoy @ 2020-12-11 12:25 ` Cyrill Gorcunov 2020-12-13 18:12 ` Vladislav Shpilevoy 0 siblings, 1 reply; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-11 12:25 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml On Thu, Dec 10, 2020 at 12:22:28AM +0100, Vladislav Shpilevoy wrote: > > > > @TarantoolBot document > > Title: Synchronous replication > > 1. Please, be more specific. Imagine if all github tickets about > qsync would have the same title 'Synchronous replication'. > > > The plain integer value might be convenient for simple scenarios. > > The plain integer for what? Please, keep in mind that the docteam > will see everything below "@TarantoolBot document" mark. It means, > at this sentence the reader is already lost, because no context at > all. This text will be read not by developers, and out of the > commit message context. > > Only by replication_synchro_quorum below the reader may assume, > that as a 'plain integer' you mean the old way of specifying > replication_synchro_quorum. Which we know, but the docteam does > not remember, that replication_synchro_quorum was an integer before, > and now can be a string. > > State explicitly what is the request is about, how it worked before, > what changed now, and why. 'Why' part is good - that it handles the > 'dynamic' part for you. OK, thanks! I suppose the plain integers are allowed for simplicity mostly, right? > > > > +/** > > + * Evaluate replication syncro quorum number from a formula. > > + */ > > +static int > > +eval_replication_synchro_quorum(int nr_replicas) > > 2. Number of replicas is always passed as replicaset.registered_count. > I suggest you to make this function take no args, and read > replicaset.registered_count internally. > > Also would be good to rename it to box_eval_... . Because you > touch box things here. Such as cfg_gets("replication_synchro_quorum"), > for example. Which reads box.cfg. Done, I'll resend a new version. > > + int value = -1; > > + const char *expr = cfg_gets("replication_synchro_quorum"); > > + const char *buf = tt_sprintf(fmt, expr, nr_replicas); > > 3. What is the result is >= TT_STATIC_BUF_LEN? I suspect a user will > get a surprising error message, or will even get no error, but the > expression will be just truncated. Does not look good. > > Oh, shit. I just found that cfg_gets() also uses the static buffer. > Besides, it just truncates the string value to 256 chars. So whatever > you specify as replication_synchro_quorum, if it is longer than 256, > it is silently truncated. Also does not look good. But don't know > how to fix it, and if we want to fix it now. For now I simply revert back to local stack 1K buffer for formula evaluation, like it was before. I think 1K would be more than enough and allows us to detect if trimming happened. /* * cfg_gets uses static buffer as well so we need a local * one, 1K should be enough to carry arbitrary but sane * formula. */ char buf[1024]; int len = snprintf(buf, sizeof(buf), fmt, expr, replicaset.registered_count); if (len >= (int)sizeof(buf)) { diag_set(ClientError, ER_CFG, "replication_synchro_quorum", "the formula is too big"); return -1; } > > + /* > > + * At least we should have 1 node to sync, thus > > + * if the formula has evaluated to some negative > > + * value (say it was n-2) do not treat it as an > > + * error but just yield a minimum valid magnitude. > > + */ > > + if (value >= VCLOCK_MAX) { > > + const int value_max = VCLOCK_MAX - 1; > > + say_warn("replication_synchro_quorum evaluated " > > + "to value %d, set to %d", > > + value, value_max); > > + value = value_max; > > 4. When I said I want to see a warning when the quorum is > evaluated to 0, while number of replicas is > 0, I didn't mean to > delete the validation at all. > > Your example about 'n-2' is a proof that a negative value means > an issue in user's code. Because if node count is 3, the quorum > will be 1, and synchro guarantees simply don't work. This is not anyhow different from using plain integers here. You know I told Mons several times already -- I don't like this "formula" approach at all. I don't think users gonna be using some complex formulas here and I don't understand where it might be needed. When one start using synchronious replication the only thing he is interested in -- data consistency, ie canonical N/2+1 quorum. And that's all. Instead we provide some strange interface forcing a user to figure out which exactly number of nodes he needs to guarantee that there won't be data loss :( I think this is simply not needed. But since I didn't manage to convince Mons we do have to implement formula evaluation. > But since we are here, there are two options: > > - Delete the upper bound validation as well. Because it makes no > sense to check it if we allow to overflow the other bound. This > warning does not warn about all invalid values anyway. Moreover, > a "<= 0" quorum is more dangerous than a too big quorum IMO, as the > user will commit data but with weaker guarantees. > > - Return the lower bound check and properly catch the case when the > quorum is 0 illegally. It is easy. If the formula returned a negative > value, it is always a warning. If the formula returned 0, but the > number of replicas is > 0, then this is a warning. Everything else is > correct (if no upper overflow). Including the case when the formula > returned 0, and the number of replicas is 0 (happens at bootstrap). > Also we could warn even in the latter case (0 quorum, 0 replicas). > Because it signals that the user does not use N/2+1 formula. > > If user will want to do strange things like N*3/4 or N-2, then he > will see warnings, will think more, and will add 'if's into his > code or min/max calls, or will fix an issue in his code. > > Another third special option is kinda stupid, but reliable as fuck. > > When formula is changed, you can try it with all replica counts > from 0 to 31. And if any returns an out of range value, we return > an error saying on which size the bad value was returned. Then > during cluster size changes we will never get a bad value. And the > user won't need to read the logs to see the errors. Personally, I > would just do it from the beginning. Box.cfg is rare, and all 32 > values will be checked in the order of ones of microseconds I think. OK, gimme some time to think about. Thanks! > > > > +/** > > + * Renew replication_synchro_quorum value if defined > > + * as a formula and we need to recalculate it. > > + */ > > +void > > +box_update_replication_synchro_quorum(void) > > +{ > > + if (cfg_isnumber("replication_synchro_quorum")) { > > + /* > > + * Even if replication_synchro_quorum is a constant > > + * number the RAFT engine should be notified on > > + * change of replicas amount. > > + */ > > + box_raft_update_election_quorum(); > > 5. Why don't you update the limbo? And why don't you change > replication_synchro_quorum global variable? It is not changed > anywhere now. Good catch, thanks! > > + > > + /* > > + * The formula has been verified already on the bootstrap > > + * stage (and on dynamic reconfig as well), still there > > + * is a Lua call inside, heck knowns what could go wrong > > 6. knowns -> knows. +1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum 2020-12-11 12:25 ` Cyrill Gorcunov @ 2020-12-13 18:12 ` Vladislav Shpilevoy 0 siblings, 0 replies; 16+ messages in thread From: Vladislav Shpilevoy @ 2020-12-13 18:12 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: Mons Anderson, tml On 11.12.2020 13:25, Cyrill Gorcunov via Tarantool-patches wrote: > On Thu, Dec 10, 2020 at 12:22:28AM +0100, Vladislav Shpilevoy wrote: >>> >>> @TarantoolBot document >>> Title: Synchronous replication >> >> 1. Please, be more specific. Imagine if all github tickets about >> qsync would have the same title 'Synchronous replication'. >> >>> The plain integer value might be convenient for simple scenarios. >> >> The plain integer for what? Please, keep in mind that the docteam >> will see everything below "@TarantoolBot document" mark. It means, >> at this sentence the reader is already lost, because no context at >> all. This text will be read not by developers, and out of the >> commit message context. >> >> Only by replication_synchro_quorum below the reader may assume, >> that as a 'plain integer' you mean the old way of specifying >> replication_synchro_quorum. Which we know, but the docteam does >> not remember, that replication_synchro_quorum was an integer before, >> and now can be a string. >> >> State explicitly what is the request is about, how it worked before, >> what changed now, and why. 'Why' part is good - that it handles the >> 'dynamic' part for you. > > OK, thanks! I suppose the plain integers are allowed for simplicity > mostly, right? No. Mostly because we didn't think of having the option as an expression. Because we never did that. This option is a first dynamically auto-configured one. >>> + int value = -1; >>> + const char *expr = cfg_gets("replication_synchro_quorum"); >>> + const char *buf = tt_sprintf(fmt, expr, nr_replicas); >> >> 3. What is the result is >= TT_STATIC_BUF_LEN? I suspect a user will >> get a surprising error message, or will even get no error, but the >> expression will be just truncated. Does not look good. >> >> Oh, shit. I just found that cfg_gets() also uses the static buffer. >> Besides, it just truncates the string value to 256 chars. So whatever >> you specify as replication_synchro_quorum, if it is longer than 256, >> it is silently truncated. Also does not look good. But don't know >> how to fix it, and if we want to fix it now. > > For now I simply revert back to local stack 1K buffer for formula > evaluation, like it was before. I think 1K would be more than enough > and allows us to detect if trimming happened. No, the problem is not in the local buffer only. The problem is that cfg_gets() does the trim. So you didn't remove the trim. It still happens inside of cfg_gets(). And I assume that is worth fixing somehow. Current cfg module implementation is very naive and easy to break. Perhaps an easy fix would be to make cfg_gets() take a buffer + buffer size parameters, where it would copy the value. And would return something bad if it didn't fit. A more complex solution - make cfg_gets() return an out parameter + string pointer + string size. And make it keep the string on Lua stack. So the string pointer remains valid. Then after you would finish with it, you would call cfg_puts() with the out parameter you got earlier. This approach is 0 copy. But still it is not related to your patch directly, so I am fine with using a stack buffer here. > /* > * cfg_gets uses static buffer as well so we need a local > * one, 1K should be enough to carry arbitrary but sane > * formula. > */ > char buf[1024]; > int len = snprintf(buf, sizeof(buf), fmt, expr, > replicaset.registered_count); > if (len >= (int)sizeof(buf)) { > diag_set(ClientError, ER_CFG, > "replication_synchro_quorum", > "the formula is too big"); > return -1; > } > > >>> + /* >>> + * At least we should have 1 node to sync, thus >>> + * if the formula has evaluated to some negative >>> + * value (say it was n-2) do not treat it as an >>> + * error but just yield a minimum valid magnitude. >>> + */ >>> + if (value >= VCLOCK_MAX) { >>> + const int value_max = VCLOCK_MAX - 1; >>> + say_warn("replication_synchro_quorum evaluated " >>> + "to value %d, set to %d", >>> + value, value_max); >>> + value = value_max; >> >> 4. When I said I want to see a warning when the quorum is >> evaluated to 0, while number of replicas is > 0, I didn't mean to >> delete the validation at all. >> >> Your example about 'n-2' is a proof that a negative value means >> an issue in user's code. Because if node count is 3, the quorum >> will be 1, and synchro guarantees simply don't work. > > This is not anyhow different from using plain integers here. You know > I told Mons several times already -- I don't like this "formula" approach > at all. I don't think users gonna be using some complex formulas here > and I don't understand where it might be needed. N-2 is not complex. It is just stupid. But users may want to use N*3/4 or max(N-1, 1), or something like that. N/2 + 1 is a minimal possible value when guarantees start working. Some users may want better guarantees. And you can't cover them all with special hardcoded values. Here I agree with Mons, that being able to calculate the option dynamically is super cool, and potentially may be useful for other options in future. > When one start using synchronious replication the only thing he is interested > in -- data consistency, ie canonical N/2+1 quorum. And that's all. N/2 + 1 is a minimal limit. You may want it bigger in case you want more safety and you are ready to sacrifice some latency for that. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua 2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov @ 2020-12-03 14:04 ` Cyrill Gorcunov 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-04 10:15 ` [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Serge Petrenko 3 siblings, 1 reply; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-03 14:04 UTC (permalink / raw) To: tml; +Cc: Mons Anderson, Vladislav Shpilevoy Part-of #5446 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- .../gh-5446-sqync-eval-quorum.result | 156 ++++++++++++++++++ .../gh-5446-sqync-eval-quorum.test.lua | 62 +++++++ test/replication/replica-quorum-1.lua | 1 + test/replication/replica-quorum-2.lua | 1 + test/replication/replica-quorum-3.lua | 1 + test/replication/replica-quorum-4.lua | 1 + 6 files changed, 222 insertions(+) create mode 100644 test/replication/gh-5446-sqync-eval-quorum.result create mode 100644 test/replication/gh-5446-sqync-eval-quorum.test.lua create mode 120000 test/replication/replica-quorum-1.lua create mode 120000 test/replication/replica-quorum-2.lua create mode 120000 test/replication/replica-quorum-3.lua create mode 120000 test/replication/replica-quorum-4.lua diff --git a/test/replication/gh-5446-sqync-eval-quorum.result b/test/replication/gh-5446-sqync-eval-quorum.result new file mode 100644 index 000000000..14050aa4b --- /dev/null +++ b/test/replication/gh-5446-sqync-eval-quorum.result @@ -0,0 +1,156 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +engine = test_run:get_cfg('engine') + | --- + | ... + +box.schema.user.grant('guest', 'replication') + | --- + | ... + +box.cfg { replication_synchro_quorum = "N/2+1", replication_synchro_timeout = 1000 } + | --- + | ... + +test_run:cmd('create server replica1 with rpl_master=default,\ + script="replication/replica-quorum-1.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica1 with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:cmd('create server replica2 with rpl_master=default,\ + script="replication/replica-quorum-2.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica2 with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:cmd('create server replica3 with rpl_master=default,\ + script="replication/replica-quorum-3.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica3 with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:cmd('create server replica4 with rpl_master=default,\ + script="replication/replica-quorum-4.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica4 with wait=True, wait_load=True') + | --- + | - true + | ... + +_ = box.schema.space.create('sync', {is_sync = true, engine = engine}) + | --- + | ... +s = box.space.sync + | --- + | ... + +s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}}) + | --- + | ... + +_ = s:create_index('primary', {parts = {'id'}}) + | --- + | ... +s:insert({1, '1'}) + | --- + | - [1, '1'] + | ... +s:insert({2, '2'}) + | --- + | - [2, '2'] + | ... +s:insert({3, '3'}) + | --- + | - [3, '3'] + | ... + +-- Wait the data to be propagated, there is no much +-- need for this since we're testing the quorum number +-- update only but just in case. +test_run:wait_lsn('replica1', 'default') + | --- + | ... +test_run:wait_lsn('replica2', 'default') + | --- + | ... +test_run:wait_lsn('replica3', 'default') + | --- + | ... +test_run:wait_lsn('replica4', 'default') + | --- + | ... + +-- Make sure we were configured in a proper way +match = 'set \'replication_synchro_quorum\' configuration option to \"N\\/2%+1' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +-- There are 4 replicas -> replication_synchro_quorum = 4/2 + 1 = 3 +match = 'update replication_synchro_quorum = 3' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('stop server replica1') + | --- + | - true + | ... +test_run:cmd('delete server replica1') + | --- + | - true + | ... + +test_run:cmd('stop server replica2') + | --- + | - true + | ... +test_run:cmd('delete server replica2') + | --- + | - true + | ... + +test_run:cmd('stop server replica3') + | --- + | - true + | ... +test_run:cmd('delete server replica3') + | --- + | - true + | ... + +test_run:cmd('stop server replica4') + | --- + | - true + | ... +test_run:cmd('delete server replica4') + | --- + | - true + | ... + +box.schema.user.revoke('guest', 'replication') + | --- + | ... diff --git a/test/replication/gh-5446-sqync-eval-quorum.test.lua b/test/replication/gh-5446-sqync-eval-quorum.test.lua new file mode 100644 index 000000000..8a188e5b2 --- /dev/null +++ b/test/replication/gh-5446-sqync-eval-quorum.test.lua @@ -0,0 +1,62 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +box.schema.user.grant('guest', 'replication') + +box.cfg { replication_synchro_quorum = "N/2+1", replication_synchro_timeout = 1000 } + +test_run:cmd('create server replica1 with rpl_master=default,\ + script="replication/replica-quorum-1.lua"') +test_run:cmd('start server replica1 with wait=True, wait_load=True') + +test_run:cmd('create server replica2 with rpl_master=default,\ + script="replication/replica-quorum-2.lua"') +test_run:cmd('start server replica2 with wait=True, wait_load=True') + +test_run:cmd('create server replica3 with rpl_master=default,\ + script="replication/replica-quorum-3.lua"') +test_run:cmd('start server replica3 with wait=True, wait_load=True') + +test_run:cmd('create server replica4 with rpl_master=default,\ + script="replication/replica-quorum-4.lua"') +test_run:cmd('start server replica4 with wait=True, wait_load=True') + +_ = box.schema.space.create('sync', {is_sync = true, engine = engine}) +s = box.space.sync + +s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}}) + +_ = s:create_index('primary', {parts = {'id'}}) +s:insert({1, '1'}) +s:insert({2, '2'}) +s:insert({3, '3'}) + +-- Wait the data to be propagated, there is no much +-- need for this since we're testing the quorum number +-- update only but just in case. +test_run:wait_lsn('replica1', 'default') +test_run:wait_lsn('replica2', 'default') +test_run:wait_lsn('replica3', 'default') +test_run:wait_lsn('replica4', 'default') + +-- Make sure we were configured in a proper way +match = 'set \'replication_synchro_quorum\' configuration option to \"N\\/2%+1' +test_run:grep_log("default", match) ~= nil + +-- There are 4 replicas -> replication_synchro_quorum = 4/2 + 1 = 3 +match = 'update replication_synchro_quorum = 3' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('stop server replica1') +test_run:cmd('delete server replica1') + +test_run:cmd('stop server replica2') +test_run:cmd('delete server replica2') + +test_run:cmd('stop server replica3') +test_run:cmd('delete server replica3') + +test_run:cmd('stop server replica4') +test_run:cmd('delete server replica4') + +box.schema.user.revoke('guest', 'replication') diff --git a/test/replication/replica-quorum-1.lua b/test/replication/replica-quorum-1.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-1.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-2.lua b/test/replication/replica-quorum-2.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-2.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-3.lua b/test/replication/replica-quorum-3.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-3.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-4.lua b/test/replication/replica-quorum-4.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-4.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file -- 2.26.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov @ 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-08 8:48 ` Cyrill Gorcunov 0 siblings, 1 reply; 16+ messages in thread From: Vladislav Shpilevoy @ 2020-12-04 23:52 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson Thanks for the patch! See 5 comments below. On 03.12.2020 15:04, Cyrill Gorcunov via Tarantool-patches wrote: > Part-of #5446 > > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > --- > .../gh-5446-sqync-eval-quorum.result | 156 ++++++++++++++++++ > .../gh-5446-sqync-eval-quorum.test.lua | 62 +++++++ > test/replication/replica-quorum-1.lua | 1 + > test/replication/replica-quorum-2.lua | 1 + > test/replication/replica-quorum-3.lua | 1 + > test/replication/replica-quorum-4.lua | 1 + > 6 files changed, 222 insertions(+) > create mode 100644 test/replication/gh-5446-sqync-eval-quorum.result > create mode 100644 test/replication/gh-5446-sqync-eval-quorum.test.lua 1. sqync? > create mode 120000 test/replication/replica-quorum-1.lua > create mode 120000 test/replication/replica-quorum-2.lua > create mode 120000 test/replication/replica-quorum-3.lua > create mode 120000 test/replication/replica-quorum-4.lua > > diff --git a/test/replication/gh-5446-sqync-eval-quorum.result b/test/replication/gh-5446-sqync-eval-quorum.result > new file mode 100644 > index 000000000..14050aa4b > --- /dev/null > +++ b/test/replication/gh-5446-sqync-eval-quorum.result > @@ -0,0 +1,156 @@ > +-- test-run result file version 2 > +test_run = require('test_run').new() > + | --- > + | ... > +engine = test_run:get_cfg('engine') > + | --- > + | ... > + > +box.schema.user.grant('guest', 'replication') > + | --- > + | ... > + > +box.cfg { replication_synchro_quorum = "N/2+1", replication_synchro_timeout = 1000 } > + | --- > + | ... > + > +test_run:cmd('create server replica1 with rpl_master=default,\ > + script="replication/replica-quorum-1.lua"') 2. There is a function test_run:create_cluster for such cases. But what I actually would want to see is how the quorum value changes as we add more replicas. Here you just add 4, and that is it. We don't see how the value changes dynamically. > + | --- > + | - true > + | ... > +test_run:cmd('start server replica1 with wait=True, wait_load=True') > + | --- > + | - true > + | ... > + > +test_run:cmd('create server replica2 with rpl_master=default,\ > + script="replication/replica-quorum-2.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica2 with wait=True, wait_load=True') > + | --- > + | - true > + | ... > + > +test_run:cmd('create server replica3 with rpl_master=default,\ > + script="replication/replica-quorum-3.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica3 with wait=True, wait_load=True') > + | --- > + | - true > + | ... > + > +test_run:cmd('create server replica4 with rpl_master=default,\ > + script="replication/replica-quorum-4.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica4 with wait=True, wait_load=True') > + | --- > + | - true > + | ... > + > +_ = box.schema.space.create('sync', {is_sync = true, engine = engine}) > + | --- > + | ... > +s = box.space.sync > + | --- > + | ... > + > +s:format({{name = 'id', type = 'unsigned'}, {name = 'band_name', type = 'string'}}) 3. Why do you need the format? Why isn't the space dropped in the end? Why do you need the tuples below, if you never use them? > + | --- > + | ... > + > +_ = s:create_index('primary', {parts = {'id'}}) > + | --- > + | ... > +s:insert({1, '1'}) > + | --- > + | - [1, '1'] > + | ... > +s:insert({2, '2'}) > + | --- > + | - [2, '2'] > + | ... > +s:insert({3, '3'}) > + | --- > + | - [3, '3'] > + | ... > + > +-- Wait the data to be propagated, there is no much > +-- need for this since we're testing the quorum number > +-- update only but just in case. > +test_run:wait_lsn('replica1', 'default') > + | --- > + | ... > +test_run:wait_lsn('replica2', 'default') > + | --- > + | ... > +test_run:wait_lsn('replica3', 'default') > + | --- > + | ... > +test_run:wait_lsn('replica4', 'default') 4. Why do you need these wait_lsns? The whole purpose of is_sync is to make sure the data is already replicated before commit. > + | --- > + | ... > + > +-- Make sure we were configured in a proper way > +match = 'set \'replication_synchro_quorum\' configuration option to \"N\\/2%+1' > + | --- > + | ... > +test_run:grep_log("default", match) ~= nil > + | --- > + | - true > + | ... > + > +-- There are 4 replicas -> replication_synchro_quorum = 4/2 + 1 = 3 > +match = 'update replication_synchro_quorum = 3' > + | --- > + | ... > +test_run:grep_log("default", match) ~= nil 5. I would better expect you to test the quorum in action. The logs may say anything, but we must validate it. For example, here quorum must be 3. So if you kill 2 replicas, it should stop working. Would be also interesting how the quorum decreases as we remove replicas from _cluster and kill them for good. Besides, I don't see tests for errors in the quorum expression. Also we usually add tests in the same commit as the feature. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua 2020-12-04 23:52 ` Vladislav Shpilevoy @ 2020-12-08 8:48 ` Cyrill Gorcunov 2020-12-09 23:22 ` Vladislav Shpilevoy 0 siblings, 1 reply; 16+ messages in thread From: Cyrill Gorcunov @ 2020-12-08 8:48 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml On Sat, Dec 05, 2020 at 12:52:25AM +0100, Vladislav Shpilevoy wrote: > Thanks for the patch! > > See 5 comments below. > Here is a force-pushed update - I keep use explicit "start server replica1" which allows me to grep update of quorum one by one - I dropped unused space creation - Raised number of repicas up to 6 - Since the patch is big i prefer to have it as a separate commit It is unclear for me yet how to detect the replication failure when we would stop replica. Could you please help? --- From: Cyrill Gorcunov <gorcunov@gmail.com> Date: Tue, 24 Nov 2020 17:11:23 +0300 Subject: [PATCH 3/3] test: add replication/gh-5446-qsync-eval-quorum.test.lua Part-of #5446 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- .../gh-5446-qsync-eval-quorum.result | 193 ++++++++++++++++++ .../gh-5446-qsync-eval-quorum.test.lua | 76 +++++++ test/replication/replica-quorum-1.lua | 1 + test/replication/replica-quorum-2.lua | 1 + test/replication/replica-quorum-3.lua | 1 + test/replication/replica-quorum-4.lua | 1 + test/replication/replica-quorum-5.lua | 1 + test/replication/replica-quorum-6.lua | 1 + 8 files changed, 275 insertions(+) create mode 100644 test/replication/gh-5446-qsync-eval-quorum.result create mode 100644 test/replication/gh-5446-qsync-eval-quorum.test.lua create mode 120000 test/replication/replica-quorum-1.lua create mode 120000 test/replication/replica-quorum-2.lua create mode 120000 test/replication/replica-quorum-3.lua create mode 120000 test/replication/replica-quorum-4.lua create mode 120000 test/replication/replica-quorum-5.lua create mode 120000 test/replication/replica-quorum-6.lua diff --git a/test/replication/gh-5446-qsync-eval-quorum.result b/test/replication/gh-5446-qsync-eval-quorum.result new file mode 100644 index 000000000..ebe0aede7 --- /dev/null +++ b/test/replication/gh-5446-qsync-eval-quorum.result @@ -0,0 +1,193 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +engine = test_run:get_cfg('engine') + | --- + | ... + +box.schema.user.grant('guest', 'replication') + | --- + | ... + +-- Test syntax error +box.cfg{replication_synchro_quorum = "aaa"} + | --- + | - error: 'Incorrect value for option ''replication_synchro_quorum'': [string "local + | expr = [[aaa]]..."]:7: Expression should return a number' + | ... + +-- Make sure we were configured in a proper way +box.cfg { replication_synchro_quorum = "N/2+1", replication_synchro_timeout = 1000 } + | --- + | ... +match = 'set \'replication_synchro_quorum\' configuration option to \"N\\/2%+1' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('create server replica1 with rpl_master=default,\ + script="replication/replica-quorum-1.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica1 with wait=True, wait_load=True') + | --- + | - true + | ... + +-- 1 replica -> replication_synchro_quorum = 1/2 + 1 = 1 +match = 'update replication_synchro_quorum = 1' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('create server replica2 with rpl_master=default,\ + script="replication/replica-quorum-2.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica2 with wait=True, wait_load=True') + | --- + | - true + | ... + +-- 2 replicas -> replication_synchro_quorum = 2/2 + 1 = 2 +match = 'update replication_synchro_quorum = 2' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('create server replica3 with rpl_master=default,\ + script="replication/replica-quorum-3.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica3 with wait=True, wait_load=True') + | --- + | - true + | ... + +-- 3 replicas -> replication_synchro_quorum = 3/2 + 1 = 2 +match = 'update replication_synchro_quorum = 2' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('create server replica4 with rpl_master=default,\ + script="replication/replica-quorum-4.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica4 with wait=True, wait_load=True') + | --- + | - true + | ... + +-- 4 replicas -> replication_synchro_quorum = 4/2 + 1 = 3 +match = 'update replication_synchro_quorum = 3' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('create server replica5 with rpl_master=default,\ + script="replication/replica-quorum-5.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica5 with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:cmd('create server replica6 with rpl_master=default,\ + script="replication/replica-quorum-6.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica6 with wait=True, wait_load=True') + | --- + | - true + | ... + +-- 6 replicas -> replication_synchro_quorum = 6/2 + 1 = 4 +match = 'update replication_synchro_quorum = 4' + | --- + | ... +test_run:grep_log("default", match) ~= nil + | --- + | - true + | ... + +test_run:cmd('stop server replica1') + | --- + | - true + | ... +test_run:cmd('delete server replica1') + | --- + | - true + | ... + +test_run:cmd('stop server replica2') + | --- + | - true + | ... +test_run:cmd('delete server replica2') + | --- + | - true + | ... + +test_run:cmd('stop server replica3') + | --- + | - true + | ... +test_run:cmd('delete server replica3') + | --- + | - true + | ... + +test_run:cmd('stop server replica4') + | --- + | - true + | ... +test_run:cmd('delete server replica4') + | --- + | - true + | ... + +test_run:cmd('stop server replica5') + | --- + | - true + | ... +test_run:cmd('delete server replica5') + | --- + | - true + | ... + +test_run:cmd('stop server replica6') + | --- + | - true + | ... +test_run:cmd('delete server replica6') + | --- + | - true + | ... + +box.schema.user.revoke('guest', 'replication') + | --- + | ... diff --git a/test/replication/gh-5446-qsync-eval-quorum.test.lua b/test/replication/gh-5446-qsync-eval-quorum.test.lua new file mode 100644 index 000000000..71adcad90 --- /dev/null +++ b/test/replication/gh-5446-qsync-eval-quorum.test.lua @@ -0,0 +1,76 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +box.schema.user.grant('guest', 'replication') + +-- Test syntax error +box.cfg{replication_synchro_quorum = "aaa"} + +-- Make sure we were configured in a proper way +box.cfg { replication_synchro_quorum = "N/2+1", replication_synchro_timeout = 1000 } +match = 'set \'replication_synchro_quorum\' configuration option to \"N\\/2%+1' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('create server replica1 with rpl_master=default,\ + script="replication/replica-quorum-1.lua"') +test_run:cmd('start server replica1 with wait=True, wait_load=True') + +-- 1 replica -> replication_synchro_quorum = 1/2 + 1 = 1 +match = 'update replication_synchro_quorum = 1' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('create server replica2 with rpl_master=default,\ + script="replication/replica-quorum-2.lua"') +test_run:cmd('start server replica2 with wait=True, wait_load=True') + +-- 2 replicas -> replication_synchro_quorum = 2/2 + 1 = 2 +match = 'update replication_synchro_quorum = 2' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('create server replica3 with rpl_master=default,\ + script="replication/replica-quorum-3.lua"') +test_run:cmd('start server replica3 with wait=True, wait_load=True') + +-- 3 replicas -> replication_synchro_quorum = 3/2 + 1 = 2 +match = 'update replication_synchro_quorum = 2' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('create server replica4 with rpl_master=default,\ + script="replication/replica-quorum-4.lua"') +test_run:cmd('start server replica4 with wait=True, wait_load=True') + +-- 4 replicas -> replication_synchro_quorum = 4/2 + 1 = 3 +match = 'update replication_synchro_quorum = 3' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('create server replica5 with rpl_master=default,\ + script="replication/replica-quorum-5.lua"') +test_run:cmd('start server replica5 with wait=True, wait_load=True') + +test_run:cmd('create server replica6 with rpl_master=default,\ + script="replication/replica-quorum-6.lua"') +test_run:cmd('start server replica6 with wait=True, wait_load=True') + +-- 6 replicas -> replication_synchro_quorum = 6/2 + 1 = 4 +match = 'update replication_synchro_quorum = 4' +test_run:grep_log("default", match) ~= nil + +test_run:cmd('stop server replica1') +test_run:cmd('delete server replica1') + +test_run:cmd('stop server replica2') +test_run:cmd('delete server replica2') + +test_run:cmd('stop server replica3') +test_run:cmd('delete server replica3') + +test_run:cmd('stop server replica4') +test_run:cmd('delete server replica4') + +test_run:cmd('stop server replica5') +test_run:cmd('delete server replica5') + +test_run:cmd('stop server replica6') +test_run:cmd('delete server replica6') + +box.schema.user.revoke('guest', 'replication') diff --git a/test/replication/replica-quorum-1.lua b/test/replication/replica-quorum-1.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-1.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-2.lua b/test/replication/replica-quorum-2.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-2.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-3.lua b/test/replication/replica-quorum-3.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-3.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-4.lua b/test/replication/replica-quorum-4.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-4.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-5.lua b/test/replication/replica-quorum-5.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-5.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file diff --git a/test/replication/replica-quorum-6.lua b/test/replication/replica-quorum-6.lua new file mode 120000 index 000000000..da69ac81c --- /dev/null +++ b/test/replication/replica-quorum-6.lua @@ -0,0 +1 @@ +replica.lua \ No newline at end of file -- 2.26.2 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua 2020-12-08 8:48 ` Cyrill Gorcunov @ 2020-12-09 23:22 ` Vladislav Shpilevoy 0 siblings, 0 replies; 16+ messages in thread From: Vladislav Shpilevoy @ 2020-12-09 23:22 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: Mons Anderson, tml On 08.12.2020 09:48, Cyrill Gorcunov wrote: > On Sat, Dec 05, 2020 at 12:52:25AM +0100, Vladislav Shpilevoy wrote: >> Thanks for the patch! >> >> See 5 comments below. >> > > Here is a force-pushed update > > - I keep use explicit "start server replica1" which allows me > to grep update of quorum one by one > - I dropped unused space creation > - Raised number of repicas up to 6 > - Since the patch is big i prefer to have it as a separate commit > > It is unclear for me yet how to detect the replication failure > when we would stop replica. Could you please help? You try to make a synchronous transaction. If not enough replicas are alive, the transaction will timeout. I also ask you to add a test on what happens, when quorum formula is changed while there are waiting transactions in the limbo. For example, if I set the formula to return a smaller value than the current number of alive replicas, the transactions should get committed. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically 2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov ` (2 preceding siblings ...) 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov @ 2020-12-04 10:15 ` Serge Petrenko 3 siblings, 0 replies; 16+ messages in thread From: Serge Petrenko @ 2020-12-04 10:15 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy 03.12.2020 17:04, Cyrill Gorcunov пишет: > From the issue description: > > > The reason is that users will likely not understand how the option > > should be configured properly, and will break something accidentally. > > So this idea about allowance to write an expression on the cluster > > size allows to specify the canonical N/2 + 1 formula, and not update > > it manually on all instances, when a new node is added, or an existing > > one is deleted. > > v2 (by Serge): > - keep replication_synchro_quorum been skipped at bootstrap in load_cfg.lua > - eliminate redundant say_info calls > - call quorum update routine from replica_set_id/replica_clear_id > - use replicaset.registered_count directly when evaluating the formula > - make quorum evaluation procedure always return value in allowed range, > the only error which may happen here is some syntax error or Lua evaluation > errors > - a test has been added > > v3 (by Serge, Mons, Vlad): > - use replica.lua in tests > - use N symbol in formula > - use lua_pcall when evaluating a formula > - make formula more safe itself, provide various math helpers > - use box_update_replication_synchro_quorum name as a general > updater from replication code > - do not forget to update raft election quorum inside > box_update_replication_synchro_quorum > - print warns inside functions evaluator if value get out of bounds > > issue https://github.com/tarantool/tarantool/issues/5446 > branch gorcunov/gh-5446-eval-quorum-3 > > Cyrill Gorcunov (3): > cfg: add cfg_isnumber helper > cfg: support symbolic evaluation of replication_synchro_quorum > test: add replication/gh-5446-sqync-eval-quorum.test.lua > > src/box/box.cc | 142 +++++++++++++++- > src/box/box.h | 1 + > src/box/lua/load_cfg.lua | 2 +- > src/box/replication.cc | 4 +- > src/cfg.c | 9 + > src/cfg.h | 6 + > .../gh-5446-sqync-eval-quorum.result | 156 ++++++++++++++++++ > .../gh-5446-sqync-eval-quorum.test.lua | 62 +++++++ > test/replication/replica-quorum-1.lua | 1 + > test/replication/replica-quorum-2.lua | 1 + > test/replication/replica-quorum-3.lua | 1 + > test/replication/replica-quorum-4.lua | 1 + > 12 files changed, 379 insertions(+), 7 deletions(-) > create mode 100644 test/replication/gh-5446-sqync-eval-quorum.result > create mode 100644 test/replication/gh-5446-sqync-eval-quorum.test.lua > create mode 120000 test/replication/replica-quorum-1.lua > create mode 120000 test/replication/replica-quorum-2.lua > create mode 120000 test/replication/replica-quorum-3.lua > create mode 120000 test/replication/replica-quorum-4.lua > > > base-commit: bd03dfc76c0b76f56374c3e66052e2af0c50ae65 Thanks for the patchset! LGTM. -- Serge Petrenko ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-12-13 18:12 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-07 20:17 ` Cyrill Gorcunov 2020-12-07 21:25 ` Vladislav Shpilevoy 2020-12-07 21:48 ` Cyrill Gorcunov 2020-12-08 8:02 ` Cyrill Gorcunov 2020-12-09 23:22 ` Vladislav Shpilevoy 2020-12-11 12:25 ` Cyrill Gorcunov 2020-12-13 18:12 ` Vladislav Shpilevoy 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-08 8:48 ` Cyrill Gorcunov 2020-12-09 23:22 ` Vladislav Shpilevoy 2020-12-04 10:15 ` [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Serge Petrenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox