From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> To: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Mons Anderson <v.perepelitsa@corp.mail.ru>, tml <tarantool-patches@dev.tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Date: Thu, 10 Dec 2020 00:22:28 +0100 [thread overview] Message-ID: <4091c2f3-9990-80a2-1096-5c195903dac9@tarantool.org> (raw) In-Reply-To: <20201208080230.GG2303@grain> Hi! Thanks for the patch! Your branch fails all the qsync tests in CI: https://gitlab.com/tarantool/tarantool/-/pipelines/226651134 See 6 comments below. > Subject: [PATCH 2/3] cfg: support symbolic evaluation of replication_synchro_quorum > > When synchronous replication is used we prefer a user to specify > a quorum number, ie the number of replicas where data must be > replicated before the master node continue accepting new > transactions. > > This is not very convenient since a user may not know initially > how many replicas will be used. Moreover the number of replicas > may vary dynamically. For this sake we allow to specify the > number of quorum in a symbolic way. > > For example > > box.cfg { > replication_synchro_quorum = "N/2+1", > } > > where `N` is a number of registered replicas in a cluster. > Once new replica attached or old one detached the number > is renewed and propagated. > > Internally on each replica_set_id() and replica_clear_id(), > ie at moment when replica get registered or unregistered, > we call box_update_replication_synchro_quorum() helper which > finds out if evaluation of replication_synchro_quorum is > needed and if so we calculate new replication_synchro_quorum > value based on number of currently registered replicas. Then > we notify dependent systems such as qsync and raft to update > their guts. > > Note: we do *not* change the default settings for this option, > it remains 1 by default for now. Change the default option should > be done as a separate commit once we make sure that everything is > fine. > > Closes #5446 > > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > > @TarantoolBot document > Title: Synchronous replication 1. Please, be more specific. Imagine if all github tickets about qsync would have the same title 'Synchronous replication'. > The plain integer value might be convenient for simple scenarios. The plain integer for what? Please, keep in mind that the docteam will see everything below "@TarantoolBot document" mark. It means, at this sentence the reader is already lost, because no context at all. This text will be read not by developers, and out of the commit message context. Only by replication_synchro_quorum below the reader may assume, that as a 'plain integer' you mean the old way of specifying replication_synchro_quorum. Which we know, but the docteam does not remember, that replication_synchro_quorum was an integer before, and now can be a string. State explicitly what is the request is about, how it worked before, what changed now, and why. 'Why' part is good - that it handles the 'dynamic' part for you. > But in case if number of replicas is dynamic a better way should > be to describe a quorum via formula. The formula should use symbol > `N` to represent amount of registered replicas. > > For example the canonical definition for a quorum (ie majority > of members in a set) of `N` replicas is `N/2+1`. For such > configuration one can define > > ``` > box.cfg {replication_synchro_quorum = "N/2+1"} > ``` > > Note that for sake of simplicity quorum evaluation never returns > negative values thus for the case of formula say `N-2` the result > will be 1 until number of replicas become 4 and more. > --- > src/box/box.cc | 121 +++++++++++++++++++++++++++++++++++++-- > src/box/box.h | 1 + > src/box/lua/load_cfg.lua | 2 +- > src/box/replication.cc | 4 +- > 4 files changed, 121 insertions(+), 7 deletions(-) > > diff --git a/src/box/box.cc b/src/box/box.cc > index a8bc3471d..2076a8dc3 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -554,10 +554,91 @@ box_check_replication_sync_lag(void) > return lag; > } > > +/** > + * Evaluate replication syncro quorum number from a formula. > + */ > +static int > +eval_replication_synchro_quorum(int nr_replicas) 2. Number of replicas is always passed as replicaset.registered_count. I suggest you to make this function take no args, and read replicaset.registered_count internally. Also would be good to rename it to box_eval_... . Because you touch box things here. Such as cfg_gets("replication_synchro_quorum"), for example. Which reads box.cfg. > +{ > + const char fmt[] = > + "local expr = [[%s]]\n" > + "local f, err = loadstring('return ('..expr..')')\n" > + "if not f then " > + "error(string.format('Failed to load \%\%s:" > + "\%\%s', expr, err)) " > + "end\n" > + "setfenv(f, {N = %d, math = {" > + "ceil = math.ceil," > + "floor = math.floor," > + "abs = math.abs," > + "random = math.random," > + "min = math.min," > + "max = math.abs," > + "sqrt = math.sqrt," > + "fmod = math.fmod," > + "}})\n" > + "local res = f()\n" > + "if type(res) ~= 'number' then\n" > + "error('Expression should return a number')\n" > + "end\n" > + "return math.floor(res)\n"; > + int value = -1; > + const char *expr = cfg_gets("replication_synchro_quorum"); > + const char *buf = tt_sprintf(fmt, expr, nr_replicas); 3. What is the result is >= TT_STATIC_BUF_LEN? I suspect a user will get a surprising error message, or will even get no error, but the expression will be just truncated. Does not look good. Oh, shit. I just found that cfg_gets() also uses the static buffer. Besides, it just truncates the string value to 256 chars. So whatever you specify as replication_synchro_quorum, if it is longer than 256, it is silently truncated. Also does not look good. But don't know how to fix it, and if we want to fix it now. > + > + luaL_loadstring(tarantool_L, buf); > + if (lua_pcall(tarantool_L, 0, 1, 0) != 0) { > + diag_set(ClientError, ER_CFG, > + "replication_synchro_quorum", > + lua_tostring(tarantool_L, -1)); > + return -1; > + } > + > + if (lua_isnumber(tarantool_L, -1)) > + value = (int)lua_tonumber(tarantool_L, -1); > + lua_pop(tarantool_L, 1); > + > + /* > + * At least we should have 1 node to sync, thus > + * if the formula has evaluated to some negative > + * value (say it was n-2) do not treat it as an > + * error but just yield a minimum valid magnitude. > + */ > + if (value >= VCLOCK_MAX) { > + const int value_max = VCLOCK_MAX - 1; > + say_warn("replication_synchro_quorum evaluated " > + "to value %d, set to %d", > + value, value_max); > + value = value_max; 4. When I said I want to see a warning when the quorum is evaluated to 0, while number of replicas is > 0, I didn't mean to delete the validation at all. Your example about 'n-2' is a proof that a negative value means an issue in user's code. Because if node count is 3, the quorum will be 1, and synchro guarantees simply don't work. But since we are here, there are two options: - Delete the upper bound validation as well. Because it makes no sense to check it if we allow to overflow the other bound. This warning does not warn about all invalid values anyway. Moreover, a "<= 0" quorum is more dangerous than a too big quorum IMO, as the user will commit data but with weaker guarantees. - Return the lower bound check and properly catch the case when the quorum is 0 illegally. It is easy. If the formula returned a negative value, it is always a warning. If the formula returned 0, but the number of replicas is > 0, then this is a warning. Everything else is correct (if no upper overflow). Including the case when the formula returned 0, and the number of replicas is 0 (happens at bootstrap). Also we could warn even in the latter case (0 quorum, 0 replicas). Because it signals that the user does not use N/2+1 formula. If user will want to do strange things like N*3/4 or N-2, then he will see warnings, will think more, and will add 'if's into his code or min/max calls, or will fix an issue in his code. Another third special option is kinda stupid, but reliable as fuck. When formula is changed, you can try it with all replica counts from 0 to 31. And if any returns an out of range value, we return an error saying on which size the bad value was returned. Then during cluster size changes we will never get a bad value. And the user won't need to read the logs to see the errors. Personally, I would just do it from the beginning. Box.cfg is rare, and all 32 values will be checked in the order of ones of microseconds I think. > + } > + > + /* > + * We never return 0, even if we're in bootstrap > + * stage were number of replicas equals zero we > + * should consider the node itself as a minimum > + * quorum number. > + */ > + return MAX(1, value); > +} > + > static int > box_check_replication_synchro_quorum(void) > { > - int quorum = cfg_geti("replication_synchro_quorum"); > + int quorum = 0; > + > + if (!cfg_isnumber("replication_synchro_quorum")) { > + /* > + * The formula uses symbolic name 'N' as > + * a number of currently registered replicas. > + */ > + int value = replicaset.registered_count; > + quorum = eval_replication_synchro_quorum(value); > + if (quorum < 0) > + return -1;> + } else { > + quorum = cfg_geti("replication_synchro_quorum"); > + } > + > if (quorum <= 0 || quorum >= VCLOCK_MAX) { > diag_set(ClientError, ER_CFG, "replication_synchro_quorum", > "the value must be greater than zero and less than " > @@ -910,15 +991,47 @@ box_set_replication_sync_lag(void) > replication_sync_lag = box_check_replication_sync_lag(); > } > > +/** > + * Renew replication_synchro_quorum value if defined > + * as a formula and we need to recalculate it. > + */ > +void > +box_update_replication_synchro_quorum(void) > +{ > + if (cfg_isnumber("replication_synchro_quorum")) { > + /* > + * Even if replication_synchro_quorum is a constant > + * number the RAFT engine should be notified on > + * change of replicas amount. > + */ > + box_raft_update_election_quorum(); 5. Why don't you update the limbo? And why don't you change replication_synchro_quorum global variable? It is not changed anywhere now. > + return; > + } > + > + /* > + * The formula has been verified already on the bootstrap > + * stage (and on dynamic reconfig as well), still there > + * is a Lua call inside, heck knowns what could go wrong 6. knowns -> knows. > + * there thus panic if we're screwed. > + */ > + int value = replicaset.registered_count; > + int quorum = eval_replication_synchro_quorum(value); > + if (quorum < 0 || quorum >= VCLOCK_MAX) > + panic("failed to eval replication_synchro_quorum"); > + say_info("update replication_synchro_quorum = %d", quorum); > + > + replication_synchro_quorum = quorum; > + txn_limbo_on_parameters_change(&txn_limbo); > + box_raft_update_election_quorum(); > +} > + > int > box_set_replication_synchro_quorum(void) > { > int value = box_check_replication_synchro_quorum(); > if (value < 0) > return -1; > - replication_synchro_quorum = value; > - txn_limbo_on_parameters_change(&txn_limbo); > - box_raft_update_election_quorum(); > + box_update_replication_synchro_quorum(); > return 0; > }
next prev parent reply other threads:[~2020-12-09 23:22 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-03 14:04 [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 1/3] cfg: add cfg_isnumber helper Cyrill Gorcunov 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-07 20:17 ` Cyrill Gorcunov 2020-12-07 21:25 ` Vladislav Shpilevoy 2020-12-07 21:48 ` Cyrill Gorcunov 2020-12-08 8:02 ` Cyrill Gorcunov 2020-12-09 23:22 ` Vladislav Shpilevoy [this message] 2020-12-11 12:25 ` Cyrill Gorcunov 2020-12-13 18:12 ` Vladislav Shpilevoy 2020-12-03 14:04 ` [Tarantool-patches] [PATCH v3 3/3] test: add replication/gh-5446-sqync-eval-quorum.test.lua Cyrill Gorcunov 2020-12-04 23:52 ` Vladislav Shpilevoy 2020-12-08 8:48 ` Cyrill Gorcunov 2020-12-09 23:22 ` Vladislav Shpilevoy 2020-12-04 10:15 ` [Tarantool-patches] [PATCH v3 0/3] qsync: evaluate replication_synchro_quorum dynamically Serge Petrenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4091c2f3-9990-80a2-1096-5c195903dac9@tarantool.org \ --to=v.shpilevoy@tarantool.org \ --cc=gorcunov@gmail.com \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.perepelitsa@corp.mail.ru \ --subject='Re: [Tarantool-patches] [PATCH v3 2/3] cfg: support symbolic evaluation of replication_synchro_quorum' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox