* [Tarantool-patches] [RFC 0/4] qsync: evaluate replication_synchro_quorum dynamically
@ 2020-11-19 19:40 Cyrill Gorcunov
2020-11-19 19:40 ` [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper Cyrill Gorcunov
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-19 19:40 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy
From the issue description:
> The reason is that users will likely not understand how the option
> should be configured properly, and will break something accidentally.
> So this idea about allowance to write an expression on the cluster
> size allows to specify the canonical N/2 + 1 formula, and not update
> it manually on all instances, when a new node is added, or an existing
> one is deleted.
The series provides a first draft/rfc for review. As to me current pucture
is muddy. The whole cfg subsystem is too overloaded - some parts are configured
on lua level then passed to C level where the values are finally stored. I think
we need to keep configuration core on C level, simply providing a thin interface
to lua this will allow to share settings (for example I've had a huge problems
when been implementing separate log module config).
Anyway in the series we start to allow cofigure replication_synchro_quorum
settings via symbolic representation. Please take a look. There is NO tests
yet, no docs update just an early draft to share. Maybe you share some ideas
how to make this code less messy since now I'm far from being happy with it.
Thus any comments are highly appreciated!
issue https://github.com/tarantool/tarantool/issues/5446
branch gorcunov/gh-5446-eval-quorum
Cyrill Gorcunov (4):
cfg: add cfg_isnumber helper
qsync: move synchro quorum update to separate routine
cfg: prepare symbolic evaluation of replication_synchro_quorum
qsync: allow to specify replication_synchro_quorum as a formula
src/box/alter.cc | 2 +
src/box/box.cc | 81 ++++++++++++++++++++++++++++++++++++++--
src/box/box.h | 1 +
src/box/lua/load_cfg.lua | 3 +-
src/box/replication.cc | 49 ++++++++++++++++++++++++
src/box/replication.h | 12 ++++++
src/cfg.c | 9 +++++
src/cfg.h | 6 +++
8 files changed, 157 insertions(+), 6 deletions(-)
--
2.26.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper
2020-11-19 19:40 [Tarantool-patches] [RFC 0/4] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
@ 2020-11-19 19:40 ` Cyrill Gorcunov
2020-11-20 9:53 ` Serge Petrenko
2020-11-19 19:40 ` [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine Cyrill Gorcunov
` (2 subsequent siblings)
3 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-19 19:40 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy
We will need it to figure out if parameter
is a numeric value when doing configuration
check.
Part-of #5446
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/cfg.c | 9 +++++++++
src/cfg.h | 6 ++++++
2 files changed, 15 insertions(+)
diff --git a/src/cfg.c b/src/cfg.c
index 46cff1999..cf07d5817 100644
--- a/src/cfg.c
+++ b/src/cfg.c
@@ -57,6 +57,15 @@ cfg_geti(const char *param)
return val;
}
+bool
+cfg_isnumber(const char *param)
+{
+ cfg_get(param);
+ bool ret = !!lua_isnumber(tarantool_L, -1);
+ lua_pop(tarantool_L, 1);
+ return ret;
+}
+
int
cfg_getb(const char *param)
{
diff --git a/src/cfg.h b/src/cfg.h
index 140bdddb8..e2955e6b2 100644
--- a/src/cfg.h
+++ b/src/cfg.h
@@ -40,6 +40,12 @@ extern "C" {
int
cfg_geti(const char *param);
+/**
+ * Test if cfg parameter is a number.
+ */
+bool
+cfg_isnumber(const char *param);
+
/**
* Gets boolean parameter of cfg.
* Returns -1 in case of nil
--
2.26.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine
2020-11-19 19:40 [Tarantool-patches] [RFC 0/4] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
2020-11-19 19:40 ` [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper Cyrill Gorcunov
@ 2020-11-19 19:40 ` Cyrill Gorcunov
2020-11-20 10:06 ` Serge Petrenko
2020-11-19 19:40 ` [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
2020-11-19 19:41 ` [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula Cyrill Gorcunov
3 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-19 19:40 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy
We will be updating replication_synchro_quorum parameter
dynamically once we add support for number evaluation
via symbolic formula, thus the update will be called
not only from manual configuration changes but from
cluster updates as well.
Thus to reuse the code lets gather it into a separate
routine.
Part-of #5446
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/box.cc | 4 +---
src/box/replication.cc | 18 ++++++++++++++++++
src/box/replication.h | 3 +++
3 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 1f7dec362..5fcf28cb3 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -924,9 +924,7 @@ box_set_replication_synchro_quorum(void)
int value = box_check_replication_synchro_quorum();
if (value < 0)
return -1;
- replication_synchro_quorum = value;
- txn_limbo_on_parameters_change(&txn_limbo);
- raft_cfg_election_quorum(box_raft());
+ replication_synchro_quorum_update(value);
return 0;
}
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 65512cf0f..c83392f81 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -40,7 +40,9 @@
#include "gc.h"
#include "error.h"
#include "relay.h"
+#include "raft.h"
#include "sio.h"
+#include "txn_limbo.h"
uint32_t instance_id = REPLICA_ID_NIL;
struct tt_uuid INSTANCE_UUID;
@@ -82,6 +84,22 @@ replicaset_quorum(void)
return MIN(replication_connect_quorum, replicaset.applier.total);
}
+/**
+ * Update the synchro quorum number and propagate the
+ * change to dependent subsystems.
+ */
+void
+replication_synchro_quorum_update(int value)
+{
+ assert(value > 0 && value < VCLOCK_MAX);
+
+ say_info("replication: replication_synchro_quorum = %d", value);
+
+ replication_synchro_quorum = value;
+ txn_limbo_on_parameters_change(&txn_limbo);
+ raft_cfg_election_quorum(box_raft());
+}
+
void
replication_init(void)
{
diff --git a/src/box/replication.h b/src/box/replication.h
index e57912848..ced519612 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -175,6 +175,9 @@ replication_disconnect_timeout(void)
return replication_timeout * 4;
}
+void
+replication_synchro_quorum_update(int value);
+
void
replication_init(void);
--
2.26.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-19 19:40 [Tarantool-patches] [RFC 0/4] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
2020-11-19 19:40 ` [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper Cyrill Gorcunov
2020-11-19 19:40 ` [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine Cyrill Gorcunov
@ 2020-11-19 19:40 ` Cyrill Gorcunov
2020-11-20 10:32 ` Serge Petrenko
2020-11-26 14:38 ` Mons Anderson
2020-11-19 19:41 ` [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula Cyrill Gorcunov
3 siblings, 2 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-19 19:40 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy
Prepare scaffolds to evaluate replication_synchro_quorum,
since we don't yet update the undeneath engines we accept
and verify the formulas but refuse to proceed.
In next patch we will support dynamic evaluation of
the quorum number.
Part-of #5446
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/box.cc | 82 +++++++++++++++++++++++++++++++++++++++-
src/box/box.h | 1 +
src/box/lua/load_cfg.lua | 3 +-
3 files changed, 83 insertions(+), 3 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 5fcf28cb3..5f7ddfa99 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -562,10 +562,90 @@ box_check_replication_sync_lag(void)
return lag;
}
+/**
+ * Evaluate replicaion syncro quorum number from a formula.
+ */
+int
+eval_replication_synchro_quorum(int nr_replicas)
+{
+ const char fmt[] =
+ "local f, err = loadstring(\"return (%s)\")\n"
+ "if not f then return 'failed to load \"%s\"' end\n"
+ "setfenv(f, { n = %d })\n"
+ "local ok, res = pcall(f)\n"
+ "if not ok then return res end\n"
+ "return math.floor(res)\n";
+ char buf[512];
+ int value = -1;
+
+ errno = 0;
+
+ const char *expr = cfg_gets("replication_synchro_quorum");
+ size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
+ expr, nr_replicas);
+ if (ret >= sizeof(buf)) {
+ errno = EINVAL;
+ diag_set(ClientError, ER_CFG,
+ "replication_synchro_quorum",
+ "the expression is too big");
+ return -1;
+ }
+
+ luaL_loadstring(tarantool_L, buf);
+ lua_call(tarantool_L, 0, 1);
+
+ if (lua_isnumber(tarantool_L, -1)) {
+ value = (int)lua_tonumber(tarantool_L, -1);
+ } else {
+ assert(lua_isstring(tarantool_L, -1));
+ errno = EINVAL;
+ diag_set(ClientError, ER_CFG,
+ "replication_synchro_quorum",
+ lua_tostring(tarantool_L, -1));
+ }
+ lua_pop(tarantool_L, 1);
+ return value;
+}
+
static int
box_check_replication_synchro_quorum(void)
{
- int quorum = cfg_geti("replication_synchro_quorum");
+ int quorum = 0;
+
+ if (!cfg_isnumber("replication_synchro_quorum")) {
+ /*
+ * When validating a formula it must return a
+ * positive value for a single node and maximum
+ * possible replicas because the quorum will be
+ * evaluated on each new replica registration,
+ * starting from a single node.
+ */
+ int v[] = {1, VCLOCK_MAX-1};
+ for (size_t i = 0; i < lengthof(v); i++) {
+ quorum = eval_replication_synchro_quorum(v[i]);
+ if (quorum < 0 && errno == EINVAL)
+ return -1;
+ }
+
+ /*
+ * Once syntax is valid we should pass the real
+ * default value from replication module itself
+ * to evaluate the actual value to use.
+ */
+ int value = replication_synchro_quorum;
+ quorum = eval_replication_synchro_quorum(value);
+ /*
+ * FIXME: Until we get full support.
+ */
+ diag_set(ClientError, ER_CFG,
+ "replication_synchro_quorum",
+ "symbolic evaluation is not yet supported");
+ diag_log();
+ quorum = -1;
+ } else {
+ quorum = cfg_geti("replication_synchro_quorum");
+ }
+
if (quorum <= 0 || quorum >= VCLOCK_MAX) {
diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
"the value must be greater than zero and less than "
diff --git a/src/box/box.h b/src/box/box.h
index b47a220b7..8f438faab 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void);
void box_set_replication_connect_quorum(void);
void box_set_replication_sync_lag(void);
int box_set_replication_synchro_quorum(void);
+int eval_replication_synchro_quorum(int nr_replicas);
int box_set_replication_synchro_timeout(void);
void box_set_replication_sync_timeout(void);
void box_set_replication_skip_conflict(void);
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 76e2e92c2..26725e08d 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -172,7 +172,7 @@ local template_cfg = {
replication_timeout = 'number',
replication_sync_lag = 'number',
replication_sync_timeout = 'number',
- replication_synchro_quorum = 'number',
+ replication_synchro_quorum = 'string, number',
replication_synchro_timeout = 'number',
replication_connect_timeout = 'number',
replication_connect_quorum = 'number',
@@ -368,7 +368,6 @@ local dynamic_cfg_skip_at_load = {
replication_connect_quorum = true,
replication_sync_lag = true,
replication_sync_timeout = true,
- replication_synchro_quorum = true,
replication_synchro_timeout = true,
replication_skip_conflict = true,
replication_anon = true,
--
2.26.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula
2020-11-19 19:40 [Tarantool-patches] [RFC 0/4] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
` (2 preceding siblings ...)
2020-11-19 19:40 ` [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
@ 2020-11-19 19:41 ` Cyrill Gorcunov
2020-11-20 10:50 ` Serge Petrenko
3 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-19 19:41 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy
When synchronous replication is used we prefer a user to specify
a quorum number, ie the number of replicas where data must be
replicated before the master node continue accepting new operations.
This is not that convenient since a user may not know initially
how many replicas will be used. Moreover the number of replicas
may variaty dynamically. For this sake we allow to specify the
number of quorum in symbolic way.
For example
box.cfg {
replication_synchro_quorum = "n/2+1",
}
where n is number of registered replicas in the cluster.
Once new replica attached or old one detached the number
is evaluated and propagated.
Internally on each "_cluster" system space update the trigger
runs replication_on_cluster_update() helper which counts the
number of registered replicas and update the quorum value notifying
limbo and raft about a change.
Closes #5446
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
src/box/alter.cc | 2 ++
src/box/box.cc | 11 +++--------
src/box/replication.cc | 31 +++++++++++++++++++++++++++++++
src/box/replication.h | 9 +++++++++
4 files changed, 45 insertions(+), 8 deletions(-)
diff --git a/src/box/alter.cc b/src/box/alter.cc
index 075b79d33..bd291ad4f 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -4196,6 +4196,7 @@ register_replica(struct trigger *trigger, void * /* event */)
} else {
try {
replica = replicaset_add(id, &uuid);
+ replication_on_cluster_update();
/* Can't throw exceptions from on_commit trigger */
} catch(Exception *e) {
panic("Can't register replica: %s", e->errmsg);
@@ -4216,6 +4217,7 @@ unregister_replica(struct trigger *trigger, void * /* event */)
struct replica *replica = replica_by_uuid(&old_uuid);
assert(replica != NULL);
replica_clear_id(replica);
+ replication_on_cluster_update();
return 0;
}
diff --git a/src/box/box.cc b/src/box/box.cc
index 5f7ddfa99..558a71468 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -634,14 +634,6 @@ box_check_replication_synchro_quorum(void)
*/
int value = replication_synchro_quorum;
quorum = eval_replication_synchro_quorum(value);
- /*
- * FIXME: Until we get full support.
- */
- diag_set(ClientError, ER_CFG,
- "replication_synchro_quorum",
- "symbolic evaluation is not yet supported");
- diag_log();
- quorum = -1;
} else {
quorum = cfg_geti("replication_synchro_quorum");
}
@@ -1004,6 +996,9 @@ box_set_replication_synchro_quorum(void)
int value = box_check_replication_synchro_quorum();
if (value < 0)
return -1;
+
+ bool isnumber = cfg_isnumber("replication_synchro_quorum");
+ replication_synchro_quorum_eval = !isnumber;
replication_synchro_quorum_update(value);
return 0;
}
diff --git a/src/box/replication.cc b/src/box/replication.cc
index c83392f81..bde850c1c 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -53,6 +53,7 @@ double replication_connect_timeout = 30.0; /* seconds */
int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
double replication_sync_lag = 10.0; /* seconds */
int replication_synchro_quorum = 1;
+bool replication_synchro_quorum_eval = false;
double replication_synchro_timeout = 5.0; /* seconds */
double replication_sync_timeout = 300.0; /* seconds */
bool replication_skip_conflict = false;
@@ -100,6 +101,36 @@ replication_synchro_quorum_update(int value)
raft_cfg_election_quorum(box_raft());
}
+/**
+ * Evaluate the new synchro quorum number when replica
+ * get registered/unregistered and the quorum depends on
+ * their amount via formula in config.
+ */
+void
+replication_on_cluster_update(void)
+{
+ if (!replication_synchro_quorum_eval)
+ return;
+
+ /*
+ * Account only registered replicas when evaluating
+ * quorum number from a fromula present in config.
+ */
+ int value = replicaset.registered_count - replicaset.anon_count;
+ int quorum = eval_replication_synchro_quorum(value);
+
+ /*
+ * Upon node bootstrap we verify configuration so there
+ * must never be a value out of bounds, still better to
+ * be sure since evaluation code lays far from here.
+ */
+ if (quorum <= 0 || quorum >= VCLOCK_MAX)
+ panic("Unexpected result for replication_synchro_quorum eval");
+
+ say_info("replication: evaluated quorum is %d", quorum);
+ replication_synchro_quorum_update(quorum);
+}
+
void
replication_init(void)
{
diff --git a/src/box/replication.h b/src/box/replication.h
index ced519612..fa651d1c5 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -131,6 +131,12 @@ extern double replication_sync_lag;
*/
extern int replication_synchro_quorum;
+/**
+ * A flag to point that replication_synchro_quorum needs
+ * to be evaluated as a formula.
+ */
+extern bool replication_synchro_quorum_eval;
+
/**
* Time in seconds which the master node is able to wait for ACKs
* for a synchronous transaction until it is rolled back.
@@ -178,6 +184,9 @@ replication_disconnect_timeout(void)
void
replication_synchro_quorum_update(int value);
+void
+replication_on_cluster_update(void);
+
void
replication_init(void);
--
2.26.2
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper
2020-11-19 19:40 ` [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper Cyrill Gorcunov
@ 2020-11-20 9:53 ` Serge Petrenko
0 siblings, 0 replies; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 9:53 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy
19.11.2020 22:40, Cyrill Gorcunov пишет:
> We will need it to figure out if parameter
> is a numeric value when doing configuration
> check.
>
> Part-of #5446
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> src/cfg.c | 9 +++++++++
> src/cfg.h | 6 ++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/src/cfg.c b/src/cfg.c
> index 46cff1999..cf07d5817 100644
> --- a/src/cfg.c
> +++ b/src/cfg.c
> @@ -57,6 +57,15 @@ cfg_geti(const char *param)
> return val;
> }
>
> +bool
> +cfg_isnumber(const char *param)
> +{
> + cfg_get(param);
> + bool ret = !!lua_isnumber(tarantool_L, -1);
> + lua_pop(tarantool_L, 1);
> + return ret;
> +}
> +
> int
> cfg_getb(const char *param)
> {
> diff --git a/src/cfg.h b/src/cfg.h
> index 140bdddb8..e2955e6b2 100644
> --- a/src/cfg.h
> +++ b/src/cfg.h
> @@ -40,6 +40,12 @@ extern "C" {
> int
> cfg_geti(const char *param);
>
> +/**
> + * Test if cfg parameter is a number.
> + */
> +bool
> +cfg_isnumber(const char *param);
> +
> /**
> * Gets boolean parameter of cfg.
> * Returns -1 in case of nil
LGTM.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine
2020-11-19 19:40 ` [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine Cyrill Gorcunov
@ 2020-11-20 10:06 ` Serge Petrenko
2020-11-20 11:01 ` Cyrill Gorcunov
0 siblings, 1 reply; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 10:06 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy
19.11.2020 22:40, Cyrill Gorcunov пишет:
> We will be updating replication_synchro_quorum parameter
> dynamically once we add support for number evaluation
> via symbolic formula, thus the update will be called
> not only from manual configuration changes but from
> cluster updates as well.
>
> Thus to reuse the code lets gather it into a separate
> routine.
>
> Part-of #5446
Hi! Thanks for the patch!
Please find my comments below.
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> src/box/box.cc | 4 +---
> src/box/replication.cc | 18 ++++++++++++++++++
> src/box/replication.h | 3 +++
> 3 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 1f7dec362..5fcf28cb3 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -924,9 +924,7 @@ box_set_replication_synchro_quorum(void)
> int value = box_check_replication_synchro_quorum();
> if (value < 0)
> return -1;
> - replication_synchro_quorum = value;
> - txn_limbo_on_parameters_change(&txn_limbo);
> - raft_cfg_election_quorum(box_raft());
> + replication_synchro_quorum_update(value);
> return 0;
> }
IMO it'd be better to name the new function somewhat similar to
`box_update_replication_synchro_quorum` and leave it in box.cc
This way you avoid adding 2 new dependencies to replication.cc
and leave all the code dealing with reconfiguration at one place.
Besides, replication.cc already depends on box, so it won't be a problem
to call update_replication_synchro_quorum there.
>
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index 65512cf0f..c83392f81 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -40,7 +40,9 @@
> #include "gc.h"
> #include "error.h"
> #include "relay.h"
> +#include "raft.h"
> #include "sio.h"
> +#include "txn_limbo.h"
>
> uint32_t instance_id = REPLICA_ID_NIL;
> struct tt_uuid INSTANCE_UUID;
> @@ -82,6 +84,22 @@ replicaset_quorum(void)
> return MIN(replication_connect_quorum, replicaset.applier.total);
> }
>
> +/**
> + * Update the synchro quorum number and propagate the
> + * change to dependent subsystems.
> + */
> +void
> +replication_synchro_quorum_update(int value)
> +{
> + assert(value > 0 && value < VCLOCK_MAX);
> +
> + say_info("replication: replication_synchro_quorum = %d", value);
Load_cfg.lua will say something similar, when replication_synchro_quorum is
a number:
```
log.info("set '%s' configuration option to %s", key,
json.encode(val))
```
So this say_info belongs to the trigger you invoke on replica
register/unregister.
> +
> + replication_synchro_quorum = value;
> + txn_limbo_on_parameters_change(&txn_limbo);
> + raft_cfg_election_quorum(box_raft());
> +}
> +
> void
> replication_init(void)
> {
> diff --git a/src/box/replication.h b/src/box/replication.h
> index e57912848..ced519612 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -175,6 +175,9 @@ replication_disconnect_timeout(void)
> return replication_timeout * 4;
> }
>
> +void
> +replication_synchro_quorum_update(int value);
> +
> void
> replication_init(void);
>
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-19 19:40 ` [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
@ 2020-11-20 10:32 ` Serge Petrenko
2020-11-20 11:34 ` Cyrill Gorcunov
2020-11-26 14:38 ` Mons Anderson
1 sibling, 1 reply; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 10:32 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy
19.11.2020 22:40, Cyrill Gorcunov пишет:
> Prepare scaffolds to evaluate replication_synchro_quorum,
> since we don't yet update the undeneath engines we accept
> and verify the formulas but refuse to proceed.
>
> In next patch we will support dynamic evaluation of
> the quorum number.
>
> Part-of #5446
Hi! Thanks for the patch!
Please find my comments below.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> src/box/box.cc | 82 +++++++++++++++++++++++++++++++++++++++-
> src/box/box.h | 1 +
> src/box/lua/load_cfg.lua | 3 +-
> 3 files changed, 83 insertions(+), 3 deletions(-)
>
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 5fcf28cb3..5f7ddfa99 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -562,10 +562,90 @@ box_check_replication_sync_lag(void)
> return lag;
> }
>
> +/**
> + * Evaluate replicaion syncro quorum number from a formula.
> + */
> +int
> +eval_replication_synchro_quorum(int nr_replicas)
> +{
> + const char fmt[] =
> + "local f, err = loadstring(\"return (%s)\")\n"
> + "if not f then return 'failed to load \"%s\"' end\n"
> + "setfenv(f, { n = %d })\n"
> + "local ok, res = pcall(f)\n"
> + "if not ok then return res end\n"
> + "return math.floor(res)\n";
> + char buf[512];
> + int value = -1;
> +
> + errno = 0;
> +
> + const char *expr = cfg_gets("replication_synchro_quorum");
> + size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
> + expr, nr_replicas);
> + if (ret >= sizeof(buf)) {
> + errno = EINVAL;
> + diag_set(ClientError, ER_CFG,
> + "replication_synchro_quorum",
> + "the expression is too big");
> + return -1;
> + }
> +
> + luaL_loadstring(tarantool_L, buf);
> + lua_call(tarantool_L, 0, 1);
> +
> + if (lua_isnumber(tarantool_L, -1)) {
> + value = (int)lua_tonumber(tarantool_L, -1);
> + } else {
> + assert(lua_isstring(tarantool_L, -1));
> + errno = EINVAL;
> + diag_set(ClientError, ER_CFG,
> + "replication_synchro_quorum",
> + lua_tostring(tarantool_L, -1));
> + }
> + lua_pop(tarantool_L, 1);
> + return value;
> +}
Better make the function return -1 on an error, and when the function
evaluates to something negative,
log the eval result and return 1. This way you won't need errno.
> +
> static int
> box_check_replication_synchro_quorum(void)
> {
> - int quorum = cfg_geti("replication_synchro_quorum");
> + int quorum = 0;
> +
> + if (!cfg_isnumber("replication_synchro_quorum")) {
> + /*
> + * When validating a formula it must return a
> + * positive value for a single node and maximum
> + * possible replicas because the quorum will be
> + * evaluated on each new replica registration,
> + * starting from a single node.
> + */
> + int v[] = {1, VCLOCK_MAX-1};
> + for (size_t i = 0; i < lengthof(v); i++) {
> + quorum = eval_replication_synchro_quorum(v[i]);
> + if (quorum < 0 && errno == EINVAL)
> + return -1;
> + }
You should either check every possible value, from 1 to VCLOCK_MAX - 1,
to make sure, say, that no division by zero is involved for some input.
or check a single value, say, 1, or 2 or whatever, to make sure that the
expression in replication_synchro_quorum is at least valid Lua code.
Why check only min and max inputs?
> +
> + /*
> + * Once syntax is valid we should pass the real
> + * default value from replication module itself
> + * to evaluate the actual value to use.
> + */
> + int value = replication_synchro_quorum;
> + quorum = eval_replication_synchro_quorum(value);
> + /*
> + * FIXME: Until we get full support.
> + */
> + diag_set(ClientError, ER_CFG,
> + "replication_synchro_quorum",
> + "symbolic evaluation is not yet supported");
> + diag_log();
> + quorum = -1;
> + } else {
> + quorum = cfg_geti("replication_synchro_quorum");
> + }
> +
> if (quorum <= 0 || quorum >= VCLOCK_MAX) {
> diag_set(ClientError, ER_CFG, "replication_synchro_quorum",
> "the value must be greater than zero and less than "
> diff --git a/src/box/box.h b/src/box/box.h
> index b47a220b7..8f438faab 100644
> --- a/src/box/box.h
> +++ b/src/box/box.h
> @@ -252,6 +252,7 @@ void box_set_replication_connect_timeout(void);
> void box_set_replication_connect_quorum(void);
> void box_set_replication_sync_lag(void);
> int box_set_replication_synchro_quorum(void);
> +int eval_replication_synchro_quorum(int nr_replicas);
> int box_set_replication_synchro_timeout(void);
> void box_set_replication_sync_timeout(void);
> void box_set_replication_skip_conflict(void);
> diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
> index 76e2e92c2..26725e08d 100644
> --- a/src/box/lua/load_cfg.lua
> +++ b/src/box/lua/load_cfg.lua
> @@ -172,7 +172,7 @@ local template_cfg = {
> replication_timeout = 'number',
> replication_sync_lag = 'number',
> replication_sync_timeout = 'number',
> - replication_synchro_quorum = 'number',
> + replication_synchro_quorum = 'string, number',
> replication_synchro_timeout = 'number',
> replication_connect_timeout = 'number',
> replication_connect_quorum = 'number',
> @@ -368,7 +368,6 @@ local dynamic_cfg_skip_at_load = {
> replication_connect_quorum = true,
> replication_sync_lag = true,
> replication_sync_timeout = true,
> - replication_synchro_quorum = true,
You shouldn't remove replication_synchro_quorum from here.
This table lists the options which are set directly from `box_cfg` in
specific order.
I.e. not lua calls box_cfg_set_..., but box_cfg() itself does. But only
on the first box.cfg call.
> replication_synchro_timeout = true,
> replication_skip_conflict = true,
> replication_anon = true,
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula
2020-11-19 19:41 ` [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula Cyrill Gorcunov
@ 2020-11-20 10:50 ` Serge Petrenko
2020-11-20 12:01 ` Cyrill Gorcunov
0 siblings, 1 reply; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 10:50 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Vladislav Shpilevoy
19.11.2020 22:41, Cyrill Gorcunov пишет:
> When synchronous replication is used we prefer a user to specify
> a quorum number, ie the number of replicas where data must be
> replicated before the master node continue accepting new operations.
>
> This is not that convenient since a user may not know initially
> how many replicas will be used. Moreover the number of replicas
> may variaty dynamically. For this sake we allow to specify the
> number of quorum in symbolic way.
>
> For example
>
> box.cfg {
> replication_synchro_quorum = "n/2+1",
> }
>
> where n is number of registered replicas in the cluster.
> Once new replica attached or old one detached the number
> is evaluated and propagated.
>
> Internally on each "_cluster" system space update the trigger
> runs replication_on_cluster_update() helper which counts the
> number of registered replicas and update the quorum value notifying
> limbo and raft about a change.
>
> Closes #5446
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
> src/box/alter.cc | 2 ++
> src/box/box.cc | 11 +++--------
> src/box/replication.cc | 31 +++++++++++++++++++++++++++++++
> src/box/replication.h | 9 +++++++++
> 4 files changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 075b79d33..bd291ad4f 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -4196,6 +4196,7 @@ register_replica(struct trigger *trigger, void * /* event */)
> } else {
> try {
> replica = replicaset_add(id, &uuid);
> + replication_on_cluster_update();
> /* Can't throw exceptions from on_commit trigger */
> } catch(Exception *e) {
> panic("Can't register replica: %s", e->errmsg);
> @@ -4216,6 +4217,7 @@ unregister_replica(struct trigger *trigger, void * /* event */)
> struct replica *replica = replica_by_uuid(&old_uuid);
> assert(replica != NULL);
> replica_clear_id(replica);
> + replication_on_cluster_update();
> return 0;
> }
>
We usually perform all the work related to replica register/unregister
directly in replica_set_id
and replica_clear_id.
Besides, these are the places where replicaset.registered_count is updated,
so it'd be nice to call replication_on_cluster_update from there.
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 5f7ddfa99..558a71468 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -634,14 +634,6 @@ box_check_replication_synchro_quorum(void)
> */
> int value = replication_synchro_quorum;
> quorum = eval_replication_synchro_quorum(value);
Pass replicaset.registered_count instead of replication_synchro_quorum here.
> - /*
> - * FIXME: Until we get full support.
> - */
> - diag_set(ClientError, ER_CFG,
> - "replication_synchro_quorum",
> - "symbolic evaluation is not yet supported");
> - diag_log();
> - quorum = -1;
> } else {
> quorum = cfg_geti("replication_synchro_quorum");
> }
> @@ -1004,6 +996,9 @@ box_set_replication_synchro_quorum(void)
> int value = box_check_replication_synchro_quorum();
> if (value < 0)
> return -1;
> +
> + bool isnumber = cfg_isnumber("replication_synchro_quorum");
> + replication_synchro_quorum_eval = !isnumber;
> replication_synchro_quorum_update(value);
> return 0;
> }
> diff --git a/src/box/replication.cc b/src/box/replication.cc
> index c83392f81..bde850c1c 100644
> --- a/src/box/replication.cc
> +++ b/src/box/replication.cc
> @@ -53,6 +53,7 @@ double replication_connect_timeout = 30.0; /* seconds */
> int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
> double replication_sync_lag = 10.0; /* seconds */
> int replication_synchro_quorum = 1;
> +bool replication_synchro_quorum_eval = false;
> double replication_synchro_timeout = 5.0; /* seconds */
> double replication_sync_timeout = 300.0; /* seconds */
> bool replication_skip_conflict = false;
> @@ -100,6 +101,36 @@ replication_synchro_quorum_update(int value)
> raft_cfg_election_quorum(box_raft());
> }
>
> +/**
> + * Evaluate the new synchro quorum number when replica
> + * get registered/unregistered and the quorum depends on
> + * their amount via formula in config.
> + */
> +void
> +replication_on_cluster_update(void)
> +{
> + if (!replication_synchro_quorum_eval)
> + return;
> +
> + /*
> + * Account only registered replicas when evaluating
> + * quorum number from a fromula present in config.
> + */
> + int value = replicaset.registered_count - replicaset.anon_count;
registered_count stands for 'normal' replica count, so no need to subtract
anon_count from it.
> + int quorum = eval_replication_synchro_quorum(value);
> +
> + /*
> + * Upon node bootstrap we verify configuration so there
> + * must never be a value out of bounds, still better to
> + * be sure since evaluation code lays far from here.
> + */
> + if (quorum <= 0 || quorum >= VCLOCK_MAX)
> + panic("Unexpected result for replication_synchro_quorum eval");
> +
> + say_info("replication: evaluated quorum is %d", quorum);
quorum -> replication_synchro_quorum
> + replication_synchro_quorum_update(quorum);
> +}
> +
> void
> replication_init(void)
> {
> diff --git a/src/box/replication.h b/src/box/replication.h
> index ced519612..fa651d1c5 100644
> --- a/src/box/replication.h
> +++ b/src/box/replication.h
> @@ -131,6 +131,12 @@ extern double replication_sync_lag;
> */
> extern int replication_synchro_quorum;
>
> +/**
> + * A flag to point that replication_synchro_quorum needs
> + * to be evaluated as a formula.
> + */
> +extern bool replication_synchro_quorum_eval;
> +
> /**
> * Time in seconds which the master node is able to wait for ACKs
> * for a synchronous transaction until it is rolled back.
> @@ -178,6 +184,9 @@ replication_disconnect_timeout(void)
> void
> replication_synchro_quorum_update(int value);
>
> +void
> +replication_on_cluster_update(void);
> +
> void
> replication_init(void);
>
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine
2020-11-20 10:06 ` Serge Petrenko
@ 2020-11-20 11:01 ` Cyrill Gorcunov
2020-11-20 11:39 ` Serge Petrenko
0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-20 11:01 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy
On Fri, Nov 20, 2020 at 01:06:55PM +0300, Serge Petrenko wrote:
>
> IMO it'd be better to name the new function somewhat similar to
> `box_update_replication_synchro_quorum` and leave it in box.cc
>
>
> This way you avoid adding 2 new dependencies to replication.cc
> and leave all the code dealing with reconfiguration at one place.
>
> Besides, replication.cc already depends on box, so it won't be a problem
> to call update_replication_synchro_quorum there.
Yeah, I don't mind, thanks!
> Load_cfg.lua will say something similar, when replication_synchro_quorum is
> a number:
>
> ```
>
> log.info("set '%s' configuration option to %s", key,
> json.encode(val))
>
> ```
>
> So this say_info belongs to the trigger you invoke on replica
> register/unregister.
It will say "set ... to n+2/1" while we're printing the evaluated
integer value which is more informative.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-20 10:32 ` Serge Petrenko
@ 2020-11-20 11:34 ` Cyrill Gorcunov
2020-11-20 11:56 ` Serge Petrenko
0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-20 11:34 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy
On Fri, Nov 20, 2020 at 01:32:42PM +0300, Serge Petrenko wrote:
> > +/**
> > + * Evaluate replicaion syncro quorum number from a formula.
> > + */
> > +int
> > +eval_replication_synchro_quorum(int nr_replicas)
> > +{
> > + const char fmt[] =
> > + "local f, err = loadstring(\"return (%s)\")\n"
> > + "if not f then return 'failed to load \"%s\"' end\n"
> > + "setfenv(f, { n = %d })\n"
> > + "local ok, res = pcall(f)\n"
> > + "if not ok then return res end\n"
> > + "return math.floor(res)\n";
> > + char buf[512];
> > + int value = -1;
> > +
> > + errno = 0;
> > +
> > + const char *expr = cfg_gets("replication_synchro_quorum");
> > + size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
> > + expr, nr_replicas);
> > + if (ret >= sizeof(buf)) {
> > + errno = EINVAL;
> > + diag_set(ClientError, ER_CFG,
> > + "replication_synchro_quorum",
> > + "the expression is too big");
> > + return -1;
> > + }
> > +
> > + luaL_loadstring(tarantool_L, buf);
> > + lua_call(tarantool_L, 0, 1);
> > +
> > + if (lua_isnumber(tarantool_L, -1)) {
> > + value = (int)lua_tonumber(tarantool_L, -1);
> > + } else {
> > + assert(lua_isstring(tarantool_L, -1));
> > + errno = EINVAL;
> > + diag_set(ClientError, ER_CFG,
> > + "replication_synchro_quorum",
> > + lua_tostring(tarantool_L, -1));
> > + }
> > + lua_pop(tarantool_L, 1);
> > + return value;
> > +}
>
> Better make the function return -1 on an error, and when the function
> evaluates to something negative,
> log the eval result and return 1. This way you won't need errno.
We need to distinguish wrong syntax and wrong evaluated values somehow
which means using -1 for anything wont work here.
- wrong syntax (which includes too long formula): -1 && errno
- on correct syntax we simply return the result the caller has
to check if returned value fits the allowed range.
Or I'm missing something.
> You should either check every possible value, from 1 to VCLOCK_MAX - 1,
> to make sure, say, that no division by zero is involved for some input.
That's the good point. Another question if we should allow formulas like
n-2, and while n <= 2 assume the quorum to be 1? Ie max(1, eval(n))
> or check a single value, say, 1, or 2 or whatever, to make sure that the
> expression in replication_synchro_quorum is at least valid Lua code.
>
> Why check only min and max inputs?
Lets consider the evaluation a bit closer. Denote arguments as x, thus
we will have x ∈ N, where are x's are strictly ordered, thus to get
f(x) < 0 between min/max values we need to use nonlinear f. I must confess
I don't know what is allowed inside Lua evaluator and really don't like
that we're allowing to pass completely arbitrary formula here. Moreover
what would happen if f would be using some weird data so that if will be
linear on bootstrap procedure but then switched to something more complex?
All this arbitrary evaluation looks very very fishy and I bet there will
be problems because this is the data we do not control completely.
Back to the former question -- initially I assume the f gonna be linear
and eval in min/max will be enough. But of course this is not correct.
You know I can pass all N's here but still this doesn't guarantee anything :(
That's why I'm for more strict rules here:
- allow some symbolic names such as
"all" -> (alias for f(x) = n)
"canonical" -> (alias for f(x) = n/2 + 1)
- implement own syntax analizyer/evaluator which would allow only
basic operation {+,/} over the set of {n} (which will give us
a linear functioni by definition). This of course will require
a way more code so not right now.
> > @@ -368,7 +368,6 @@ local dynamic_cfg_skip_at_load = {
> > replication_connect_quorum = true,
> > replication_sync_lag = true,
> > replication_sync_timeout = true,
> > - replication_synchro_quorum = true,
>
> You shouldn't remove replication_synchro_quorum from here.
> This table lists the options which are set directly from `box_cfg` in
> specific order.
No, this table is to _skip_ evaluation on bootup. But we have to verify
the default value to evaluate.
> I.e. not lua calls box_cfg_set_..., but box_cfg() itself does. But only on
> the first box.cfg call.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine
2020-11-20 11:01 ` Cyrill Gorcunov
@ 2020-11-20 11:39 ` Serge Petrenko
2020-11-20 11:47 ` Cyrill Gorcunov
0 siblings, 1 reply; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 11:39 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml, Vladislav Shpilevoy
20.11.2020 14:01, Cyrill Gorcunov пишет:
> On Fri, Nov 20, 2020 at 01:06:55PM +0300, Serge Petrenko wrote:
>> IMO it'd be better to name the new function somewhat similar to
>> `box_update_replication_synchro_quorum` and leave it in box.cc
>>
>>
>> This way you avoid adding 2 new dependencies to replication.cc
>> and leave all the code dealing with reconfiguration at one place.
>>
>> Besides, replication.cc already depends on box, so it won't be a problem
>> to call update_replication_synchro_quorum there.
> Yeah, I don't mind, thanks!
>
>> Load_cfg.lua will say something similar, when replication_synchro_quorum is
>> a number:
>>
>> ```
>>
>> log.info("set '%s' configuration option to %s", key,
>> json.encode(val))
>>
>> ```
>>
>> So this say_info belongs to the trigger you invoke on replica
>> register/unregister.
> It will say "set ... to n+2/1" while we're printing the evaluated
> integer value which is more informative.
Yes, but when it's actually configured to number, there'll be duplicate
messages.
You already have the needed log message in the next commit.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine
2020-11-20 11:39 ` Serge Petrenko
@ 2020-11-20 11:47 ` Cyrill Gorcunov
0 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-20 11:47 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy
On Fri, Nov 20, 2020 at 02:39:03PM +0300, Serge Petrenko wrote:
> > It will say "set ... to n+2/1" while we're printing the evaluated
> > integer value which is more informative.
>
> Yes, but when it's actually configured to number, there'll be duplicate
> messages.
>
> You already have the needed log message in the next commit.
+1, thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-20 11:34 ` Cyrill Gorcunov
@ 2020-11-20 11:56 ` Serge Petrenko
2020-11-20 12:14 ` Cyrill Gorcunov
0 siblings, 1 reply; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 11:56 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml, Vladislav Shpilevoy
20.11.2020 14:34, Cyrill Gorcunov пишет:
> On Fri, Nov 20, 2020 at 01:32:42PM +0300, Serge Petrenko wrote:
>>> +/**
>>> + * Evaluate replicaion syncro quorum number from a formula.
>>> + */
>>> +int
>>> +eval_replication_synchro_quorum(int nr_replicas)
>>> +{
>>> + const char fmt[] =
>>> + "local f, err = loadstring(\"return (%s)\")\n"
>>> + "if not f then return 'failed to load \"%s\"' end\n"
>>> + "setfenv(f, { n = %d })\n"
>>> + "local ok, res = pcall(f)\n"
>>> + "if not ok then return res end\n"
>>> + "return math.floor(res)\n";
>>> + char buf[512];
>>> + int value = -1;
>>> +
>>> + errno = 0;
>>> +
>>> + const char *expr = cfg_gets("replication_synchro_quorum");
>>> + size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
>>> + expr, nr_replicas);
>>> + if (ret >= sizeof(buf)) {
>>> + errno = EINVAL;
>>> + diag_set(ClientError, ER_CFG,
>>> + "replication_synchro_quorum",
>>> + "the expression is too big");
>>> + return -1;
>>> + }
>>> +
>>> + luaL_loadstring(tarantool_L, buf);
>>> + lua_call(tarantool_L, 0, 1);
>>> +
>>> + if (lua_isnumber(tarantool_L, -1)) {
>>> + value = (int)lua_tonumber(tarantool_L, -1);
>>> + } else {
>>> + assert(lua_isstring(tarantool_L, -1));
>>> + errno = EINVAL;
>>> + diag_set(ClientError, ER_CFG,
>>> + "replication_synchro_quorum",
>>> + lua_tostring(tarantool_L, -1));
>>> + }
>>> + lua_pop(tarantool_L, 1);
>>> + return value;
>>> +}
>> Better make the function return -1 on an error, and when the function
>> evaluates to something negative,
>> log the eval result and return 1. This way you won't need errno.
> We need to distinguish wrong syntax and wrong evaluated values somehow
> which means using -1 for anything wont work here.
>
> - wrong syntax (which includes too long formula): -1 && errno
> - on correct syntax we simply return the result the caller has
> to check if returned value fits the allowed range.
>
> Or I'm missing something.
I was thinking that we're gonna return something like max(1, min(value, 31))
so that any evaluated number is correct. Lets better discuss this
verbally then.
>
>> You should either check every possible value, from 1 to VCLOCK_MAX - 1,
>> to make sure, say, that no division by zero is involved for some input.
> That's the good point. Another question if we should allow formulas like
> n-2, and while n <= 2 assume the quorum to be 1? Ie max(1, eval(n))
Yes, that's what I was speaking about above. So that when the formula may
be evaluated correctly (i.e. without division by zero or syntax errors) its
result will automatically be correct.
>
>> or check a single value, say, 1, or 2 or whatever, to make sure that the
>> expression in replication_synchro_quorum is at least valid Lua code.
>>
>> Why check only min and max inputs?
> Lets consider the evaluation a bit closer. Denote arguments as x, thus
> we will have x ∈ N, where are x's are strictly ordered, thus to get
> f(x) < 0 between min/max values we need to use nonlinear f. I must confess
> I don't know what is allowed inside Lua evaluator and really don't like
> that we're allowing to pass completely arbitrary formula here. Moreover
> what would happen if f would be using some weird data so that if will be
> linear on bootstrap procedure but then switched to something more complex?
> All this arbitrary evaluation looks very very fishy and I bet there will
> be problems because this is the data we do not control completely.
I guess we shouldn't be this crazy about what is allowed in this formula
and what's not.
If a user has access to box.cfg{}, he may evaluate any expression he
wishes anyway.
Anyway, this is subject of a verbal discussion.
>
> Back to the former question -- initially I assume the f gonna be linear
> and eval in min/max will be enough. But of course this is not correct.
>
> You know I can pass all N's here but still this doesn't guarantee anything :(
> That's why I'm for more strict rules here:
>
> - allow some symbolic names such as
> "all" -> (alias for f(x) = n)
> "canonical" -> (alias for f(x) = n/2 + 1)
Sounds good to me. AFAIR others were agains it, though.
>
> - implement own syntax analizyer/evaluator which would allow only
> basic operation {+,/} over the set of {n} (which will give us
> a linear functioni by definition). This of course will require
> a way more code so not right now.
>
>>> @@ -368,7 +368,6 @@ local dynamic_cfg_skip_at_load = {
>>> replication_connect_quorum = true,
>>> replication_sync_lag = true,
>>> replication_sync_timeout = true,
>>> - replication_synchro_quorum = true,
>> You shouldn't remove replication_synchro_quorum from here.
>> This table lists the options which are set directly from `box_cfg` in
>> specific order.
> No, this table is to _skip_ evaluation on bootup. But we have to verify
> the default value to evaluate.
Yes, that's what I'm talking about.
Even if the cfg option from this list is 'skipped' in lua, it's
referenced
directly from box_cfg_xc(). Othervise the `box_cfg_set_...` will be
called twice.
Once from box_cfg_xc(), second time from this lua code.
To be more verbose, all the setters from dynamic_cfg_skip_at_load are
called on
bootstrap. But from box_cfg_xc() in C, not from Lua. If you remove an entry
from dynamic_cfg_skip_at_load, the corresponding setter will be called
twice.
>
>> I.e. not lua calls box_cfg_set_..., but box_cfg() itself does. But only on
>> the first box.cfg call.
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula
2020-11-20 10:50 ` Serge Petrenko
@ 2020-11-20 12:01 ` Cyrill Gorcunov
2020-11-20 12:41 ` Serge Petrenko
0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-20 12:01 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy
On Fri, Nov 20, 2020 at 01:50:50PM +0300, Serge Petrenko wrote:
> > @@ -4216,6 +4217,7 @@ unregister_replica(struct trigger *trigger, void * /* event */)
> > struct replica *replica = replica_by_uuid(&old_uuid);
> > assert(replica != NULL);
> > replica_clear_id(replica);
> > + replication_on_cluster_update();
> > return 0;
> > }
>
> We usually perform all the work related to replica register/unregister
> directly in replica_set_id
> and replica_clear_id.
>
> Besides, these are the places where replicaset.registered_count is updated,
> so it'd be nice to call replication_on_cluster_update from there.
Sure.
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -634,14 +634,6 @@ box_check_replication_synchro_quorum(void)
> > */
> > int value = replication_synchro_quorum;
> > quorum = eval_replication_synchro_quorum(value);
>
> > Pass replicaset.registered_count instead of replication_synchro_quorum here.
Wait, this is bootstrap, replicaset.registered_count is 0 at this moment, no?
> > }
> > +/**
> > + * Evaluate the new synchro quorum number when replica
> > + * get registered/unregistered and the quorum depends on
> > + * their amount via formula in config.
> > + */
> > +void
> > +replication_on_cluster_update(void)
> > +{
> > + if (!replication_synchro_quorum_eval)
> > + return;
> > +
> > + /*
> > + * Account only registered replicas when evaluating
> > + * quorum number from a fromula present in config.
> > + */
> > + int value = replicaset.registered_count - replicaset.anon_count;
>
>
> registered_count stands for 'normal' replica count, so no need to subtract
> anon_count from it.
Yeah, thanks!
> > +
> > + say_info("replication: evaluated quorum is %d", quorum);
>
>
> quorum -> replication_synchro_quorum
ok
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-20 11:56 ` Serge Petrenko
@ 2020-11-20 12:14 ` Cyrill Gorcunov
0 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-20 12:14 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy
On Fri, Nov 20, 2020 at 02:56:12PM +0300, Serge Petrenko wrote:
>
> I was thinking that we're gonna return something like max(1, min(value, 31))
>
> so that any evaluated number is correct. Lets better discuss this verbally
> then.
Ah, I see. Actually I don't mind to change it this way. Looks reasonable.
>
> > > You should either check every possible value, from 1 to VCLOCK_MAX - 1,
> > > to make sure, say, that no division by zero is involved for some input.
> > That's the good point. Another question if we should allow formulas like
> > n-2, and while n <= 2 assume the quorum to be 1? Ie max(1, eval(n))
>
> Yes, that's what I was speaking about above. So that when the formula may
> be evaluated correctly (i.e. without division by zero or syntax errors) its
> result will automatically be correct.
+1
> I guess we shouldn't be this crazy about what is allowed in this formula and
> what's not.
> If a user has access to box.cfg{}, he may evaluate any expression he wishes
> anyway.
>
> Anyway, this is subject of a verbal discussion.
Yes, better discuss. All this formalism is done in a sake "lets provide users
options to make sync replication guaranteed" and this contradict the requirements
with ability to run arbitrary formula :(
> > Back to the former question -- initially I assume the f gonna be linear
> > and eval in min/max will be enough. But of course this is not correct.
> >
> > You know I can pass all N's here but still this doesn't guarantee anything :(
> > That's why I'm for more strict rules here:
> >
> > - allow some symbolic names such as
> > "all" -> (alias for f(x) = n)
> > "canonical" -> (alias for f(x) = n/2 + 1)
>
> Sounds good to me. AFAIR others were agains it, though.
Yeah, except noone gave a good reason how manually defined formulas
are better than predefined ones. Users usually doesn't care about
config specifics they simply need a guaranteed replication to not
loose their data.
> > > You shouldn't remove replication_synchro_quorum from here.
> > > This table lists the options which are set directly from `box_cfg` in
> > > specific order.
> > No, this table is to _skip_ evaluation on bootup. But we have to verify
> > the default value to evaluate.
>
>
> Yes, that's what I'm talking about.
> Even if the cfg option from this list is 'skipped' in lua, it's
> referenced
> directly from box_cfg_xc(). Othervise the `box_cfg_set_...` will be called
> twice.
> Once from box_cfg_xc(), second time from this lua code.
>
> To be more verbose, all the setters from dynamic_cfg_skip_at_load are called
> on
> bootstrap. But from box_cfg_xc() in C, not from Lua. If you remove an entry
> from dynamic_cfg_skip_at_load, the corresponding setter will be called
> twice.
I'll recheck, thanks Serge!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula
2020-11-20 12:01 ` Cyrill Gorcunov
@ 2020-11-20 12:41 ` Serge Petrenko
2020-11-20 15:00 ` Cyrill Gorcunov
0 siblings, 1 reply; 21+ messages in thread
From: Serge Petrenko @ 2020-11-20 12:41 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tml, Vladislav Shpilevoy
20.11.2020 15:01, Cyrill Gorcunov пишет:
> On Fri, Nov 20, 2020 at 01:50:50PM +0300, Serge Petrenko wrote:
>>> @@ -4216,6 +4217,7 @@ unregister_replica(struct trigger *trigger, void * /* event */)
>>> struct replica *replica = replica_by_uuid(&old_uuid);
>>> assert(replica != NULL);
>>> replica_clear_id(replica);
>>> + replication_on_cluster_update();
>>> return 0;
>>> }
>> We usually perform all the work related to replica register/unregister
>> directly in replica_set_id
>> and replica_clear_id.
>>
>> Besides, these are the places where replicaset.registered_count is updated,
>> so it'd be nice to call replication_on_cluster_update from there.
> Sure.
>
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -634,14 +634,6 @@ box_check_replication_synchro_quorum(void)
>>> */
>>> int value = replication_synchro_quorum;
>>> quorum = eval_replication_synchro_quorum(value);
>>> Pass replicaset.registered_count instead of replication_synchro_quorum here.
> Wait, this is bootstrap, replicaset.registered_count is 0 at this moment, no?
Hm, I didn't think of this.
Yes, this is either bootstrap or reconfiguration.
Well, if we have a max(1, value) guard nothing bad should happen even if we
pass 0 on bootstrap. The formula will be reevaluated each time
a replica is registered anyway. Even when the node registers itself.
Anyway, passing replication_synchro_quorum as parameter to evaluate quorum
is even more random.
>
>>> }
>>> +/**
>>> + * Evaluate the new synchro quorum number when replica
>>> + * get registered/unregistered and the quorum depends on
>>> + * their amount via formula in config.
>>> + */
>>> +void
>>> +replication_on_cluster_update(void)
>>> +{
>>> + if (!replication_synchro_quorum_eval)
>>> + return;
>>> +
>>> + /*
>>> + * Account only registered replicas when evaluating
>>> + * quorum number from a fromula present in config.
>>> + */
>>> + int value = replicaset.registered_count - replicaset.anon_count;
>>
>> registered_count stands for 'normal' replica count, so no need to subtract
>> anon_count from it.
> Yeah, thanks!
>
>>> +
>>> + say_info("replication: evaluated quorum is %d", quorum);
>>
>> quorum -> replication_synchro_quorum
> ok
--
Serge Petrenko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula
2020-11-20 12:41 ` Serge Petrenko
@ 2020-11-20 15:00 ` Cyrill Gorcunov
0 siblings, 0 replies; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-20 15:00 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tml, Vladislav Shpilevoy
On Fri, Nov 20, 2020 at 03:41:12PM +0300, Serge Petrenko wrote:
> > > > Pass replicaset.registered_count instead of replication_synchro_quorum here.
> > Wait, this is bootstrap, replicaset.registered_count is 0 at this moment, no?
>
> Hm, I didn't think of this.
>
> Yes, this is either bootstrap or reconfiguration.
> Well, if we have a max(1, value) guard nothing bad should happen even if we
> pass 0 on bootstrap. The formula will be reevaluated each time
> a replica is registered anyway. Even when the node registers itself.
>
> Anyway, passing replication_synchro_quorum as parameter to evaluate quorum
> is even more random.
ok, thanks!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-19 19:40 ` [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
2020-11-20 10:32 ` Serge Petrenko
@ 2020-11-26 14:38 ` Mons Anderson
2020-11-26 14:44 ` Cyrill Gorcunov
1 sibling, 1 reply; 21+ messages in thread
From: Mons Anderson @ 2020-11-26 14:38 UTC (permalink / raw)
To: tarantool-patches
Hi, thanks! Some comments below.
On 19.11.2020 22:40, Cyrill Gorcunov wrote:
>
> +/**
> + * Evaluate replicaion syncro quorum number from a formula.
> + */
> +int
> +eval_replication_synchro_quorum(int nr_replicas)
> +{
> + const char fmt[] =
> + "local f, err = loadstring(\"return (%s)\")\n"
> + "if not f then return 'failed to load \"%s\"' end\n"
> + "setfenv(f, { n = %d })\n"
> + "local ok, res = pcall(f)\n"
> + "if not ok then return res end\n"
> + "return math.floor(res)\n";
For the formula evaluation I'd propose the following snippet:
local expr = [[%s]]
local f, err = loadstring('return ('..expr..')')
if not f then error(string.format('Failed to load %%s: %%s',expr, err)) end
setfenv(f, { N = %d, math = math })
return math.floor( f() )
The reasons are:
- [[ ]] quotes are not interpolated, no need to repeat substitution again
- You need to handle errors anyway, so no need to return errors as
return values. Throw them.
- Also, since error handling (like a pcall from C layer) is required, no
need to do pcall inside
- Use of uppercase N, not lowercase is desired. Lowercase is used for
variables, uppercase for constants. Here N is more constant than
variable. (Or we may include eihter N or n)
- math module may be used for formulas, so pass it also.
> + luaL_loadstring(tarantool_L, buf);
> + lua_call(tarantool_L, 0, 1);
Use lua_pcall instead lua_call. So then check for absence of error and
lua_isnumber
> + if (lua_isnumber(tarantool_L, -1)) {
> + value = (int)lua_tonumber(tarantool_L, -1);
> + } else {
> + assert(lua_isstring(tarantool_L, -1));
> + errno = EINVAL;
> + diag_set(ClientError, ER_CFG,
> + "replication_synchro_quorum",
> + lua_tostring(tarantool_L, -1));
> + }
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-26 14:38 ` Mons Anderson
@ 2020-11-26 14:44 ` Cyrill Gorcunov
2020-11-26 16:01 ` Mons Anderson
0 siblings, 1 reply; 21+ messages in thread
From: Cyrill Gorcunov @ 2020-11-26 14:44 UTC (permalink / raw)
To: Mons Anderson; +Cc: tarantool-patches
On Thu, Nov 26, 2020 at 05:38:32PM +0300, Mons Anderson wrote:
>
> For the formula evaluation I'd propose the following snippet:
>
> local expr = [[%s]]
> local f, err = loadstring('return ('..expr..')')
> if not f then error(string.format('Failed to load %%s: %%s',expr, err)) end
> setfenv(f, { N = %d, math = math })
> return math.floor( f() )
Will do. And will address the rest of comments. Thanks
for feedback!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum
2020-11-26 14:44 ` Cyrill Gorcunov
@ 2020-11-26 16:01 ` Mons Anderson
0 siblings, 0 replies; 21+ messages in thread
From: Mons Anderson @ 2020-11-26 16:01 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: tarantool-patches
Some feedback on behalf of Vlad Grubov:
Quote:
Good idea, but it is not secure to export whole math to the
configuration. User must not have access to math table itself because it
can modify it's function which will take effect through entire
application. Moreover user must not have access to math.randomseed
for security reasons. User is allowed to have access to ceil, floor,
min, max, fmod, random, abs, sqrt.
So, I'd chande setfenv with:
setfenv(f, { N = %d, math = { ceil = math.ceil, floor = math.floor, abs
= math.abs,
random = math.random, min = math.min, max = math.abs,
sqrt = math.sqrt, fmod = math.fmod } })
On 26.11.2020 17:44, Cyrill Gorcunov wrote:
> On Thu, Nov 26, 2020 at 05:38:32PM +0300, Mons Anderson wrote:
>> For the formula evaluation I'd propose the following snippet:
>>
>> local expr = [[%s]]
>> local f, err = loadstring('return ('..expr..')')
>> if not f then error(string.format('Failed to load %%s: %%s',expr, err)) end
>> setfenv(f, { N = %d, math = math })
>> return math.floor( f() )
> Will do. And will address the rest of comments. Thanks
> for feedback!
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-11-26 16:01 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19 19:40 [Tarantool-patches] [RFC 0/4] qsync: evaluate replication_synchro_quorum dynamically Cyrill Gorcunov
2020-11-19 19:40 ` [Tarantool-patches] [RFC 1/4] cfg: add cfg_isnumber helper Cyrill Gorcunov
2020-11-20 9:53 ` Serge Petrenko
2020-11-19 19:40 ` [Tarantool-patches] [RFC 2/4] qsync: move synchro quorum update to separate routine Cyrill Gorcunov
2020-11-20 10:06 ` Serge Petrenko
2020-11-20 11:01 ` Cyrill Gorcunov
2020-11-20 11:39 ` Serge Petrenko
2020-11-20 11:47 ` Cyrill Gorcunov
2020-11-19 19:40 ` [Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum Cyrill Gorcunov
2020-11-20 10:32 ` Serge Petrenko
2020-11-20 11:34 ` Cyrill Gorcunov
2020-11-20 11:56 ` Serge Petrenko
2020-11-20 12:14 ` Cyrill Gorcunov
2020-11-26 14:38 ` Mons Anderson
2020-11-26 14:44 ` Cyrill Gorcunov
2020-11-26 16:01 ` Mons Anderson
2020-11-19 19:41 ` [Tarantool-patches] [RFC 4/4] qsync: allow to specify replication_synchro_quorum as a formula Cyrill Gorcunov
2020-11-20 10:50 ` Serge Petrenko
2020-11-20 12:01 ` Cyrill Gorcunov
2020-11-20 12:41 ` Serge Petrenko
2020-11-20 15:00 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox