[Tarantool-patches] [RFC 3/4] cfg: prepare symbolic evaluation of replication_synchro_quorum

Fri Nov 20 14:34:54 MSK 2020

On Fri, Nov 20, 2020 at 01:32:42PM +0300, Serge Petrenko wrote:
> > +/**
> > + * Evaluate replicaion syncro quorum number from a formula.
> > + */
> > +int
> > +eval_replication_synchro_quorum(int nr_replicas)
> > +{
> > +	const char fmt[] =
> > +		"local f, err = loadstring(\"return (%s)\")\n"
> > +		"if not f then return 'failed to load \"%s\"' end\n"
> > +		"setfenv(f, { n = %d })\n"
> > +		"local ok, res = pcall(f)\n"
> > +		"if not ok then return res end\n"
> > +		"return math.floor(res)\n";
> > +	char buf[512];
> > +	int value = -1;
> > +
> > +	errno = 0;
> > +
> > +	const char *expr = cfg_gets("replication_synchro_quorum");
> > +	size_t ret = snprintf(buf, sizeof(buf), fmt, expr,
> > +			      expr, nr_replicas);
> > +	if (ret >= sizeof(buf)) {
> > +		errno = EINVAL;
> > +		diag_set(ClientError, ER_CFG,
> > +			 "replication_synchro_quorum",
> > +			 "the expression is too big");
> > +		return -1;
> > +	}
> > +
> > +	luaL_loadstring(tarantool_L, buf);
> > +	lua_call(tarantool_L, 0, 1);
> > +
> > +	if (lua_isnumber(tarantool_L, -1)) {
> > +		value = (int)lua_tonumber(tarantool_L, -1);
> > +	} else {
> > +		assert(lua_isstring(tarantool_L, -1));
> > +		errno = EINVAL;
> > +		diag_set(ClientError, ER_CFG,
> > +			 "replication_synchro_quorum",
> > +			 lua_tostring(tarantool_L, -1));
> > +	}
> > +	lua_pop(tarantool_L, 1);
> > +	return value;
> > +}
> 
> Better make  the function return -1 on an error, and when the function
> evaluates to something negative,
> log the eval result and return 1. This way you won't need errno.

We need to distinguish wrong syntax and wrong evaluated values somehow
which means using -1 for anything wont work here.

 - wrong syntax (which includes too long formula): -1 && errno
 - on correct syntax we simply return the result the caller has
   to check if returned value fits the allowed range.

Or I'm missing something.

> You should either check every possible value, from 1 to VCLOCK_MAX - 1,
> to make sure, say, that no division by zero is involved for some input.

That's the good point. Another question if we should allow formulas like
n-2, and while n <= 2 assume the quorum to be 1? Ie max(1, eval(n))

> or check a single value, say, 1, or 2 or whatever, to make sure that the
> expression in replication_synchro_quorum is  at least valid Lua code.
> 
> Why check only min and max inputs?

Lets consider the evaluation a bit closer. Denote arguments as x, thus
we will have x ∈ N, where are x's are strictly ordered, thus to get
f(x) < 0 between min/max values we need to use nonlinear f. I must confess
I don't know what is allowed inside Lua evaluator and really don't like
that we're allowing to pass completely arbitrary formula here. Moreover
what would happen if f would be using some weird data so that if will be
linear on bootstrap procedure but then switched to something more complex?
All this arbitrary evaluation looks very very fishy and I bet there will
be problems because this is the data we do not control completely.

Back to the former question -- initially I assume the f gonna be linear
and eval in min/max will be enough. But of course this is not correct.

You know I can pass all N's here but still this doesn't guarantee anything :(
That's why I'm for more strict rules here:

 - allow some symbolic names such as
   "all" -> (alias for f(x) = n)
   "canonical" -> (alias for f(x) = n/2 + 1)

 - implement own syntax analizyer/evaluator which would allow only
   basic operation {+,/} over the set of {n} (which will give us
   a linear functioni by definition). This of course will require
   a way more code so not right now.

> > @@ -368,7 +368,6 @@ local dynamic_cfg_skip_at_load = {
> >       replication_connect_quorum = true,
> >       replication_sync_lag    = true,
> >       replication_sync_timeout = true,
> > -    replication_synchro_quorum = true,
> 
> You shouldn't remove replication_synchro_quorum from here.
> This table  lists the options which are set directly from `box_cfg` in
> specific order.

No, this table is to _skip_ evaluation on bootup. But we have to verify
the default value to evaluate.

> I.e. not lua calls  box_cfg_set_..., but box_cfg() itself does. But only on
> the first box.cfg call.