[Tarantool-patches] [PATCH v12 7/8] box/cmod: implement cmod Lua module
Cyrill Gorcunov
gorcunov at gmail.com
Mon Jan 25 19:50:00 MSK 2021
On Sun, Jan 24, 2021 at 05:28:27PM +0100, Vladislav Shpilevoy wrote:
> >
> > Module functions
> > ================
> >
> > `require('cmod').load(path) -> obj | nil, err`
>
> 1. I don't like it, but I must say: according to one another
> change in Lua guidelines now we must again throw exceptions
> from all new functions present on the board instead of returning
> errors. You can try to raise it in the chat if you want.
Sigh... Thanks for pointing, will switch to a new approach.
> >
> > Modules are loaded with that named local binding which means
> > that reload of module symbols won't affect the functions which
> > are started execution already, only new calls will be rerouted.
> >
> > Possible errors:
> > - IllegalParams: a module is not supplied.
> > - ClientError: a module does not exist.
>
> 2. What happens to the existing function objects on reload()?
> Modules are loaded with that named local binding which means
> that reload of module symbols won't affect the functions which
> are started execution already, only new calls will be rerouted.
I suspect the paragraph above was not clear. What if I rephrase it as "Modules
are loaded with that named local binding which means that reload of the module
updates all functions tied up with, the only place where a kind of race
possible is function execution: a function may start execution and call
reschedule inside, thus this function will not be unloaded until its
execution complete. Next call to the same function will be routed via
new symbol".
Does it sound better?
> How is reload different from unload() + load()?
This is rather a shortcut to unload+load where previously allocated
functions on Lua level are simply updates their internal state.
> > module:load(name) -> obj | nil, err`
> > ------------------------------------
> >
> > Loads a new function with name `name` from the `module` object
> > and return a callable object instance associate with the function.
> >
> > Possible errors:
> > - IllegalParams: function name is either not supplied
> > or not a string.
> > - OutOfMemory: unable to allocate a function.
>
> 3. What if the function is not found?
ClientError. I'll add, thanks!
> >
> > `function:unload() -> true | nil, err`
> > --------------------------------------
> >
> > Unloads a function.
> >
> > Possible errors:
> > - IllegalParams: function name is either not supplied
> > or not a string.
> > - IllegalParams: the function does not exist.
>
> 4. Is this unloaded automatically, when the function object is
> GCed?
True. When __gc called on variable we unload function automatically.
Probably I should mention this in documentation.
> > On success the first returned value set to `true` and `res` represent
> > function execution result.
> >
> > ``` Lua
> > local ok, res = cfunc_sum(1, 2)
> > assert(ok);
> > print(res)
> > ```
> >
> > We will see the number `3` in output.
>
> 5. What happens when I do multireturn?
Not sure I follow you here. But if you mean tradtional
multireturn then we have it in test case (the next patch)
| cfunc_multireturn()
| | ---
| | - true
| | - [1]
| | - [1]
The function should prepare the result by self. The calling
convention is the same as to old known `box.func`. Or
you mean something else?
>
> 6. This member is never used. I removed it and nothing
> changed.
>
> ====================
> @@ -31,10 +31,6 @@ struct cmod_func {
> * Number of loads of the function.
> */
> int64_t load_count;
> - /**
> - * The function name (without a package path).
> - */
> - const char *func_name;
> /**
Initially I thought to print short function name in obj:__index,
but since the function hash is path+name I switched to the long
function name (but continue keeping short "func_name" if we deside
to print a short name as well). Probably indeed better to drop it
and don't confuse code readers.
> > +
> > +/** A type to find a function handle from an object. */
> > +static const char *cmod_func_handle_uname = "cmod_func_handle";
>
> 7. Please, move global declarations to the beginning of the file
> so as it would be easier to find them, and to be consistent with
> other files.
OK. I put them first at the top but then desided to group by function
context (ie close to the functions which use them).
>
> > +
> > +static struct cmod_func *
> > +cmod_get_func_handle(struct lua_State *L, bool mandatory)
>
> 8. According to the code style, flags must have 'is' prefix or
> a similar one like 'has', 'does', etc. Here it should be 'is_mandatory'.
Could you please point me where this rule sits? I see both usage (with
and without prefix) in our code. Surely I update it (or probably drop
as you suggest below).
> But a better idea - just drop it. The single place where it is not
> mandatory is GC handler.
>
> Another option - introduce separate get() and check(). Get() only
> returns it without checking for NULL. Check() will ensure it is not NULL
> and set the error otherwise.
>
> Also the function takes a Lua state, so it seems it must have lcmod
> prefix, not cmod. According to the naming convention you established
> in this file.
Yeah, let me try this way.
> > +{
> > + struct cmod_func *cf = cmod_get_handle(L, cmod_func_handle_uname);
> > + if (mandatory && cf == NULL) {
> > + const char *fmt = "The function is already unloaded";
> > + diag_set(IllegalParams, fmt);
> > + }
> > + return cf;
> > +}
> > +
> > +static void
> > +cmod_set_func_handle(struct lua_State *L, struct cmod_func *cf)
> > +{
> > + cmod_set_handle(L, cmod_func_handle_uname, cf);
> > +}
> > +
> > +static void
> > +cmod_setup_func_handle(struct lua_State *L, struct cmod_func *cf)
> > +{
> > + cmod_setup_handle(L, cmod_func_handle_uname, cf);
> > +}
>
> 9. Amount of boilerplate code and double checks of the same
> conditions you need just to push some udata is terrifying, tbh.
>
> For instance, take lcmod_func_handle_call(). It calls
> cmod_get_func_handle() and checks result for NULL. This one
> calls cmod_get_handle() and also checks result for NULL.
> This one calls luaL_testudata() and also checks result for NULL.
> You did 3 NULL checks which could be one, if you would have
> a set of simpler functions working directly with luaL_testudata().
No, you're wrong. The minimum number of check is 2 (1 - fetch the
double pointer to userdata, 2 - dereference first level). Instead
I use two helpers where one sets diag in case of error which allows
me to not duplicate code. This is a way better. Until you meant
something different.
> 10. Why do you have 'handle' word in each function name? For
> example, how 'handle_call' is any different from just 'call' in
> scope of code in this file? Or how is 'cmod_setup_func' different
> from 'cmod_setup_func_handle'? It seems that by 'handle' you
> try not to clash names with the non-Lua part of cmod. But you
> should do it via prefixes, not via suffixes. lcmod instead of cmod.
The base idea of handle is that "handle" is sitting in udata of Lua
variable and the functions are fetching it from there. User may unload
a function then its handle set to NULL but variable is still alive
and subsequent call should print an error.
If you prefer the prefix then sure, gimme some time to try out.
> > +struct cmod_func *
> > +cmod_func_find(const char *name, size_t name_len)
>
> 11. The function must be static.
Yup. Thanks!
> > +{
> > + mh_int_t e = mh_strnptr_find_inp(func_hash, name, name_len);
> > + if (e == mh_end(func_hash))
> > + return NULL;
> > + return mh_strnptr_node(func_hash, e)->val;
> > +}
> > +
> > +/**
> > + * Delete a function instance from the hash or decrease
> > + * a reference if the function is still loaded.
> > + *
> > + * @param cf function descriptor.
> > + *
> > + * @retval true if the function has no more loaded instances
> > + * and removed from the hash.
> > + *
> > + * @retval false if there are loaded instances left and function
> > + * is kept in the hash.
> > + */
> > +static bool
> > +cmod_func_del(struct cmod_func *cf)
>
> 12. This whole cmod and lcmod API looks extremely inconsistent and complex.
> Here are the functions you have (not counting the boilerplate code above):
>
> // Find a function in the cache, ok.
> cmod_func_find(name)
>
> // It is called 'delete'. But it does not do the
> // delete. Moreover, it also decreases 'load_count',
> // which is super surprising. What does it have to do
> // with loads? You don't call module_sym_unload() here,
> // but decrease load_count.
> cmod_func_del(f)
The same function may be loaded twise into different
Lua variables, thus we count number of loads. You proposed
it. I think we can drop this countint and simply use module
`refs` field instead as you proposed in another email.
cmod_func_add/del operates with function hash they don't manipulate
symbols. The symbols are handled via module_cache. This is because
we have two interfaces for functions: box.func and cmod.
In turn cmod_func_new/free calls module_sym_x helpers to run
a real work with symbols.
>
> // Ok, this looks like it should add the function to the
> // hash. But wait, it also increases load_count, and does
> // not call module_sym_load()! WTF?
> cmod_func_add(f)
Tracks number of same function loads.
>
> // It should free the memory, right? But it calls
> // module_sym_unload(). Why!? Also, we have a code style
> // agreement, that function freeing the memory should
> // have suffix 'delete', not 'free'. As soon as you have
> // 'new'.
> cmod_func_free(f)
OK for naming. As to in general -- new function allocation
goes into module_cache and resolve the function symbol. The
symbol resides in memory until last unload/__gc and then
function gets deleted and module_cache unloads the symbol.
>
> // Fine, it should allocate a new function object, right?
> // But it also calls module_sym_load()! And does not increase
> // load_count! I don't understand. It seems like the names
> // of these functions has very little to do with what they are
> // really doing. The only way to make them work is to call them
> // in some special sequence, in which a next function fixes
"some special sequence" is all our code I fear, the functions
flow is context dependent grammar not context free.
> // leftovers of previous functions.
> cmod_func_new(name)
>
> This must be seriously redesigned. Starting from the function
> names.
>
> Here is an option:
>
> // Find the function in the hash. If found, increase load_count
> // and return. Otherwise allocate a new function object, fill its
> // members, do the load, and make load_count = 1. Add to the hash.
> cmod_func_load(name)
>
> // Decrease load_count. If became 0, remove from the hash,
> // unload mod_sym, and free the memory.
> cmod_func_unload(f)
>
> I don't really see why would you need the other functions. These 2
> are going to be extra simple anyway. Module_cache API for module
> objects is literally that simple - load, unload, reload. Why do
> you need so many for cmod, especially with so unclear behaviour?
Because otherwise we will have a long function where we will have
error path to handle. Let me check if I manged to make it readable.
Note thouh that the number of actions are not goiing to disappear
it will simply sit in one big function instead.
> > +static int
> > +cmod_func_add(struct cmod_func *cf)
> > +{
> > + assert(cf->load_count >= 0);
> > + if (cf->load_count++ != 0)
>
> 13. I already proposed an option how to rework cmod API above, but
> still will leave some comments alongside.
Thanks!
>
> Why the heck 'add' function increases load_count?
Because it doesn't need to add same function into hash.
> > + return 0;
> > +
> > + const struct mh_strnptr_node_t nd = {
> > + .str = cf->name,
> > + .len = cf->len,
> > + .hash = mh_strn_hash(cf->name, cf->len),
> > + .val = cf,
> > + };
> > +
> > + mh_int_t e = mh_strnptr_put(func_hash, &nd, NULL, NULL);
> > + if (e == mh_end(func_hash)) {
> > + diag_set(OutOfMemory, sizeof(nd),
> > + "malloc", "cmod_func node");
> > + return -1;
> > + }
> > + return 0;
>
> 14. Why so complex? Just add it to the hash when it is created. This
> function is used in a single place, where you already do the check
> if it exists.
I would like to separate hash helpers from other code. But sure.
> > +}
> > +
> > +/**
> > + * Unload a symbol and free a function instance.
> > + *
> > + * @param cf function descriptor.
> > + */
> > +static void
> > +cmod_func_free(struct cmod_func *cf)
> > +{
> > + module_sym_unload(&cf->mod_sym);
>
> 15. Worth adding load_count == 0 assertion.
> Or rather do the unload where it belongs - where
> load_count becomes 0.
OK
>
> > + TRASH(cf);
> > + free(cf);
> > +}
> > +
> > +/**
> > + * Allocate a new function instance and resolve a symbol address.
> > + *
> > + * @param name package path and a function name, ie "module.foo"
> > + * @param len length of @a name.
> > + * @param func_name_len function name length, ie "3" for "module.foo"
> > + *
> > + * @returns function instance on success, NULL otherwise setting diag area.
> > + */
> > +static struct cmod_func *
> > +cmod_func_new(const char *name, size_t len, size_t func_name_len)
> > +{
> > + const ssize_t cf_size = sizeof(struct cmod_func);
> > + size_t size = cf_size + len + 1;
> > + struct cmod_func *cf = malloc(size);
> > + if (cf == NULL) {
> > + diag_set(OutOfMemory, size, "malloc", "cf");
> > + return NULL;
> > + }
> > +
> > + cf->mod_sym.addr = NULL;
> > + cf->mod_sym.module = NULL;
> > + cf->load_count = 0;
> > + cf->mod_sym.name = cf->name;
> > + cf->func_name = &cf->name[len - func_name_len];
> > + cf->len = len;
> > +
> > + memcpy(cf->name, name, len);
> > + cf->name[len] = '\0';
> > +
> > + if (module_sym_load(&cf->mod_sym) != 0) {
>
> 16. You load, and don't increase load_count. Does not it look
> contradictory to you?
Well, there are two sides: initial values where address comes from
module subsystem and the state when function is ready to be used.
The function is "ready" when it is not only initilized and has
symbol resolved but when it sits inside function cache, and because
of this we increas load_count only in cmod_func_add. That is why
load_count increased not at moment of creation.
> > +static int
> > +lcmod_func_load(struct lua_State *L)
> > +{
> > + const char *method = "function = module:load";
> > + struct cmod_func *cf = NULL;
> > +
> > + if (lua_gettop(L) != 2 || !lua_isstring(L, 2)) {
> > + const char *fmt =
> > + "Expects %s(\'name\') but no name passed";
> > + diag_set(IllegalParams, fmt, method);
>
> 17. Why couldn't you just inline 'fmt' string? The current version
> is not any shorter nor easier to read. The same in many other places.
To not split the string - it will overflow 80 symbols iirc
> > + return luaT_push_nil_and_error(L);
> > + }
> > +
> > + struct module *m = cmod_get_module_handle(L, false);
> > + if (m == NULL) {
> > + const char *fmt =
> > + "Expects %s(\'name\') but not module object passed";
> > + diag_set(IllegalParams, fmt, method);
> > + return luaT_push_nil_and_error(L);
> > + }
> > +
> > + const char *func_name = lua_tostring(L, 2);
> > + const char *name = tt_sprintf("%s.%s", m->package, func_name);
> > + size_t len = strlen(name);
> > +
> > + cf = cmod_func_find(name, len);
>
> 18. Or you could pass package name and function name to the constructor
> and copy them right to the allocated memory, without double copying
> via the static buffer.
Will take a look, though the hash requires the long name which means
we will need to count number of symbols in merged name. Will try.
> > + if (cf == NULL) {
> > + cf = cmod_func_new(name, len, strlen(func_name));
> > + if (cf == NULL)
> > + return luaT_push_nil_and_error(L);
> > + }
> > +
> > + if (cmod_func_add(cf) != 0) {
>
> 19. From this code it looks like you can fail addition of an
> already existing function and you just delete it. I know it
> can't happen, but the code says the opposite, and it is
> confusing.
>
> Another reason why 'add' is a bad name - it seems like it
> should fail when added second time due to being already
> added.
OK
> > +static int
> > +lcmod_module_load(struct lua_State *L)
> > +{
> > + if (lua_gettop(L) != 1 || !lua_isstring(L, 1)) {
> > + const char *fmt =
> > + "Expects cmod.load(\'name\') but no name passed";
> > + diag_set(IllegalParams, fmt);
> > + return luaT_push_nil_and_error(L);
> > + }
> > +
> > + size_t name_len;
> > + const char *name = lua_tolstring(L, 1, &name_len);
> > +
> > + struct module *module = module_load(name, name_len);
> > + if (module == NULL)
> > + return luaT_push_nil_and_error(L);
> > +
> > + cmod_setup_module_handle(L, module);
> > + /*
> > + * Make sure the module won't disappear until
> > + * it is GC'ed or unloaded explicitly.
> > + */
> > + module->ref++;
>
> 20. The fact that you need to touch reference counter of an object
> from a different subsystem should make you start to question
> sanity of the API. Why module_load() does not do the increase,
> and module_unload() - decrease?
Because there were no such API and I though this gonna be over
the top taking into account the size of the patch. Moreover there
is different senatics of how modules are working in box.fun and
our cmod module.
I agree that this is ugly, maybe it is a good time to improve
modules code.
> > +static int
> > +lcmod_module_handle_serialize(struct lua_State *L)
> > +{
> > + struct module *m = cmod_get_module_handle(L, true);
> > + if (m == NULL) {
> > + lua_pushnil(L);
> > + return 1;
> > + }
> > +
> > + lua_createtable(L, 0, 0);
>
> 21. Since you decided to use createtable instead of newtable,
> I suggest you to fill the proper parameters, which should
> be (0, 1) in this case I think.
OK.
Thanks a huge for review, Vlad!
More information about the Tarantool-patches
mailing list