From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id C0E907030E; Mon, 25 Jan 2021 19:50:05 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org C0E907030E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1611593405; bh=YXElWeZfHzo0QbuzflGEpjdVNfHZ0ElasjGsDp5fs7w=; h=Date:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=gcWe5Zok4wKev9+orZwRo1puGWxi1oXlEtEBKP+OrmvcBH/B5RZOg7FZiySiXNB5f 3ZGJCTD1DEt1HIP9UBnljaO+PUn/zs+MxvcDVN5rswrXo3iGBfMrG+SZQ0PxtLga+M 6vRQjT8IIfelvBjE0St/eCVvCrr2Tf0j5OflItCM= Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id CFDDC7030E for ; Mon, 25 Jan 2021 19:50:04 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org CFDDC7030E Received: by mail-lj1-f176.google.com with SMTP id f17so16105476ljg.12 for ; Mon, 25 Jan 2021 08:50:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=m6UtQHCuok3s3EvOhrUEN+zlvUMXM6xwzjW1+skv4q4=; b=sKlOhV7o2p6Lg0EbkK3QmIPvj9Gamvpi9cegd+HuwG8mnB3DpodDX7qPwrkzztagyv MqM6k7g03Ubno8y9wxva3gtBbB/Ezx5T37CFIVMfh5ohRc7UlLn1MieGLRVH6zKRykap xnNt3vILeHzI7aN4vgA9oSoY95z5KiGQ0CbS23Sbgsc6ozmgCsVFfthVOYJdZjCPypVQ Fjk/VtiVHf4VIFnH+6gq8P570F8sq3o+5ntpBDgM/hxGwftI1zsok0+LQhWbMMdIBn7H w0JZY/RgtNFxVoyTTnbOgMh68sbPAuJdFAgJKF+a6KGAgxHd3vMKS/hGly8dNZCTGpFW EjIQ== X-Gm-Message-State: AOAM533eJHVMD3sA0W3bdmPFjZR7pWf9aGhWfIFdbC7/EoaYItBU4u2R /Cu+X0Lchxr11i5OFpJ/qHAK3X374e4= X-Google-Smtp-Source: ABdhPJzfrgrWlXwAm18b6N5qVCnskSt0TYXdp92DjfEKWizj5FO4nx9N4nc4RS2cIcR4Cbtuy+982Q== X-Received: by 2002:a2e:a303:: with SMTP id l3mr587431lje.395.1611593403051; Mon, 25 Jan 2021 08:50:03 -0800 (PST) Received: from grain.localdomain ([5.18.91.94]) by smtp.gmail.com with ESMTPSA id 195sm360297lfk.67.2021.01.25.08.50.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Jan 2021 08:50:01 -0800 (PST) Received: by grain.localdomain (Postfix, from userid 1000) id 0A768560113; Mon, 25 Jan 2021 19:50:00 +0300 (MSK) Date: Mon, 25 Jan 2021 19:50:00 +0300 To: Vladislav Shpilevoy Cc: tml Message-ID: <20210125165000.GG2174@grain> References: <20210118203556.281700-1-gorcunov@gmail.com> <20210118203556.281700-8-gorcunov@gmail.com> <5703e5a2-47b1-bcde-7cd8-7bca6e9c606f@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5703e5a2-47b1-bcde-7cd8-7bca6e9c606f@tarantool.org> User-Agent: Mutt/1.14.6 (2020-07-11) Subject: Re: [Tarantool-patches] [PATCH v12 7/8] box/cmod: implement cmod Lua module X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Cyrill Gorcunov via Tarantool-patches Reply-To: Cyrill Gorcunov Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" On Sun, Jan 24, 2021 at 05:28:27PM +0100, Vladislav Shpilevoy wrote: > > > > Module functions > > ================ > > > > `require('cmod').load(path) -> obj | nil, err` > > 1. I don't like it, but I must say: according to one another > change in Lua guidelines now we must again throw exceptions > from all new functions present on the board instead of returning > errors. You can try to raise it in the chat if you want. Sigh... Thanks for pointing, will switch to a new approach. > > > > Modules are loaded with that named local binding which means > > that reload of module symbols won't affect the functions which > > are started execution already, only new calls will be rerouted. > > > > Possible errors: > > - IllegalParams: a module is not supplied. > > - ClientError: a module does not exist. > > 2. What happens to the existing function objects on reload()? > Modules are loaded with that named local binding which means > that reload of module symbols won't affect the functions which > are started execution already, only new calls will be rerouted. I suspect the paragraph above was not clear. What if I rephrase it as "Modules are loaded with that named local binding which means that reload of the module updates all functions tied up with, the only place where a kind of race possible is function execution: a function may start execution and call reschedule inside, thus this function will not be unloaded until its execution complete. Next call to the same function will be routed via new symbol". Does it sound better? > How is reload different from unload() + load()? This is rather a shortcut to unload+load where previously allocated functions on Lua level are simply updates their internal state. > > module:load(name) -> obj | nil, err` > > ------------------------------------ > > > > Loads a new function with name `name` from the `module` object > > and return a callable object instance associate with the function. > > > > Possible errors: > > - IllegalParams: function name is either not supplied > > or not a string. > > - OutOfMemory: unable to allocate a function. > > 3. What if the function is not found? ClientError. I'll add, thanks! > > > > `function:unload() -> true | nil, err` > > -------------------------------------- > > > > Unloads a function. > > > > Possible errors: > > - IllegalParams: function name is either not supplied > > or not a string. > > - IllegalParams: the function does not exist. > > 4. Is this unloaded automatically, when the function object is > GCed? True. When __gc called on variable we unload function automatically. Probably I should mention this in documentation. > > On success the first returned value set to `true` and `res` represent > > function execution result. > > > > ``` Lua > > local ok, res = cfunc_sum(1, 2) > > assert(ok); > > print(res) > > ``` > > > > We will see the number `3` in output. > > 5. What happens when I do multireturn? Not sure I follow you here. But if you mean tradtional multireturn then we have it in test case (the next patch) | cfunc_multireturn() | | --- | | - true | | - [1] | | - [1] The function should prepare the result by self. The calling convention is the same as to old known `box.func`. Or you mean something else? > > 6. This member is never used. I removed it and nothing > changed. > > ==================== > @@ -31,10 +31,6 @@ struct cmod_func { > * Number of loads of the function. > */ > int64_t load_count; > - /** > - * The function name (without a package path). > - */ > - const char *func_name; > /** Initially I thought to print short function name in obj:__index, but since the function hash is path+name I switched to the long function name (but continue keeping short "func_name" if we deside to print a short name as well). Probably indeed better to drop it and don't confuse code readers. > > + > > +/** A type to find a function handle from an object. */ > > +static const char *cmod_func_handle_uname = "cmod_func_handle"; > > 7. Please, move global declarations to the beginning of the file > so as it would be easier to find them, and to be consistent with > other files. OK. I put them first at the top but then desided to group by function context (ie close to the functions which use them). > > > + > > +static struct cmod_func * > > +cmod_get_func_handle(struct lua_State *L, bool mandatory) > > 8. According to the code style, flags must have 'is' prefix or > a similar one like 'has', 'does', etc. Here it should be 'is_mandatory'. Could you please point me where this rule sits? I see both usage (with and without prefix) in our code. Surely I update it (or probably drop as you suggest below). > But a better idea - just drop it. The single place where it is not > mandatory is GC handler. > > Another option - introduce separate get() and check(). Get() only > returns it without checking for NULL. Check() will ensure it is not NULL > and set the error otherwise. > > Also the function takes a Lua state, so it seems it must have lcmod > prefix, not cmod. According to the naming convention you established > in this file. Yeah, let me try this way. > > +{ > > + struct cmod_func *cf = cmod_get_handle(L, cmod_func_handle_uname); > > + if (mandatory && cf == NULL) { > > + const char *fmt = "The function is already unloaded"; > > + diag_set(IllegalParams, fmt); > > + } > > + return cf; > > +} > > + > > +static void > > +cmod_set_func_handle(struct lua_State *L, struct cmod_func *cf) > > +{ > > + cmod_set_handle(L, cmod_func_handle_uname, cf); > > +} > > + > > +static void > > +cmod_setup_func_handle(struct lua_State *L, struct cmod_func *cf) > > +{ > > + cmod_setup_handle(L, cmod_func_handle_uname, cf); > > +} > > 9. Amount of boilerplate code and double checks of the same > conditions you need just to push some udata is terrifying, tbh. > > For instance, take lcmod_func_handle_call(). It calls > cmod_get_func_handle() and checks result for NULL. This one > calls cmod_get_handle() and also checks result for NULL. > This one calls luaL_testudata() and also checks result for NULL. > You did 3 NULL checks which could be one, if you would have > a set of simpler functions working directly with luaL_testudata(). No, you're wrong. The minimum number of check is 2 (1 - fetch the double pointer to userdata, 2 - dereference first level). Instead I use two helpers where one sets diag in case of error which allows me to not duplicate code. This is a way better. Until you meant something different. > 10. Why do you have 'handle' word in each function name? For > example, how 'handle_call' is any different from just 'call' in > scope of code in this file? Or how is 'cmod_setup_func' different > from 'cmod_setup_func_handle'? It seems that by 'handle' you > try not to clash names with the non-Lua part of cmod. But you > should do it via prefixes, not via suffixes. lcmod instead of cmod. The base idea of handle is that "handle" is sitting in udata of Lua variable and the functions are fetching it from there. User may unload a function then its handle set to NULL but variable is still alive and subsequent call should print an error. If you prefer the prefix then sure, gimme some time to try out. > > +struct cmod_func * > > +cmod_func_find(const char *name, size_t name_len) > > 11. The function must be static. Yup. Thanks! > > +{ > > + mh_int_t e = mh_strnptr_find_inp(func_hash, name, name_len); > > + if (e == mh_end(func_hash)) > > + return NULL; > > + return mh_strnptr_node(func_hash, e)->val; > > +} > > + > > +/** > > + * Delete a function instance from the hash or decrease > > + * a reference if the function is still loaded. > > + * > > + * @param cf function descriptor. > > + * > > + * @retval true if the function has no more loaded instances > > + * and removed from the hash. > > + * > > + * @retval false if there are loaded instances left and function > > + * is kept in the hash. > > + */ > > +static bool > > +cmod_func_del(struct cmod_func *cf) > > 12. This whole cmod and lcmod API looks extremely inconsistent and complex. > Here are the functions you have (not counting the boilerplate code above): > > // Find a function in the cache, ok. > cmod_func_find(name) > > // It is called 'delete'. But it does not do the > // delete. Moreover, it also decreases 'load_count', > // which is super surprising. What does it have to do > // with loads? You don't call module_sym_unload() here, > // but decrease load_count. > cmod_func_del(f) The same function may be loaded twise into different Lua variables, thus we count number of loads. You proposed it. I think we can drop this countint and simply use module `refs` field instead as you proposed in another email. cmod_func_add/del operates with function hash they don't manipulate symbols. The symbols are handled via module_cache. This is because we have two interfaces for functions: box.func and cmod. In turn cmod_func_new/free calls module_sym_x helpers to run a real work with symbols. > > // Ok, this looks like it should add the function to the > // hash. But wait, it also increases load_count, and does > // not call module_sym_load()! WTF? > cmod_func_add(f) Tracks number of same function loads. > > // It should free the memory, right? But it calls > // module_sym_unload(). Why!? Also, we have a code style > // agreement, that function freeing the memory should > // have suffix 'delete', not 'free'. As soon as you have > // 'new'. > cmod_func_free(f) OK for naming. As to in general -- new function allocation goes into module_cache and resolve the function symbol. The symbol resides in memory until last unload/__gc and then function gets deleted and module_cache unloads the symbol. > > // Fine, it should allocate a new function object, right? > // But it also calls module_sym_load()! And does not increase > // load_count! I don't understand. It seems like the names > // of these functions has very little to do with what they are > // really doing. The only way to make them work is to call them > // in some special sequence, in which a next function fixes "some special sequence" is all our code I fear, the functions flow is context dependent grammar not context free. > // leftovers of previous functions. > cmod_func_new(name) > > This must be seriously redesigned. Starting from the function > names. > > Here is an option: > > // Find the function in the hash. If found, increase load_count > // and return. Otherwise allocate a new function object, fill its > // members, do the load, and make load_count = 1. Add to the hash. > cmod_func_load(name) > > // Decrease load_count. If became 0, remove from the hash, > // unload mod_sym, and free the memory. > cmod_func_unload(f) > > I don't really see why would you need the other functions. These 2 > are going to be extra simple anyway. Module_cache API for module > objects is literally that simple - load, unload, reload. Why do > you need so many for cmod, especially with so unclear behaviour? Because otherwise we will have a long function where we will have error path to handle. Let me check if I manged to make it readable. Note thouh that the number of actions are not goiing to disappear it will simply sit in one big function instead. > > +static int > > +cmod_func_add(struct cmod_func *cf) > > +{ > > + assert(cf->load_count >= 0); > > + if (cf->load_count++ != 0) > > 13. I already proposed an option how to rework cmod API above, but > still will leave some comments alongside. Thanks! > > Why the heck 'add' function increases load_count? Because it doesn't need to add same function into hash. > > + return 0; > > + > > + const struct mh_strnptr_node_t nd = { > > + .str = cf->name, > > + .len = cf->len, > > + .hash = mh_strn_hash(cf->name, cf->len), > > + .val = cf, > > + }; > > + > > + mh_int_t e = mh_strnptr_put(func_hash, &nd, NULL, NULL); > > + if (e == mh_end(func_hash)) { > > + diag_set(OutOfMemory, sizeof(nd), > > + "malloc", "cmod_func node"); > > + return -1; > > + } > > + return 0; > > 14. Why so complex? Just add it to the hash when it is created. This > function is used in a single place, where you already do the check > if it exists. I would like to separate hash helpers from other code. But sure. > > +} > > + > > +/** > > + * Unload a symbol and free a function instance. > > + * > > + * @param cf function descriptor. > > + */ > > +static void > > +cmod_func_free(struct cmod_func *cf) > > +{ > > + module_sym_unload(&cf->mod_sym); > > 15. Worth adding load_count == 0 assertion. > Or rather do the unload where it belongs - where > load_count becomes 0. OK > > > + TRASH(cf); > > + free(cf); > > +} > > + > > +/** > > + * Allocate a new function instance and resolve a symbol address. > > + * > > + * @param name package path and a function name, ie "module.foo" > > + * @param len length of @a name. > > + * @param func_name_len function name length, ie "3" for "module.foo" > > + * > > + * @returns function instance on success, NULL otherwise setting diag area. > > + */ > > +static struct cmod_func * > > +cmod_func_new(const char *name, size_t len, size_t func_name_len) > > +{ > > + const ssize_t cf_size = sizeof(struct cmod_func); > > + size_t size = cf_size + len + 1; > > + struct cmod_func *cf = malloc(size); > > + if (cf == NULL) { > > + diag_set(OutOfMemory, size, "malloc", "cf"); > > + return NULL; > > + } > > + > > + cf->mod_sym.addr = NULL; > > + cf->mod_sym.module = NULL; > > + cf->load_count = 0; > > + cf->mod_sym.name = cf->name; > > + cf->func_name = &cf->name[len - func_name_len]; > > + cf->len = len; > > + > > + memcpy(cf->name, name, len); > > + cf->name[len] = '\0'; > > + > > + if (module_sym_load(&cf->mod_sym) != 0) { > > 16. You load, and don't increase load_count. Does not it look > contradictory to you? Well, there are two sides: initial values where address comes from module subsystem and the state when function is ready to be used. The function is "ready" when it is not only initilized and has symbol resolved but when it sits inside function cache, and because of this we increas load_count only in cmod_func_add. That is why load_count increased not at moment of creation. > > +static int > > +lcmod_func_load(struct lua_State *L) > > +{ > > + const char *method = "function = module:load"; > > + struct cmod_func *cf = NULL; > > + > > + if (lua_gettop(L) != 2 || !lua_isstring(L, 2)) { > > + const char *fmt = > > + "Expects %s(\'name\') but no name passed"; > > + diag_set(IllegalParams, fmt, method); > > 17. Why couldn't you just inline 'fmt' string? The current version > is not any shorter nor easier to read. The same in many other places. To not split the string - it will overflow 80 symbols iirc > > + return luaT_push_nil_and_error(L); > > + } > > + > > + struct module *m = cmod_get_module_handle(L, false); > > + if (m == NULL) { > > + const char *fmt = > > + "Expects %s(\'name\') but not module object passed"; > > + diag_set(IllegalParams, fmt, method); > > + return luaT_push_nil_and_error(L); > > + } > > + > > + const char *func_name = lua_tostring(L, 2); > > + const char *name = tt_sprintf("%s.%s", m->package, func_name); > > + size_t len = strlen(name); > > + > > + cf = cmod_func_find(name, len); > > 18. Or you could pass package name and function name to the constructor > and copy them right to the allocated memory, without double copying > via the static buffer. Will take a look, though the hash requires the long name which means we will need to count number of symbols in merged name. Will try. > > + if (cf == NULL) { > > + cf = cmod_func_new(name, len, strlen(func_name)); > > + if (cf == NULL) > > + return luaT_push_nil_and_error(L); > > + } > > + > > + if (cmod_func_add(cf) != 0) { > > 19. From this code it looks like you can fail addition of an > already existing function and you just delete it. I know it > can't happen, but the code says the opposite, and it is > confusing. > > Another reason why 'add' is a bad name - it seems like it > should fail when added second time due to being already > added. OK > > +static int > > +lcmod_module_load(struct lua_State *L) > > +{ > > + if (lua_gettop(L) != 1 || !lua_isstring(L, 1)) { > > + const char *fmt = > > + "Expects cmod.load(\'name\') but no name passed"; > > + diag_set(IllegalParams, fmt); > > + return luaT_push_nil_and_error(L); > > + } > > + > > + size_t name_len; > > + const char *name = lua_tolstring(L, 1, &name_len); > > + > > + struct module *module = module_load(name, name_len); > > + if (module == NULL) > > + return luaT_push_nil_and_error(L); > > + > > + cmod_setup_module_handle(L, module); > > + /* > > + * Make sure the module won't disappear until > > + * it is GC'ed or unloaded explicitly. > > + */ > > + module->ref++; > > 20. The fact that you need to touch reference counter of an object > from a different subsystem should make you start to question > sanity of the API. Why module_load() does not do the increase, > and module_unload() - decrease? Because there were no such API and I though this gonna be over the top taking into account the size of the patch. Moreover there is different senatics of how modules are working in box.fun and our cmod module. I agree that this is ugly, maybe it is a good time to improve modules code. > > +static int > > +lcmod_module_handle_serialize(struct lua_State *L) > > +{ > > + struct module *m = cmod_get_module_handle(L, true); > > + if (m == NULL) { > > + lua_pushnil(L); > > + return 1; > > + } > > + > > + lua_createtable(L, 0, 0); > > 21. Since you decided to use createtable instead of newtable, > I suggest you to fill the proper parameters, which should > be (0, 1) in this case I think. OK. Thanks a huge for review, Vlad!