Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v8 0/4] box/cbox: implement cfunc Lua module
@ 2020-10-14 13:35 Cyrill Gorcunov
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure Cyrill Gorcunov
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-14 13:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

The cbox module provide a way to execute C stored procedures
on read only nodes without registring them in `_func` system space.

The series implements a bare minimum. Vlad, please take a look
once time permit. The issue with `call` simplification where
we plan to drop `tarantooL` will be addressed in another issue
which I already assigned to myself.

v1-v3 are development ones and not sent.

v5 (by vlad):
 - drop exists, list methods: they are redundant
 - rename cfunc to cbox
 - when create a function make it callable Lua object
 - initialize cbox out of modules
 - fix error in passing module name for reloading
 - make api been cbox.func.[create|drop] and
   cbox.module.reload
 - fix test for OSX sake

v6 (by vlad):
 - move module handling into module_cache file.
v7:
 - development
v8:
 - use rbtree for function instance storange, since
   i don't like the idea of unexpected rehashing of
   values in case of massive number of functions
   allocated
 - use reference counter and free function instance
   if only load/unload are coupled
 - keep a pointer to the function inside Lua object
   so we don't need to lookup on every function call.
   this force us to implement __gc method
 - use new API and update docs

branch gorcunov/gh-4642-func-ro-8
issue https://github.com/tarantool/tarantool/issues/4642

Cyrill Gorcunov (4):
  box/func: factor out c function entry structure
  module_cache: move module handling into own subsystem
  box/cbox: implement cbox Lua module
  test: box/cfunc -- add simple module test

 src/box/CMakeLists.txt  |   2 +
 src/box/box.cc          |   7 +-
 src/box/func.c          | 475 +------------------------------------
 src/box/func.h          |  41 +---
 src/box/lua/cbox.c      | 486 ++++++++++++++++++++++++++++++++++++++
 src/box/lua/cbox.h      |  39 ++++
 src/box/lua/init.c      |   2 +
 src/box/module_cache.c  | 503 ++++++++++++++++++++++++++++++++++++++++
 src/box/module_cache.h  | 156 +++++++++++++
 test/box/CMakeLists.txt |   2 +
 test/box/cbox.result    | 134 +++++++++++
 test/box/cbox.test.lua  |  51 ++++
 test/box/cfunc1.c       |  49 ++++
 test/box/cfunc2.c       | 112 +++++++++
 test/box/suite.ini      |   2 +-
 15 files changed, 1553 insertions(+), 508 deletions(-)
 create mode 100644 src/box/lua/cbox.c
 create mode 100644 src/box/lua/cbox.h
 create mode 100644 src/box/module_cache.c
 create mode 100644 src/box/module_cache.h
 create mode 100644 test/box/cbox.result
 create mode 100644 test/box/cbox.test.lua
 create mode 100644 test/box/cfunc1.c
 create mode 100644 test/box/cfunc2.c


base-commit: 0dc72812fb78a192945612f0e954026a0ffe4053
-- 
2.26.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure
  2020-10-14 13:35 [Tarantool-patches] [PATCH v8 0/4] box/cbox: implement cfunc Lua module Cyrill Gorcunov
@ 2020-10-14 13:35 ` Cyrill Gorcunov
  2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem Cyrill Gorcunov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-14 13:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Currently func_c structure is a wrapper over struct func
which in turn handles function definition, credentials
and etc, mostly to handle internal "_func" space.

Such tight bound doesn't allow to reuse func_c in any
other context. But we're going to support C function
execution in read-only instances and for this we better
reuse already existing structures as much as possible
instead of breeding new ones.

Thus lets extract module symbols into module_sym structure,
this allows us to reuse module path cache in other code.

While extracting the structure rename "func" member to
"addr" since this exactly what the member represents:
an address of symbol in memory.

Same time to be completely independent of "_func" specific lets
carry symbolic name inplace (ie member "name" is kind of redundant
when module_sym is used for "_func" handling where we can retrieve
the name via function definition but such definition won't be
available for non-persistent C functions which we will support
in next patches).

The new structure allows us to load/unload general symbols via
module_sym_load() and module_sym_unload() helpers.

Part-of #4642

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/func.c | 156 +++++++++++++++++++++++--------------------------
 src/box/func.h |  43 ++++++++++++++
 2 files changed, 116 insertions(+), 83 deletions(-)

diff --git a/src/box/func.c b/src/box/func.c
index 8087c953f..75d2abb5f 100644
--- a/src/box/func.c
+++ b/src/box/func.c
@@ -59,19 +59,9 @@ struct func_name {
 struct func_c {
 	/** Function object base class. */
 	struct func base;
-	/**
-	 * Anchor for module membership.
-	 */
-	struct rlist item;
-	/**
-	 * For C functions, the body of the function.
-	 */
-	box_function_f func;
-	/**
-	 * Each stored function keeps a handle to the
-	 * dynamic library for the C callback.
-	 */
-	struct module *module;
+
+	/** C function module symbol. */
+	struct module_sym mod_sym;
 };
 
 /***
@@ -371,6 +361,53 @@ module_sym(struct module *module, const char *name)
 	return f;
 }
 
+int
+module_sym_load(struct module_sym *mod_sym)
+{
+	assert(mod_sym->addr == NULL);
+
+	struct func_name name;
+	func_split_name(mod_sym->name, &name);
+
+	struct module *module = module_cache_find(name.package,
+						  name.package_end);
+	if (module == NULL) {
+		/* Try to find loaded module in the cache */
+		module = module_load(name.package, name.package_end);
+		if (module == NULL)
+			return -1;
+		if (module_cache_put(module)) {
+			module_delete(module);
+			return -1;
+		}
+	}
+
+	mod_sym->addr = module_sym(module, name.sym);
+	if (mod_sym->addr == NULL)
+		return -1;
+	mod_sym->module = module;
+	rlist_add(&module->funcs, &mod_sym->item);
+	return 0;
+}
+
+void
+module_sym_unload(struct module_sym *mod_sym)
+{
+	if (mod_sym->module == NULL)
+		return;
+
+	rlist_del(&mod_sym->item);
+	if (rlist_empty(&mod_sym->module->funcs)) {
+		struct func_name name;
+		func_split_name(mod_sym->name, &name);
+		module_cache_del(name.package, name.package_end);
+	}
+	module_gc(mod_sym->module);
+
+	mod_sym->module = NULL;
+	mod_sym->addr = NULL;
+}
+
 int
 module_reload(const char *package, const char *package_end, struct module **module)
 {
@@ -385,15 +422,15 @@ module_reload(const char *package, const char *package_end, struct module **modu
 	if (new_module == NULL)
 		return -1;
 
-	struct func_c *func, *tmp_func;
-	rlist_foreach_entry_safe(func, &old_module->funcs, item, tmp_func) {
+	struct module_sym *mod_sym, *tmp;
+	rlist_foreach_entry_safe(mod_sym, &old_module->funcs, item, tmp) {
 		struct func_name name;
-		func_split_name(func->base.def->name, &name);
-		func->func = module_sym(new_module, name.sym);
-		if (func->func == NULL)
+		func_split_name(mod_sym->name, &name);
+		mod_sym->addr = module_sym(new_module, name.sym);
+		if (mod_sym->addr == NULL)
 			goto restore;
-		func->module = new_module;
-		rlist_move(&new_module->funcs, &func->item);
+		mod_sym->module = new_module;
+		rlist_move(&new_module->funcs, &mod_sym->item);
 	}
 	module_cache_del(package, package_end);
 	if (module_cache_put(new_module) != 0)
@@ -408,9 +445,9 @@ module_reload(const char *package, const char *package_end, struct module **modu
 	 */
 	do {
 		struct func_name name;
-		func_split_name(func->base.def->name, &name);
-		func->func = module_sym(old_module, name.sym);
-		if (func->func == NULL) {
+		func_split_name(mod_sym->name, &name);
+		mod_sym->addr = module_sym(old_module, name.sym);
+		if (mod_sym->addr == NULL) {
 			/*
 			 * Something strange was happen, an early loaden
 			 * function was not found in an old dso.
@@ -418,10 +455,11 @@ module_reload(const char *package, const char *package_end, struct module **modu
 			panic("Can't restore module function, "
 			      "server state is inconsistent");
 		}
-		func->module = old_module;
-		rlist_move(&old_module->funcs, &func->item);
-	} while (func != rlist_first_entry(&old_module->funcs,
-					   struct func_c, item));
+		mod_sym->module = old_module;
+		rlist_move(&old_module->funcs, &mod_sym->item);
+	} while (mod_sym != rlist_first_entry(&old_module->funcs,
+					      struct module_sym,
+					      item));
 	assert(rlist_empty(&new_module->funcs));
 	module_delete(new_module);
 	return -1;
@@ -484,79 +522,31 @@ func_c_new(MAYBE_UNUSED struct func_def *def)
 		return NULL;
 	}
 	func->base.vtab = &func_c_vtab;
-	func->func = NULL;
-	func->module = NULL;
+	func->mod_sym.addr = NULL;
+	func->mod_sym.module = NULL;
+	func->mod_sym.name = def->name;
 	return &func->base;
 }
 
-static void
-func_c_unload(struct func_c *func)
-{
-	if (func->module) {
-		rlist_del(&func->item);
-		if (rlist_empty(&func->module->funcs)) {
-			struct func_name name;
-			func_split_name(func->base.def->name, &name);
-			module_cache_del(name.package, name.package_end);
-		}
-		module_gc(func->module);
-	}
-	func->module = NULL;
-	func->func = NULL;
-}
-
 static void
 func_c_destroy(struct func *base)
 {
 	assert(base->vtab == &func_c_vtab);
 	assert(base != NULL && base->def->language == FUNC_LANGUAGE_C);
 	struct func_c *func = (struct func_c *) base;
-	func_c_unload(func);
+	module_sym_unload(&func->mod_sym);
 	TRASH(base);
 	free(func);
 }
 
-/**
- * Resolve func->func (find the respective DLL and fetch the
- * symbol from it).
- */
-static int
-func_c_load(struct func_c *func)
-{
-	assert(func->func == NULL);
-
-	struct func_name name;
-	func_split_name(func->base.def->name, &name);
-
-	struct module *module = module_cache_find(name.package,
-						  name.package_end);
-	if (module == NULL) {
-		/* Try to find loaded module in the cache */
-		module = module_load(name.package, name.package_end);
-		if (module == NULL)
-			return -1;
-		if (module_cache_put(module)) {
-			module_delete(module);
-			return -1;
-		}
-	}
-
-	func->func = module_sym(module, name.sym);
-	if (func->func == NULL)
-		return -1;
-	func->module = module;
-	rlist_add(&module->funcs, &func->item);
-	return 0;
-}
-
 int
 func_c_call(struct func *base, struct port *args, struct port *ret)
 {
 	assert(base->vtab == &func_c_vtab);
 	assert(base != NULL && base->def->language == FUNC_LANGUAGE_C);
 	struct func_c *func = (struct func_c *) base;
-	if (func->func == NULL) {
-		if (func_c_load(func) != 0)
+	if (func->mod_sym.addr == NULL) {
+		if (module_sym_load(&func->mod_sym) != 0)
 			return -1;
 	}
 
@@ -571,10 +561,10 @@ func_c_call(struct func *base, struct port *args, struct port *ret)
 	box_function_ctx_t ctx = { ret };
 
 	/* Module can be changed after function reload. */
-	struct module *module = func->module;
+	struct module *module = func->mod_sym.module;
 	assert(module != NULL);
 	++module->calls;
-	int rc = func->func(&ctx, data, data + data_sz);
+	int rc = func->mod_sym.addr(&ctx, data, data + data_sz);
 	--module->calls;
 	module_gc(module);
 	region_truncate(region, region_svp);
diff --git a/src/box/func.h b/src/box/func.h
index 581e468cb..acf2df9a9 100644
--- a/src/box/func.h
+++ b/src/box/func.h
@@ -58,6 +58,29 @@ struct module {
 	char package[0];
 };
 
+/**
+ * Callable symbol bound to a module.
+ */
+struct module_sym {
+	/**
+	 * Anchor for module membership.
+	 */
+	struct rlist item;
+	/**
+	 * For C functions, address of the function.
+	 */
+	box_function_f addr;
+	/**
+	 * Each stored function keeps a handle to the
+	 * dynamic library for the C callback.
+	 */
+	struct module *module;
+	/**
+	 * Function name definition.
+	 */
+	char *name;
+};
+
 /** Virtual method table for func object. */
 struct func_vtab {
 	/** Call function with given arguments. */
@@ -108,6 +131,26 @@ func_delete(struct func *func);
 int
 func_call(struct func *func, struct port *args, struct port *ret);
 
+/**
+ * Resolve C entry (find the respective DLL and fetch the
+ * symbol from it).
+ *
+ * @param mod_sym module symbol pointer.
+ * @retval -1 on error.
+ * @retval 0 on success.
+ */
+int
+module_sym_load(struct module_sym *mod_sym);
+
+/**
+ * Unload module symbol and drop it from the package
+ * cache if there is no users left.
+ *
+ * @param mod_sym module symbol pointer.
+ */
+void
+module_sym_unload(struct module_sym *mod_sym);
+
 /**
  * Reload dynamically loadable module.
  *
-- 
2.26.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem
  2020-10-14 13:35 [Tarantool-patches] [PATCH v8 0/4] box/cbox: implement cfunc Lua module Cyrill Gorcunov
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure Cyrill Gorcunov
@ 2020-10-14 13:35 ` Cyrill Gorcunov
  2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module Cyrill Gorcunov
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 4/4] test: box/cfunc -- add simple module test Cyrill Gorcunov
  3 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-14 13:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

The module handling should not be bound to particular
function implementation (we will have two users: already
existing functions for "_func" space, and a new upcoming
one which will be serving cbox submodule in next patch).

For this sake all module related code is moved to
module_cache file where we do symbol resolving,
calling and tracking of symbol usage.

Part-of #4642

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/CMakeLists.txt |   1 +
 src/box/box.cc         |   2 +-
 src/box/func.c         | 453 +------------------------------------
 src/box/func.h         |  84 +------
 src/box/module_cache.c | 503 +++++++++++++++++++++++++++++++++++++++++
 src/box/module_cache.h | 156 +++++++++++++
 6 files changed, 665 insertions(+), 534 deletions(-)
 create mode 100644 src/box/module_cache.c
 create mode 100644 src/box/module_cache.h

diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
index 8b2e704cf..91d85c6db 100644
--- a/src/box/CMakeLists.txt
+++ b/src/box/CMakeLists.txt
@@ -190,6 +190,7 @@ add_library(box STATIC
     sql_stmt_cache.c
     wal.c
     call.c
+    module_cache.c
     merger.c
     ${sql_sources}
     ${lua_sources}
diff --git a/src/box/box.cc b/src/box/box.cc
index 6ec813c12..2485b79f3 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -74,7 +74,7 @@
 #include "sql.h"
 #include "systemd.h"
 #include "call.h"
-#include "func.h"
+#include "module_cache.h"
 #include "sequence.h"
 #include "sql_stmt_cache.h"
 #include "msgpack.h"
diff --git a/src/box/func.c b/src/box/func.c
index 75d2abb5f..6119f182b 100644
--- a/src/box/func.c
+++ b/src/box/func.c
@@ -31,30 +31,12 @@
 #include "func.h"
 #include "fiber.h"
 #include "trivia/config.h"
-#include "assoc.h"
 #include "lua/utils.h"
 #include "lua/call.h"
-#include "error.h"
-#include "errinj.h"
 #include "diag.h"
 #include "port.h"
 #include "schema.h"
 #include "session.h"
-#include "libeio/eio.h"
-#include <fcntl.h>
-#include <dlfcn.h>
-
-/**
- * Parsed symbol and package names.
- */
-struct func_name {
-	/** Null-terminated symbol name, e.g. "func" for "mod.submod.func" */
-	const char *sym;
-	/** Package name, e.g. "mod.submod" for "mod.submod.func" */
-	const char *package;
-	/** A pointer to the last character in ->package + 1 */
-	const char *package_end;
-};
 
 struct func_c {
 	/** Function object base class. */
@@ -64,407 +46,6 @@ struct func_c {
 	struct module_sym mod_sym;
 };
 
-/***
- * Split function name to symbol and package names.
- * For example, str = foo.bar.baz => sym = baz, package = foo.bar
- * @param str function name, e.g. "module.submodule.function".
- * @param[out] name parsed symbol and package names.
- */
-static void
-func_split_name(const char *str, struct func_name *name)
-{
-	name->package = str;
-	name->package_end = strrchr(str, '.');
-	if (name->package_end != NULL) {
-		/* module.submodule.function => module.submodule, function */
-		name->sym = name->package_end + 1; /* skip '.' */
-	} else {
-		/* package == function => function, function */
-		name->sym = name->package;
-		name->package_end = str + strlen(str);
-	}
-}
-
-/**
- * Arguments for luaT_module_find used by lua_cpcall()
- */
-struct module_find_ctx {
-	const char *package;
-	const char *package_end;
-	char *path;
-	size_t path_len;
-};
-
-/**
- * A cpcall() helper for module_find()
- */
-static int
-luaT_module_find(lua_State *L)
-{
-	struct module_find_ctx *ctx = (struct module_find_ctx *)
-		lua_topointer(L, 1);
-
-	/*
-	 * Call package.searchpath(name, package.cpath) and use
-	 * the path to the function in dlopen().
-	 */
-	lua_getglobal(L, "package");
-
-	lua_getfield(L, -1, "search");
-
-	/* Argument of search: name */
-	lua_pushlstring(L, ctx->package, ctx->package_end - ctx->package);
-
-	lua_call(L, 1, 1);
-	if (lua_isnil(L, -1))
-		return luaL_error(L, "module not found");
-	/* Convert path to absolute */
-	char resolved[PATH_MAX];
-	if (realpath(lua_tostring(L, -1), resolved) == NULL) {
-		diag_set(SystemError, "realpath");
-		return luaT_error(L);
-	}
-
-	snprintf(ctx->path, ctx->path_len, "%s", resolved);
-	return 0;
-}
-
-/**
- * Find path to module using Lua's package.cpath
- * @param package package name
- * @param package_end a pointer to the last byte in @a package + 1
- * @param[out] path path to shared library
- * @param path_len size of @a path buffer
- * @retval 0 on success
- * @retval -1 on error, diag is set
- */
-static int
-module_find(const char *package, const char *package_end, char *path,
-	    size_t path_len)
-{
-	struct module_find_ctx ctx = { package, package_end, path, path_len };
-	lua_State *L = tarantool_L;
-	int top = lua_gettop(L);
-	if (luaT_cpcall(L, luaT_module_find, &ctx) != 0) {
-		int package_len = (int) (package_end - package);
-		diag_set(ClientError, ER_LOAD_MODULE, package_len, package,
-			 lua_tostring(L, -1));
-		lua_settop(L, top);
-		return -1;
-	}
-	assert(top == lua_gettop(L)); /* cpcall discard results */
-	return 0;
-}
-
-static struct mh_strnptr_t *modules = NULL;
-
-static void
-module_gc(struct module *module);
-
-int
-module_init(void)
-{
-	modules = mh_strnptr_new();
-	if (modules == NULL) {
-		diag_set(OutOfMemory, sizeof(*modules), "malloc",
-			  "modules hash table");
-		return -1;
-	}
-	return 0;
-}
-
-void
-module_free(void)
-{
-	while (mh_size(modules) > 0) {
-		mh_int_t i = mh_first(modules);
-		struct module *module =
-			(struct module *) mh_strnptr_node(modules, i)->val;
-		/* Can't delete modules if they have active calls */
-		module_gc(module);
-	}
-	mh_strnptr_delete(modules);
-}
-
-/**
- * Look up a module in the modules cache.
- */
-static struct module *
-module_cache_find(const char *name, const char *name_end)
-{
-	mh_int_t i = mh_strnptr_find_inp(modules, name, name_end - name);
-	if (i == mh_end(modules))
-		return NULL;
-	return (struct module *)mh_strnptr_node(modules, i)->val;
-}
-
-/**
- * Save module to the module cache.
- */
-static inline int
-module_cache_put(struct module *module)
-{
-	size_t package_len = strlen(module->package);
-	uint32_t name_hash = mh_strn_hash(module->package, package_len);
-	const struct mh_strnptr_node_t strnode = {
-		module->package, package_len, name_hash, module};
-
-	if (mh_strnptr_put(modules, &strnode, NULL, NULL) == mh_end(modules)) {
-		diag_set(OutOfMemory, sizeof(strnode), "malloc", "modules");
-		return -1;
-	}
-	return 0;
-}
-
-/**
- * Delete a module from the module cache
- */
-static void
-module_cache_del(const char *name, const char *name_end)
-{
-	mh_int_t i = mh_strnptr_find_inp(modules, name, name_end - name);
-	if (i == mh_end(modules))
-		return;
-	mh_strnptr_del(modules, i, NULL);
-}
-
-/*
- * Load a dso.
- * Create a new symlink based on temporary directory and try to
- * load via this symink to load a dso twice for cases of a function
- * reload.
- */
-static struct module *
-module_load(const char *package, const char *package_end)
-{
-	char path[PATH_MAX];
-	if (module_find(package, package_end, path, sizeof(path)) != 0)
-		return NULL;
-
-	int package_len = package_end - package;
-	struct module *module = (struct module *)
-		malloc(sizeof(*module) + package_len + 1);
-	if (module == NULL) {
-		diag_set(OutOfMemory, sizeof(struct module) + package_len + 1,
-			 "malloc", "struct module");
-		return NULL;
-	}
-	memcpy(module->package, package, package_len);
-	module->package[package_len] = 0;
-	rlist_create(&module->funcs);
-	module->calls = 0;
-
-	const char *tmpdir = getenv("TMPDIR");
-	if (tmpdir == NULL)
-		tmpdir = "/tmp";
-	char dir_name[PATH_MAX];
-	int rc = snprintf(dir_name, sizeof(dir_name), "%s/tntXXXXXX", tmpdir);
-	if (rc < 0 || (size_t) rc >= sizeof(dir_name)) {
-		diag_set(SystemError, "failed to generate path to tmp dir");
-		goto error;
-	}
-
-	if (mkdtemp(dir_name) == NULL) {
-		diag_set(SystemError, "failed to create unique dir name: %s",
-			 dir_name);
-		goto error;
-	}
-	char load_name[PATH_MAX];
-	rc = snprintf(load_name, sizeof(load_name), "%s/%.*s." TARANTOOL_LIBEXT,
-		      dir_name, package_len, package);
-	if (rc < 0 || (size_t) rc >= sizeof(dir_name)) {
-		diag_set(SystemError, "failed to generate path to DSO");
-		goto error;
-	}
-
-	struct stat st;
-	if (stat(path, &st) < 0) {
-		diag_set(SystemError, "failed to stat() module %s", path);
-		goto error;
-	}
-
-	int source_fd = open(path, O_RDONLY);
-	if (source_fd < 0) {
-		diag_set(SystemError, "failed to open module %s file for" \
-			 " reading", path);
-		goto error;
-	}
-	int dest_fd = open(load_name, O_WRONLY|O_CREAT|O_TRUNC,
-			   st.st_mode & 0777);
-	if (dest_fd < 0) {
-		diag_set(SystemError, "failed to open file %s for writing ",
-			 load_name);
-		close(source_fd);
-		goto error;
-	}
-
-	off_t ret = eio_sendfile_sync(dest_fd, source_fd, 0, st.st_size);
-	close(source_fd);
-	close(dest_fd);
-	if (ret != st.st_size) {
-		diag_set(SystemError, "failed to copy DSO %s to %s",
-			 path, load_name);
-		goto error;
-	}
-
-	module->handle = dlopen(load_name, RTLD_NOW | RTLD_LOCAL);
-	if (unlink(load_name) != 0)
-		say_warn("failed to unlink dso link %s", load_name);
-	if (rmdir(dir_name) != 0)
-		say_warn("failed to delete temporary dir %s", dir_name);
-	if (module->handle == NULL) {
-		diag_set(ClientError, ER_LOAD_MODULE, package_len,
-			  package, dlerror());
-		goto error;
-	}
-	struct errinj *e = errinj(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT);
-	if (e != NULL)
-		++e->iparam;
-	return module;
-error:
-	free(module);
-	return NULL;
-}
-
-static void
-module_delete(struct module *module)
-{
-	struct errinj *e = errinj(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT);
-	if (e != NULL)
-		--e->iparam;
-	dlclose(module->handle);
-	TRASH(module);
-	free(module);
-}
-
-/*
- * Check if a dso is unused and can be closed.
- */
-static void
-module_gc(struct module *module)
-{
-	if (rlist_empty(&module->funcs) && module->calls == 0)
-		module_delete(module);
-}
-
-/*
- * Import a function from the module.
- */
-static box_function_f
-module_sym(struct module *module, const char *name)
-{
-	box_function_f f = (box_function_f)dlsym(module->handle, name);
-	if (f == NULL) {
-		diag_set(ClientError, ER_LOAD_FUNCTION, name, dlerror());
-		return NULL;
-	}
-	return f;
-}
-
-int
-module_sym_load(struct module_sym *mod_sym)
-{
-	assert(mod_sym->addr == NULL);
-
-	struct func_name name;
-	func_split_name(mod_sym->name, &name);
-
-	struct module *module = module_cache_find(name.package,
-						  name.package_end);
-	if (module == NULL) {
-		/* Try to find loaded module in the cache */
-		module = module_load(name.package, name.package_end);
-		if (module == NULL)
-			return -1;
-		if (module_cache_put(module)) {
-			module_delete(module);
-			return -1;
-		}
-	}
-
-	mod_sym->addr = module_sym(module, name.sym);
-	if (mod_sym->addr == NULL)
-		return -1;
-	mod_sym->module = module;
-	rlist_add(&module->funcs, &mod_sym->item);
-	return 0;
-}
-
-void
-module_sym_unload(struct module_sym *mod_sym)
-{
-	if (mod_sym->module == NULL)
-		return;
-
-	rlist_del(&mod_sym->item);
-	if (rlist_empty(&mod_sym->module->funcs)) {
-		struct func_name name;
-		func_split_name(mod_sym->name, &name);
-		module_cache_del(name.package, name.package_end);
-	}
-	module_gc(mod_sym->module);
-
-	mod_sym->module = NULL;
-	mod_sym->addr = NULL;
-}
-
-int
-module_reload(const char *package, const char *package_end, struct module **module)
-{
-	struct module *old_module = module_cache_find(package, package_end);
-	if (old_module == NULL) {
-		/* Module wasn't loaded - do nothing. */
-		*module = NULL;
-		return 0;
-	}
-
-	struct module *new_module = module_load(package, package_end);
-	if (new_module == NULL)
-		return -1;
-
-	struct module_sym *mod_sym, *tmp;
-	rlist_foreach_entry_safe(mod_sym, &old_module->funcs, item, tmp) {
-		struct func_name name;
-		func_split_name(mod_sym->name, &name);
-		mod_sym->addr = module_sym(new_module, name.sym);
-		if (mod_sym->addr == NULL)
-			goto restore;
-		mod_sym->module = new_module;
-		rlist_move(&new_module->funcs, &mod_sym->item);
-	}
-	module_cache_del(package, package_end);
-	if (module_cache_put(new_module) != 0)
-		goto restore;
-	module_gc(old_module);
-	*module = new_module;
-	return 0;
-restore:
-	/*
-	 * Some old-dso func can't be load from new module, restore old
-	 * functions.
-	 */
-	do {
-		struct func_name name;
-		func_split_name(mod_sym->name, &name);
-		mod_sym->addr = module_sym(old_module, name.sym);
-		if (mod_sym->addr == NULL) {
-			/*
-			 * Something strange was happen, an early loaden
-			 * function was not found in an old dso.
-			 */
-			panic("Can't restore module function, "
-			      "server state is inconsistent");
-		}
-		mod_sym->module = old_module;
-		rlist_move(&old_module->funcs, &mod_sym->item);
-	} while (mod_sym != rlist_first_entry(&old_module->funcs,
-					      struct module_sym,
-					      item));
-	assert(rlist_empty(&new_module->funcs));
-	module_delete(new_module);
-	return -1;
-}
-
 static struct func *
 func_c_new(struct func_def *def);
 
@@ -544,39 +125,9 @@ func_c_call(struct func *base, struct port *args, struct port *ret)
 {
 	assert(base->vtab == &func_c_vtab);
 	assert(base != NULL && base->def->language == FUNC_LANGUAGE_C);
-	struct func_c *func = (struct func_c *) base;
-	if (func->mod_sym.addr == NULL) {
-		if (module_sym_load(&func->mod_sym) != 0)
-			return -1;
-	}
-
-	struct region *region = &fiber()->gc;
-	size_t region_svp = region_used(region);
-	uint32_t data_sz;
-	const char *data = port_get_msgpack(args, &data_sz);
-	if (data == NULL)
-		return -1;
 
-	port_c_create(ret);
-	box_function_ctx_t ctx = { ret };
-
-	/* Module can be changed after function reload. */
-	struct module *module = func->mod_sym.module;
-	assert(module != NULL);
-	++module->calls;
-	int rc = func->mod_sym.addr(&ctx, data, data + data_sz);
-	--module->calls;
-	module_gc(module);
-	region_truncate(region, region_svp);
-	if (rc != 0) {
-		if (diag_last_error(&fiber()->diag) == NULL) {
-			/* Stored procedure forget to set diag  */
-			diag_set(ClientError, ER_PROC_C, "unknown error");
-		}
-		port_destroy(ret);
-		return -1;
-	}
-	return rc;
+	struct func_c *func = (struct func_c *) base;
+	return module_sym_call(&func->mod_sym, args, ret);
 }
 
 static struct func_vtab func_c_vtab = {
diff --git a/src/box/func.h b/src/box/func.h
index acf2df9a9..11a466b28 100644
--- a/src/box/func.h
+++ b/src/box/func.h
@@ -34,7 +34,8 @@
 #include <stddef.h>
 #include <stdint.h>
 #include <stdbool.h>
-#include "small/rlist.h"
+
+#include "module_cache.h"
 #include "func_def.h"
 #include "user_def.h"
 
@@ -44,43 +45,6 @@ extern "C" {
 
 struct func;
 
-/**
- * Dynamic shared module.
- */
-struct module {
-	/** Module dlhandle. */
-	void *handle;
-	/** List of imported functions. */
-	struct rlist funcs;
-	/** Count of active calls. */
-	size_t calls;
-	/** Module's package name. */
-	char package[0];
-};
-
-/**
- * Callable symbol bound to a module.
- */
-struct module_sym {
-	/**
-	 * Anchor for module membership.
-	 */
-	struct rlist item;
-	/**
-	 * For C functions, address of the function.
-	 */
-	box_function_f addr;
-	/**
-	 * Each stored function keeps a handle to the
-	 * dynamic library for the C callback.
-	 */
-	struct module *module;
-	/**
-	 * Function name definition.
-	 */
-	char *name;
-};
-
 /** Virtual method table for func object. */
 struct func_vtab {
 	/** Call function with given arguments. */
@@ -107,18 +71,6 @@ struct func {
 	struct access access[BOX_USER_MAX];
 };
 
-/**
- * Initialize modules subsystem.
- */
-int
-module_init(void);
-
-/**
- * Cleanup modules subsystem.
- */
-void
-module_free(void);
-
 struct func *
 func_new(struct func_def *def);
 
@@ -131,38 +83,6 @@ func_delete(struct func *func);
 int
 func_call(struct func *func, struct port *args, struct port *ret);
 
-/**
- * Resolve C entry (find the respective DLL and fetch the
- * symbol from it).
- *
- * @param mod_sym module symbol pointer.
- * @retval -1 on error.
- * @retval 0 on success.
- */
-int
-module_sym_load(struct module_sym *mod_sym);
-
-/**
- * Unload module symbol and drop it from the package
- * cache if there is no users left.
- *
- * @param mod_sym module symbol pointer.
- */
-void
-module_sym_unload(struct module_sym *mod_sym);
-
-/**
- * Reload dynamically loadable module.
- *
- * @param package name begin pointer.
- * @param package_end package_end name end pointer.
- * @param[out] module a pointer to store module object on success.
- * @retval -1 on error.
- * @retval 0 on success.
- */
-int
-module_reload(const char *package, const char *package_end, struct module **module);
-
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
diff --git a/src/box/module_cache.c b/src/box/module_cache.c
new file mode 100644
index 000000000..a44f3d110
--- /dev/null
+++ b/src/box/module_cache.c
@@ -0,0 +1,503 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#include <dlfcn.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "assoc.h"
+#include "diag.h"
+#include "error.h"
+#include "errinj.h"
+#include "fiber.h"
+#include "port.h"
+
+#include "box/error.h"
+#include "lua/utils.h"
+#include "libeio/eio.h"
+
+#include "module_cache.h"
+
+/** Modules name to descriptor hash. */
+static struct mh_strnptr_t *mod_hash = NULL;
+
+/**
+ * Parse function descriptor from a string.
+ */
+void
+parse_func_name(const char *str, struct func_name_desc *d)
+{
+	d->package = str;
+	d->package_end = strrchr(str, '.');
+	if (d->package_end != NULL) {
+		/* module.submodule.function => module.submodule, function */
+		d->sym = d->package_end + 1; /* skip '.' */
+	} else {
+		/* package == function => function, function */
+		d->sym = d->package;
+		d->package_end = str + strlen(str);
+	}
+}
+
+/**
+ * Look up a module in the modules cache.
+ */
+static struct module *
+module_cache_find(const char *name, const char *name_end)
+{
+	mh_int_t e = mh_strnptr_find_inp(mod_hash, name, name_end - name);
+	if (e == mh_end(mod_hash))
+		return NULL;
+	return mh_strnptr_node(mod_hash, e)->val;
+}
+
+/**
+ * Save a module to the modules cache.
+ */
+static int
+module_cache_add(struct module *module)
+{
+	size_t package_len = strlen(module->package);
+	const struct mh_strnptr_node_t nd = {
+		.str	= module->package,
+		.len	= package_len,
+		.hash	= mh_strn_hash(module->package, package_len),
+		.val	= module,
+	};
+
+	if (mh_strnptr_put(mod_hash, &nd, NULL, NULL) == mh_end(mod_hash)) {
+		diag_set(OutOfMemory, sizeof(nd), "malloc",
+			 "module cache node");
+		return -1;
+	}
+	return 0;
+}
+
+/**
+ * Delete a module from the modules cache.
+ */
+static void
+module_cache_del(const char *name, const char *name_end)
+{
+	mh_int_t e = mh_strnptr_find_inp(mod_hash, name, name_end - name);
+	if (e != mh_end(mod_hash))
+		mh_strnptr_del(mod_hash, e, NULL);
+}
+
+/**
+ * Arguments for luaT_module_find used by lua_cpcall().
+ */
+struct module_find_ctx {
+	const char *package;
+	const char *package_end;
+	char *path;
+	size_t path_len;
+};
+
+/**
+ * A cpcall() helper for module_find().
+ */
+static int
+luaT_module_find(lua_State *L)
+{
+	struct module_find_ctx *ctx = (void *)lua_topointer(L, 1);
+
+	/*
+	 * Call package.searchpath(name, package.cpath) and use
+	 * the path to the function in dlopen().
+	 */
+	lua_getglobal(L, "package");
+	lua_getfield(L, -1, "search");
+
+	/* Argument of search: name */
+	lua_pushlstring(L, ctx->package, ctx->package_end - ctx->package);
+
+	lua_call(L, 1, 1);
+	if (lua_isnil(L, -1))
+		return luaL_error(L, "module not found");
+
+	/* Convert path to absolute */
+	char resolved[PATH_MAX];
+	if (realpath(lua_tostring(L, -1), resolved) == NULL) {
+		diag_set(SystemError, "realpath");
+		return luaT_error(L);
+	}
+
+	snprintf(ctx->path, ctx->path_len, "%s", resolved);
+	return 0;
+}
+
+/**
+ * Find a path to a module using Lua's package.cpath.
+ *
+ * @param ctx module find context
+ * @param[out] ctx->path path to shared library.
+ * @param[out] ctx->path_len size of @a ctx->path buffer.
+ *
+ * @return 0 on succes, -1 otherwise, diag is set.
+ */
+static int
+module_find(struct module_find_ctx *ctx)
+{
+	lua_State *L = tarantool_L;
+	int top = lua_gettop(L);
+	if (luaT_cpcall(L, luaT_module_find, ctx) != 0) {
+		diag_set(ClientError, ER_LOAD_MODULE,
+			 (int)(ctx->package_end - ctx->package),
+			 ctx->package, lua_tostring(L, -1));
+		lua_settop(L, top);
+		return -1;
+	}
+	assert(top == lua_gettop(L)); /* cpcall discard results */
+	return 0;
+}
+
+/**
+ * Load dynamic shared object, ie module library.
+ *
+ * Create a new symlink based on temporary directory
+ * and try to load via this symink to load a dso twice
+ * for cases of a function reload.
+ */
+static struct module *
+module_load(const char *package, const char *package_end)
+{
+	char path[PATH_MAX];
+	struct module_find_ctx ctx = {
+		.package	= package,
+		.package_end	= package_end,
+		.path		= path,
+		.path_len	= sizeof(path),
+	};
+	if (module_find(&ctx) != 0)
+		return NULL;
+
+	int package_len = package_end - package;
+	struct module *module = malloc(sizeof(*module) + package_len + 1);
+	if (module == NULL) {
+		diag_set(OutOfMemory, sizeof(*module) + package_len + 1,
+			 "malloc", "struct module");
+		return NULL;
+	}
+	memcpy(module->package, package, package_len);
+	module->package[package_len] = 0;
+	rlist_create(&module->mod_sym_head);
+	module->calls = 0;
+
+	const char *tmpdir = getenv("TMPDIR");
+	if (tmpdir == NULL)
+		tmpdir = "/tmp";
+
+	char dir_name[PATH_MAX];
+	int rc = snprintf(dir_name, sizeof(dir_name), "%s/tntXXXXXX", tmpdir);
+	if (rc < 0 || (size_t)rc >= sizeof(dir_name)) {
+		diag_set(SystemError, "failed to generate path to tmp dir");
+		goto error;
+	}
+
+	if (mkdtemp(dir_name) == NULL) {
+		diag_set(SystemError, "failed to create unique dir name: %s",
+			 dir_name);
+		goto error;
+	}
+
+	char load_name[PATH_MAX];
+	rc = snprintf(load_name, sizeof(load_name), "%s/%.*s." TARANTOOL_LIBEXT,
+		      dir_name, package_len, package);
+	if (rc < 0 || (size_t)rc >= sizeof(dir_name)) {
+		diag_set(SystemError, "failed to generate path to DSO");
+		goto error;
+	}
+
+	struct stat st;
+	if (stat(path, &st) < 0) {
+		diag_set(SystemError, "failed to stat() module %s", path);
+		goto error;
+	}
+
+	int source_fd = open(path, O_RDONLY);
+	if (source_fd < 0) {
+		diag_set(SystemError, "failed to open module %s", path);
+		goto error;
+	}
+
+	int dest_fd = open(load_name, O_WRONLY | O_CREAT | O_TRUNC,
+			   st.st_mode & (S_IRWXU | S_IRWXG | S_IRWXO));
+	if (dest_fd < 0) {
+		diag_set(SystemError, "failed to open file %s for writing ",
+			 load_name);
+		close(source_fd);
+		goto error;
+	}
+
+	off_t ret = eio_sendfile_sync(dest_fd, source_fd, 0, st.st_size);
+	close(source_fd);
+	close(dest_fd);
+	if (ret != st.st_size) {
+		diag_set(SystemError, "failed to copy DSO %s to %s",
+			 path, load_name);
+		goto error;
+	}
+
+	module->handle = dlopen(load_name, RTLD_NOW | RTLD_LOCAL);
+	if (unlink(load_name) != 0)
+		say_warn("failed to unlink dso link %s", load_name);
+	if (rmdir(dir_name) != 0)
+		say_warn("failed to delete temporary dir %s", dir_name);
+	if (module->handle == NULL) {
+		diag_set(ClientError, ER_LOAD_MODULE, package_len,
+			  package, dlerror());
+		goto error;
+	}
+
+	struct errinj *e = errinj(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT);
+	if (e != NULL)
+		++e->iparam;
+	return module;
+
+error:
+	free(module);
+	return NULL;
+}
+
+/**
+ * Delete a module.
+ */
+static void
+module_delete(struct module *module)
+{
+	struct errinj *e = errinj(ERRINJ_DYN_MODULE_COUNT, ERRINJ_INT);
+	if (e != NULL)
+		--e->iparam;
+	dlclose(module->handle);
+	TRASH(module);
+	free(module);
+}
+
+/**
+ * Check if a module is unused and delete it then.
+ */
+static void
+module_gc(struct module *module)
+{
+	if (rlist_empty(&module->mod_sym_head) && module->calls == 0)
+		module_delete(module);
+}
+
+/**
+ * Import a function from a module.
+ */
+static box_function_f
+module_sym(struct module *module, const char *name)
+{
+	box_function_f f = dlsym(module->handle, name);
+	if (f == NULL) {
+		diag_set(ClientError, ER_LOAD_FUNCTION, name, dlerror());
+		return NULL;
+	}
+	return f;
+}
+
+/**
+ * Load a new module symbol.
+ */
+int
+module_sym_load(struct module_sym *mod_sym)
+{
+	assert(mod_sym->addr == NULL);
+
+	struct func_name_desc d;
+	parse_func_name(mod_sym->name, &d);
+
+	struct module *module = module_cache_find(d.package, d.package_end);
+	if (module == NULL) {
+		module = module_load(d.package, d.package_end);
+		if (module == NULL)
+			return -1;
+		if (module_cache_add(module)) {
+			module_delete(module);
+			return -1;
+		}
+	}
+
+	mod_sym->addr = module_sym(module, d.sym);
+	if (mod_sym->addr == NULL)
+		return -1;
+
+	mod_sym->module = module;
+	rlist_add(&module->mod_sym_head, &mod_sym->list);
+	return 0;
+}
+
+/**
+ * Unload a module's symbol.
+ */
+void
+module_sym_unload(struct module_sym *mod_sym)
+{
+	if (mod_sym->module == NULL)
+		return;
+
+	rlist_del(&mod_sym->list);
+	if (rlist_empty(&mod_sym->module->mod_sym_head)) {
+		struct func_name_desc d;
+		parse_func_name(mod_sym->name, &d);
+		module_cache_del(d.package, d.package_end);
+	}
+	module_gc(mod_sym->module);
+
+	mod_sym->module = NULL;
+	mod_sym->addr = NULL;
+}
+
+/**
+ * Execute module symbol (run a function).
+ */
+int
+module_sym_call(struct module_sym *mod_sym, struct port *args,
+		struct port *ret)
+{
+	if (mod_sym->addr == NULL) {
+		if (module_sym_load(mod_sym) != 0)
+			return -1;
+	}
+
+	struct region *region = &fiber()->gc;
+	size_t region_svp = region_used(region);
+
+	uint32_t data_sz;
+	const char *data = port_get_msgpack(args, &data_sz);
+	if (data == NULL)
+		return -1;
+
+	port_c_create(ret);
+	box_function_ctx_t ctx = {
+		.port = ret,
+	};
+
+	/*
+	 * Module can be changed after function reload. Also
+	 * keep in mind that stored C procedure may yield inside.
+	 */
+	struct module *module = mod_sym->module;
+	assert(module != NULL);
+	++module->calls;
+	int rc = mod_sym->addr(&ctx, data, data + data_sz);
+	--module->calls;
+	module_gc(module);
+	region_truncate(region, region_svp);
+
+	if (rc != 0) {
+		if (diag_last_error(&fiber()->diag) == NULL) {
+			/* Stored procedure forget to set diag  */
+			diag_set(ClientError, ER_PROC_C, "unknown error");
+		}
+		port_destroy(ret);
+		return -1;
+	}
+
+	return rc;
+}
+
+/**
+ * Reload a module and all associated symbols.
+ */
+int
+module_reload(const char *package, const char *package_end,
+	      struct module **module)
+{
+	struct module *old = module_cache_find(package, package_end);
+	if (old == NULL) {
+		/* Module wasn't loaded - do nothing. */
+		*module = NULL;
+		return 0;
+	}
+
+	struct module *new = module_load(package, package_end);
+	if (new == NULL)
+		return -1;
+
+	struct module_sym *mod_sym, *tmp;
+	rlist_foreach_entry_safe(mod_sym, &old->mod_sym_head, list, tmp) {
+		struct func_name_desc d;
+		parse_func_name(mod_sym->name, &d);
+
+		mod_sym->addr = module_sym(new, d.sym);
+		if (mod_sym->addr == NULL) {
+			say_error("module: reload %s, symbol %s not found",
+				  package, d.sym);
+			goto restore;
+		}
+
+		mod_sym->module = new;
+		rlist_move(&new->mod_sym_head, &mod_sym->list);
+	}
+
+	module_cache_del(package, package_end);
+	if (module_cache_add(new) != 0)
+		goto restore;
+
+	module_gc(old);
+	*module = new;
+	return 0;
+
+restore:
+	/*
+	 * Some old-dso func can't be load from new module,
+	 * restore old functions.
+	 */
+	do {
+		struct func_name_desc d;
+		parse_func_name(mod_sym->name, &d);
+		mod_sym->addr = module_sym(old, d.sym);
+		if (mod_sym->addr == NULL) {
+			/*
+			 * Something strange was happen, an early loaden
+			 * function was not found in an old dso.
+			 */
+			panic("Can't restore module function, "
+			      "server state is inconsistent");
+		}
+		mod_sym->module = old;
+		rlist_move(&old->mod_sym_head, &mod_sym->list);
+	} while (mod_sym != rlist_first_entry(&old->mod_sym_head,
+					      struct module_sym,
+					      list));
+	assert(rlist_empty(&new->mod_sym_head));
+	module_delete(new);
+	return -1;
+}
+
+/**
+ * Initialize modules subsystem.
+ */
+int
+module_init(void)
+{
+	mod_hash = mh_strnptr_new();
+	if (mod_hash == NULL) {
+		diag_set(OutOfMemory, sizeof(*mod_hash),
+			 "malloc", "modules hash table");
+		return -1;
+	}
+	return 0;
+}
+
+/**
+ * Free modules subsystem.
+ */
+void
+module_free(void)
+{
+	while (mh_size(mod_hash) > 0) {
+		mh_int_t i = mh_first(mod_hash);
+		struct module *m = mh_strnptr_node(mod_hash, i)->val;
+		module_gc(m);
+	}
+	mh_strnptr_delete(mod_hash);
+	mod_hash = NULL;
+}
diff --git a/src/box/module_cache.h b/src/box/module_cache.h
new file mode 100644
index 000000000..3f275024a
--- /dev/null
+++ b/src/box/module_cache.h
@@ -0,0 +1,156 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#pragma once
+
+#include <small/rlist.h>
+
+#include "func_def.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif /* defined(__cplusplus) */
+
+/**
+ * Function name descriptor: a symbol and a package.
+ */
+struct func_name_desc {
+	/**
+	 * Null-terminated symbol name, e.g.
+	 * "func" for "mod.submod.func".
+	 */
+	const char *sym;
+	/**
+	 * Package name, e.g. "mod.submod" for
+	 * "mod.submod.func".
+	 */
+	const char *package;
+	/**
+	 * A pointer to the last character in ->package + 1.
+	 */
+	const char *package_end;
+};
+
+/***
+ * Parse function name into a name descriptor.
+ * 
+ * For example, str = foo.bar.baz => sym = baz, package = foo.bar
+ *
+ * @param str function name, e.g. "module.submodule.function".
+ * @param[out] d parsed symbol and a package name.
+ */
+void
+parse_func_name(const char *str, struct func_name_desc *d);
+
+/**
+ * Dynamic shared module.
+ */
+struct module {
+	/**
+	 * Module dlhandle.
+	 */
+	void *handle;
+	/**
+	 * List of associated symbols (functions).
+	 */
+	struct rlist mod_sym_head;
+	/**
+	 * Count of active calls.
+	 */
+	size_t calls;
+	/**
+	 * Module's package name.
+	 */
+	char package[0];
+};
+
+/**
+ * Callable symbol bound to a module.
+ */
+struct module_sym {
+	/**
+	 * Anchor for module membership.
+	 */
+	struct rlist list;
+	/**
+	 * For C functions, address of the function.
+	 */
+	box_function_f addr;
+	/**
+	 * Each stored function keeps a handle to the
+	 * dynamic library for the C callback.
+	 */
+	struct module *module;
+	/**
+	 * Symbol (function) name definition.
+	 */
+	char *name;
+};
+
+/**
+ * Load a new module symbol.
+ *
+ * @param mod_sym symbol to load.
+ *
+ * @returns 0 on succse, -1 otherwise, diag is set.
+ */
+int
+module_sym_load(struct module_sym *mod_sym);
+
+/**
+ * Unload a module's symbol.
+ *
+ * @param mod_sym symbol to unload.
+ */
+void
+module_sym_unload(struct module_sym *mod_sym);
+
+/**
+ * Execute a module symbol (run a function).
+ *
+ * The function packs function arguments into a message pack
+ * and send it as a function argument. Function may return
+ * results via @a ret stream.
+ *
+ * @param mod_sym module symbol to run.
+ * @param args function arguments
+ * @param[out] execution results
+ *
+ * @returns 0 on success, -1 otherwise, diag is set.
+ */
+int
+module_sym_call(struct module_sym *mod_sym, struct port *args,
+		struct port *ret);
+
+/**
+ * Reload a module and all associated symbols.
+ *
+ * @param str function name, e.g. "module.submodule.function".
+ * @param[out] d parsed symbol and a package name.
+ *
+ * @return 0 on succes, -1 otherwise, diag is set.
+ */
+int
+module_reload(const char *package, const char *package_end,
+	      struct module **module);
+
+/**
+ * Initialize modules subsystem.
+ *
+ * @return 0 on succes, -1 otherwise, diag is set.
+ */
+int
+module_init(void);
+
+/**
+ * Free modules subsystem.
+ */
+void
+module_free(void);
+
+#if defined(__cplusplus)
+}
+#endif /* defined(__plusplus) */
-- 
2.26.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-14 13:35 [Tarantool-patches] [PATCH v8 0/4] box/cbox: implement cfunc Lua module Cyrill Gorcunov
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure Cyrill Gorcunov
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem Cyrill Gorcunov
@ 2020-10-14 13:35 ` Cyrill Gorcunov
  2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 4/4] test: box/cfunc -- add simple module test Cyrill Gorcunov
  3 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-14 13:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Currently to run "C" function from some external module
one have to register it first in "_func" system space. This
is a problem if node is in read-only mode (replica).

Still people would like to have a way to run such functions
even in ro mode. For this sake we implement "cbox" lua module.

Fixes #4692

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: cbox module

Overview
========

`cbox` module provides a way to create, delete and execute
`C` procedures. Unlinke `box.schema.func` functionality this
the functions created with `cbox` help are not persistent and
live purely in memory. Once a node get turned off they are
vanished. An initial purpose for them is to execute them on
nodes which are running in read-only mode.

Module functions
================

`cbox.func.load([dso.]name) -> obj | nil, err`
----------------------------------------------

Loads a new function with name `name` from module `dso.`.
The module name is optional and if not provided implies
to be the same as `name`.

The `load` call must be paired with `unload` and these
calls are accounted. Until coupled `unload` is called
the instance is present in memory. Any `load` calls
followed by another `load` with same name simply
increase a reference to the existing function.

Possible errors:
 - IllegalParams: function name is either not supplied
   or not a string.
 - IllegalParams: function name is too long.
 - IllegalParams: function references limit exceeded.
 - OutOfMemory: unable to allocate a function.

On success a new callable object is returned,
otherwise `nil, error` pair.

Example:

``` Lua
f, err = require('cbox').func.load('func')
if not f then
    print(err)
end
```

Once function is loaded it is possible to execute it
in a traditional Lua way, ie to call it as a function.

``` Lua
-- call with agruments arg1 and arg2
f(arg1, arg2)
```

`cbox.func.unload([dso.]name) -> true | nil, err`
-------------------------------------------------

Unload a function with name `[dso.]name`. Since function
instances are accounted the function is not unloaded until
number of `unload` calls matched to the number of `load`
calls.

Possible errors:
 - IllegalParams: function name is either not supplied
   or not a string.
 - IllegalParams: the function does not exist.

On success `true` is returned, otherwise `nil, error` pair.

Example:

``` Lua
ok, err = require('cbox').func.unload('func')
if not ok then
    print(err)
end
```

`cbox.module.reload(name) -> true | nil, err`
---------------------------------------------

Reloads a module with name `name` and all functions which
were associated for the module. Each module keeps a list of
functions belonging to the module and reload procedure cause
the bound function to update their addresses such that
function execution will be routed via a new library.

Modules are loaded with that named local binding which means
that reload of module symbols won't affect the functions which
are started execution already, only new calls will be rerouted.

Possible errors:
 - IllegalParams: module name is either not supplied
   or not a string.
 - ClientError: a module with the name provided does
   not exist.

On success `true` is returned, otherwise `nil,error` pair.

Example:

``` Lua
ok, err = require('cbox').module.reload('func')
if not ok then
    print(err)
end
```
---
 src/box/CMakeLists.txt |   1 +
 src/box/box.cc         |   5 +
 src/box/lua/cbox.c     | 486 +++++++++++++++++++++++++++++++++++++++++
 src/box/lua/cbox.h     |  39 ++++
 src/box/lua/init.c     |   2 +
 5 files changed, 533 insertions(+)
 create mode 100644 src/box/lua/cbox.c
 create mode 100644 src/box/lua/cbox.h

diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
index 91d85c6db..1202697c5 100644
--- a/src/box/CMakeLists.txt
+++ b/src/box/CMakeLists.txt
@@ -197,6 +197,7 @@ add_library(box STATIC
     lua/init.c
     lua/call.c
     lua/cfg.cc
+    lua/cbox.c
     lua/console.c
     lua/serialize_lua.c
     lua/tuple.c
diff --git a/src/box/box.cc b/src/box/box.cc
index 2485b79f3..f20761e8f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -75,6 +75,7 @@
 #include "systemd.h"
 #include "call.h"
 #include "module_cache.h"
+#include "lua/cbox.h"
 #include "sequence.h"
 #include "sql_stmt_cache.h"
 #include "msgpack.h"
@@ -2246,6 +2247,7 @@ box_free(void)
 		tuple_free();
 		port_free();
 #endif
+		cbox_free();
 		iproto_free();
 		replication_free();
 		sequence_free();
@@ -2647,6 +2649,9 @@ box_init(void)
 	if (module_init() != 0)
 		diag_raise();
 
+	if (cbox_init() != 0)
+		diag_raise();
+
 	if (tuple_init(lua_hash) != 0)
 		diag_raise();
 
diff --git a/src/box/lua/cbox.c b/src/box/lua/cbox.c
new file mode 100644
index 000000000..0d8208024
--- /dev/null
+++ b/src/box/lua/cbox.c
@@ -0,0 +1,486 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#include <string.h>
+#include <lua.h>
+
+#define RB_COMPACT 1
+#include <small/rb.h>
+
+#include "diag.h"
+
+#include "box/module_cache.h"
+#include "box/error.h"
+#include "box/port.h"
+
+#include "trivia/util.h"
+#include "lua/utils.h"
+
+/**
+ * Function descriptor.
+ */
+struct cbox_func {
+	/**
+	 * Gather functions into rbtree.
+	 */
+	rb_node(struct cbox_func) nd;
+
+	/**
+	 * Symbol descriptor for the function in
+	 * an associated module.
+	 */
+	struct module_sym mod_sym;
+
+	/**
+	 * Number of references to the function
+	 * instance.
+	 */
+	ssize_t ref;
+
+	/** Function name. */
+	const char *name;
+
+	/** Function name length. */
+	size_t name_len;
+
+	/** Function name keeper. */
+	char inplace[0];
+};
+
+/**
+ * A tree to lookup functions by name.
+ */
+typedef rb_tree(struct cbox_func) func_rb_t;
+static func_rb_t func_rb_root;
+
+static int
+cbox_func_cmp(const struct cbox_func *a, const struct cbox_func *b)
+{
+	ssize_t len = (ssize_t)a->name_len - (ssize_t)b->name_len;
+	if (len == 0)
+		return strcmp(a->name, b->name);
+	return len < 0 ? -1 : 1;
+}
+
+rb_gen(MAYBE_UNUSED static, func_rb_, func_rb_t,
+       struct cbox_func, nd, cbox_func_cmp);
+
+/**
+ * Find function in a tree.
+ */
+struct cbox_func *
+cbox_func_find(const char *name, size_t name_len)
+{
+	struct cbox_func v = {
+		.name		= name,
+		.name_len	= name_len,
+	};
+	return func_rb_search(&func_rb_root, &v);
+}
+
+/**
+ * Unreference a function instance.
+ */
+static void
+cbox_func_unref(struct cbox_func *cf)
+{
+	assert(cf->ref > 0);
+	if (cf->ref-- == 1)
+		func_rb_remove(&func_rb_root, cf);
+}
+
+/**
+ * Reference a function instance.
+ */
+static bool
+cbox_func_ref(struct cbox_func *cf)
+{
+	assert(cf->ref >= 0);
+
+	/*
+	 * Hardly to ever happen but just
+	 * to make sure.
+	 */
+	if (cf->ref == SSIZE_MAX) {
+		const char *fmt =
+			"Too many function references (max %zd)";
+		diag_set(IllegalParams, fmt, SSIZE_MAX);
+		return false;
+	}
+
+	if (cf->ref++ == 0)
+		func_rb_insert(&func_rb_root, cf);
+
+	return true;
+}
+
+/**
+ * Allocate a new function instance.
+ */
+static struct cbox_func *
+cbox_func_new(const char *name, size_t name_len)
+{
+	const ssize_t cf_size = sizeof(struct cbox_func);
+	ssize_t size = cf_size + name_len + 1;
+	if (size < 0) {
+		const size_t max_len = SSIZE_MAX - cf_size - 1;
+		const char *fmt = "Function name is too long (max %zd)";
+		diag_set(IllegalParams, fmt, max_len);
+		return NULL;
+	}
+
+	struct cbox_func *cf = malloc(size);
+	if (cf == NULL) {
+		diag_set(OutOfMemory, size, "malloc", "cf");
+		return NULL;
+	}
+
+	cf->mod_sym.addr	= NULL;
+	cf->mod_sym.module	= NULL;
+	cf->ref			= 0;
+	cf->mod_sym.name	= cf->inplace;
+	cf->name		= cf->inplace;
+	cf->name_len		= name_len;
+
+	memcpy(cf->inplace, name, name_len);
+	cf->inplace[name_len] = '\0';
+
+	memset(&cf->nd, 0, sizeof(cf->nd));
+	return cf;
+}
+
+/**
+ * Fetch a function instance from the Lua object.
+ */
+static struct cbox_func *
+lcbox_func_get_handler(struct lua_State *L)
+{
+	struct cbox_func *cf = NULL;
+	int top = lua_gettop(L);
+
+	if (lua_getmetatable(L, -top) != 0) {
+		lua_getfield(L, -1, "__cbox_func");
+		if (lua_isuserdata(L, -1)) {
+			struct cbox_func **pcf = lua_touserdata(L, -1);
+			cf = pcf[0];
+		}
+		lua_pop(L, 2);
+	}
+	return cf;
+}
+
+/**
+ * Free a function instance.
+ *
+ * It is called by Lua itself when a variable has no more reference.
+ * Since we associate a function instance for each variable we
+ * can't just free the memory immediately, instead we must be sure
+ * the final unload() is called and there are no more instances left
+ * in the tree thus next load() will have to allocate a new instance.
+ */
+static int
+lcbox_func_gc(struct lua_State *L)
+{
+	struct cbox_func *cf = lcbox_func_get_handler(L);
+	if (cf->ref == 0) {
+		TRASH(cf);
+		free(cf);
+	}
+	return 0;
+}
+
+/**
+ * Call a function by its name from the Lua code.
+ */
+static int
+lcbox_func_call(struct lua_State *L)
+{
+	struct cbox_func *cf = lcbox_func_get_handler(L);
+	if (cf == NULL) {
+		/*
+		 * How can this happen? Someone screwed
+		 * internal object data intentionally?
+		 * Whatever, the pointer is ruined we
+		 * can't do anything.
+		 */
+		const char *fmt = "Function is corrupted";
+		diag_set(IllegalParams, fmt);
+		return luaT_push_nil_and_error(L);
+	}
+
+	/*
+	 * FIXME: We should get rid of luaT_newthread but this
+	 * requires serious modifications. In particular
+	 * port_lua_do_dump uses tarantool_L reference and
+	 * coro_ref must be valid as well.
+	 */
+	lua_State *args_L = luaT_newthread(tarantool_L);
+	if (args_L == NULL)
+		return luaT_push_nil_and_error(L);
+
+	int coro_ref = luaL_ref(tarantool_L, LUA_REGISTRYINDEX);
+	lua_xmove(L, args_L, lua_gettop(L) - 1);
+
+	struct port args;
+	port_lua_create(&args, args_L);
+	((struct port_lua *)&args)->ref = coro_ref;
+
+	struct port ret;
+	if (module_sym_call(&cf->mod_sym, &args, &ret) != 0) {
+		port_destroy(&args);
+		return luaT_push_nil_and_error(L);
+	}
+
+	int top = lua_gettop(L);
+	lua_pushboolean(L, true);
+	port_dump_lua(&ret, L, true);
+	int cnt = lua_gettop(L) - top;
+
+	port_destroy(&ret);
+	port_destroy(&args);
+
+	return cnt;
+}
+
+/**
+ * Load a new function.
+ *
+ * This function takes a function name from the caller
+ * stack @a L and creates a new function object. If
+ * the function is already loaded we simply return
+ * a reference to existing one.
+ *
+ * Possible errors:
+ *
+ * - IllegalParams: function name is either not supplied
+ *   or not a string.
+ * - IllegalParams: function references limit exceeded.
+ * - OutOfMemory: unable to allocate a function.
+ *
+ * @returns function object on success or {nil,error} on error,
+ * the error is set to the diagnostics area.
+ */
+static int
+lcbox_func_load(struct lua_State *L)
+{
+	const char *method = "cbox.func.load";
+	struct cbox_func *cf = NULL;
+
+	if (lua_gettop(L) != 1 || !lua_isstring(L, 1)) {
+		const char *fmt =
+			"Expects %s(\'name\') but no name passed";
+		diag_set(IllegalParams, fmt, method);
+		return luaT_push_nil_and_error(L);
+	}
+
+	size_t name_len;
+	const char *name = lua_tolstring(L, 1, &name_len);
+
+	cf = cbox_func_find(name, name_len);
+	if (cf == NULL) {
+		cf = cbox_func_new(name, name_len);
+		if (cf == NULL)
+			return luaT_push_nil_and_error(L);
+	}
+	if (!cbox_func_ref(cf))
+		return luaT_push_nil_and_error(L);
+
+	lua_newtable(L);
+
+	lua_pushstring(L, "name");
+	lua_pushstring(L, cf->name);
+	lua_settable(L, -3);
+
+	lua_newtable(L);
+
+	/*
+	 * A new variable should be callable for
+	 * convenient use in Lua.
+	 */
+	lua_pushstring(L, "__call");
+	lua_pushcfunction(L, lcbox_func_call);
+	lua_settable(L, -3);
+
+	/*
+	 * We will release the memory associated
+	 * with the objet if only no active refs
+	 * are left.
+	 */
+	lua_pushstring(L, "__gc");
+	lua_pushcfunction(L, lcbox_func_gc);
+	lua_settable(L, -3);
+
+	/*
+	 * Carry the pointer to the function so
+	 * we won't need to run a lookup when
+	 * calling.
+	 */
+	lua_pushstring(L, "__cbox_func");
+	*(struct cbox_func **)lua_newuserdata(L, sizeof(cf)) = cf;
+	lua_settable(L, -3);
+
+	lua_setmetatable(L, -2);
+	return 1;
+}
+
+/**
+ * Unload a function.
+ *
+ * This function takes a function name from the caller
+ * stack @a L and unloads a function object.
+ *
+ * Possible errors:
+ *
+ * - IllegalParams: function name is either not supplied
+ *   or not a string.
+ * - IllegalParams: the function does not exist.
+ *
+ * @returns true on success or {nil,error} on error,
+ * the error is set to the diagnostics area.
+ */
+static int
+lcbox_func_unload(struct lua_State *L)
+{
+	const char *method = "cbox.func.unload";
+	const char *name = NULL;
+
+	if (lua_gettop(L) != 1 || !lua_isstring(L, 1)) {
+		const char *fmt =
+			"Expects %s(\'name\') but no name passed";
+		diag_set(IllegalParams, fmt, method);
+		return luaT_push_nil_and_error(L);
+	}
+
+	size_t name_len;
+	name = lua_tolstring(L, 1, &name_len);
+
+	struct cbox_func *cf = cbox_func_find(name, name_len);
+	if (cf == NULL) {
+		const char *fmt = tnt_errcode_desc(ER_NO_SUCH_FUNCTION);
+		diag_set(IllegalParams, fmt, name);
+		return luaT_push_nil_and_error(L);
+	}
+
+	cbox_func_unref(cf);
+	lua_pushboolean(L, true);
+	return 1;
+}
+
+/**
+ * Reload a module.
+ *
+ * This function takes a module name from the caller
+ * stack @a L and reloads all functions associated with
+ * the module.
+ *
+ * Possible errors:
+ *
+ * - IllegalParams: module name is either not supplied
+ *   or not a string.
+ * - IllegalParams: the function does not exist.
+ * - ClientError: a module with the name provided does
+ *   not exist.
+ *
+ * @returns true on success or {nil,error} on error,
+ * the error is set to the diagnostics area.
+ */
+static int
+lcbox_module_reload(struct lua_State *L)
+{
+	const char *method = "cbox.module.reload";
+	const char *fmt = "Expects %s(\'name\') but no name passed";
+
+	if (lua_gettop(L) != 1 || !lua_isstring(L, 1)) {
+		diag_set(IllegalParams, fmt, method);
+		return luaT_push_nil_and_error(L);
+	}
+
+	size_t name_len;
+	const char *name = lua_tolstring(L, 1, &name_len);
+	if (name == NULL || name_len < 1) {
+		diag_set(IllegalParams, fmt, method);
+		return luaT_push_nil_and_error(L);
+	}
+
+	struct module *module = NULL;
+	if (module_reload(name, &name[name_len], &module) == 0) {
+		if (module != NULL) {
+			lua_pushboolean(L, true);
+			return 1;
+		}
+		diag_set(ClientError, ER_NO_SUCH_MODULE, name);
+	}
+	return luaT_push_nil_and_error(L);
+}
+
+/**
+ * Initialize cbox Lua module.
+ *
+ * @param L Lua state where to register the cbox module.
+ */
+void
+box_lua_cbox_init(struct lua_State *L)
+{
+	static const struct luaL_Reg cbox_methods[] = {
+		{ NULL, NULL }
+	};
+	luaL_register_module(L, "cbox", cbox_methods);
+
+	/* func table */
+	static const struct luaL_Reg func_table[] = {
+		{ "load",	lcbox_func_load },
+		{ "unload",	lcbox_func_unload },
+	};
+
+	lua_newtable(L);
+	for (size_t i = 0; i < lengthof(func_table); i++) {
+		lua_pushstring(L, func_table[i].name);
+		lua_pushcfunction(L, func_table[i].func);
+		lua_settable(L, -3);
+	}
+	lua_setfield(L, -2, "func");
+
+	/* module table */
+	static const struct luaL_Reg module_table[] = {
+		{ "reload",	lcbox_module_reload },
+	};
+
+	lua_newtable(L);
+	for (size_t i = 0; i < lengthof(module_table); i++) {
+		lua_pushstring(L, module_table[i].name);
+		lua_pushcfunction(L, module_table[i].func);
+		lua_settable(L, -3);
+	}
+	lua_setfield(L, -2, "module");
+
+	lua_pop(L, 1);
+}
+
+/**
+ * Initialize cbox module.
+ *
+ * @return 0 on success, -1 on error (diag is set).	
+ */
+int
+cbox_init(void)
+{
+	func_rb_new(&func_rb_root);
+	return 0;
+}
+
+/**
+ * Free cbox module.
+ */
+void
+cbox_free(void)
+{
+	struct cbox_func *cf = func_rb_first(&func_rb_root);
+	while (cf != NULL) {
+		func_rb_remove(&func_rb_root, cf);
+		cf = func_rb_first(&func_rb_root);
+	}
+	func_rb_new(&func_rb_root);
+}
diff --git a/src/box/lua/cbox.h b/src/box/lua/cbox.h
new file mode 100644
index 000000000..1e6cd9a9b
--- /dev/null
+++ b/src/box/lua/cbox.h
@@ -0,0 +1,39 @@
+/*
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
+ */
+
+#pragma once
+
+#if defined(__cplusplus)
+extern "C" {
+#endif /* defined(__cplusplus) */
+
+struct lua_State;
+
+/**
+ * Initialize cbox Lua module.
+ *
+ * @param L Lua state where to register the cbox module.
+ */
+void
+box_lua_cbox_init(struct lua_State *L);
+
+/**
+ * Initialize cbox module.
+ *
+ * @return 0 on success, -1 on error (diag is set).	
+ */
+int
+cbox_init(void);
+
+/**
+ * Free cbox module.
+ */
+void
+cbox_free(void);
+
+#if defined(__cplusplus)
+}
+#endif /* defined(__plusplus) */
diff --git a/src/box/lua/init.c b/src/box/lua/init.c
index d0316ef86..b37aa284a 100644
--- a/src/box/lua/init.c
+++ b/src/box/lua/init.c
@@ -61,6 +61,7 @@
 #include "box/lua/cfg.h"
 #include "box/lua/xlog.h"
 #include "box/lua/console.h"
+#include "box/lua/cbox.h"
 #include "box/lua/tuple.h"
 #include "box/lua/execute.h"
 #include "box/lua/key_def.h"
@@ -466,6 +467,7 @@ box_lua_init(struct lua_State *L)
 	box_lua_tuple_init(L);
 	box_lua_call_init(L);
 	box_lua_cfg_init(L);
+	box_lua_cbox_init(L);
 	box_lua_slab_init(L);
 	box_lua_index_init(L);
 	box_lua_space_init(L);
-- 
2.26.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Tarantool-patches] [PATCH v8 4/4] test: box/cfunc -- add simple module test
  2020-10-14 13:35 [Tarantool-patches] [PATCH v8 0/4] box/cbox: implement cfunc Lua module Cyrill Gorcunov
                   ` (2 preceding siblings ...)
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module Cyrill Gorcunov
@ 2020-10-14 13:35 ` Cyrill Gorcunov
  3 siblings, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-14 13:35 UTC (permalink / raw)
  To: tml; +Cc: Vladislav Shpilevoy

Note that the test is disabled for a while since we
need to update test-run first like

 | diff --git a/pretest_clean.lua b/pretest_clean.lua
 | index 9b5ac9d..b0280c4 100644
 | --- a/pretest_clean.lua
 | +++ b/pretest_clean.lua
 | @@ -272,6 +272,7 @@ local function clean()
 |          package = true,
 |          pickle = true,
 |          popen = true,
 | +        cbox = true,
 |          pwd = true,
 |          socket = true,
 |          strict = true,

ie need to enable cbox module.

Part-of #4642

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 test/box/CMakeLists.txt |   2 +
 test/box/cbox.result    | 134 ++++++++++++++++++++++++++++++++++++++++
 test/box/cbox.test.lua  |  51 +++++++++++++++
 test/box/cfunc1.c       |  49 +++++++++++++++
 test/box/cfunc2.c       | 112 +++++++++++++++++++++++++++++++++
 test/box/suite.ini      |   2 +-
 6 files changed, 349 insertions(+), 1 deletion(-)
 create mode 100644 test/box/cbox.result
 create mode 100644 test/box/cbox.test.lua
 create mode 100644 test/box/cfunc1.c
 create mode 100644 test/box/cfunc2.c

diff --git a/test/box/CMakeLists.txt b/test/box/CMakeLists.txt
index 06bfbbe9d..2afbeadc3 100644
--- a/test/box/CMakeLists.txt
+++ b/test/box/CMakeLists.txt
@@ -3,3 +3,5 @@ build_module(function1 function1.c)
 build_module(reload1 reload1.c)
 build_module(reload2 reload2.c)
 build_module(tuple_bench tuple_bench.c)
+build_module(cfunc1 cfunc1.c)
+build_module(cfunc2 cfunc2.c)
diff --git a/test/box/cbox.result b/test/box/cbox.result
new file mode 100644
index 000000000..234b57fff
--- /dev/null
+++ b/test/box/cbox.result
@@ -0,0 +1,134 @@
+-- test-run result file version 2
+build_path = os.getenv("BUILDDIR")
+ | ---
+ | ...
+package.cpath = build_path..'/test/box/?.so;'..build_path..'/test/box/?.dylib;'..package.cpath
+ | ---
+ | ...
+
+cbox = require('cbox')
+ | ---
+ | ...
+fio = require('fio')
+ | ---
+ | ...
+
+ext = (jit.os == "OSX" and "dylib" or "so")
+ | ---
+ | ...
+
+cfunc_path = fio.pathjoin(build_path, "test/box/cfunc.") .. ext
+ | ---
+ | ...
+cfunc1_path = fio.pathjoin(build_path, "test/box/cfunc1.") .. ext
+ | ---
+ | ...
+cfunc2_path = fio.pathjoin(build_path, "test/box/cfunc2.") .. ext
+ | ---
+ | ...
+
+_ = pcall(fio.unlink(cfunc_path))
+ | ---
+ | ...
+fio.symlink(cfunc1_path, cfunc_path)
+ | ---
+ | - true
+ | ...
+
+--
+-- They all are sitting in cfunc.so
+cfunc_nop = cbox.func.load('cfunc.cfunc_nop')
+ | ---
+ | ...
+cfunc_fetch_evens = cbox.func.load('cfunc.cfunc_fetch_evens')
+ | ---
+ | ...
+cfunc_multireturn = cbox.func.load('cfunc.cfunc_multireturn')
+ | ---
+ | ...
+cfunc_args = cbox.func.load('cfunc.cfunc_args')
+ | ---
+ | ...
+
+--
+-- Make sure they all are callable
+cfunc_nop()
+ | ---
+ | - true
+ | ...
+cfunc_fetch_evens()
+ | ---
+ | - true
+ | ...
+cfunc_multireturn()
+ | ---
+ | - true
+ | ...
+cfunc_args()
+ | ---
+ | - true
+ | ...
+
+--
+-- Clean old function references and reload a new one.
+_ = pcall(fio.unlink(cfunc_path))
+ | ---
+ | ...
+fio.symlink(cfunc2_path, cfunc_path)
+ | ---
+ | - true
+ | ...
+
+cbox.module.reload('cfunc')
+ | ---
+ | - true
+ | ...
+
+cfunc_nop()
+ | ---
+ | - true
+ | ...
+cfunc_multireturn()
+ | ---
+ | - true
+ | - [1]
+ | - [1]
+ | ...
+cfunc_fetch_evens({2,4,6})
+ | ---
+ | - true
+ | ...
+cfunc_fetch_evens({1,2,3})  -- error
+ | ---
+ | - null
+ | - invalid argument 1 != 2
+ | ...
+cfunc_args(1, "hello")
+ | ---
+ | - true
+ | - [1, 'hello']
+ | ...
+--
+-- Clean it up
+cbox.func.unload('cfunc.cfunc_nop')
+ | ---
+ | - true
+ | ...
+cbox.func.unload('cfunc.cfunc_fetch_evens')
+ | ---
+ | - true
+ | ...
+cbox.func.unload('cfunc.cfunc_multireturn')
+ | ---
+ | - true
+ | ...
+cbox.func.unload('cfunc.cfunc_args')
+ | ---
+ | - true
+ | ...
+
+--
+-- Cleanup the generated symlink
+_ = pcall(fio.unlink(cfunc_path))
+ | ---
+ | ...
diff --git a/test/box/cbox.test.lua b/test/box/cbox.test.lua
new file mode 100644
index 000000000..1228fac11
--- /dev/null
+++ b/test/box/cbox.test.lua
@@ -0,0 +1,51 @@
+build_path = os.getenv("BUILDDIR")
+package.cpath = build_path..'/test/box/?.so;'..build_path..'/test/box/?.dylib;'..package.cpath
+
+cbox = require('cbox')
+fio = require('fio')
+
+ext = (jit.os == "OSX" and "dylib" or "so")
+
+cfunc_path = fio.pathjoin(build_path, "test/box/cfunc.") .. ext
+cfunc1_path = fio.pathjoin(build_path, "test/box/cfunc1.") .. ext
+cfunc2_path = fio.pathjoin(build_path, "test/box/cfunc2.") .. ext
+
+_ = pcall(fio.unlink(cfunc_path))
+fio.symlink(cfunc1_path, cfunc_path)
+
+--
+-- They all are sitting in cfunc.so
+cfunc_nop = cbox.func.load('cfunc.cfunc_nop')
+cfunc_fetch_evens = cbox.func.load('cfunc.cfunc_fetch_evens')
+cfunc_multireturn = cbox.func.load('cfunc.cfunc_multireturn')
+cfunc_args = cbox.func.load('cfunc.cfunc_args')
+
+--
+-- Make sure they all are callable
+cfunc_nop()
+cfunc_fetch_evens()
+cfunc_multireturn()
+cfunc_args()
+
+--
+-- Clean old function references and reload a new one.
+_ = pcall(fio.unlink(cfunc_path))
+fio.symlink(cfunc2_path, cfunc_path)
+
+cbox.module.reload('cfunc')
+
+cfunc_nop()
+cfunc_multireturn()
+cfunc_fetch_evens({2,4,6})
+cfunc_fetch_evens({1,2,3})  -- error
+cfunc_args(1, "hello")
+--
+-- Clean it up
+cbox.func.unload('cfunc.cfunc_nop')
+cbox.func.unload('cfunc.cfunc_fetch_evens')
+cbox.func.unload('cfunc.cfunc_multireturn')
+cbox.func.unload('cfunc.cfunc_args')
+
+--
+-- Cleanup the generated symlink
+_ = pcall(fio.unlink(cfunc_path))
diff --git a/test/box/cfunc1.c b/test/box/cfunc1.c
new file mode 100644
index 000000000..50c979d63
--- /dev/null
+++ b/test/box/cfunc1.c
@@ -0,0 +1,49 @@
+#include <stdio.h>
+#include <stdbool.h>
+#include <msgpuck.h>
+
+#include "module.h"
+
+/*
+ * Before the reload functions are just declared
+ * and simply exit with zero.
+ *
+ * After the module reload we should provide real
+ * functionality.
+ */
+
+int
+cfunc_nop(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	(void)ctx;
+	(void)args;
+	(void)args_end;
+	return 0;
+}
+
+int
+cfunc_fetch_evens(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	(void)ctx;
+	(void)args;
+	(void)args_end;
+	return 0;
+}
+
+int
+cfunc_multireturn(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	(void)ctx;
+	(void)args;
+	(void)args_end;
+	return 0;
+}
+
+int
+cfunc_args(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	(void)ctx;
+	(void)args;
+	(void)args_end;
+	return 0;
+}
diff --git a/test/box/cfunc2.c b/test/box/cfunc2.c
new file mode 100644
index 000000000..7a4ece843
--- /dev/null
+++ b/test/box/cfunc2.c
@@ -0,0 +1,112 @@
+#include <stdio.h>
+#include <stdbool.h>
+#include <msgpuck.h>
+
+#include "module.h"
+
+/*
+ * Just make sure we've been called.
+ */
+int
+cfunc_nop(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	(void)ctx;
+	(void)args;
+	(void)args_end;
+	return 0;
+}
+
+/*
+ * Fetch first N even numbers (just to make sure the order of
+ * arguments is not screwed).
+ */
+int
+cfunc_fetch_evens(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	int arg_count = mp_decode_array(&args);
+	if (arg_count != 1) {
+		return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s",
+			"invalid argument count");
+	}
+	int field_count = mp_decode_array(&args);
+	if (field_count < 1) {
+		return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s",
+			"invalid array size");
+	}
+
+	for (int i = 0; i < field_count; i++) {
+		int val = mp_decode_uint(&args);
+		int needed = 2 * (i+1);
+		if (val != needed) {
+			char res[128];
+			snprintf(res, sizeof(res), "%s %d != %d",
+				 "invalid argument", val, needed);
+			return box_error_set(__FILE__, __LINE__,
+					     ER_PROC_C, "%s", res);
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Return one element array twice.
+ */
+int
+cfunc_multireturn(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	char tuple_buf[512];
+	char *d = tuple_buf;
+	d = mp_encode_array(d, 1);
+	d = mp_encode_uint(d, 1);
+	assert(d <= tuple_buf + sizeof(tuple_buf));
+
+	box_tuple_format_t *fmt = box_tuple_format_default();
+	box_tuple_t *tuple_a = box_tuple_new(fmt, tuple_buf, d);
+	if (tuple_a == NULL)
+		return -1;
+	int rc = box_return_tuple(ctx, tuple_a);
+	if (rc != 0)
+		return rc;
+	return box_return_tuple(ctx, tuple_a);
+}
+
+/*
+ * Encode int + string pair back.
+ */
+int
+cfunc_args(box_function_ctx_t *ctx, const char *args, const char *args_end)
+{
+	uint32_t arg_count = mp_decode_array(&args);
+	if (arg_count != 2) {
+		return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s",
+			"invalid argument count");
+	}
+
+	if (mp_typeof(*args) != MP_UINT) {
+		return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s",
+			"tuple field must be uint");
+	}
+	uint32_t num = mp_decode_uint(&args);
+
+	if (mp_typeof(*args) != MP_STR) {
+		return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s",
+			"tuple field must be string");
+	}
+	const char *str = args;
+	uint32_t len = mp_decode_strl(&str);
+
+	char tuple_buf[512];
+	char *d = tuple_buf;
+	d = mp_encode_array(d, 2);
+	d = mp_encode_uint(d, num);
+	d = mp_encode_str(d, str, len);
+	assert(d <= tuple_buf + sizeof(tuple_buf));
+
+	box_tuple_format_t *fmt = box_tuple_format_default();
+	box_tuple_t *tuple = box_tuple_new(fmt, tuple_buf, d);
+	if (tuple == NULL)
+		return -1;
+
+	return box_return_tuple(ctx, tuple);
+}
diff --git a/test/box/suite.ini b/test/box/suite.ini
index 793b8bf76..95fa6e660 100644
--- a/test/box/suite.ini
+++ b/test/box/suite.ini
@@ -2,7 +2,7 @@
 core = tarantool
 description = Database tests
 script = box.lua
-disabled = rtree_errinj.test.lua tuple_bench.test.lua
+disabled = rtree_errinj.test.lua tuple_bench.test.lua cbox.test.lua
 long_run = huge_field_map_long.test.lua
 config = engine.cfg
 release_disabled = errinj.test.lua errinj_index.test.lua rtree_errinj.test.lua upsert_errinj.test.lua iproto_stress.test.lua gh-4648-func-load-unload.test.lua
-- 
2.26.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure Cyrill Gorcunov
@ 2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-30  9:51     ` Cyrill Gorcunov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-29 22:15 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

See 3 comments below.

On 14.10.2020 15:35, Cyrill Gorcunov wrote:
> Currently func_c structure is a wrapper over struct func
> which in turn handles function definition, credentials
> and etc, mostly to handle internal "_func" space.
> 
> Such tight bound doesn't allow to reuse func_c in any
> other context. But we're going to support C function
> execution in read-only instances and for this we better
> reuse already existing structures as much as possible
> instead of breeding new ones.
> 
> Thus lets extract module symbols into module_sym structure,
> this allows us to reuse module path cache in other code.
> 
> While extracting the structure rename "func" member to
> "addr" since this exactly what the member represents:
> an address of symbol in memory.
> 
> Same time to be completely independent of "_func" specific lets
> carry symbolic name inplace (ie member "name" is kind of redundant
> when module_sym is used for "_func" handling where we can retrieve
> the name via function definition but such definition won't be
> available for non-persistent C functions which we will support
> in next patches).
> 
> The new structure allows us to load/unload general symbols via
> module_sym_load() and module_sym_unload() helpers.
> 
> Part-of #4642
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> ---
>  src/box/func.c | 156 +++++++++++++++++++++++--------------------------
>  src/box/func.h |  43 ++++++++++++++
>  2 files changed, 116 insertions(+), 83 deletions(-)
> 
> diff --git a/src/box/func.c b/src/box/func.c
> index 8087c953f..75d2abb5f 100644
> --- a/src/box/func.c
> +++ b/src/box/func.c
> @@ -371,6 +361,53 @@ module_sym(struct module *module, const char *name)
>  	return f;
>  }
>  
> +int
> +module_sym_load(struct module_sym *mod_sym)
> +{
> +	assert(mod_sym->addr == NULL);
> +
> +	struct func_name name;
> +	func_split_name(mod_sym->name, &name);
> +
> +	struct module *module = module_cache_find(name.package,
> +						  name.package_end);
> +	if (module == NULL) {
> +		/* Try to find loaded module in the cache */
> +		module = module_load(name.package, name.package_end);
> +		if (module == NULL)
> +			return -1;
> +		if (module_cache_put(module)) {

1. Please, use explicit != 0.
https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style

> +			module_delete(module);
> +			return -1;
> +		}
> +	}
> +
> +	mod_sym->addr = module_sym(module, name.sym);
> +	if (mod_sym->addr == NULL)
> +		return -1;

2. If the module was loaded first time here, it is not unloaded in case of
an error in this place.

> +	mod_sym->module = module;
> +	rlist_add(&module->funcs, &mod_sym->item);
> +	return 0;
> +}
> diff --git a/src/box/func.h b/src/box/func.h
> index 581e468cb..acf2df9a9 100644
> --- a/src/box/func.h
> +++ b/src/box/func.h
> @@ -58,6 +58,29 @@ struct module {
>  	char package[0];
>  };
>  
> +/**
> + * Callable symbol bound to a module.
> + */
> +struct module_sym {
> +	/**
> +	 * Anchor for module membership.
> +	 */
> +	struct rlist item;
> +	/**
> +	 * For C functions, address of the function.
> +	 */
> +	box_function_f addr;
> +	/**
> +	 * Each stored function keeps a handle to the
> +	 * dynamic library for the C callback.
> +	 */

3. Can't parse the comment. What is the 'C callback'?
And why is this function stored? After you extracted it
from struct func_c, it is not related to _func space, and
is not stored.

> +	struct module *module;
> +	/**
> +	 * Function name definition.
> +	 */
> +	char *name;
> +};

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem Cyrill Gorcunov
@ 2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-30 10:15     ` Cyrill Gorcunov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-29 22:15 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Hi! Thanks for the patch!

See 5 comments below.

> diff --git a/src/box/module_cache.c b/src/box/module_cache.c
> new file mode 100644
> index 000000000..a44f3d110
> --- /dev/null
> +++ b/src/box/module_cache.c
> @@ -0,0 +1,503 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +
> +#include <dlfcn.h>
> +#include <fcntl.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include "assoc.h"
> +#include "diag.h"
> +#include "error.h"
> +#include "errinj.h"
> +#include "fiber.h"
> +#include "port.h"
> +
> +#include "box/error.h"
> +#include "lua/utils.h"
> +#include "libeio/eio.h"
> +
> +#include "module_cache.h"
> +
> +/** Modules name to descriptor hash. */
> +static struct mh_strnptr_t *mod_hash = NULL;
> +
> +/**
> + * Parse function descriptor from a string.
> + */
> +void
> +parse_func_name(const char *str, struct func_name_desc *d)

1. The function is not used out of this file. Please, make it
static.

> +{
> +	d->package = str;
> +	d->package_end = strrchr(str, '.');
> +	if (d->package_end != NULL) {
> +		/* module.submodule.function => module.submodule, function */
> +		d->sym = d->package_end + 1; /* skip '.' */
> +	} else {
> +		/* package == function => function, function */
> +		d->sym = d->package;
> +		d->package_end = str + strlen(str);
> +	}
> +}
> +
> +/**
> + * A cpcall() helper for module_find().
> + */
> +static int
> +luaT_module_find(lua_State *L)
> +{
> +	struct module_find_ctx *ctx = (void *)lua_topointer(L, 1);
> +
> +	/*
> +	 * Call package.searchpath(name, package.cpath) and use
> +	 * the path to the function in dlopen().
> +	 */
> +	lua_getglobal(L, "package");
> +	lua_getfield(L, -1, "search");
> +
> +	/* Argument of search: name */
> +	lua_pushlstring(L, ctx->package, ctx->package_end - ctx->package);
> +
> +	lua_call(L, 1, 1);
> +	if (lua_isnil(L, -1))
> +		return luaL_error(L, "module not found");
> +
> +	/* Convert path to absolute */
> +	char resolved[PATH_MAX];
> +	if (realpath(lua_tostring(L, -1), resolved) == NULL) {
> +		diag_set(SystemError, "realpath");
> +		return luaT_error(L);
> +	}
> +
> +	snprintf(ctx->path, ctx->path_len, "%s", resolved);
> +	return 0;
> +}
> +
> +/**
> + * Find a path to a module using Lua's package.cpath.
> + *
> + * @param ctx module find context
> + * @param[out] ctx->path path to shared library.
> + * @param[out] ctx->path_len size of @a ctx->path buffer.

2. The last two lines are not params. The only parameter is ctx. If
you want to describe it, it should be done under the same @param.
Otherwise the comment looks mismatching the signature. I gonna say
that again. I don't force to use Doxygen, but if you decide to use it,
it should be done correctly.

> + *
> + * @return 0 on succes, -1 otherwise, diag is set.
> + */
> +static int
> +module_find(struct module_find_ctx *ctx)
> diff --git a/src/box/module_cache.h b/src/box/module_cache.h
> new file mode 100644
> index 000000000..3f275024a
> --- /dev/null
> +++ b/src/box/module_cache.h
> @@ -0,0 +1,156 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +
> +#pragma once
> +
> +#include <small/rlist.h>
> +
> +#include "func_def.h"

3. Func_def heavily depends on schema. Also it is a part of
all the _func handling, meaning this is a cyclic dependency now -
module_cache depends on func and vice versa.

> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif /* defined(__cplusplus) */
> +
> +/**
> + * Function name descriptor: a symbol and a package.
> + */
> +struct func_name_desc {
> +	/**
> +	 * Null-terminated symbol name, e.g.
> +	 * "func" for "mod.submod.func".
> +	 */
> +	const char *sym;
> +	/**
> +	 * Package name, e.g. "mod.submod" for
> +	 * "mod.submod.func".
> +	 */
> +	const char *package;
> +	/**
> +	 * A pointer to the last character in ->package + 1.
> +	 */
> +	const char *package_end;
> +};
> +
> +/***
> + * Parse function name into a name descriptor.
> + * 

4. Trailing whitespace.

> + * For example, str = foo.bar.baz => sym = baz, package = foo.bar
> + *
> + * @param str function name, e.g. "module.submodule.function".
> + * @param[out] d parsed symbol and a package name.
> + */
> +void
> +parse_func_name(const char *str, struct func_name_desc *d);
> +
> +/**
> + * Load a new module symbol.
> + *
> + * @param mod_sym symbol to load.
> + *
> + * @returns 0 on succse, -1 otherwise, diag is set.
> + */

5. I see you added comments for each function twice - in its
declaration and implementation. Please, don't. Such double
comments usually quicky get desynchronized. Even now they are
not the same. We always comment functions on their declaration,
if it is separated from implementation.

The same for the other commits and other places in this commit.

> +int
> +module_sym_load(struct module_sym *mod_sym);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module Cyrill Gorcunov
@ 2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-30 12:51     ` Cyrill Gorcunov
  2020-11-12 22:54     ` Vladislav Shpilevoy
  0 siblings, 2 replies; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-29 22:15 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

Thanks for the patch!

The module leaks somewhere. It never unloads the shared file. I
tried this:

	cbox = require('cbox')
	f = cbox.func.load('function1.test_push')
	f()
	f = nil
	cbox.func.unload('function1.test_push')
	collectgarbage('collect')

Then I use 'lsof -p' to see what files are loaded, and I see
function1.dylib here. When I do all the same using _func space
and box.func.call, the file is unloaded correctly.

(function1.dylib I took from test/box/.)

See 14 comments below. I didn't review the last commit yet due
to too many comments here. Will do after we fix everything here.

On 14.10.2020 15:35, Cyrill Gorcunov wrote:
> Currently to run "C" function from some external module
> one have to register it first in "_func" system space. This
> is a problem if node is in read-only mode (replica).
> 
> Still people would like to have a way to run such functions
> even in ro mode. For this sake we implement "cbox" lua module.
> 
> Fixes #4692
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> 
> @TarantoolBot document
> Title: cbox module
> 
> Overview
> ========
> 
> `cbox` module provides a way to create, delete and execute
> `C` procedures. Unlinke `box.schema.func` functionality this
> the functions created with `cbox` help are not persistent and
> live purely in memory. Once a node get turned off they are
> vanished. An initial purpose for them is to execute them on
> nodes which are running in read-only mode.
> 
> Module functions
> ================
> 
> `cbox.func.load([dso.]name) -> obj | nil, err`
> ----------------------------------------------
> 
> Loads a new function with name `name` from module `dso.`.
> The module name is optional and if not provided implies
> to be the same as `name`.
> 
> The `load` call must be paired with `unload` and these
> calls are accounted. Until coupled `unload` is called
> the instance is present in memory. Any `load` calls
> followed by another `load` with same name simply
> increase a reference to the existing function.
> 
> Possible errors:
>  - IllegalParams: function name is either not supplied
>    or not a string.
>  - IllegalParams: function name is too long.
>  - IllegalParams: function references limit exceeded.
>  - OutOfMemory: unable to allocate a function.
> 
> On success a new callable object is returned,
> otherwise `nil, error` pair.
> 
> Example:
> 
> ``` Lua
> f, err = require('cbox').func.load('func')
> if not f then
>     print(err)
> end
> ```
> 
> Once function is loaded it is possible to execute it
> in a traditional Lua way, ie to call it as a function.
> 
> ``` Lua
> -- call with agruments arg1 and arg2

1. agruments -> arguments.

> f(arg1, arg2)
> ```
> 
> `cbox.func.unload([dso.]name) -> true | nil, err`
> -------------------------------------------------
> 
> Unload a function with name `[dso.]name`. Since function
> instances are accounted the function is not unloaded until
> number of `unload` calls matched to the number of `load`
> calls.
> 
> Possible errors:
>  - IllegalParams: function name is either not supplied
>    or not a string.
>  - IllegalParams: the function does not exist.
> 
> On success `true` is returned, otherwise `nil, error` pair.
> 
> Example:
> 
> ``` Lua
> ok, err = require('cbox').func.unload('func')
> if not ok then
>     print(err)
> end
> ```
> 
> `cbox.module.reload(name) -> true | nil, err`
> ---------------------------------------------
> 
> Reloads a module with name `name` and all functions which
> were associated for the module. Each module keeps a list of
> functions belonging to the module and reload procedure cause
> the bound function to update their addresses such that
> function execution will be routed via a new library.
> 
> Modules are loaded with that named local binding which means
> that reload of module symbols won't affect the functions which
> are started execution already, only new calls will be rerouted.
> 
> Possible errors:
>  - IllegalParams: module name is either not supplied
>    or not a string.
>  - ClientError: a module with the name provided does
>    not exist.
> 
> On success `true` is returned, otherwise `nil,error` pair.
> 
> Example:
> 
> ``` Lua
> ok, err = require('cbox').module.reload('func')
> if not ok then
>     print(err)
> end
> ```

2. In the previous review I said:

	3. It seems the function invocation itself is not documented?

You answered:

	Good point, seems this would be useful. I though 'cause this
	is mostly an alias for "_func" space it would be obvious. But
	indeed without real example the doc looks incomplete. Thanks!

Tbh, I don't see any difference regarding this. It is
still not documented how to call a function after you
loaded it, and what does it return.

> ---
>  src/box/CMakeLists.txt |   1 +
>  src/box/box.cc         |   5 +
>  src/box/lua/cbox.c     | 486 +++++++++++++++++++++++++++++++++++++++++
>  src/box/lua/cbox.h     |  39 ++++
>  src/box/lua/init.c     |   2 +
>  5 files changed, 533 insertions(+)
>  create mode 100644 src/box/lua/cbox.c
>  create mode 100644 src/box/lua/cbox.h
> 
> diff --git a/src/box/box.cc b/src/box/box.cc
> index 2485b79f3..f20761e8f 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -75,6 +75,7 @@
>  #include "systemd.h"
>  #include "call.h"
>  #include "module_cache.h"
> +#include "lua/cbox.h"
>  #include "sequence.h"
>  #include "sql_stmt_cache.h"
>  #include "msgpack.h"
> @@ -2246,6 +2247,7 @@ box_free(void)
>  		tuple_free();
>  		port_free();
>  #endif
> +		cbox_free();

3. box_init and box_free are not for Lua modules. Please,
use box_lua_init(). Also I don't unserstand, why do you have two
cbox init functions. If it is a Lua module, it should have lua init
function and be in lua/ folder. If it is a C module, it should have
a usual init function and not be in a lua/ folder. Not both. See
examples such as tuple and error modules. They have separate Lua
and C part. In your case I don't understand why would you need a
separate C part. Cbox is entirely for Lua. We don't need cbox in C.
So for your case box_lua_init should be enough.

>  		iproto_free();
>  		replication_free();
>  		sequence_free();
> @@ -2647,6 +2649,9 @@ box_init(void)
>  	if (module_init() != 0)
>  		diag_raise();
>  
> +	if (cbox_init() != 0)
> +		diag_raise();
> +
>  	if (tuple_init(lua_hash) != 0)
>  		diag_raise();
>  
> diff --git a/src/box/lua/cbox.c b/src/box/lua/cbox.c
> new file mode 100644
> index 000000000..0d8208024
> --- /dev/null
> +++ b/src/box/lua/cbox.c
> @@ -0,0 +1,486 @@
> +/*
> + * SPDX-License-Identifier: BSD-2-Clause
> + *
> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> + */
> +
> +#include <string.h>
> +#include <lua.h>
> +
> +#define RB_COMPACT 1
> +#include <small/rb.h>
> +
> +#include "diag.h"
> +
> +#include "box/module_cache.h"
> +#include "box/error.h"
> +#include "box/port.h"
> +
> +#include "trivia/util.h"
> +#include "lua/utils.h"
> +
> +/**
> + * Function descriptor.
> + */
> +struct cbox_func {
> +	/**
> +	 * Gather functions into rbtree.
> +	 */
> +	rb_node(struct cbox_func) nd;

4. Why rbtree? Search in the tree is going to be slower than in
the hash. Hash would be perfect here, because all the searches
are always by ==, not < or >, no iteration. Only lookup.

In the cover letter you said

	"i don't like the idea of unexpected rehashing of
	values in case of massive number of functions
	allocated"

but AFAIK, according to our benches, when a hash is used as an index,
it is always faster than any tree on lookups. Although I don't
remember benches hash vs tree out of an index. Since your are doing
such dubious "optimizations", is there a bench you can show, that
a tree is faster than a hash?

> +
> +	/**
> +	 * Symbol descriptor for the function in
> +	 * an associated module.
> +	 */
> +	struct module_sym mod_sym;
> +
> +	/**
> +	 * Number of references to the function
> +	 * instance.
> +	 */
> +	ssize_t ref;

5. Why is it ssize_t? Is there any single place, where we use ssize_t
for reference counting? Why ssize_t, if it is has nothing to do with
size of anything?

Also it probably would be better to name it 'load_count'. Because
to the end of the patch I was thinking you decided to count Lua refs
too somewhy, and realized it is load count only afterwards, when
started writing the comments.

7. Why are there so many extra empty lines after each member? I am
going to ask again you not to invent a new code style on each next
patch you send.

> +	/** Function name. */
> +	const char *name;
> +
> +	/** Function name length. */
> +	size_t name_len;
> +
> +	/** Function name keeper. */
> +	char inplace[0];> +};
> +
> +/**
> + * A tree to lookup functions by name.
> + */
> +typedef rb_tree(struct cbox_func) func_rb_t;
> +static func_rb_t func_rb_root;
> +
> +static int
> +cbox_func_cmp(const struct cbox_func *a, const struct cbox_func *b)> +{
> +	ssize_t len = (ssize_t)a->name_len - (ssize_t)b->name_len;
> +	if (len == 0)
> +		return strcmp(a->name, b->name);
> +	return len < 0 ? -1 : 1;
> +}
> +
> +rb_gen(MAYBE_UNUSED static, func_rb_, func_rb_t,
> +       struct cbox_func, nd, cbox_func_cmp);
> +
> +/**
> + * Find function in a tree.
> + */
> +struct cbox_func *
> +cbox_func_find(const char *name, size_t name_len)

8. It should be static.

> +{
> +	struct cbox_func v = {
> +		.name		= name,
> +		.name_len	= name_len,
> +	};
> +	return func_rb_search(&func_rb_root, &v);
> +}
> +
> +/**
> + * Unreference a function instance.
> + */
> +static void
> +cbox_func_unref(struct cbox_func *cf)
> +{
> +	assert(cf->ref > 0);
> +	if (cf->ref-- == 1)
> +		func_rb_remove(&func_rb_root, cf);
> +}
> +
> +/**
> + * Reference a function instance.
> + */
> +static bool
> +cbox_func_ref(struct cbox_func *cf)
> +{
> +	assert(cf->ref >= 0);
> +
> +	/*
> +	 * Hardly to ever happen but just
> +	 * to make sure.

9. We already discussed it a ton of times, literally exactly the
same 'problem' with struct error objects. A 64 bit integer
will not overflow for your lifetime even if it would increment
every nanosecond. How is it possible? Why the code should get
complicated by making a simple 'ref' function be able to fail?

> +	 */
> +	if (cf->ref == SSIZE_MAX) {
> +		const char *fmt =
> +			"Too many function references (max %zd)";
> +		diag_set(IllegalParams, fmt, SSIZE_MAX);
> +		return false;
> +	}
> +
> +	if (cf->ref++ == 0)
> +		func_rb_insert(&func_rb_root, cf);
> +
> +	return true;
> +}
> +
> +/**
> + * Allocate a new function instance.
> + */
> +static struct cbox_func *
> +cbox_func_new(const char *name, size_t name_len)
> +{
> +	const ssize_t cf_size = sizeof(struct cbox_func);
> +	ssize_t size = cf_size + name_len + 1;
> +	if (size < 0) {

10. If this is intended to check for an overflow, it does not work.
If I will pass name_len = UINT64_MAX (on my machine size_t == 8 bytes),
then UINT64_MAX + 1 = 0, you will allocate only sizeof(), and then
will do memcpy for UINT64_MAX below.

I would suggest not to get crazy with such checks. You can't cover
everything, and it is not needed.

> +		const size_t max_len = SSIZE_MAX - cf_size - 1;
> +		const char *fmt = "Function name is too long (max %zd)";
> +		diag_set(IllegalParams, fmt, max_len);
> +		return NULL;
> +	}
> +
> +	struct cbox_func *cf = malloc(size);
> +	if (cf == NULL) {
> +		diag_set(OutOfMemory, size, "malloc", "cf");
> +		return NULL;
> +	}
> +
> +	cf->mod_sym.addr	= NULL;
> +	cf->mod_sym.module	= NULL;
> +	cf->ref			= 0;
> +	cf->mod_sym.name	= cf->inplace;
> +	cf->name		= cf->inplace;
> +	cf->name_len		= name_len;
> +
> +	memcpy(cf->inplace, name, name_len);
> +	cf->inplace[name_len] = '\0';
> +
> +	memset(&cf->nd, 0, sizeof(cf->nd));
> +	return cf;
> +}
> +
> +/**
> + * Load a new function.
> + *
> + * This function takes a function name from the caller
> + * stack @a L and creates a new function object. If
> + * the function is already loaded we simply return
> + * a reference to existing one.
> + *
> + * Possible errors:
> + *
> + * - IllegalParams: function name is either not supplied
> + *   or not a string.
> + * - IllegalParams: function references limit exceeded.
> + * - OutOfMemory: unable to allocate a function.
> + *
> + * @returns function object on success or {nil,error} on error,
> + * the error is set to the diagnostics area.
> + */
> +static int
> +lcbox_func_load(struct lua_State *L)
> +{
> +	const char *method = "cbox.func.load";
> +	struct cbox_func *cf = NULL;
> +
> +	if (lua_gettop(L) != 1 || !lua_isstring(L, 1)) {
> +		const char *fmt =
> +			"Expects %s(\'name\') but no name passed";
> +		diag_set(IllegalParams, fmt, method);
> +		return luaT_push_nil_and_error(L);
> +	}
> +
> +	size_t name_len;
> +	const char *name = lua_tolstring(L, 1, &name_len);
> +
> +	cf = cbox_func_find(name, name_len);
> +	if (cf == NULL) {
> +		cf = cbox_func_new(name, name_len);
> +		if (cf == NULL)
> +			return luaT_push_nil_and_error(L);
> +	}
> +	if (!cbox_func_ref(cf))
> +		return luaT_push_nil_and_error(L);

11. 'cf' leaks if it was just created a few lines above.

> +
> +	lua_newtable(L);
> +
> +	lua_pushstring(L, "name");
> +	lua_pushstring(L, cf->name);
> +	lua_settable(L, -3);
> +
> +	lua_newtable(L);
> +
> +	/*
> +	 * A new variable should be callable for
> +	 * convenient use in Lua.
> +	 */
> +	lua_pushstring(L, "__call");
> +	lua_pushcfunction(L, lcbox_func_call);
> +	lua_settable(L, -3);> +
> +	/*
> +	 * We will release the memory associated
> +	 * with the objet if only no active refs
> +	 * are left.
> +	 */
> +	lua_pushstring(L, "__gc");
> +	lua_pushcfunction(L, lcbox_func_gc);
> +	lua_settable(L, -3);
> +
> +	/*
> +	 * Carry the pointer to the function so
> +	 * we won't need to run a lookup when
> +	 * calling.
> +	 */
> +	lua_pushstring(L, "__cbox_func");
> +	*(struct cbox_func **)lua_newuserdata(L, sizeof(cf)) = cf;
> +	lua_settable(L, -3);

12. I can assure you, that creation of all these Lua GC objects:
C functions, metatables, is going to cost much much more than the
optimizations you tried to do with the tree vs hash. Order of
magnitude slower. I suggest you to look at lua/tuple.c to check
how to avoid creation of a new metatable on each function object.

A single metatable object should be shared by all cbox function
objects to reduce the GC pressure.

Also you don't need to push a table with __cbox_func field in it.
You can push userdata right away and make it look like a proper Lua
object. See luaT_pusherror(), luaT_pushtuple().

> +
> +	lua_setmetatable(L, -2);
> +	return 1;
> +}
> +
> +/**
> + * Initialize cbox Lua module.
> + *
> + * @param L Lua state where to register the cbox module.
> + */
> +void
> +box_lua_cbox_init(struct lua_State *L)
> +{
> +	static const struct luaL_Reg cbox_methods[] = {
> +		{ NULL, NULL }
> +	};
> +	luaL_register_module(L, "cbox", cbox_methods);
> +
> +	/* func table */
> +	static const struct luaL_Reg func_table[] = {
> +		{ "load",	lcbox_func_load },
> +		{ "unload",	lcbox_func_unload },
> +	};
> +
> +	lua_newtable(L);
> +	for (size_t i = 0; i < lengthof(func_table); i++) {
> +		lua_pushstring(L, func_table[i].name);
> +		lua_pushcfunction(L, func_table[i].func);
> +		lua_settable(L, -3);
> +	}
> +	lua_setfield(L, -2, "func");
> +
> +	/* module table */
> +	static const struct luaL_Reg module_table[] = {
> +		{ "reload",	lcbox_module_reload },
> +	};
> +
> +	lua_newtable(L);
> +	for (size_t i = 0; i < lengthof(module_table); i++) {
> +		lua_pushstring(L, module_table[i].name);
> +		lua_pushcfunction(L, module_table[i].func);
> +		lua_settable(L, -3);
> +	}
> +	lua_setfield(L, -2, "module");

13. Why couldn't you simply use luaL_register() instead of these
cycles?

> +
> +	lua_pop(L, 1);
> +}
> +
> +/**
> + * Initialize cbox module.
> + *
> + * @return 0 on success, -1 on error (diag is set).	

14. Trailing tab. In the header file too.

> + */
> +int
> +cbox_init(void)
> +{
> +	func_rb_new(&func_rb_root);
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure
  2020-10-29 22:15   ` Vladislav Shpilevoy
@ 2020-10-30  9:51     ` Cyrill Gorcunov
  2020-10-31  0:13       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-30  9:51 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Thu, Oct 29, 2020 at 11:15:51PM +0100, Vladislav Shpilevoy wrote:
...
> > +int
> > +module_sym_load(struct module_sym *mod_sym)
> > +{
> > +	assert(mod_sym->addr == NULL);
> > +
> > +	struct func_name name;
> > +	func_split_name(mod_sym->name, &name);
> > +
> > +	struct module *module = module_cache_find(name.package,
> > +						  name.package_end);
> > +	if (module == NULL) {
> > +		/* Try to find loaded module in the cache */
> > +		module = module_load(name.package, name.package_end);
> > +		if (module == NULL)
> > +			return -1;
> > +		if (module_cache_put(module)) {
> 
> 1. Please, use explicit != 0.
> https://github.com/tarantool/tarantool/wiki/Code-review-procedure#code-style

OK

> 
> > +			module_delete(module);
> > +			return -1;
> > +		}
> > +	}
> > +
> > +	mod_sym->addr = module_sym(module, name.sym);
> > +	if (mod_sym->addr == NULL)
> > +		return -1;
> 
> 2. If the module was loaded first time here, it is not unloaded in case of
> an error in this place.

Just like it was before the patch. The patch simply factor outs the old
code. It doesn't improve it (because, lets be honest this is a min problem
for module management -- we've to check for module symbol not at moment
of calling it but rather at the moment when we load a function). That
said the issue with module management is known and I think we need to
rework modules code more deeply, but not in this series. In the
series it remains exactly as it was.

> >  
> > +/**
> > + * Callable symbol bound to a module.
> > + */
> > +struct module_sym {
> > +	/**
> > +	 * Anchor for module membership.
> > +	 */
> > +	struct rlist item;
> > +	/**
> > +	 * For C functions, address of the function.
> > +	 */
> > +	box_function_f addr;
> > +	/**
> > +	 * Each stored function keeps a handle to the
> > +	 * dynamic library for the C callback.
> > +	 */
> 
> 3. Can't parse the comment. What is the 'C callback'?
> And why is this function stored? After you extracted it
> from struct func_c, it is not related to _func space, and
> is not stored.

It is stored in memory. The C callback is the function we
call, so we keep a pointer to a module. If shuch comment
confuses lets change it to something like "Each function
keeps a handle to the module where the symbol relies"?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem
  2020-10-29 22:15   ` Vladislav Shpilevoy
@ 2020-10-30 10:15     ` Cyrill Gorcunov
  2020-10-31  0:15       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-30 10:15 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Thu, Oct 29, 2020 at 11:15:54PM +0100, Vladislav Shpilevoy wrote:
...
> > +
> > +/**
> > + * Parse function descriptor from a string.
> > + */
> > +void
> > +parse_func_name(const char *str, struct func_name_desc *d)
> 
> 1. The function is not used out of this file. Please, make it
> static.

Yup, thanks!

> > +
> > +/**
> > + * Find a path to a module using Lua's package.cpath.
> > + *
> > + * @param ctx module find context
> > + * @param[out] ctx->path path to shared library.
> > + * @param[out] ctx->path_len size of @a ctx->path buffer.
> 
> 2. The last two lines are not params. The only parameter is ctx. If
> you want to describe it, it should be done under the same @param.
> Otherwise the comment looks mismatching the signature. I gonna say
> that again. I don't force to use Doxygen, but if you decide to use it,
> it should be done correctly.

Well, I didnt find any reference for the correct way to describe such
updates of internal structures. I think I better drop this doxy-style
comment.

> 
> > + *
> > + * @return 0 on succes, -1 otherwise, diag is set.
> > + */
> > +static int
> > +module_find(struct module_find_ctx *ctx)
> > diff --git a/src/box/module_cache.h b/src/box/module_cache.h
> > new file mode 100644
> > index 000000000..3f275024a
> > --- /dev/null
> > +++ b/src/box/module_cache.h
> > @@ -0,0 +1,156 @@
> > +/*
> > + * SPDX-License-Identifier: BSD-2-Clause
> > + *
> > + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
> > + */
> > +
> > +#pragma once
> > +
> > +#include <small/rlist.h>
> > +
> > +#include "func_def.h"
> 
> 3. Func_def heavily depends on schema. Also it is a part of
> all the _func handling, meaning this is a cyclic dependency now -
> module_cache depends on func and vice versa.

Nope. Where you get it? func_def.h is not touched in this series
at all. What we have now

module_cache.h: func_def.h
func.h: module_cache.h func_def.h
func_def.h: trivia/util.h field_def.h opt_def.h

I don't understand which cyclic dependency you mean.

> > +
> > +/***
> > + * Parse function name into a name descriptor.
> > + * 
> 
> 4. Trailing whitespace.

+1

> > +/**
> > + * Load a new module symbol.
> > + *
> > + * @param mod_sym symbol to load.
> > + *
> > + * @returns 0 on succse, -1 otherwise, diag is set.
> > + */
> 
> 5. I see you added comments for each function twice - in its
> declaration and implementation. Please, don't. Such double
> comments usually quicky get desynchronized. Even now they are
> not the same. We always comment functions on their declaration,
> if it is separated from implementation.
> 
> The same for the other commits and other places in this commit.

OK

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-29 22:15   ` Vladislav Shpilevoy
@ 2020-10-30 12:51     ` Cyrill Gorcunov
  2020-10-31  0:21       ` Vladislav Shpilevoy
  2020-11-12 22:54     ` Vladislav Shpilevoy
  1 sibling, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-30 12:51 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Thu, Oct 29, 2020 at 11:15:58PM +0100, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> The module leaks somewhere. It never unloads the shared file. I
> tried this:
> 
> 	cbox = require('cbox')
> 	f = cbox.func.load('function1.test_push')
> 	f()
> 	f = nil
> 	cbox.func.unload('function1.test_push')
> 	collectgarbage('collect')
> 
> Then I use 'lsof -p' to see what files are loaded, and I see
> function1.dylib here. When I do all the same using _func space
> and box.func.call, the file is unloaded correctly.
> 
> (function1.dylib I took from test/box/.)

I'll try to repeat. That;s interesting, actually I've been trying
to force gc to trigger but without much luck. Letme investigate
this.

> > 
> > ``` Lua
> > -- call with agruments arg1 and arg2
> 
> 1. agruments -> arguments.

Thanks! I managed to miss this typo while been running aspell.

> > 
> > ``` Lua
> > ok, err = require('cbox').module.reload('func')
> > if not ok then
> >     print(err)
> > end
> > ```
> 
> 2. In the previous review I said:
> 
> 	3. It seems the function invocation itself is not documented?
> 
> You answered:
> 
> 	Good point, seems this would be useful. I though 'cause this
> 	is mostly an alias for "_func" space it would be obvious. But
> 	indeed without real example the doc looks incomplete. Thanks!
> 
> Tbh, I don't see any difference regarding this. It is
> still not documented how to call a function after you
> loaded it, and what does it return.

Here is a paragraph above
---
    Once function is loaded it is possible to execute it
    in a traditional Lua way, ie to call it as a function.

    ``` Lua
    -- call with agruments arg1 and arg2
    f(arg1, arg2)
    ```
---

I'm not sure what else we could put here?

> > 
> > diff --git a/src/box/box.cc b/src/box/box.cc
> > index 2485b79f3..f20761e8f 100644
> > --- a/src/box/box.cc
> > +++ b/src/box/box.cc
> > @@ -75,6 +75,7 @@
> >  #include "systemd.h"
> >  #include "call.h"
> >  #include "module_cache.h"
> > +#include "lua/cbox.h"
> >  #include "sequence.h"
> >  #include "sql_stmt_cache.h"
> >  #include "msgpack.h"
> > @@ -2246,6 +2247,7 @@ box_free(void)
> >  		tuple_free();
> >  		port_free();
> >  #endif
> > +		cbox_free();
> 
> 3. box_init and box_free are not for Lua modules. Please,
> use box_lua_init(). Also I don't unserstand, why do you have two
> cbox init functions. If it is a Lua module, it should have lua init
> function and be in lua/ folder. If it is a C module, it should have
> a usual init function and not be in a lua/ folder. Not both. See
> examples such as tuple and error modules. They have separate Lua
> and C part. In your case I don't understand why would you need a
> separate C part. Cbox is entirely for Lua. We don't need cbox in C.
> So for your case box_lua_init should be enough.

cbox engine uses rbtree to keep functions in memory and we have
to initialize it first and clean up it on exit. The lua init provides
Lua's interface over the engine. I can move initialization to box_lua_init
but where to clean things up?! Please look into popen module -- it does
exactly the same: there is C part and Lua parts.

I'll read tuple and error modules maybe I miss something obvious. Thanks!

> > +/**
> > + * Function descriptor.
> > + */
> > +struct cbox_func {
> > +	/**
> > +	 * Gather functions into rbtree.
> > +	 */
> > +	rb_node(struct cbox_func) nd;
> 
> 4. Why rbtree? Search in the tree is going to be slower than in
> the hash. Hash would be perfect here, because all the searches
> are always by ==, not < or >, no iteration. Only lookup.

And lookup in our hash table causes iterations as well in case
of collisions.

> In the cover letter you said
> 
> 	"i don't like the idea of unexpected rehashing of
> 	values in case of massive number of functions
> 	allocated"
> 
> but AFAIK, according to our benches, when a hash is used as an index,
> it is always faster than any tree on lookups. Although I don't
> remember benches hash vs tree out of an index. Since your are doing
> such dubious "optimizations", is there a bench you can show, that
> a tree is faster than a hash?

The main reason for rbtree is the memory. Initially I used our mhash
engine but I don't expect there be that many functions which allows
me to not allocate such big structure as a hash table, instead I only
need two pointers for lookup (ie 16 bytes per entry).

Do you really think it worth it to allocate hash table instead? I'm ready
to make lookup a bit slower just to eliminate redundant memory overhead.

I expect at worst case to have up to 1000 functions which may require up
to 10 attempts.  I hate (with a passon) idea that in a middle of inserting
new key our hash engine start a rehashing procedure.

And note that lookup happens only _once_ when you loading a new function,
we keep the pointer later and calling function goes directly without any
lookups. So I think keeping low memory usage a way more preferred. Also
we might consider moving memory allocation for functions in our own "small"
instace (where mh hashes simply use system's malloc/calloc calls).

And no, I didn't measure performance here because it was not a priority
when _inserting_ a new function. And I'm very sceptical about "we ran
a benchmark and it worked a way more better". We've been using both
trees and hashes for decades and open-hashing (like ours) works fine
until some moment where collision starts to happen and performace flushed
to the toiled because you start walking over the table causing pages
to swap in, massive tlb flushed and all this friends. On big data and
higly loaded systems it is very fragile area and there is still no
answer which structures are better :( Sometimes if your data is small
enough at sits near in memory old plain linear lookup a way more
faster than enything else, because theory is theory but in real life
too many components with own limists involved.

Summarizing: I can easily switch back to mhash if you prefer.

> > +
> > +	/**
> > +	 * Symbol descriptor for the function in
> > +	 * an associated module.
> > +	 */
> > +	struct module_sym mod_sym;
> > +
> > +	/**
> > +	 * Number of references to the function
> > +	 * instance.
> > +	 */
> > +	ssize_t ref;
> 
> 5. Why is it ssize_t? Is there any single place, where we use ssize_t
> for reference counting? Why ssize_t, if it is has nothing to do with
> size of anything?
> 
> Also it probably would be better to name it 'load_count'. Because
> to the end of the patch I was thinking you decided to count Lua refs
> too somewhy, and realized it is load count only afterwards, when
> started writing the comments.

Strictly speaking plain int should fit here better but signed size
just guaranteed to be large enough to cover address space. That said
I agree that using ssize_t here is a git misleading. Will change and
rename. Thanks!

> 7. Why are there so many extra empty lines after each member? I am
> going to ask again you not to invent a new code style on each next
> patch you send.

As far as I see there is no any requirements about spacing outs so
for me it looks a way more readable. I'll shrink it if you prefer.

> > +/**
> > + * Find function in a tree.
> > + */
> > +struct cbox_func *
> > +cbox_func_find(const char *name, size_t name_len)
> 
> 8. It should be static.

+1

> > +
> > +/**
> > + * Reference a function instance.
> > + */
> > +static bool
> > +cbox_func_ref(struct cbox_func *cf)
> > +{
> > +	assert(cf->ref >= 0);
> > +
> > +	/*
> > +	 * Hardly to ever happen but just
> > +	 * to make sure.
> 
> 9. We already discussed it a ton of times, literally exactly the
> same 'problem' with struct error objects. A 64 bit integer
> will not overflow for your lifetime even if it would increment
> every nanosecond. How is it possible? Why the code should get
> complicated by making a simple 'ref' function be able to fail?

This has nothing to do for fair use. It prevents from errors where
you occasionally screwed the counter. I don't know why but for some
reason you think that counters are never screwed. What we really
need is a general kind of kref_t and it *must* include saturation
check somewhere inside and panic on overflow.

> 
> > +	 */
> > +	if (cf->ref == SSIZE_MAX) {
> > +		const char *fmt =
> > +			"Too many function references (max %zd)";
> > +		diag_set(IllegalParams, fmt, SSIZE_MAX);
> > +		return false;
> > +	}

...

> > +/**
> > + * Allocate a new function instance.
> > + */
> > +static struct cbox_func *
> > +cbox_func_new(const char *name, size_t name_len)
> > +{
> > +	const ssize_t cf_size = sizeof(struct cbox_func);
> > +	ssize_t size = cf_size + name_len + 1;
> > +	if (size < 0) {
> 
> 10. If this is intended to check for an overflow, it does not work.
> If I will pass name_len = UINT64_MAX (on my machine size_t == 8 bytes),
> then UINT64_MAX + 1 = 0, you will allocate only sizeof(), and then
> will do memcpy for UINT64_MAX below.
> 
> I would suggest not to get crazy with such checks. You can't cover
> everything, and it is not needed.

It is a typo, it should be `if (size <= 0)`. I think we must check
what users pass us and don't allow to screw a program. So this test
is not crazy at all. I can drop it of course but at least we can
set up some assert() here? Seriously we must not trust user data.

> > +static int
> > +lcbox_func_load(struct lua_State *L)
> > +{
> > +
> 
> 11. 'cf' leaks if it was just created a few lines above.

Good cactch! Thanks!

> > +	lua_pushstring(L, "__cbox_func");
> > +	*(struct cbox_func **)lua_newuserdata(L, sizeof(cf)) = cf;
> > +	lua_settable(L, -3);
> 
> 12. I can assure you, that creation of all these Lua GC objects:
> C functions, metatables, is going to cost much much more than the
> optimizations you tried to do with the tree vs hash. Order of
> magnitude slower. I suggest you to look at lua/tuple.c to check
> how to avoid creation of a new metatable on each function object.
> 
> A single metatable object should be shared by all cbox function
> objects to reduce the GC pressure.
> 
> Also you don't need to push a table with __cbox_func field in it.
> You can push userdata right away and make it look like a proper Lua
> object. See luaT_pusherror(), luaT_pushtuple().

Will do, thanks for suggestion.

> > +
> > +	/* module table */
> > +	static const struct luaL_Reg module_table[] = {
> > +		{ "reload",	lcbox_module_reload },
> > +	};
> > +
> > +	lua_newtable(L);
> > +	for (size_t i = 0; i < lengthof(module_table); i++) {
> > +		lua_pushstring(L, module_table[i].name);
> > +		lua_pushcfunction(L, module_table[i].func);
> > +		lua_settable(L, -3);
> > +	}
> > +	lua_setfield(L, -2, "module");
> 
> 13. Why couldn't you simply use luaL_register() instead of these
> cycles?

For some reason when I've been using luaL_register it didn't appear
on newly create objects. Letme try to use luaL_register again maybe
I simply miss something obvious.

> > +
> > +/**
> > + * Initialize cbox module.
> > + *
> > + * @return 0 on success, -1 on error (diag is set).	
> 
> 14. Trailing tab. In the header file too.

Ok, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure
  2020-10-30  9:51     ` Cyrill Gorcunov
@ 2020-10-31  0:13       ` Vladislav Shpilevoy
  2020-10-31 15:27         ` Cyrill Gorcunov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31  0:13 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

>>> +			module_delete(module);
>>> +			return -1;
>>> +		}
>>> +	}
>>> +
>>> +	mod_sym->addr = module_sym(module, name.sym);
>>> +	if (mod_sym->addr == NULL)
>>> +		return -1;
>>
>> 2. If the module was loaded first time here, it is not unloaded in case of
>> an error in this place.
> 
> Just like it was before the patch. The patch simply factor outs the old
> code. It doesn't improve it (because, lets be honest this is a min problem
> for module management -- we've to check for module symbol not at moment
> of calling it but rather at the moment when we load a function). That
> said the issue with module management is known and I think we need to
> rework modules code more deeply, but not in this series. In the
> series it remains exactly as it was.

There is no even a ticket for that. And if you add it, it will go to wishlist,
I can assure you. The rule is that if you see a bug, the first thing you do is
ensure there is a ticket on it. This issue is definitely a bug. So you can't rely
on some refactoring of something sometime in the future. If there is a bug, it
must not be lost, and must be fixed. Bugs have higher priority than refactoring
almost always, and there is more chances it will be fixed when filed separately
from refactoring.

https://github.com/tarantool/tarantool/issues/5475

>>> +/**
>>> + * Callable symbol bound to a module.
>>> + */
>>> +struct module_sym {
>>> +	/**
>>> +	 * Anchor for module membership.
>>> +	 */
>>> +	struct rlist item;
>>> +	/**
>>> +	 * For C functions, address of the function.
>>> +	 */
>>> +	box_function_f addr;
>>> +	/**
>>> +	 * Each stored function keeps a handle to the
>>> +	 * dynamic library for the C callback.
>>> +	 */
>>
>> 3. Can't parse the comment. What is the 'C callback'?
>> And why is this function stored? After you extracted it
>> from struct func_c, it is not related to _func space, and
>> is not stored.
> 
> It is stored in memory.

Everything is stored in memory in Tarantool except Vinyl. I
see the comment is just copy-pasted from the previous place
in struct func_c, where it meant stored as stored in the
schema. In spaces.

> The C callback is the function we
> call, so we keep a pointer to a module.

You don't need a module pointer to call a function. It can be 
called by (), without a module. It is saved here to load it and
unload. At unload it may delete the module if the function object
was the last user of the module.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem
  2020-10-30 10:15     ` Cyrill Gorcunov
@ 2020-10-31  0:15       ` Vladislav Shpilevoy
  2020-10-31 15:29         ` Cyrill Gorcunov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31  0:15 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

>>> +
>>> +/**
>>> + * Find a path to a module using Lua's package.cpath.
>>> + *
>>> + * @param ctx module find context
>>> + * @param[out] ctx->path path to shared library.
>>> + * @param[out] ctx->path_len size of @a ctx->path buffer.
>>
>> 2. The last two lines are not params. The only parameter is ctx. If
>> you want to describe it, it should be done under the same @param.
>> Otherwise the comment looks mismatching the signature. I gonna say
>> that again. I don't force to use Doxygen, but if you decide to use it,
>> it should be done correctly.
> 
> Well, I didnt find any reference for the correct way to describe such
> updates of internal structures.

Doxygen syntax can be found on the official site, available on a first
link from google when you type 'doxygen'. If it does not provide a special
syntax for describing changes of individual fields of a single parameter,
then you need to describe them in the comment main part, or the @param
describing the parameter, like I proposed above.

> I think I better drop this doxy-style
> comment.

Or drop it, yes.

>>> + *
>>> + * @return 0 on succes, -1 otherwise, diag is set.
>>> + */
>>> +static int
>>> +module_find(struct module_find_ctx *ctx)
>>> diff --git a/src/box/module_cache.h b/src/box/module_cache.h
>>> new file mode 100644
>>> index 000000000..3f275024a
>>> --- /dev/null
>>> +++ b/src/box/module_cache.h
>>> @@ -0,0 +1,156 @@
>>> +/*
>>> + * SPDX-License-Identifier: BSD-2-Clause
>>> + *
>>> + * Copyright 2010-2020, Tarantool AUTHORS, please see AUTHORS file.
>>> + */
>>> +
>>> +#pragma once
>>> +
>>> +#include <small/rlist.h>
>>> +
>>> +#include "func_def.h"
>>
>> 3. Func_def heavily depends on schema. Also it is a part of
>> all the _func handling, meaning this is a cyclic dependency now -
>> module_cache depends on func and vice versa.
> 
> Nope. Where you get it? func_def.h is not touched in this series
> at all. What we have now

I didn't say you touched it. I said it depends on schema, and is a part of
stored functions code. Including func.h, func.c, call.h, call.c, and the
others.

> module_cache.h: func_def.h
> func.h: module_cache.h func_def.h
> func_def.h: trivia/util.h field_def.h opt_def.h
> 
> I don't understand which cyclic dependency you mean.

func.h/func.c/func_def.h/func_def.c are a part of the subsystem for
calling stored functions. Inside of the schema. It also depends on
module_cache.

module_cache is not related to schema, but depends on func_def.h
somewhy. This is the cycle. Module_cache and stored functions depend
on each other.

Talking of the schema details: func_def.h operates on user ID, on
function ID, on Lua specifics like sandboxing, knows about multikey,
and so on. It heavily depends on schema, and module_cache should not
depend on schema.

AFAIS, you included it simply because it defined box_function_ctx and
box_function_f. But it should be vice-versa.

I did this and all was built fine:

====================
diff --git a/src/box/func_def.h b/src/box/func_def.h
index d99d89190..75cd6a0d3 100644
--- a/src/box/func_def.h
+++ b/src/box/func_def.h
@@ -168,20 +168,6 @@ func_def_dup(struct func_def *def);
 int
 func_def_check(struct func_def *def);
 
-/**
- * API of C stored function.
- */
-
-struct port;
-
-struct box_function_ctx {
-	struct port *port;
-};
-
-typedef struct box_function_ctx box_function_ctx_t;
-typedef int (*box_function_f)(box_function_ctx_t *ctx,
-	     const char *args, const char *args_end);
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/box/module_cache.h b/src/box/module_cache.h
index 3f275024a..1ec7f0042 100644
--- a/src/box/module_cache.h
+++ b/src/box/module_cache.h
@@ -8,12 +8,20 @@
 
 #include <small/rlist.h>
 
-#include "func_def.h"
-
 #if defined(__cplusplus)
 extern "C" {
 #endif /* defined(__cplusplus) */
 
+struct port;
+
+struct box_function_ctx {
+	struct port *port;
+};
+
+typedef struct box_function_ctx box_function_ctx_t;
+typedef int (*box_function_f)(box_function_ctx_t *ctx,
+	     const char *args, const char *args_end);
+
 /**
  * Function name descriptor: a symbol and a package.
  */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-30 12:51     ` Cyrill Gorcunov
@ 2020-10-31  0:21       ` Vladislav Shpilevoy
  2020-10-31 21:59         ` Cyrill Gorcunov
  0 siblings, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-31  0:21 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

>> The module leaks somewhere. It never unloads the shared file. I
>> tried this:
>>
>> 	cbox = require('cbox')
>> 	f = cbox.func.load('function1.test_push')
>> 	f()
>> 	f = nil
>> 	cbox.func.unload('function1.test_push')
>> 	collectgarbage('collect')
>>
>> Then I use 'lsof -p' to see what files are loaded, and I see
>> function1.dylib here. When I do all the same using _func space
>> and box.func.call, the file is unloaded correctly.
>>
>> (function1.dylib I took from test/box/.)
> 
> I'll try to repeat. That;s interesting, actually I've been trying
> to force gc to trigger but without much luck. Letme investigate
> this.

It is not about Lua GC. The collectgarbage('collect') works fine
and the function object is deleted. But the module is not unloaded.
Because you never call module_sym_unload().

>>> ``` Lua
>>> ok, err = require('cbox').module.reload('func')
>>> if not ok then
>>>     print(err)
>>> end
>>> ```
>>
>> 2. In the previous review I said:
>>
>> 	3. It seems the function invocation itself is not documented?
>>
>> You answered:
>>
>> 	Good point, seems this would be useful. I though 'cause this
>> 	is mostly an alias for "_func" space it would be obvious. But
>> 	indeed without real example the doc looks incomplete. Thanks!
>>
>> Tbh, I don't see any difference regarding this. It is
>> still not documented how to call a function after you
>> loaded it, and what does it return.
> 
> Here is a paragraph above
> ---
>     Once function is loaded it is possible to execute it
>     in a traditional Lua way, ie to call it as a function.
> 
>     ``` Lua
>     -- call with agruments arg1 and arg2
>     f(arg1, arg2)
>     ```
> ---
> 
> I'm not sure what else we could put here?

This paragraph is false, it seems. Because it does not work like Lua
functions 1-to-1. It returns nil,err or true,results. Lua functions
return whatever you return from them so just 'results'. These cbox
functions change the returned set a bit. And this is good, but you
didn't document it.

>>> diff --git a/src/box/box.cc b/src/box/box.cc
>>> index 2485b79f3..f20761e8f 100644
>>> --- a/src/box/box.cc
>>> +++ b/src/box/box.cc
>>> @@ -75,6 +75,7 @@
>>>  #include "systemd.h"
>>>  #include "call.h"
>>>  #include "module_cache.h"
>>> +#include "lua/cbox.h"
>>>  #include "sequence.h"
>>>  #include "sql_stmt_cache.h"
>>>  #include "msgpack.h"
>>> @@ -2246,6 +2247,7 @@ box_free(void)
>>>  		tuple_free();
>>>  		port_free();
>>>  #endif
>>> +		cbox_free();
>>
>> 3. box_init and box_free are not for Lua modules. Please,
>> use box_lua_init(). Also I don't unserstand, why do you have two
>> cbox init functions. If it is a Lua module, it should have lua init
>> function and be in lua/ folder. If it is a C module, it should have
>> a usual init function and not be in a lua/ folder. Not both. See
>> examples such as tuple and error modules. They have separate Lua
>> and C part. In your case I don't understand why would you need a
>> separate C part. Cbox is entirely for Lua. We don't need cbox in C.
>> So for your case box_lua_init should be enough.
> 
> cbox engine uses rbtree to keep functions in memory and we have
> to initialize it first and clean up it on exit. The lua init provides
> Lua's interface over the engine. I can move initialization to box_lua_init
> but where to clean things up?!

Option 1 - nowhere like other Lua modules. Option 2 - add box_lua_free
and add your cbox destruction there.

> Please look into popen module -- it does
> exactly the same: there is C part and Lua parts.

It is not the same. Popen, on the contrary, proves my point.

lua/popen.h has only tarantool_lua_popen_init(). lua/popen
defines a new module on top of lib/core/popen.

lib/core/popen.h has popen_init().

These two modules are separate. They are not one. lib/core/popen works
and builds without lua part. In your case you just threw everything into
a huge pile inside of lua/cbox.c, which is also fine, but then it is a
single monolithic module, and should have one init function. Also since
it is in Lua, it should not be used in files having nothing to do with
Lua.

>>> +/**
>>> + * Function descriptor.
>>> + */
>>> +struct cbox_func {
>>> +	/**
>>> +	 * Gather functions into rbtree.
>>> +	 */
>>> +	rb_node(struct cbox_func) nd;
>>
>> 4. Why rbtree? Search in the tree is going to be slower than in
>> the hash. Hash would be perfect here, because all the searches
>> are always by ==, not < or >, no iteration. Only lookup.
> 
> And lookup in our hash table causes iterations as well in case
> of collisions.
> 
>> In the cover letter you said
>>
>> 	"i don't like the idea of unexpected rehashing of
>> 	values in case of massive number of functions
>> 	allocated"
>>
>> but AFAIK, according to our benches, when a hash is used as an index,
>> it is always faster than any tree on lookups. Although I don't
>> remember benches hash vs tree out of an index. Since your are doing
>> such dubious "optimizations", is there a bench you can show, that
>> a tree is faster than a hash?
> 
> The main reason for rbtree is the memory.

Ok. Did you measure how much more efficient it is in terms of memory?
Even in theory?

> Initially I used our mhash
> engine but I don't expect there be that many functions which allows
> me to not allocate such big structure as a hash table, instead I only
> need two pointers for lookup (ie 16 bytes per entry).
> 
> Do you really think it worth it to allocate hash table instead? I'm ready
> to make lookup a bit slower just to eliminate redundant memory overhead.

Yes, I think the hash is better here. But since you said the tree
solution is better in terms of speed because of rehash, and in terms of
memory because of I don't know what, you gotta prove it somehow. I can
prove my point because Alexanded had benchmarks showing hash is faster
inside of indexes. Also there is an issue in vshard about hash index being
better: https://github.com/tarantool/vshard/issues/128. Alexey did a bench,
AFAIR, even though we didn't post it in the issue.

We use the hash in a lot of far more critical places in terms of perf.
So either you are right and we need to fix them too, or we need to fix the
hash implementation, or you are not right and we need to keep using the
hash, when all operations are lookup. In either way we can't just say the
tree is better for lookups and silently use it here.

> I expect at worst case to have up to 1000 functions which may require up
> to 10 attempts.  I hate (with a passon) idea that in a middle of inserting
> new key our hash engine start a rehashing procedure.

If a user loads and unloads functions often, the rehash issue is going to
be the least severe one, because much more time will be spent on doing
mmap/munmap to load/unload the shared files.

> And note that lookup happens only _once_ when you loading a new function,
> we keep the pointer later and calling function goes directly without any
> lookups. So I think keeping low memory usage a way more preferred.

I am going to ask for numbers, what is the memory usage difference we are
talking about. Otherwise it all sounds sus.

> Also
> we might consider moving memory allocation for functions in our own "small"
> instace (where mh hashes simply use system's malloc/calloc calls).

I didn't understand this. Can you elaborate?

> And no, I didn't measure performance here because it was not a priority
> when _inserting_ a new function. And I'm very sceptical about "we ran
> a benchmark and it worked a way more better". We've been using both
> trees and hashes for decades and open-hashing (like ours) works fine
> until some moment where collision starts to happen and performace flushed
> to the toiled because you start walking over the table causing pages
> to swap in, massive tlb flushed and all this friends. On big data and
> higly loaded systems it is very fragile area and there is still no
> answer which structures are better :(

Please, go to Alexander L. He has concrete numbers how much faster hash is.
I remember he did it at least in scope of an issue about Lua tables vs spaces
performance.

> Sometimes if your data is small
> enough at sits near in memory old plain linear lookup a way more
> faster than enything else, because theory is theory but in real life
> too many components with own limists involved.
> 
> Summarizing: I can easily switch back to mhash if you prefer.

I think at this point we have gone too far to simply choose one solution,
and we need to do the comparison. Memory comparison is simple to do - insert
the same functions into a hash, printf its size. Insert into a tree, and
printf its size. For perf you can either write your own micro-bench or ask
Alexander L. for existing results.

>>> +
>>> +	/**
>>> +	 * Symbol descriptor for the function in
>>> +	 * an associated module.
>>> +	 */
>>> +	struct module_sym mod_sym;
>>> +
>>> +	/**
>>> +	 * Number of references to the function
>>> +	 * instance.
>>> +	 */
>>> +	ssize_t ref;
>>
>> 5. Why is it ssize_t? Is there any single place, where we use ssize_t
>> for reference counting? Why ssize_t, if it is has nothing to do with
>> size of anything?
>>
>> Also it probably would be better to name it 'load_count'. Because
>> to the end of the patch I was thinking you decided to count Lua refs
>> too somewhy, and realized it is load count only afterwards, when
>> started writing the comments.
> 
> Strictly speaking plain int should fit here better but signed size
> just guaranteed to be large enough to cover address space.

But it is not an address either ...

>>> +/**
>>> + * Reference a function instance.
>>> + */
>>> +static bool
>>> +cbox_func_ref(struct cbox_func *cf)
>>> +{
>>> +	assert(cf->ref >= 0);
>>> +
>>> +	/*
>>> +	 * Hardly to ever happen but just
>>> +	 * to make sure.
>>
>> 9. We already discussed it a ton of times, literally exactly the
>> same 'problem' with struct error objects. A 64 bit integer
>> will not overflow for your lifetime even if it would increment
>> every nanosecond. How is it possible? Why the code should get
>> complicated by making a simple 'ref' function be able to fail?
> 
> This has nothing to do for fair use. It prevents from errors where
> you occasionally screwed the counter.

In the code you always either increment or decrement it. How many
lifes you will need to get it 'screwed'?

> I don't know why but for some
> reason you think that counters are never screwed.

With that we can get to the point that it is not safe to use TCP,
and we need to ensure packet order, send checksums and so on. Also
we can't rely on the main memory stability - some electrons may leak,
and screw a bit in a number. So we need to check each number before
we use it.

What I think is that there is a border of sanity, when the checks
must be stopped in order to keep the code sane and simple. An overflow
of a 64 bit number, changed only by increments and decrements, is beyond
that border.

If the counter is somehow broken, it means its memory was overridden,
and the whole process is compromised. It makes no sense to continue
the execution. You not only continue, you also expose this error to a
user. What is he supposed to do with that error?

> What we really
> need is a general kind of kref_t and it *must* include saturation
> check somewhere inside and panic on overflow.

This is not really applicable in Tarantool. At least not everywhere.
Struct tuple, for instance, is referenced, but it uses a very special
reference counter to save every single byte, because tuple count is
millions and billions.

>>> +/**
>>> + * Allocate a new function instance.
>>> + */
>>> +static struct cbox_func *
>>> +cbox_func_new(const char *name, size_t name_len)
>>> +{
>>> +	const ssize_t cf_size = sizeof(struct cbox_func);
>>> +	ssize_t size = cf_size + name_len + 1;
>>> +	if (size < 0) {
>>
>> 10. If this is intended to check for an overflow, it does not work.
>> If I will pass name_len = UINT64_MAX (on my machine size_t == 8 bytes),
>> then UINT64_MAX + 1 = 0, you will allocate only sizeof(), and then
>> will do memcpy for UINT64_MAX below.
>>
>> I would suggest not to get crazy with such checks. You can't cover
>> everything, and it is not needed.
> 
> It is a typo, it should be `if (size <= 0)`.

It won't work either. Because size won't be 0. Name_len + 1 will be,
and size will be = sizeof(struct cbox_func).

> I think we must check
> what users pass us and don't allow to screw a program.

Name_len is returned by lua_tolstring(). If there is a string, whose
size overflows size_t, I am afraid the program is already screwed,
because there is not enough memory in the world to fit such a string.

> So this test
> is not crazy at all.

It really is, see above.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure
  2020-10-31  0:13       ` Vladislav Shpilevoy
@ 2020-10-31 15:27         ` Cyrill Gorcunov
  0 siblings, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-31 15:27 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sat, Oct 31, 2020 at 01:13:18AM +0100, Vladislav Shpilevoy wrote:
> >>> +			module_delete(module);
> >>> +			return -1;
> >>> +		}
> >>> +	}
> >>> +
> >>> +	mod_sym->addr = module_sym(module, name.sym);
> >>> +	if (mod_sym->addr == NULL)
> >>> +		return -1;
> >>
> >> 2. If the module was loaded first time here, it is not unloaded in case of
> >> an error in this place.
> > 
> > Just like it was before the patch. The patch simply factor outs the old
> > code. It doesn't improve it (because, lets be honest this is a min problem
> > for module management -- we've to check for module symbol not at moment
> > of calling it but rather at the moment when we load a function). That
> > said the issue with module management is known and I think we need to
> > rework modules code more deeply, but not in this series. In the
> > series it remains exactly as it was.
> 
> There is no even a ticket for that. And if you add it, it will go to wishlist,
> I can assure you. The rule is that if you see a bug, the first thing you do is
> ensure there is a ticket on it. This issue is definitely a bug. So you can't rely
> on some refactoring of something sometime in the future. If there is a bug, it
> must not be lost, and must be fixed. Bugs have higher priority than refactoring
> almost always, and there is more chances it will be fixed when filed separately
> from refactoring.
> 
> https://github.com/tarantool/tarantool/issues/5475

I assigned this bug for myself. I see your point, and I rather tend to agree
in general. I'll wix.

> >>
> >> 3. Can't parse the comment. What is the 'C callback'?
> >> And why is this function stored? After you extracted it
> >> from struct func_c, it is not related to _func space, and
> >> is not stored.
> > 
> > It is stored in memory.
> 
> Everything is stored in memory in Tarantool except Vinyl. I
> see the comment is just copy-pasted from the previous place
> in struct func_c, where it meant stored as stored in the
> schema. In spaces.
> 
> > The C callback is the function we
> > call, so we keep a pointer to a module.
> 
> You don't need a module pointer to call a function. It can be 
> called by (), without a module. It is saved here to load it and
> unload. At unload it may delete the module if the function object
> was the last user of the module.

Seems we don't understad each other. This is a symbol we retrieve
from the module and we literally "call" it then. The symbol has
a name and an address, and address used as callable object. I'm
not sure how else I could rephrase it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem
  2020-10-31  0:15       ` Vladislav Shpilevoy
@ 2020-10-31 15:29         ` Cyrill Gorcunov
  0 siblings, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-31 15:29 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sat, Oct 31, 2020 at 01:15:38AM +0100, Vladislav Shpilevoy wrote:
...
> 
> AFAIS, you included it simply because it defined box_function_ctx and
> box_function_f. But it should be vice-versa.
> 
> I did this and all was built fine:

OK, I'll take a precise look on the week. Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-31  0:21       ` Vladislav Shpilevoy
@ 2020-10-31 21:59         ` Cyrill Gorcunov
  2020-11-01  8:26           ` Cyrill Gorcunov
  2020-11-02 22:25           ` Vladislav Shpilevoy
  0 siblings, 2 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-10-31 21:59 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sat, Oct 31, 2020 at 01:21:34AM +0100, Vladislav Shpilevoy wrote:
> > 
> > I'll try to repeat. That;s interesting, actually I've been trying
> > to force gc to trigger but without much luck. Letme investigate
> > this.
> 
> It is not about Lua GC. The collectgarbage('collect') works fine
> and the function object is deleted. But the module is not unloaded.
> Because you never call module_sym_unload().

I'll investigate.

> > Here is a paragraph above
> > ---
> >     Once function is loaded it is possible to execute it
> >     in a traditional Lua way, ie to call it as a function.
> > 
> >     ``` Lua
> >     -- call with agruments arg1 and arg2
> >     f(arg1, arg2)
> >     ```
> > ---
> > 
> > I'm not sure what else we could put here?
> 
> This paragraph is false, it seems. Because it does not work like Lua
> functions 1-to-1. It returns nil,err or true,results. Lua functions
> return whatever you return from them so just 'results'. These cbox
> functions change the returned set a bit. And this is good, but you
> didn't document it.

Vlad, frankly I don't understand what else we can put here in documentation,
lets talk f2f about this.

> > 
> > cbox engine uses rbtree to keep functions in memory and we have
> > to initialize it first and clean up it on exit. The lua init provides
> > Lua's interface over the engine. I can move initialization to box_lua_init
> > but where to clean things up?!
> 
> Option 1 - nowhere like other Lua modules. Option 2 - add box_lua_free
> and add your cbox destruction there.

Then box_lua_free will suite us I think. I dont like that we might
exit without explicit cleanup (surely OS will clean the memory by
its own but still).

> >> but AFAIK, according to our benches, when a hash is used as an index,
> >> it is always faster than any tree on lookups. Although I don't
> >> remember benches hash vs tree out of an index. Since your are doing
> >> such dubious "optimizations", is there a bench you can show, that
> >> a tree is faster than a hash?
> > 
> > The main reason for rbtree is the memory.
> 
> Ok. Did you measure how much more efficient it is in terms of memory?
> Even in theory?

There is no need for theory. The cbox_func requires only one pointer
for tree root, and two nodes per each node. In turn our mhash requires
memoy for hash table and entries. My results shows

rbtree
==546461==    still reachable: 39,168 bytes in 256 blocks
==546461==   total heap usage: 257 allocs, 1 frees, 40,192 bytes allocated

mhash
==546471==    still reachable: 51,828 bytes in 260 blocks
==546471==   total heap usage: 273 allocs, 13 frees, 65,124 bytes allocated

almost twice in size. Note that it highly depends on number of nodes/cells.
In the example above we use 256 functions. I did a number of measurements and
hash always use more memory, this is expected because of mhash internals.

Taking into account that I don't expect that many functions to be present
in the system this might not be critical but still.

> > Initially I used our mhash
> > engine but I don't expect there be that many functions which allows
> > me to not allocate such big structure as a hash table, instead I only
> > need two pointers for lookup (ie 16 bytes per entry).
> > 
> > Do you really think it worth it to allocate hash table instead? I'm ready
> > to make lookup a bit slower just to eliminate redundant memory overhead.
> 
> Yes, I think the hash is better here. But since you said the tree
> solution is better in terms of speed because of rehash, and in terms of
> memory because of I don't know what, you gotta prove it somehow. I can
> prove my point because Alexanded had benchmarks showing hash is faster
> inside of indexes. Also there is an issue in vshard about hash index being
> better: https://github.com/tarantool/vshard/issues/128. Alexey did a bench,
> AFAIR, even though we didn't post it in the issue.
> 
> We use the hash in a lot of far more critical places in terms of perf.
> So either you are right and we need to fix them too, or we need to fix the
> hash implementation, or you are not right and we need to keep using the
> hash, when all operations are lookup. In either way we can't just say the
> tree is better for lookups and silently use it here.

Guys, you need to understand one important thing -- benchmarks shows picture
on particular data. They don't explain what is happening. The heart of mhash
engine is MurmurHash3. And it is known to work well on general data while the
distribution of incoming data do not cause too many collisions. I'll push
my benchmark into github so you could take a look (maybe I simply did some
mistakes in benchmarking https://github.com/cyrillos/cbox).

Here is an examples of "good" data, ie random "good" strings where hashing
shows a way better behaviour

(256 strings, create -- nanoseconds spent while inserting all 256 entries,
 lookup -- time spent while been looking every string from 256 entries,
 benchmark built with gcc and -O2).

[cyrill@grain cbox] ./hvsrb --nr 256 --rb --mh
rb create diff :               384353 ns
rb lookup diff :               144715 ns

mh create diff :               258046 ns
mh lookup diff :                92049 ns

Creation time is comparable while lookup almost twice faster. Note though
this is nanoseconds. And we call for lookup only when inserting a new
function. Adding and removing functions won't be a frequent case.

Now lets look into "bad" data, where each string is causing collission.

[cyrill@grain cbox] ./hvsrb --rb --mh
rb create diff :               370554 ns
rb lookup diff :               138824 ns

mh create diff :              1473945 ns
mh lookup diff :               940340 ns

Here I created 256 entries where each of it is known to be "bad" and
mh works 9 times worse. This ponits only to the fact that hashing
procedure is very sensitive to the incoming data while plain strcmp
doesn't care much and rbtree works a way more faster.

Thus when you use some hash table you must be sure that it produce
acceptable results in a worst case. And above been proven that in
worst case things are incredibly bad. I'm showing this to prove
you that hashes are not panacea. And if I have to choose between
rbtree and hashes here I choose the first because 1) it requires
less memory 2) it guarantees us acceptable results even in bad
cases. And we do not expect millions of functions thus on good
cases they work acceptably fast as well.

Still the example above is rather synthetical where incoming
data choosen the way to produce as many collisions as possible
and there is no easy way to generate such symbols in modules
(but still possible, and passing it via lua script should be
possible as well).

And I think we should consider moving to xxhash instead

git@github.com:Cyan4973/xxHash.git

which shows better result so far. Still as any noncrypto
hashes it has collissions.

Hopefully now you agree with me that hashes are not that simple
as they look.

> > I expect at worst case to have up to 1000 functions which may require up
> > to 10 attempts.  I hate (with a passon) idea that in a middle of inserting
> > new key our hash engine start a rehashing procedure.
> 
> If a user loads and unloads functions often, the rehash issue is going to
> be the least severe one, because much more time will be spent on doing
> mmap/munmap to load/unload the shared files.

Yes, and exactly for this reason I don't understand why we fight for
using hashing, while looking up via rbtree gonna be more than enough
here and saves us a bit of memory.

To be honest I'm tired already of this conversation, I think I simply
switch to use our mhash and lets go further.

> > And note that lookup happens only _once_ when you loading a new function,
> > we keep the pointer later and calling function goes directly without any
> > lookups. So I think keeping low memory usage a way more preferred.
> 
> I am going to ask for numbers, what is the memory usage difference we are
> talking about. Otherwise it all sounds sus.

Pointed above.

> > Also
> > we might consider moving memory allocation for functions in our own "small"
> > instace (where mh hashes simply use system's malloc/calloc calls).
> 
> I didn't understand this. Can you elaborate?

I mean our mhash internally use calloc calls while we probably better use
our "small" engine there. But I'm not 100% sure. We have that good own
memory manager why we use so many malloc/calloc calls it is a mistery for
me. I though that we could use "small" everywhere and a user could set
a limit over every bit of tarantool allocations.

> > And no, I didn't measure performance here because it was not a priority
> > when _inserting_ a new function. And I'm very sceptical about "we ran
> > a benchmark and it worked a way more better". We've been using both
> > trees and hashes for decades and open-hashing (like ours) works fine
> > until some moment where collision starts to happen and performace flushed
> > to the toiled because you start walking over the table causing pages
> > to swap in, massive tlb flushed and all this friends. On big data and
> > higly loaded systems it is very fragile area and there is still no
> > answer which structures are better :(
> 
> Please, go to Alexander L. He has concrete numbers how much faster hash is.
> I remember he did it at least in scope of an issue about Lua tables vs spaces
> performance.
> 
> > Sometimes if your data is small
> > enough at sits near in memory old plain linear lookup a way more
> > faster than enything else, because theory is theory but in real life
> > too many components with own limists involved.
> > 
> > Summarizing: I can easily switch back to mhash if you prefer.
> 
> I think at this point we have gone too far to simply choose one solution,
> and we need to do the comparison. Memory comparison is simple to do - insert
> the same functions into a hash, printf its size. Insert into a tree, and
> printf its size. For perf you can either write your own micro-bench or ask
> Alexander L. for existing results.

Already did. And I agree that the conversation is too long already. Lets
jump into mhash use and continue. There are more important problems in
cbox patches you pointed and mhash is the least of them.

> >>
> >> 5. Why is it ssize_t? Is there any single place, where we use ssize_t
> >> for reference counting? Why ssize_t, if it is has nothing to do with
> >> size of anything?
> >>
> >> Also it probably would be better to name it 'load_count'. Because
> >> to the end of the patch I was thinking you decided to count Lua refs
> >> too somewhy, and realized it is load count only afterwards, when
> >> started writing the comments.
> > 
> > Strictly speaking plain int should fit here better but signed size
> > just guaranteed to be large enough to cover address space.
> 
> But it is not an address either ...

I must admit you're right here.

> >> 9. We already discussed it a ton of times, literally exactly the
> >> same 'problem' with struct error objects. A 64 bit integer
> >> will not overflow for your lifetime even if it would increment
> >> every nanosecond. How is it possible? Why the code should get
> >> complicated by making a simple 'ref' function be able to fail?
> > 
> > This has nothing to do for fair use. It prevents from errors where
> > you occasionally screwed the counter.
> 
> In the code you always either increment or decrement it. How many
> lifes you will need to get it 'screwed'?

Counters are not always screwed by correct calls of inc/dec but sometime
it might be an indirrect errors due to addressing bugs and such. IOW
if we can check counters for saturation -- we should. If you really
think it has such big penalty we could put it under #ifdef.

> > I don't know why but for some
> > reason you think that counters are never screwed.
> 
> With that we can get to the point that it is not safe to use TCP,
> and we need to ensure packet order, send checksums and so on. Also
> we can't rely on the main memory stability - some electrons may leak,
> and screw a bit in a number. So we need to check each number before
> we use it.
> 
> What I think is that there is a border of sanity, when the checks
> must be stopped in order to keep the code sane and simple. An overflow
> of a 64 bit number, changed only by increments and decrements, is beyond
> that border.

Sigh, Vlad, I already saw bugs when counters were screwed due to
addressing issue and it required heck lot of time to catch why it
happened. Surely tarantool is not a kernel code and in worse situation
it simply crash but still...

To stop this controversy (because we hardly persuade each other) I simply
drop this saturation check. And surely we won't going to check every
number in memory (thanks we're not writting medical software).

> If the counter is somehow broken, it means its memory was overridden,
> and the whole process is compromised. It makes no sense to continue
> the execution. You not only continue, you also expose this error to a
> user. What is he supposed to do with that error?

He should report us. This is better than continue operating with code
which is already broken. I don't expect it ever happen but if the
test for saturation is so cheap I would save it. Whatever, I'll drop
it.

> > What we really
> > need is a general kind of kref_t and it *must* include saturation
> > check somewhere inside and panic on overflow.
> 
> This is not really applicable in Tarantool. At least not everywhere.
> Struct tuple, for instance, is referenced, but it uses a very special
> reference counter to save every single byte, because tuple count is
> millions and billions.

And I don't propose it for bigrefs but IIRC we've a number of places
where plain reference counters are used?

> >>> +/**
> >>> + * Allocate a new function instance.
> >>> + */
> >>> +static struct cbox_func *
> >>> +cbox_func_new(const char *name, size_t name_len)
> >>> +{
> >>> +	const ssize_t cf_size = sizeof(struct cbox_func);
> >>> +	ssize_t size = cf_size + name_len + 1;
> >>> +	if (size < 0) {
> >>
> >> 10. If this is intended to check for an overflow, it does not work.
> >> If I will pass name_len = UINT64_MAX (on my machine size_t == 8 bytes),
> >> then UINT64_MAX + 1 = 0, you will allocate only sizeof(), and then
> >> will do memcpy for UINT64_MAX below.
> >>
> >> I would suggest not to get crazy with such checks. You can't cover
> >> everything, and it is not needed.
> > 
> > It is a typo, it should be `if (size <= 0)`.
> 
> It won't work either. Because size won't be 0. Name_len + 1 will be,
> and size will be = sizeof(struct cbox_func).
> 
> > I think we must check
> > what users pass us and don't allow to screw a program.
> 
> Name_len is returned by lua_tolstring(). If there is a string, whose
> size overflows size_t, I am afraid the program is already screwed,
> because there is not enough memory in the world to fit such a string.

That's good point.

> 
> > So this test
> > is not crazy at all.
> 
> It really is, see above.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-31 21:59         ` Cyrill Gorcunov
@ 2020-11-01  8:26           ` Cyrill Gorcunov
  2020-11-02 22:25           ` Vladislav Shpilevoy
  1 sibling, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-11-01  8:26 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Sun, Nov 01, 2020 at 12:59:46AM +0300, Cyrill Gorcunov wrote:
...
> 
> And I think we should consider moving to xxhash instead
> 
> git@github.com:Cyan4973/xxHash.git
> 
> which shows better result so far. Still as any noncrypto
> hashes it has collissions.

FWIW, I changed Murmur3/32 to xxhash/32 and here are results
---
murmur3
[cyrill@grain cbox] ./hvsrb --mh
mh create diff :               459890 ns
mh lookup diff :               194201 ns

xxhash
[cyrill@grain cbox] ./hvsrb --mh
mh create diff :               421618 ns
mh lookup diff :               123385 ns

lookup works almost twice faster on random data,
creation time is comparable.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-31 21:59         ` Cyrill Gorcunov
  2020-11-01  8:26           ` Cyrill Gorcunov
@ 2020-11-02 22:25           ` Vladislav Shpilevoy
  2020-11-03  7:26             ` Cyrill Gorcunov
  1 sibling, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-02 22:25 UTC (permalink / raw)
  To: Cyrill Gorcunov; +Cc: tml

On 31.10.2020 22:59, Cyrill Gorcunov wrote:
> On Sat, Oct 31, 2020 at 01:21:34AM +0100, Vladislav Shpilevoy wrote:
>>>
>>> I'll try to repeat. That;s interesting, actually I've been trying
>>> to force gc to trigger but without much luck. Letme investigate
>>> this.
>>
>> It is not about Lua GC. The collectgarbage('collect') works fine
>> and the function object is deleted. But the module is not unloaded.
>> Because you never call module_sym_unload().
> 
> I'll investigate.
> 
>>> Here is a paragraph above
>>> ---
>>>     Once function is loaded it is possible to execute it
>>>     in a traditional Lua way, ie to call it as a function.
>>>
>>>     ``` Lua
>>>     -- call with agruments arg1 and arg2
>>>     f(arg1, arg2)
>>>     ```
>>> ---
>>>
>>> I'm not sure what else we could put here?
>>
>> This paragraph is false, it seems. Because it does not work like Lua
>> functions 1-to-1. It returns nil,err or true,results. Lua functions
>> return whatever you return from them so just 'results'. These cbox
>> functions change the returned set a bit. And this is good, but you
>> didn't document it.
> 
> Vlad, frankly I don't understand what else we can put here in documentation,
> lets talk f2f about this.

Давай на русском.

Не понимаю. Как это нечего положить в доку? У тебя С функции
возвращают всегда не то, что юзер из них вернул. Ты добавляешь туда
true, или nil,err. Как до этого юзер догадается, что функция
при вызове через cbox модифицирует набор возвращаемых значений? Луа
так не делает, поэтому ты не можешь сказать, что оно работает как
Луа.

Если написать функцию, которая делает сумму двух чисел, то в Луа это
будет выглядеть так:

	local f = function(a, b) return a + b end
	assert(f(1, 2) == 3)

В С через cbox это будет написано так:

	test.c:

	int
	sum(box_function_ctx_t *ctx, const char *args, const char *args_end)
	{
		uint64_t a = mp_decode_uint(&args);
		uint64_t b = mp_decode_uint(&args);
		char res[16];
		char *end = mp_encode_uint(res, a + b);
		box_return_mp(ctx, res, end);
		return 0;
	}

Но в Lua вести оно себя будет нифига не так же:

	local f = cbox.load('test.sum')
	assert(f(1, 2) == 3) -- Это ебанет.

Потому что у тебя f будет возвращать true + result.

	local ok, res = f(1, 2)
	assert(ok)
	assert(res == 3) -- Вот это пройдет.

Это не точно как луа, и явно должно быть в документации.

Ошибки тоже надо задокументировать - что функции cbox никогда не
кидают исключения. Только возвращают nil, err.

По поводу бенчей - да, походу дерево выглядит лучше или по крайней
мере предсказуемо. Мне норм оставить дерево. По хешу я создал тикет,
так как оставлять это без действия нельзя. Если текущий хеш так
плох, его надо поменять. Либо его метод расчета, либо вообще на
дерево его заменить в некоторых других местах тоже. Но я все еще не
уверен, что твои данные для "худшего случая" вообще возможны в
реальности, на типичных именах функций, например. Так что это надо
еще до-проверять.

https://github.com/tarantool/tarantool/issues/5492

>>>> 9. We already discussed it a ton of times, literally exactly the
>>>> same 'problem' with struct error objects. A 64 bit integer
>>>> will not overflow for your lifetime even if it would increment
>>>> every nanosecond. How is it possible? Why the code should get
>>>> complicated by making a simple 'ref' function be able to fail?
>>>
>>> This has nothing to do for fair use. It prevents from errors where
>>> you occasionally screwed the counter.
>>
>> In the code you always either increment or decrement it. How many
>> lifes you will need to get it 'screwed'?
> 
> Counters are not always screwed by correct calls of inc/dec but sometime
> it might be an indirrect errors due to addressing bugs and such. IOW
> if we can check counters for saturation -- we should. If you really
> think it has such big penalty we could put it under #ifdef.

Это будет еще хуже с ifdef. Все равно придется проверять, что ref не
вернул -1, что выглядит супер стремно.

>>> What we really
>>> need is a general kind of kref_t and it *must* include saturation
>>> check somewhere inside and panic on overflow.
>>
>> This is not really applicable in Tarantool. At least not everywhere.
>> Struct tuple, for instance, is referenced, but it uses a very special
>> reference counter to save every single byte, because tuple count is
>> millions and billions.
> 
> And I don't propose it for bigrefs but IIRC we've a number of places
> where plain reference counters are used?

Таких мест точно не много, но да, есть структуры, которые рефаются
обычными интами. Все из-за того, что рефы очень не нравились Косте,
и их супер мало применяли. Я не найду сходу сейчас.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-11-02 22:25           ` Vladislav Shpilevoy
@ 2020-11-03  7:26             ` Cyrill Gorcunov
  0 siblings, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-11-03  7:26 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Mon, Nov 02, 2020 at 11:25:53PM +0100, Vladislav Shpilevoy wrote:
...
> > 
> > Vlad, frankly I don't understand what else we can put here in documentation,
> > lets talk f2f about this.
> 
> Давай на русском.
> 
> Не понимаю. Как это нечего положить в доку? У тебя С функции
> возвращают всегда не то, что юзер из них вернул. Ты добавляешь туда
> true, или nil,err. Как до этого юзер догадается, что функция
> при вызове через cbox модифицирует набор возвращаемых значений? Луа
> так не делает, поэтому ты не можешь сказать, что оно работает как
> Луа.
...
> 
> Это не точно как луа, и явно должно быть в документации.

Thanks a huge, Vlad! Now I see what you mean.

> Ошибки тоже надо задокументировать - что функции cbox никогда не
> кидают исключения. Только возвращают nil, err.

+1

> 
> По поводу бенчей - да, походу дерево выглядит лучше или по крайней
> мере предсказуемо. Мне норм оставить дерево. По хешу я создал тикет,
> так как оставлять это без действия нельзя. Если текущий хеш так
> плох, его надо поменять. Либо его метод расчета, либо вообще на
> дерево его заменить в некоторых других местах тоже. Но я все еще не
> уверен, что твои данные для "худшего случая" вообще возможны в
> реальности, на типичных именах функций, например. Так что это надо
> еще до-проверять.
> 
> https://github.com/tarantool/tarantool/issues/5492

Я эту багу себе взял. И да "худший случай" что я привел -- он синтетический.
Я специально использовал генератор коллизий, поэтому хэш настолько просел.
В нормальных условиях конечно не будет такого потока коллизий. В любом случае
xxhash выглядит лучше, так что посмотрим повнимательней.

> >>>> 9. We already discussed it a ton of times, literally exactly the
> >>>> same 'problem' with struct error objects. A 64 bit integer
> >>>> will not overflow for your lifetime even if it would increment
> >>>> every nanosecond. How is it possible? Why the code should get
> >>>> complicated by making a simple 'ref' function be able to fail?
> >>>
> >>> This has nothing to do for fair use. It prevents from errors where
> >>> you occasionally screwed the counter.
> >>
> >> In the code you always either increment or decrement it. How many
> >> lifes you will need to get it 'screwed'?
> > 
> > Counters are not always screwed by correct calls of inc/dec but sometime
> > it might be an indirrect errors due to addressing bugs and such. IOW
> > if we can check counters for saturation -- we should. If you really
> > think it has such big penalty we could put it under #ifdef.
> 
> Это будет еще хуже с ifdef. Все равно придется проверять, что ref не
> вернул -1, что выглядит супер стремно.

We could wrap it with inline but thnking more I guess we can simply drop
this all saturation idea for now. Let forget.
 
> >>> What we really
> >>> need is a general kind of kref_t and it *must* include saturation
> >>> check somewhere inside and panic on overflow.
> >>
> >> This is not really applicable in Tarantool. At least not everywhere.
> >> Struct tuple, for instance, is referenced, but it uses a very special
> >> reference counter to save every single byte, because tuple count is
> >> millions and billions.
> > 
> > And I don't propose it for bigrefs but IIRC we've a number of places
> > where plain reference counters are used?
> 
> Таких мест точно не много, но да, есть структуры, которые рефаются
> обычными интами. Все из-за того, что рефы очень не нравились Косте,
> и их супер мало применяли. Я не найду сходу сейчас.

OK 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-10-29 22:15   ` Vladislav Shpilevoy
  2020-10-30 12:51     ` Cyrill Gorcunov
@ 2020-11-12 22:54     ` Vladislav Shpilevoy
  2020-11-13 18:30       ` Cyrill Gorcunov
  1 sibling, 1 reply; 22+ messages in thread
From: Vladislav Shpilevoy @ 2020-11-12 22:54 UTC (permalink / raw)
  To: Cyrill Gorcunov, tml

On 29.10.2020 23:15, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> The module leaks somewhere. It never unloads the shared file. I
> tried this:
> 
> 	cbox = require('cbox')
> 	f = cbox.func.load('function1.test_push')
> 	f()
> 	f = nil
> 	cbox.func.unload('function1.test_push')
> 	collectgarbage('collect')
> 
> Then I use 'lsof -p' to see what files are loaded, and I see
> function1.dylib here. When I do all the same using _func space
> and box.func.call, the file is unloaded correctly.
> 
> (function1.dylib I took from test/box/.)

This bug still is not fixed. You said

	I'll try to repeat.

But the module is not unloaded on the same test above in v10.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module
  2020-11-12 22:54     ` Vladislav Shpilevoy
@ 2020-11-13 18:30       ` Cyrill Gorcunov
  0 siblings, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2020-11-13 18:30 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tml

On Thu, Nov 12, 2020 at 11:54:00PM +0100, Vladislav Shpilevoy wrote:
> 
> This bug still is not fixed. You said
> 
> 	I'll try to repeat.
> 
> But the module is not unloaded on the same test above in v10.

There have been two issues with refs to the library -- one with
error in symbol resolving procedure, and the second -- this one.

The first issue now patches (the patch is floating around) but
this one I tried to repeat and it didn't trigger (I suspect there
was some mistake on my side) and then I managed to forget about
it. Right now I tried again and indeed it triggers. Thanks for
notifiction!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-11-13 18:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 13:35 [Tarantool-patches] [PATCH v8 0/4] box/cbox: implement cfunc Lua module Cyrill Gorcunov
2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 1/4] box/func: factor out c function entry structure Cyrill Gorcunov
2020-10-29 22:15   ` Vladislav Shpilevoy
2020-10-30  9:51     ` Cyrill Gorcunov
2020-10-31  0:13       ` Vladislav Shpilevoy
2020-10-31 15:27         ` Cyrill Gorcunov
2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 2/4] module_cache: move module handling into own subsystem Cyrill Gorcunov
2020-10-29 22:15   ` Vladislav Shpilevoy
2020-10-30 10:15     ` Cyrill Gorcunov
2020-10-31  0:15       ` Vladislav Shpilevoy
2020-10-31 15:29         ` Cyrill Gorcunov
2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 3/4] box/cbox: implement cbox Lua module Cyrill Gorcunov
2020-10-29 22:15   ` Vladislav Shpilevoy
2020-10-30 12:51     ` Cyrill Gorcunov
2020-10-31  0:21       ` Vladislav Shpilevoy
2020-10-31 21:59         ` Cyrill Gorcunov
2020-11-01  8:26           ` Cyrill Gorcunov
2020-11-02 22:25           ` Vladislav Shpilevoy
2020-11-03  7:26             ` Cyrill Gorcunov
2020-11-12 22:54     ` Vladislav Shpilevoy
2020-11-13 18:30       ` Cyrill Gorcunov
2020-10-14 13:35 ` [Tarantool-patches] [PATCH v8 4/4] test: box/cfunc -- add simple module test Cyrill Gorcunov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox