Tarantool development patches archive
 help / color / mirror / Atom feed
From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: [Tarantool-patches] [PATCH v14 10/12] module_cache: provide module_load/unload calls
Date: Wed,  3 Feb 2021 01:12:05 +0300	[thread overview]
Message-ID: <20210202221207.383101-11-gorcunov@gmail.com> (raw)
In-Reply-To: <20210202221207.383101-1-gorcunov@gmail.com>

These calls provide an ability to load and unload modules
taking into account cache invalidation. In particular if
module file is changed on disk then all previous copies
become invalid and any attempt to resolve a function symbol
in such copies will fail. Still any existing instances will
be able to run until explicitly unloaded. This is done for
sake of new upcoming `cmod` module (will be implemented
in next patch).

Need to mention some moments here. Currently we already have
`box.schema.func` interface which is strange designed but can't
be dropped off without breaking backward compatibility. Very
inconvenient moment here is lazy symbols binding, iow we can
create a function but until we execute it the module won't be
read and parsed.

Moreover one can execute the function then overwrite the module
on disk and define an another function from the same module and
run it. This new function will be read from previously cached copy
and one have to run module reload procedure explicitly to update
the module.

Module update on its own is a very fragile procedure: if
there are running functions which are yielding then old
module won't be reload until execution is complete even
if explicit reload is called. And in case if function
return signature is changed (say occasionally) this will
lead to unpredicted results.

In summary: new module_load/unload API with cache
invalidation is incompatible with old `box.schema.func`
thus to keep backward compatibility (until we deprecate
old API) we keep modules in two separate caches: one
for module_load/unload and one for `box.schema.func`.
Later we will keep only new API and get rid of old
code.

Part-of #4642

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 src/box/module_cache.c | 179 +++++++++++++++++++++++++++++++++++------
 src/box/module_cache.h |  27 +++++++
 2 files changed, 181 insertions(+), 25 deletions(-)

diff --git a/src/box/module_cache.c b/src/box/module_cache.c
index f9ed6e7d8..aba55bbe3 100644
--- a/src/box/module_cache.c
+++ b/src/box/module_cache.c
@@ -19,11 +19,33 @@
 #include "error.h"
 #include "lua/utils.h"
 #include "libeio/eio.h"
+#include "trivia/util.h"
 
 #include "module_cache.h"
 
-/** Modules name to descriptor hash. */
+/**
+ * Modules names to descriptor hashes. The first one
+ * for modules created with old `box.schema.func`
+ * interface.
+ *
+ * Here is an important moment for backward compatibility.
+ * The `box.schema.func` operations always use cache and
+ * if a module is updated on a storage device or even
+ * no longer present, then lazy symbol resolving is done
+ * via previously loaded copy. To update modules one have
+ * to reload them manually.
+ *
+ * In turn new API implies to use module_load/unload explicit
+ * interface, and when module is re-loaded from cache then
+ * we make a cache validation to be sure the copy on storage
+ * is up to date.
+ *
+ * Due to all this we have to keep two hash tables. Probably
+ * we should deprecate explicit reload at all and require
+ * manual load/unload instead. But later.
+ */
 static struct mh_strnptr_t *box_schema_hash = NULL;
+static struct mh_strnptr_t *mod_hash = NULL;
 
 /**
  * Parsed symbol and package names.
@@ -52,7 +74,7 @@ struct func_name {
 static inline struct mh_strnptr_t *
 hash_tbl(bool is_box_schema)
 {
-	return is_box_schema ? box_schema_hash : NULL;
+	return is_box_schema ? box_schema_hash : mod_hash;
 }
 
 /***
@@ -289,13 +311,9 @@ module_unref(struct module *module)
  * for cases of a function reload.
  */
 static struct module *
-module_load(struct mh_strnptr_t *h, const char *package,
-	    const char *package_end)
+module_new(const char *path, struct mh_strnptr_t *h,
+	   const char *package, const char *package_end)
 {
-	char path[PATH_MAX];
-	if (module_find(package, package_end, path, sizeof(path)) != 0)
-		return NULL;
-
 	int package_len = package_end - package;
 	struct module *module = malloc(sizeof(*module) + package_len + 1);
 	if (module == NULL) {
@@ -334,8 +352,8 @@ module_load(struct mh_strnptr_t *h, const char *package,
 		goto error;
 	}
 
-	struct stat st;
-	if (stat(path, &st) < 0) {
+	struct stat *st = &module->st;
+	if (stat(path, st) < 0) {
 		diag_set(SystemError, "failed to stat() module %s", path);
 		goto error;
 	}
@@ -347,7 +365,7 @@ module_load(struct mh_strnptr_t *h, const char *package,
 	}
 
 	int dest_fd = open(load_name, O_WRONLY | O_CREAT | O_TRUNC,
-			   st.st_mode & (S_IRWXU | S_IRWXG | S_IRWXO));
+			   st->st_mode & (S_IRWXU | S_IRWXG | S_IRWXO));
 	if (dest_fd < 0) {
 		diag_set(SystemError, "failed to open file %s for writing ",
 			 load_name);
@@ -355,10 +373,10 @@ module_load(struct mh_strnptr_t *h, const char *package,
 		goto error;
 	}
 
-	off_t ret = eio_sendfile_sync(dest_fd, source_fd, 0, st.st_size);
+	off_t ret = eio_sendfile_sync(dest_fd, source_fd, 0, st->st_size);
 	close(source_fd);
 	close(dest_fd);
-	if (ret != st.st_size) {
+	if (ret != st->st_size) {
 		diag_set(SystemError, "failed to copy DSO %s to %s",
 			 path, load_name);
 		goto error;
@@ -405,6 +423,24 @@ module_sym_load(struct module_sym *mod_sym, bool is_box_schema)
 {
 	assert(mod_sym->addr == NULL);
 
+	/*
+	 * When module is created via new interface and
+	 * cached, it might be updated on disk afterward
+	 * so that symbols might not longer match the loaded
+	 * instance. Thus new resolving on same module requires
+	 * the whole module explicit reload.
+	 */
+	if (!is_box_schema) {
+		struct module *m = mod_sym->module;
+		assert(m != NULL);
+		if (module_is_orphan(m)) {
+			diag_set(ClientError, ER_UNSUPPORTED,
+				 "module updated on disk,",
+				 "needs whole reload");
+			return -1;
+		}
+	}
+
 	struct func_name name;
 	func_split_name(mod_sym->name, &name);
 
@@ -417,7 +453,12 @@ module_sym_load(struct module_sym *mod_sym, bool is_box_schema)
 	struct mh_strnptr_t *h = hash_tbl(is_box_schema);
 	cached = module_cache_find(h, name.package, name.package_end);
 	if (cached == NULL) {
-		module = module_load(h, name.package, name.package_end);
+		char path[PATH_MAX];
+		if (module_find(name.package, name.package_end,
+				path, sizeof(path)) != 0) {
+			return -1;
+		}
+		module = module_new(path, h, name.package, name.package_end);
 		if (module == NULL)
 			return -1;
 		if (module_cache_add(module) != 0) {
@@ -435,7 +476,9 @@ module_sym_load(struct module_sym *mod_sym, bool is_box_schema)
 		return -1;
 	}
 
-	mod_sym->module = module;
+	if (is_box_schema)
+		mod_sym->module = module;
+
 	rlist_add(&module->funcs_list, &mod_sym->item);
 	return 0;
 }
@@ -514,6 +557,72 @@ module_sym_call(struct module_sym *mod_sym, struct port *args,
 	return rc;
 }
 
+struct module *
+module_load(const char *package, const char *package_end)
+{
+	char path[PATH_MAX];
+	if (module_find(package, package_end, path, sizeof(path)) != 0)
+		return NULL;
+
+	struct module *cached, *module;
+	struct mh_strnptr_t *h = hash_tbl(false);
+	cached = module_cache_find(h, package, package_end);
+	if (cached == NULL) {
+		module = module_new(path, h, package, package_end);
+		if (module == NULL)
+			return NULL;
+		if (module_cache_add(module) != 0) {
+			module_unref(module);
+			return NULL;
+		}
+		return module;
+	}
+
+	struct stat st;
+	if (stat(path, &st) != 0) {
+		diag_set(SystemError, "module: stat() module %s", path);
+		return NULL;
+	}
+
+	/*
+	 * When module comes from cache make sure that
+	 * it is not changed on the storage device. The
+	 * test below still can miss update if cpu data
+	 * been manually moved backward and device/inode
+	 * persisted but this is a really rare situation.
+	 */
+	if (cached->st.st_dev == st.st_dev &&
+	    cached->st.st_ino == st.st_ino &&
+	    cached->st.st_size == st.st_size &&
+	    memcmp(&cached->st.st_mtim, &st.st_mtim,
+		   sizeof(st.st_mtim)) == 0) {
+		module_ref(cached);
+		return cached;
+	}
+
+	/*
+	 * Load a new module, update the cache and
+	 * zap old module: every attempt to resolve
+	 * symbols on old instance won't success.
+	 */
+	module = module_new(path, h, package, package_end);
+	if (module == NULL)
+		return NULL;
+	if (module_cache_update(module) != 0) {
+		module_unref(module);
+		return NULL;
+	}
+
+	module_set_orphan(cached);
+	return module;
+}
+
+void
+module_unload(struct module *module)
+{
+	module_unref(module);
+}
+
 int
 module_reload(const char *package, const char *package_end)
 {
@@ -529,7 +638,11 @@ module_reload(const char *package, const char *package_end)
 		return -1;
 	}
 
-	new = module_load(box_schema_hash, package, package_end);
+	char path[PATH_MAX];
+	if (module_find(package, package_end, path, sizeof(path)) != 0)
+		return -1;
+
+	new = module_new(path, box_schema_hash, package, package_end);
 	if (new == NULL)
 		return -1;
 
@@ -606,11 +719,21 @@ module_reload(const char *package, const char *package_end)
 int
 module_init(void)
 {
-	box_schema_hash = mh_strnptr_new();
-	if (box_schema_hash == NULL) {
-		diag_set(OutOfMemory, sizeof(*box_schema_hash),
-			 "malloc", "modules box_schema_hash");
-		return -1;
+	struct mh_strnptr_t **ht[] = {
+		&box_schema_hash,
+		&mod_hash,
+	};
+	for (size_t i = 0; i < lengthof(ht); i++) {
+		*ht[i] = mh_strnptr_new();
+		if (*ht[i] == NULL) {
+			diag_set(OutOfMemory, sizeof(*ht[i]),
+				 "malloc", "modules hash");
+			for (ssize_t j = i - 1; j >= 0; j--) {
+				mh_strnptr_delete(*ht[j]);
+				*ht[j] = NULL;
+			}
+			return -1;
+		}
 	}
 	return 0;
 }
@@ -618,12 +741,18 @@ module_init(void)
 void
 module_free(void)
 {
-	struct mh_strnptr_t *h = box_schema_hash;
-	while (mh_size(h) > 0) {
+	struct mh_strnptr_t **ht[] = {
+		&box_schema_hash,
+		&mod_hash,
+	};
+	for (size_t i = 0; i < lengthof(ht); i++) {
+		struct mh_strnptr_t *h = *ht[i];
+
 		mh_int_t i = mh_first(h);
 		struct module *m = mh_strnptr_node(h, i)->val;
 		module_unref(m);
+
+		mh_strnptr_delete(h);
+		*ht[i] = NULL;
 	}
-	mh_strnptr_delete(box_schema_hash);
-	box_schema_hash = NULL;
 }
diff --git a/src/box/module_cache.h b/src/box/module_cache.h
index 875f2eb3c..ba2cbc7dd 100644
--- a/src/box/module_cache.h
+++ b/src/box/module_cache.h
@@ -6,6 +6,10 @@
 
 #pragma once
 
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include "small/rlist.h"
 
 #if defined(__cplusplus)
@@ -48,6 +52,10 @@ struct module {
 	 * Count of active references to the module.
 	 */
 	int64_t refs;
+	/**
+	 * Storage stat for identity check.
+	 */
+	struct stat st;
 	/**
 	 * Module's package name.
 	 */
@@ -112,6 +120,25 @@ int
 module_sym_call(struct module_sym *mod_sym, struct port *args,
 		struct port *ret);
 
+/**
+ * Load new module instance.
+ *
+ * @param package shared library path start.
+ * @param package_end shared library path end.
+ *
+ * @return 0 on succes, -1 otherwise, diag is set.
+ */
+struct module *
+module_load(const char *package, const char *package_end);
+
+/**
+ * Unload module instance.
+ *
+ * @param module instance to unload.
+ */
+void
+module_unload(struct module *module);
+
 /**
  * Reload a module and all associated symbols.
  *
-- 
2.29.2


  parent reply	other threads:[~2021-02-02 22:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02 22:11 [Tarantool-patches] [PATCH v14 00/12] box: implement cmod Lua module Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:11 ` [Tarantool-patches] [PATCH v14 01/12] box/func: factor out c function entry structure Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:11 ` [Tarantool-patches] [PATCH v14 02/12] module_cache: move module handling into own subsystem Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:11 ` [Tarantool-patches] [PATCH v14 03/12] module_cache: direct update a cache value on reload Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:11 ` [Tarantool-patches] [PATCH v14 04/12] module_cache: rename calls to ref in module structure Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 05/12] module_cache: add comment about weird resolving Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 06/12] module_cache: module_reload - drop redundant parameter Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 07/12] module_cache: use references as a main usage counter Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 08/12] module_cache: make module to carry hash it belongs to Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 09/12] module_cache: use own hash for box.schema.func requests Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` Cyrill Gorcunov via Tarantool-patches [this message]
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 11/12] box/cmod: implement cmod Lua module Cyrill Gorcunov via Tarantool-patches
2021-02-02 22:12 ` [Tarantool-patches] [PATCH v14 12/12] test: box/cfunc -- add cmod test Cyrill Gorcunov via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202221207.383101-11-gorcunov@gmail.com \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v14 10/12] module_cache: provide module_load/unload calls' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox