Hi!

Thank you for the patch, it LGTM.

Best regards,
Sergos 


Friday, 2 October 2020, 17:49 +0300 from Igor Munkin <imun@tarantool.org>:
While running GC hook (i.e. __gc metamethod) garbage collector engine
is "stopped": the memory penalty threshold is set to LJ_MAX_MEM and
incremental GC step is not triggered as a result. Ergo, yielding the
execution at the finalizer body leads to further running platform with
disabled LuaJIT GC. It is not re-enabled until the yielded fiber doesn't
get the execution back.

This changeset extends <cord_on_yield> routine with the check whether GC
hook is active. If the switch-over occurs in scope of __gc metamethod
the platform is forced to stop its execution with EXIT_FAILURE and calls
panic routine before the exit.

Relates to #4518
Follows up #4727

Signed-off-by: Igor Munkin <imun@tarantool.org>
---

Vlad introduced the internal interface and local internal background
fiber in scope of 8443bd9 ("fiber: introduce schedule_task() internal
function") to postpone any yielding finalization (e.g. 3d5b4da ("fio:
close unused descriptors automatically") and f073834 ("swim: use
fiber._internal.schedule_task() for GC")). After this patch is merged we
need to update docs and provide users a correct scenario to detect and
fix yielding finalizers.

Here are also the benchmark results for the Release build:
* Vanilla (2711797) -> Patched (61072ba) (min, median, mean, max):
| fibers: 10; iters: 100 0% 0% 0% 2%
| fibers: 10; iters: 1000 -3% 0% 0% 1%
| fibers: 10; iters: 10000 -3% 0% -1% -2%
| fibers: 10; iters: 100000 0% 0% 0% 1%
| fibers: 100; iters: 100 -1% -2% -2% -9%
| fibers: 100; iters: 1000 0% 0% 0% 5%
| fibers: 100; iters: 10000 0% 0% 0% 0%
| fibers: 100; iters: 100000 0% 0% 0% -3%
| fibers: 1000; iters: 100 0% 0% 0% 0%
| fibers: 1000; iters: 1000 0% 0% 0% 0%
| fibers: 1000; iters: 10000 0% 0% 0% 1%
| fibers: 1000; iters: 100000 0% 0% 0% 2%
| fibers: 10000; iters: 100 0% -3% -1% -2%
| fibers: 10000; iters: 1000 0% -2% -3% -2%
| fibers: 10000; iters: 10000 0% 0% 0% -2%
| fibers: 10000; iters: 100000 0% -3% -3% -6%

 src/lua/utils.c | 26 ++++++++++++++-
 test/app-tap/yield-in-gc-finalizer.test.lua | 36 +++++++++++++++++++++
 2 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100755 test/app-tap/yield-in-gc-finalizer.test.lua

diff --git a/src/lua/utils.c b/src/lua/utils.c
index bb2287162..399bec6c6 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1324,7 +1324,8 @@ tarantool_lua_utils_init(struct lua_State *L)
  * the running fiber yields the execution.
  * Since Tarantool fibers don't switch-over the way Lua coroutines
  * do the platform ought to notify JIT engine when one lua_State
- * substitutes another one.
+ * substitutes another one. Furthermore fiber switch is forbidden
+ * when GC hook (i.e. __gc metamethod) is running.
  */
 void cord_on_yield(void)
 {
@@ -1355,4 +1356,27 @@ void cord_on_yield(void)
  * lead to a failure on any next compiler phase.
  */
  lj_trace_abort(g);
+
+ /*
+ * XXX: While running GC hook (i.e. __gc metamethod)
+ * garbage collector is formally "stopped" since the
+ * memory penalty threshold is set to its maximum value,
+ * ergo incremental GC step is not triggered. Thereby,
+ * yielding the execution at this point leads to further
+ * running platform with disabled LuaJIT GC. The fiber
+ * doesn't get the execution back until it's ready, so
+ * in pessimistic scenario LuaJIT OOM might occur
+ * earlier. As a result fiber switch is prohibited when
+ * GC hook is active and the platform is forced to stop.
+ */
+ if (unlikely(g->hookmask & (HOOK_ACTIVE|HOOK_GC))) {
+ struct lua_State *L = fiber()->storage.lua.stack;
+ assert(L != NULL);
+ lua_pushfstring(L, "fiber %d is switched while running GC"
+ " finalizer (i.e. __gc metamethod)",
+ fiber()->fid);
+ if (g->panic)
+ g->panic(L);
+ exit(EXIT_FAILURE);
+ }
 }
diff --git a/test/app-tap/yield-in-gc-finalizer.test.lua b/test/app-tap/yield-in-gc-finalizer.test.lua
new file mode 100755
index 000000000..a7e173721
--- /dev/null
+++ b/test/app-tap/yield-in-gc-finalizer.test.lua
@@ -0,0 +1,36 @@
+#!/usr/bin/env tarantool
+
+if #arg == 0 then
+ local tap = require('tap')
+ local test = tap.test('test')
+
+ test:plan(1)
+
+ -- XXX: Shell argument <test> is necessary to differ test case
+ -- from the test runner.
+ local cmd = string.gsub('<LUABIN> 2>/dev/null <SCRIPT> test', '%<(%w+)>', {
+ LUABIN = arg[-1],
+ SCRIPT = arg[0],
+ })
+ test:isnt(os.execute(cmd), 0, 'fiber.yield is forbidden in __gc')
+
+ os.exit(test:check() and 0 or 1)
+end
+
+
+-- Test body.
+
+local ffi = require('ffi')
+local fiber = require('fiber')
+
+ffi.cdef('struct test { int foo; };')
+
+local test = ffi.metatype('struct test', {
+ __gc = function() fiber.yield() end,
+})
+
+local t = test(9)
+t = nil
+
+-- This call leads to the platform panic.
+collectgarbage('collect')
--
2.25.0