[Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield
Igor Munkin
imun at tarantool.org
Wed Sep 23 22:06:19 MSK 2020
Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
machinery stays unnotified when one lua_State substitutes another one.
As a result if trace recording hasn't been aborted prior to fiber
switch, the recording proceeds using the new lua_State and leads to a
failure either on any further compiler phase or while the compiled trace
is executed.
This changeset extends <cord_on_yield> routine aborting trace recording
when the fiber switches to another one. If the switch-over occurs while
mcode is being run the platform finishes its execution with EXIT_FAILURE
code and calls panic routine prior to the exit.
Closes #1700
Fixes #4491
Signed-off-by: Igor Munkin <imun at tarantool.org>
---
Here are also the benchmark results for the Release build:
* Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max):
| fibers: 10; iters: 100 0% 0% 0% 0%
| fibers: 10; iters: 1000 2% 2% 1% 2%
| fibers: 10; iters: 10000 6% 4% 5% 5%
| fibers: 10; iters: 100000 4% 3% 4% 1%
| fibers: 100; iters: 100 1% 0% 1% 10%
| fibers: 100; iters: 1000 2% 2% 0% -7%
| fibers: 100; iters: 10000 2% 1% 2% 1%
| fibers: 100; iters: 100000 2% 2% 2% 5%
| fibers: 1000; iters: 100 1% 0% 0% -2%
| fibers: 1000; iters: 1000 1% 1% 1% 0%
| fibers: 1000; iters: 10000 1% 1% 1% 2%
| fibers: 1000; iters: 100000 1% 1% 1% 1%
| fibers: 10000; iters: 100 0% 0% 0% 1%
| fibers: 10000; iters: 1000 0% 2% 3% 3%
| fibers: 10000; iters: 10000 0% 0% 0% -2%
| fibers: 10000; iters: 100000 0% 1% 0% -4%
src/lua/utils.c | 49 +++++++++++++++++
...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++
2 files changed, 102 insertions(+)
create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 8e98f7607..39fe4e30c 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -29,6 +29,7 @@
* SUCH DAMAGE.
*/
#include "lua/utils.h"
+#include <lj_trace.h>
#include <assert.h>
#include <errno.h>
@@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L)
return 0;
}
+/*
+ * XXX: There is already defined <panic> macro in say.h header
+ * (included in diag.h). As a result the call below is misexpanded
+ * and compilation fails with the corresponding error. To avoid
+ * this error the macro is temporary renamed and restored later.
+ * Compilation now fails for the following cases:
+ * * temporary name <_panic> is used in this translation unit
+ * * <panic> is not define, so this hack can be freely dropped
+ */
+#if defined(panic) && !defined(_panic)
+# define _panic panic
+# undef panic
+#else
+# error "Can't redefine <panic> macro"
+#endif
+
/**
* This routine encloses the checks and actions to be done when
* the running fiber yields the execution.
+ * Since Tarantool fibers don't switch-over the way Lua coroutines
+ * do the platform ought to notify JIT engine when one lua_State
+ * substitutes another one.
*/
void cord_on_yield(void)
{
+ struct global_State *g = G(tarantool_L);
+ /*
+ * XXX: Switching fibers while running the trace leads to
+ * code misbehaviour and failures, so stop its execution.
+ */
+ if (unlikely(tvref(g->jit_base))) {
+ /*
+ * XXX: mcode is executed only in scope of Lua
+ * world and one can obtain the corresponding Lua
+ * coroutine from the fiber storage.
+ */
+ struct lua_State *L = fiber()->storage.lua.stack;
+ assert(L != NULL);
+ lua_pushstring(L, "Fiber is switched on the trace");
+ if (g->panic)
+ g->panic(L);
+ exit(EXIT_FAILURE);
+ }
+ /*
+ * Unconditionally abort trace recording whether fibers
+ * switch each other. Otherwise, further compilation may
+ * lead to a failure on any next compiler phase.
+ */
+ lj_trace_abort(g);
}
+
+/* Restore <panic> macro back */
+#define panic _panic
+#undef _panic
diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
new file mode 100755
index 000000000..0dd8dfbee
--- /dev/null
+++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
@@ -0,0 +1,53 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+
+local test = tap.test('gh-4491-coio-wait-leads-to-segfault')
+
+-- Test file to demonstrate platform failure due to fiber switch
+-- while trace recording, details:
+-- https://github.com/tarantool/tarantool/issues/4491
+
+local fiber = require('fiber')
+local ffi = require('ffi')
+ffi.cdef('int coio_wait(int fd, int event, double timeout);')
+
+local cfg = {
+ hotloop = arg[1] or 1,
+ fibers = arg[1] or 2,
+ timeout = { put = 1, get = 1 },
+}
+
+test:plan(cfg.fibers + 1)
+
+local args = {
+ fd = 1 , -- STDIN file descriptor
+ event = 0x1 , -- COIO_READ event
+ timeout = 0.05, -- Timeout value
+}
+
+local function run(iterations, channel)
+ for _ = 1, iterations do
+ ffi.C.coio_wait(args.fd, args.event, args.timeout)
+ end
+ channel:put(true, cfg.timeout.put)
+end
+
+local channels = { }
+
+jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop))
+
+for _ = 1, cfg.fibers do
+ channels[_] = fiber.channel(1)
+ fiber.new(run, cfg.hotloop + 1, channels[_])
+end
+
+-- Finalize the existing fibers
+for _ = 1, cfg.fibers do
+ test:ok(channels[_]:get(cfg.timeout.get),
+ string.format('fiber #%d successfully finished', _))
+end
+
+test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait')
+
+os.exit(test:check() and 0 or 1)
--
2.25.0
More information about the Tarantool-patches
mailing list