[Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield

Igor Munkin imun at tarantool.org
Wed Sep 23 22:06:19 MSK 2020


Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
machinery stays unnotified when one lua_State substitutes another one.
As a result if trace recording hasn't been aborted prior to fiber
switch, the recording proceeds using the new lua_State and leads to a
failure either on any further compiler phase or while the compiled trace
is executed.

This changeset extends <cord_on_yield> routine aborting trace recording
when the fiber switches to another one. If the switch-over occurs while
mcode is being run the platform finishes its execution with EXIT_FAILURE
code and calls panic routine prior to the exit.

Closes #1700
Fixes #4491

Signed-off-by: Igor Munkin <imun at tarantool.org>
---

Here are also the benchmark results for the Release build:
* Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max):
| fibers: 10; iters: 100	0%	0%	0%	0%
| fibers: 10; iters: 1000	2%	2%	1%	2%
| fibers: 10; iters: 10000	6%	4%	5%	5%
| fibers: 10; iters: 100000	4%	3%	4%	1%
| fibers: 100; iters: 100	1%	0%	1%	10%
| fibers: 100; iters: 1000	2%	2%	0%	-7%
| fibers: 100; iters: 10000	2%	1%	2%	1%
| fibers: 100; iters: 100000	2%	2%	2%	5%
| fibers: 1000; iters: 100	1%	0%	0%	-2%
| fibers: 1000; iters: 1000	1%	1%	1%	0%
| fibers: 1000; iters: 10000	1%	1%	1%	2%
| fibers: 1000; iters: 100000	1%	1%	1%	1%
| fibers: 10000; iters: 100	0%	0%	0%	1%
| fibers: 10000; iters: 1000	0%	2%	3%	3%
| fibers: 10000; iters: 10000	0%	0%	0%	-2%
| fibers: 10000; iters: 100000	0%	1%	0%	-4%

 src/lua/utils.c                               | 49 +++++++++++++++++
 ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++
 2 files changed, 102 insertions(+)
 create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 8e98f7607..39fe4e30c 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -29,6 +29,7 @@
  * SUCH DAMAGE.
  */
 #include "lua/utils.h"
+#include <lj_trace.h>
 
 #include <assert.h>
 #include <errno.h>
@@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L)
 	return 0;
 }
 
+/*
+ * XXX: There is already defined <panic> macro in say.h header
+ * (included in diag.h). As a result the call below is misexpanded
+ * and compilation fails with the corresponding error. To avoid
+ * this error the macro is temporary renamed and restored later.
+ * Compilation now fails for the following cases:
+ * * temporary name <_panic> is used in this translation unit
+ * * <panic> is not define, so this hack can be freely dropped
+ */
+#if defined(panic) && !defined(_panic)
+#  define _panic panic
+#  undef panic
+#else
+#  error "Can't redefine <panic> macro"
+#endif
+
 /**
  * This routine encloses the checks and actions to be done when
  * the running fiber yields the execution.
+ * Since Tarantool fibers don't switch-over the way Lua coroutines
+ * do the platform ought to notify JIT engine when one lua_State
+ * substitutes another one.
  */
 void cord_on_yield(void)
 {
+	struct global_State *g = G(tarantool_L);
+	/*
+	 * XXX: Switching fibers while running the trace leads to
+	 * code misbehaviour and failures, so stop its execution.
+	 */
+	if (unlikely(tvref(g->jit_base))) {
+		/*
+		 * XXX: mcode is executed only in scope of Lua
+		 * world and one can obtain the corresponding Lua
+		 * coroutine from the fiber storage.
+		 */
+		struct lua_State *L = fiber()->storage.lua.stack;
+		assert(L != NULL);
+		lua_pushstring(L, "Fiber is switched on the trace");
+		if (g->panic)
+			g->panic(L);
+		exit(EXIT_FAILURE);
+	}
+	/*
+	 * Unconditionally abort trace recording whether fibers
+	 * switch each other. Otherwise, further compilation may
+	 * lead to a failure on any next compiler phase.
+	 */
+	lj_trace_abort(g);
 }
+
+/* Restore <panic> macro back */
+#define panic _panic
+#undef _panic
diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
new file mode 100755
index 000000000..0dd8dfbee
--- /dev/null
+++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
@@ -0,0 +1,53 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+
+local test = tap.test('gh-4491-coio-wait-leads-to-segfault')
+
+-- Test file to demonstrate platform failure due to fiber switch
+-- while trace recording, details:
+--     https://github.com/tarantool/tarantool/issues/4491
+
+local fiber = require('fiber')
+local ffi = require('ffi')
+ffi.cdef('int coio_wait(int fd, int event, double timeout);')
+
+local cfg = {
+  hotloop = arg[1] or 1,
+  fibers = arg[1] or 2,
+  timeout = { put = 1, get = 1 },
+}
+
+test:plan(cfg.fibers + 1)
+
+local args = {
+  fd      = 1   , -- STDIN file descriptor
+  event   = 0x1 , -- COIO_READ event
+  timeout = 0.05, -- Timeout value
+}
+
+local function run(iterations, channel)
+  for _ = 1, iterations do
+    ffi.C.coio_wait(args.fd, args.event, args.timeout)
+  end
+  channel:put(true, cfg.timeout.put)
+end
+
+local channels = { }
+
+jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop))
+
+for _ = 1, cfg.fibers do
+  channels[_] = fiber.channel(1)
+  fiber.new(run, cfg.hotloop + 1, channels[_])
+end
+
+-- Finalize the existing fibers
+for _ = 1, cfg.fibers do
+  test:ok(channels[_]:get(cfg.timeout.get),
+          string.format('fiber #%d successfully finished', _))
+end
+
+test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait')
+
+os.exit(test:check() and 0 or 1)
-- 
2.25.0



More information about the Tarantool-patches mailing list