From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id CFFC4469719 for ; Wed, 23 Sep 2020 22:16:52 +0300 (MSK) From: Igor Munkin Date: Wed, 23 Sep 2020 22:06:19 +0300 Message-Id: <6b7def49c9d2252425d3944cc4274c9a155ae9e9.1600862684.git.imun@tarantool.org> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , Sergey Ostanevich Cc: tarantool-patches@dev.tarantool.org Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT machinery stays unnotified when one lua_State substitutes another one. As a result if trace recording hasn't been aborted prior to fiber switch, the recording proceeds using the new lua_State and leads to a failure either on any further compiler phase or while the compiled trace is executed. This changeset extends routine aborting trace recording when the fiber switches to another one. If the switch-over occurs while mcode is being run the platform finishes its execution with EXIT_FAILURE code and calls panic routine prior to the exit. Closes #1700 Fixes #4491 Signed-off-by: Igor Munkin --- Here are also the benchmark results for the Release build: * Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max): | fibers: 10; iters: 100 0% 0% 0% 0% | fibers: 10; iters: 1000 2% 2% 1% 2% | fibers: 10; iters: 10000 6% 4% 5% 5% | fibers: 10; iters: 100000 4% 3% 4% 1% | fibers: 100; iters: 100 1% 0% 1% 10% | fibers: 100; iters: 1000 2% 2% 0% -7% | fibers: 100; iters: 10000 2% 1% 2% 1% | fibers: 100; iters: 100000 2% 2% 2% 5% | fibers: 1000; iters: 100 1% 0% 0% -2% | fibers: 1000; iters: 1000 1% 1% 1% 0% | fibers: 1000; iters: 10000 1% 1% 1% 2% | fibers: 1000; iters: 100000 1% 1% 1% 1% | fibers: 10000; iters: 100 0% 0% 0% 1% | fibers: 10000; iters: 1000 0% 2% 3% 3% | fibers: 10000; iters: 10000 0% 0% 0% -2% | fibers: 10000; iters: 100000 0% 1% 0% -4% src/lua/utils.c | 49 +++++++++++++++++ ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++ 2 files changed, 102 insertions(+) create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua diff --git a/src/lua/utils.c b/src/lua/utils.c index 8e98f7607..39fe4e30c 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -29,6 +29,7 @@ * SUCH DAMAGE. */ #include "lua/utils.h" +#include #include #include @@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L) return 0; } +/* + * XXX: There is already defined macro in say.h header + * (included in diag.h). As a result the call below is misexpanded + * and compilation fails with the corresponding error. To avoid + * this error the macro is temporary renamed and restored later. + * Compilation now fails for the following cases: + * * temporary name <_panic> is used in this translation unit + * * is not define, so this hack can be freely dropped + */ +#if defined(panic) && !defined(_panic) +# define _panic panic +# undef panic +#else +# error "Can't redefine macro" +#endif + /** * This routine encloses the checks and actions to be done when * the running fiber yields the execution. + * Since Tarantool fibers don't switch-over the way Lua coroutines + * do the platform ought to notify JIT engine when one lua_State + * substitutes another one. */ void cord_on_yield(void) { + struct global_State *g = G(tarantool_L); + /* + * XXX: Switching fibers while running the trace leads to + * code misbehaviour and failures, so stop its execution. + */ + if (unlikely(tvref(g->jit_base))) { + /* + * XXX: mcode is executed only in scope of Lua + * world and one can obtain the corresponding Lua + * coroutine from the fiber storage. + */ + struct lua_State *L = fiber()->storage.lua.stack; + assert(L != NULL); + lua_pushstring(L, "Fiber is switched on the trace"); + if (g->panic) + g->panic(L); + exit(EXIT_FAILURE); + } + /* + * Unconditionally abort trace recording whether fibers + * switch each other. Otherwise, further compilation may + * lead to a failure on any next compiler phase. + */ + lj_trace_abort(g); } + +/* Restore macro back */ +#define panic _panic +#undef _panic diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua new file mode 100755 index 000000000..0dd8dfbee --- /dev/null +++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua @@ -0,0 +1,53 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') + +local test = tap.test('gh-4491-coio-wait-leads-to-segfault') + +-- Test file to demonstrate platform failure due to fiber switch +-- while trace recording, details: +-- https://github.com/tarantool/tarantool/issues/4491 + +local fiber = require('fiber') +local ffi = require('ffi') +ffi.cdef('int coio_wait(int fd, int event, double timeout);') + +local cfg = { + hotloop = arg[1] or 1, + fibers = arg[1] or 2, + timeout = { put = 1, get = 1 }, +} + +test:plan(cfg.fibers + 1) + +local args = { + fd = 1 , -- STDIN file descriptor + event = 0x1 , -- COIO_READ event + timeout = 0.05, -- Timeout value +} + +local function run(iterations, channel) + for _ = 1, iterations do + ffi.C.coio_wait(args.fd, args.event, args.timeout) + end + channel:put(true, cfg.timeout.put) +end + +local channels = { } + +jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop)) + +for _ = 1, cfg.fibers do + channels[_] = fiber.channel(1) + fiber.new(run, cfg.hotloop + 1, channels[_]) +end + +-- Finalize the existing fibers +for _ = 1, cfg.fibers do + test:ok(channels[_]:get(cfg.timeout.get), + string.format('fiber #%d successfully finished', _)) +end + +test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait') + +os.exit(test:check() and 0 or 1) -- 2.25.0