From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id B762C4696C3 for ; Tue, 31 Mar 2020 01:50:49 +0300 (MSK) From: Igor Munkin Date: Tue, 31 Mar 2020 01:44:31 +0300 Message-Id: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich , Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Since Tarantool fibers doesn't respect Lua coroutine switch mechanism, JIT machinery stays unnotified when one lua_State substitues another one. As a result if trace recording hasn't been aborted prior to fiber switch, the recording proceeds using the new lua_State and leads to failure either on any compiler phase or on execution the compiled trace. This changeset adds a mandatory on_yield trigger aborting trace recording when fiber switches to another one. Fixes #4491 Closes #1700 Signed-off-by: Igor Munkin --- This patch is based on the changes introduced within [1]. Branch: https://github.com/tarantool/tarantool/tree/imun/gh-1700-abort-recording-on-fiber-switch Issue: * https://github.com/tarantool/tarantool/issues/1700 * https://github.com/tarantool/tarantool/issues/4491 [1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-March/015212.html src/lua/fiber.c | 39 +++++++++++++- ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++ 2 files changed, 90 insertions(+), 2 deletions(-) create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua diff --git a/src/lua/fiber.c b/src/lua/fiber.c index 45bc03787..c79aa7c2b 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -38,6 +38,7 @@ #include #include #include +#include void luaL_testcancel(struct lua_State *L) @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) return result; } +static int +fiber_on_yield(struct trigger *trigger, void *event) +{ + (void) trigger; + (void) event; + + /* + * XXX: According to LuaJIT API reference luaJIT_setmode + * function returns 0 on failure and 1 on success. Since + * this call is aimed to abort the trace recording on + * fiber_yield call and prevent already compiled traces + * execution, the mode parameter is an invalid value (-1) + * so JIT mode change is expected to fail and 0 is + * returned. Otherwise non-zero return value signals + * trigger_run routine about this trigger failure. + */ + return luaJIT_setmode(tarantool_L, 0, -1); +} + /** * Utility function for fiber.create and fiber.new */ @@ -467,10 +487,18 @@ fiber_create(struct lua_State *L) struct fiber *f = fiber_new("lua", lua_fiber_run_f); if (f == NULL) { - luaL_unref(L, LUA_REGISTRYINDEX, coro_ref); - luaT_error(L); + /* diagnostics is set in fiber_new. */ + goto error; } + struct trigger *t = malloc(sizeof(*t)); + if (t == NULL) { + diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); + goto error; + } + trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free); + trigger_add(&f->on_yield, t); + /* Move the arguments to the new coro */ lua_xmove(L, child_L, lua_gettop(L)); /* XXX: 'fiber' is leaked if this throws a Lua error. */ @@ -483,6 +511,13 @@ fiber_create(struct lua_State *L) lua_pushinteger(child_L, coro_ref); f->storage.lua.stack = child_L; return f; + +error: + /* Release the anchored coroutine. */ + luaL_unref(L, LUA_REGISTRYINDEX, coro_ref); + luaT_error(L); + unreachable(); + return NULL; } /** diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua new file mode 100755 index 000000000..0dd8dfbee --- /dev/null +++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua @@ -0,0 +1,53 @@ +#!/usr/bin/env tarantool + +local tap = require('tap') + +local test = tap.test('gh-4491-coio-wait-leads-to-segfault') + +-- Test file to demonstrate platform failure due to fiber switch +-- while trace recording, details: +-- https://github.com/tarantool/tarantool/issues/4491 + +local fiber = require('fiber') +local ffi = require('ffi') +ffi.cdef('int coio_wait(int fd, int event, double timeout);') + +local cfg = { + hotloop = arg[1] or 1, + fibers = arg[1] or 2, + timeout = { put = 1, get = 1 }, +} + +test:plan(cfg.fibers + 1) + +local args = { + fd = 1 , -- STDIN file descriptor + event = 0x1 , -- COIO_READ event + timeout = 0.05, -- Timeout value +} + +local function run(iterations, channel) + for _ = 1, iterations do + ffi.C.coio_wait(args.fd, args.event, args.timeout) + end + channel:put(true, cfg.timeout.put) +end + +local channels = { } + +jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop)) + +for _ = 1, cfg.fibers do + channels[_] = fiber.channel(1) + fiber.new(run, cfg.hotloop + 1, channels[_]) +end + +-- Finalize the existing fibers +for _ = 1, cfg.fibers do + test:ok(channels[_]:get(cfg.timeout.get), + string.format('fiber #%d successfully finished', _)) +end + +test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait') + +os.exit(test:check() and 0 or 1) -- 2.25.0