From: "sergos@tarantool.org" <sergos@tarantool.org> To: Igor Munkin <imun@tarantool.org> Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield Date: Thu, 24 Sep 2020 16:00:50 +0300 [thread overview] Message-ID: <9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org> (raw) In-Reply-To: <6b7def49c9d2252425d3944cc4274c9a155ae9e9.1600862684.git.imun@tarantool.org> Hi! Thanks for the patch, see my 2 comments below. > On 23 Sep 2020, at 22:06, Igor Munkin <imun@tarantool.org> wrote: > > Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT > machinery stays unnotified when one lua_State substitutes another one. > As a result if trace recording hasn't been aborted prior to fiber > switch, the recording proceeds using the new lua_State and leads to a > failure either on any further compiler phase or while the compiled trace > is executed. > > This changeset extends <cord_on_yield> routine aborting trace recording > when the fiber switches to another one. If the switch-over occurs while > mcode is being run the platform finishes its execution with EXIT_FAILURE > code and calls panic routine prior to the exit. > > Closes #1700 > Fixes #4491 > > Signed-off-by: Igor Munkin <imun@tarantool.org> > --- > > Here are also the benchmark results for the Release build: > * Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max): > | fibers: 10; iters: 100 0% 0% 0% 0% > | fibers: 10; iters: 1000 2% 2% 1% 2% > | fibers: 10; iters: 10000 6% 4% 5% 5% > | fibers: 10; iters: 100000 4% 3% 4% 1% > | fibers: 100; iters: 100 1% 0% 1% 10% > | fibers: 100; iters: 1000 2% 2% 0% -7% > | fibers: 100; iters: 10000 2% 1% 2% 1% > | fibers: 100; iters: 100000 2% 2% 2% 5% > | fibers: 1000; iters: 100 1% 0% 0% -2% > | fibers: 1000; iters: 1000 1% 1% 1% 0% > | fibers: 1000; iters: 10000 1% 1% 1% 2% > | fibers: 1000; iters: 100000 1% 1% 1% 1% > | fibers: 10000; iters: 100 0% 0% 0% 1% > | fibers: 10000; iters: 1000 0% 2% 3% 3% > | fibers: 10000; iters: 10000 0% 0% 0% -2% > | fibers: 10000; iters: 100000 0% 1% 0% -4% > > src/lua/utils.c | 49 +++++++++++++++++ > ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++ > 2 files changed, 102 insertions(+) > create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua > > diff --git a/src/lua/utils.c b/src/lua/utils.c > index 8e98f7607..39fe4e30c 100644 > --- a/src/lua/utils.c > +++ b/src/lua/utils.c > @@ -29,6 +29,7 @@ > * SUCH DAMAGE. > */ > #include "lua/utils.h" > +#include <lj_trace.h> > > #include <assert.h> > #include <errno.h> > @@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L) > return 0; > } > > +/* > + * XXX: There is already defined <panic> macro in say.h header > + * (included in diag.h). As a result the call below is misexpanded > + * and compilation fails with the corresponding error. To avoid > + * this error the macro is temporary renamed and restored later. > + * Compilation now fails for the following cases: > + * * temporary name <_panic> is used in this translation unit I would propose some more sophisitcated name to avoid possible overlap with ‘hidden’ _something convention. Since it is Tarantool sources, a reference to the gh will be just fine, like panic_gh1700. > + * * <panic> is not define, so this hack can be freely dropped > + */ > +#if defined(panic) && !defined(_panic) > +# define _panic panic > +# undef panic > +#else > +# error "Can't redefine <panic> macro" > +#endif > + > /** > * This routine encloses the checks and actions to be done when > * the running fiber yields the execution. > + * Since Tarantool fibers don't switch-over the way Lua coroutines > + * do the platform ought to notify JIT engine when one lua_State > + * substitutes another one. > */ > void cord_on_yield(void) > { > + struct global_State *g = G(tarantool_L); > + /* > + * XXX: Switching fibers while running the trace leads to > + * code misbehaviour and failures, so stop its execution. > + */ > + if (unlikely(tvref(g->jit_base))) { > + /* > + * XXX: mcode is executed only in scope of Lua > + * world and one can obtain the corresponding Lua > + * coroutine from the fiber storage. > + */ > + struct lua_State *L = fiber()->storage.lua.stack; > + assert(L != NULL); > + lua_pushstring(L, "Fiber is switched on the trace”); To me it is very much ’Too long WAL write’ style. Everything said, nothing is clear. Could you please elaborate on the ‘FFI code’ and ‘led to a yield’ and probably the fiber number. > + if (g->panic) > + g->panic(L); > + exit(EXIT_FAILURE); > + } > + /* > + * Unconditionally abort trace recording whether fibers > + * switch each other. Otherwise, further compilation may > + * lead to a failure on any next compiler phase. > + */ > + lj_trace_abort(g); > } > + > +/* Restore <panic> macro back */ > +#define panic _panic > +#undef _panic > diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua > new file mode 100755 > index 000000000..0dd8dfbee > --- /dev/null > +++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua > @@ -0,0 +1,53 @@ > +#!/usr/bin/env tarantool > + > +local tap = require('tap') > + > +local test = tap.test('gh-4491-coio-wait-leads-to-segfault') > + > +-- Test file to demonstrate platform failure due to fiber switch > +-- while trace recording, details: > +-- https://github.com/tarantool/tarantool/issues/4491 > + > +local fiber = require('fiber') > +local ffi = require('ffi') > +ffi.cdef('int coio_wait(int fd, int event, double timeout);') > + > +local cfg = { > + hotloop = arg[1] or 1, > + fibers = arg[1] or 2, > + timeout = { put = 1, get = 1 }, > +} > + > +test:plan(cfg.fibers + 1) > + > +local args = { > + fd = 1 , -- STDIN file descriptor > + event = 0x1 , -- COIO_READ event > + timeout = 0.05, -- Timeout value > +} > + > +local function run(iterations, channel) > + for _ = 1, iterations do > + ffi.C.coio_wait(args.fd, args.event, args.timeout) > + end > + channel:put(true, cfg.timeout.put) > +end > + > +local channels = { } > + > +jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop)) > + > +for _ = 1, cfg.fibers do > + channels[_] = fiber.channel(1) > + fiber.new(run, cfg.hotloop + 1, channels[_]) > +end > + > +-- Finalize the existing fibers > +for _ = 1, cfg.fibers do > + test:ok(channels[_]:get(cfg.timeout.get), > + string.format('fiber #%d successfully finished', _)) > +end > + > +test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait') > + > +os.exit(test:check() and 0 or 1) > -- > 2.25.0 >
next prev parent reply other threads:[~2020-09-24 13:00 UTC|newest] Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-23 19:06 [Tarantool-patches] [PATCH v2 0/2] Prevent JIT engine breakage on fibers switch-over Igor Munkin 2020-09-23 19:06 ` [Tarantool-patches] [PATCH v2 1/2] fiber: introduce a callback for " Igor Munkin 2020-09-24 12:54 ` sergos 2020-09-28 13:06 ` Igor Munkin 2020-09-29 9:15 ` Sergey Ostanevich 2020-09-29 10:05 ` Igor Munkin 2020-09-29 22:41 ` Vladislav Shpilevoy 2020-09-30 9:30 ` Igor Munkin 2020-09-30 22:00 ` Vladislav Shpilevoy 2020-09-23 19:06 ` [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield Igor Munkin 2020-09-24 13:00 ` sergos [this message] 2020-09-28 13:07 ` Igor Munkin 2020-09-28 15:36 ` Igor Munkin 2020-09-28 16:37 ` Igor Munkin 2020-09-28 17:45 ` Igor Munkin 2020-09-29 9:24 ` Sergey Ostanevich 2020-09-29 10:06 ` Igor Munkin 2020-09-29 22:41 ` Vladislav Shpilevoy 2020-09-30 6:27 ` Igor Munkin 2020-09-30 21:59 ` Vladislav Shpilevoy 2020-10-01 6:14 ` Igor Munkin 2020-09-24 13:15 ` [Tarantool-patches] [PATCH v2 0/2] Prevent JIT engine breakage on fibers switch-over sergos 2020-09-28 13:06 ` Igor Munkin 2020-09-29 9:14 ` Sergey Ostanevich 2020-10-01 21:25 ` Vladislav Shpilevoy 2020-10-01 21:29 ` Igor Munkin 2020-10-01 22:17 ` Igor Munkin 2020-10-02 12:43 ` Kirill Yukhin 2020-10-02 12:44 ` Igor Munkin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org \ --to=sergos@tarantool.org \ --cc=imun@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox