From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id A9A36469719 for ; Thu, 24 Sep 2020 16:00:52 +0300 (MSK) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\)) From: "sergos@tarantool.org" In-Reply-To: <6b7def49c9d2252425d3944cc4274c9a155ae9e9.1600862684.git.imun@tarantool.org> Date: Thu, 24 Sep 2020 16:00:50 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org> References: <6b7def49c9d2252425d3944cc4274c9a155ae9e9.1600862684.git.imun@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy Hi! Thanks for the patch, see my 2 comments below. > On 23 Sep 2020, at 22:06, Igor Munkin wrote: >=20 > Since Tarantool fibers don't respect Lua coroutine switch mechanism, = JIT > machinery stays unnotified when one lua_State substitutes another one. > As a result if trace recording hasn't been aborted prior to fiber > switch, the recording proceeds using the new lua_State and leads to a > failure either on any further compiler phase or while the compiled = trace > is executed. >=20 > This changeset extends routine aborting trace = recording > when the fiber switches to another one. If the switch-over occurs = while > mcode is being run the platform finishes its execution with = EXIT_FAILURE > code and calls panic routine prior to the exit. >=20 > Closes #1700 > Fixes #4491 >=20 > Signed-off-by: Igor Munkin > --- >=20 > Here are also the benchmark results for the Release build: > * Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max): > | fibers: 10; iters: 100 0% 0% 0% 0% > | fibers: 10; iters: 1000 2% 2% 1% 2% > | fibers: 10; iters: 10000 6% 4% 5% 5% > | fibers: 10; iters: 100000 4% 3% 4% 1% > | fibers: 100; iters: 100 1% 0% 1% 10% > | fibers: 100; iters: 1000 2% 2% 0% -7% > | fibers: 100; iters: 10000 2% 1% 2% 1% > | fibers: 100; iters: 100000 2% 2% 2% 5% > | fibers: 1000; iters: 100 1% 0% 0% -2% > | fibers: 1000; iters: 1000 1% 1% 1% 0% > | fibers: 1000; iters: 10000 1% 1% 1% 2% > | fibers: 1000; iters: 100000 1% 1% 1% 1% > | fibers: 10000; iters: 100 0% 0% 0% 1% > | fibers: 10000; iters: 1000 0% 2% 3% 3% > | fibers: 10000; iters: 10000 0% 0% 0% -2% > | fibers: 10000; iters: 100000 0% 1% 0% -4% >=20 > src/lua/utils.c | 49 +++++++++++++++++ > ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++ > 2 files changed, 102 insertions(+) > create mode 100755 = test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua >=20 > diff --git a/src/lua/utils.c b/src/lua/utils.c > index 8e98f7607..39fe4e30c 100644 > --- a/src/lua/utils.c > +++ b/src/lua/utils.c > @@ -29,6 +29,7 @@ > * SUCH DAMAGE. > */ > #include "lua/utils.h" > +#include >=20 > #include > #include > @@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L) > return 0; > } >=20 > +/* > + * XXX: There is already defined macro in say.h header > + * (included in diag.h). As a result the call below is misexpanded > + * and compilation fails with the corresponding error. To avoid > + * this error the macro is temporary renamed and restored later. > + * Compilation now fails for the following cases: > + * * temporary name <_panic> is used in this translation unit I would propose some more sophisitcated name to avoid possible overlap=20= with =E2=80=98hidden=E2=80=99 _something convention. Since it is = Tarantool sources, a reference to the gh will be just fine, like panic_gh1700. > + * * is not define, so this hack can be freely dropped > + */ > +#if defined(panic) && !defined(_panic) > +# define _panic panic > +# undef panic > +#else > +# error "Can't redefine macro" > +#endif > + > /** > * This routine encloses the checks and actions to be done when > * the running fiber yields the execution. > + * Since Tarantool fibers don't switch-over the way Lua coroutines > + * do the platform ought to notify JIT engine when one lua_State > + * substitutes another one. > */ > void cord_on_yield(void) > { > + struct global_State *g =3D G(tarantool_L); > + /* > + * XXX: Switching fibers while running the trace leads to > + * code misbehaviour and failures, so stop its execution. > + */ > + if (unlikely(tvref(g->jit_base))) { > + /* > + * XXX: mcode is executed only in scope of Lua > + * world and one can obtain the corresponding Lua > + * coroutine from the fiber storage. > + */ > + struct lua_State *L =3D fiber()->storage.lua.stack; > + assert(L !=3D NULL); > + lua_pushstring(L, "Fiber is switched on the trace=E2=80=9D= ); To me it is very much =E2=80=99Too long WAL write=E2=80=99 style. = Everything said, nothing is clear. Could you please elaborate on the =E2=80=98FFI code=E2=80= =99 and=20 =E2=80=98led to a yield=E2=80=99 and probably the fiber number. > + if (g->panic) > + g->panic(L); > + exit(EXIT_FAILURE); > + } > + /* > + * Unconditionally abort trace recording whether fibers > + * switch each other. Otherwise, further compilation may > + * lead to a failure on any next compiler phase. > + */ > + lj_trace_abort(g); > } > + > +/* Restore macro back */ > +#define panic _panic > +#undef _panic > diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua = b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua > new file mode 100755 > index 000000000..0dd8dfbee > --- /dev/null > +++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua > @@ -0,0 +1,53 @@ > +#!/usr/bin/env tarantool > + > +local tap =3D require('tap') > + > +local test =3D tap.test('gh-4491-coio-wait-leads-to-segfault') > + > +-- Test file to demonstrate platform failure due to fiber switch > +-- while trace recording, details: > +-- https://github.com/tarantool/tarantool/issues/4491 > + > +local fiber =3D require('fiber') > +local ffi =3D require('ffi') > +ffi.cdef('int coio_wait(int fd, int event, double timeout);') > + > +local cfg =3D { > + hotloop =3D arg[1] or 1, > + fibers =3D arg[1] or 2, > + timeout =3D { put =3D 1, get =3D 1 }, > +} > + > +test:plan(cfg.fibers + 1) > + > +local args =3D { > + fd =3D 1 , -- STDIN file descriptor > + event =3D 0x1 , -- COIO_READ event > + timeout =3D 0.05, -- Timeout value > +} > + > +local function run(iterations, channel) > + for _ =3D 1, iterations do > + ffi.C.coio_wait(args.fd, args.event, args.timeout) > + end > + channel:put(true, cfg.timeout.put) > +end > + > +local channels =3D { } > + > +jit.opt.start('3', string.format('hotloop=3D%d', cfg.hotloop)) > + > +for _ =3D 1, cfg.fibers do > + channels[_] =3D fiber.channel(1) > + fiber.new(run, cfg.hotloop + 1, channels[_]) > +end > + > +-- Finalize the existing fibers > +for _ =3D 1, cfg.fibers do > + test:ok(channels[_]:get(cfg.timeout.get), > + string.format('fiber #%d successfully finished', _)) > +end > + > +test:ok(true, 'trace is not recorded due to fiber switch underneath = coio_wait') > + > +os.exit(test:check() and 0 or 1) > --=20 > 2.25.0 >=20