From: "sergos@tarantool.org" <sergos@tarantool.org>
To: Igor Munkin <imun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield
Date: Thu, 24 Sep 2020 16:00:50 +0300 [thread overview]
Message-ID: <9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org> (raw)
In-Reply-To: <6b7def49c9d2252425d3944cc4274c9a155ae9e9.1600862684.git.imun@tarantool.org>
Hi!
Thanks for the patch, see my 2 comments below.
> On 23 Sep 2020, at 22:06, Igor Munkin <imun@tarantool.org> wrote:
>
> Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
> machinery stays unnotified when one lua_State substitutes another one.
> As a result if trace recording hasn't been aborted prior to fiber
> switch, the recording proceeds using the new lua_State and leads to a
> failure either on any further compiler phase or while the compiled trace
> is executed.
>
> This changeset extends <cord_on_yield> routine aborting trace recording
> when the fiber switches to another one. If the switch-over occurs while
> mcode is being run the platform finishes its execution with EXIT_FAILURE
> code and calls panic routine prior to the exit.
>
> Closes #1700
> Fixes #4491
>
> Signed-off-by: Igor Munkin <imun@tarantool.org>
> ---
>
> Here are also the benchmark results for the Release build:
> * Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max):
> | fibers: 10; iters: 100 0% 0% 0% 0%
> | fibers: 10; iters: 1000 2% 2% 1% 2%
> | fibers: 10; iters: 10000 6% 4% 5% 5%
> | fibers: 10; iters: 100000 4% 3% 4% 1%
> | fibers: 100; iters: 100 1% 0% 1% 10%
> | fibers: 100; iters: 1000 2% 2% 0% -7%
> | fibers: 100; iters: 10000 2% 1% 2% 1%
> | fibers: 100; iters: 100000 2% 2% 2% 5%
> | fibers: 1000; iters: 100 1% 0% 0% -2%
> | fibers: 1000; iters: 1000 1% 1% 1% 0%
> | fibers: 1000; iters: 10000 1% 1% 1% 2%
> | fibers: 1000; iters: 100000 1% 1% 1% 1%
> | fibers: 10000; iters: 100 0% 0% 0% 1%
> | fibers: 10000; iters: 1000 0% 2% 3% 3%
> | fibers: 10000; iters: 10000 0% 0% 0% -2%
> | fibers: 10000; iters: 100000 0% 1% 0% -4%
>
> src/lua/utils.c | 49 +++++++++++++++++
> ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++
> 2 files changed, 102 insertions(+)
> create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
>
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 8e98f7607..39fe4e30c 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -29,6 +29,7 @@
> * SUCH DAMAGE.
> */
> #include "lua/utils.h"
> +#include <lj_trace.h>
>
> #include <assert.h>
> #include <errno.h>
> @@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L)
> return 0;
> }
>
> +/*
> + * XXX: There is already defined <panic> macro in say.h header
> + * (included in diag.h). As a result the call below is misexpanded
> + * and compilation fails with the corresponding error. To avoid
> + * this error the macro is temporary renamed and restored later.
> + * Compilation now fails for the following cases:
> + * * temporary name <_panic> is used in this translation unit
I would propose some more sophisitcated name to avoid possible overlap
with ‘hidden’ _something convention. Since it is Tarantool sources, a
reference to the gh will be just fine, like panic_gh1700.
> + * * <panic> is not define, so this hack can be freely dropped
> + */
> +#if defined(panic) && !defined(_panic)
> +# define _panic panic
> +# undef panic
> +#else
> +# error "Can't redefine <panic> macro"
> +#endif
> +
> /**
> * This routine encloses the checks and actions to be done when
> * the running fiber yields the execution.
> + * Since Tarantool fibers don't switch-over the way Lua coroutines
> + * do the platform ought to notify JIT engine when one lua_State
> + * substitutes another one.
> */
> void cord_on_yield(void)
> {
> + struct global_State *g = G(tarantool_L);
> + /*
> + * XXX: Switching fibers while running the trace leads to
> + * code misbehaviour and failures, so stop its execution.
> + */
> + if (unlikely(tvref(g->jit_base))) {
> + /*
> + * XXX: mcode is executed only in scope of Lua
> + * world and one can obtain the corresponding Lua
> + * coroutine from the fiber storage.
> + */
> + struct lua_State *L = fiber()->storage.lua.stack;
> + assert(L != NULL);
> + lua_pushstring(L, "Fiber is switched on the trace”);
To me it is very much ’Too long WAL write’ style. Everything said,
nothing is clear. Could you please elaborate on the ‘FFI code’ and
‘led to a yield’ and probably the fiber number.
> + if (g->panic)
> + g->panic(L);
> + exit(EXIT_FAILURE);
> + }
> + /*
> + * Unconditionally abort trace recording whether fibers
> + * switch each other. Otherwise, further compilation may
> + * lead to a failure on any next compiler phase.
> + */
> + lj_trace_abort(g);
> }
> +
> +/* Restore <panic> macro back */
> +#define panic _panic
> +#undef _panic
> diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
> new file mode 100755
> index 000000000..0dd8dfbee
> --- /dev/null
> +++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
> @@ -0,0 +1,53 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +
> +local test = tap.test('gh-4491-coio-wait-leads-to-segfault')
> +
> +-- Test file to demonstrate platform failure due to fiber switch
> +-- while trace recording, details:
> +-- https://github.com/tarantool/tarantool/issues/4491
> +
> +local fiber = require('fiber')
> +local ffi = require('ffi')
> +ffi.cdef('int coio_wait(int fd, int event, double timeout);')
> +
> +local cfg = {
> + hotloop = arg[1] or 1,
> + fibers = arg[1] or 2,
> + timeout = { put = 1, get = 1 },
> +}
> +
> +test:plan(cfg.fibers + 1)
> +
> +local args = {
> + fd = 1 , -- STDIN file descriptor
> + event = 0x1 , -- COIO_READ event
> + timeout = 0.05, -- Timeout value
> +}
> +
> +local function run(iterations, channel)
> + for _ = 1, iterations do
> + ffi.C.coio_wait(args.fd, args.event, args.timeout)
> + end
> + channel:put(true, cfg.timeout.put)
> +end
> +
> +local channels = { }
> +
> +jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop))
> +
> +for _ = 1, cfg.fibers do
> + channels[_] = fiber.channel(1)
> + fiber.new(run, cfg.hotloop + 1, channels[_])
> +end
> +
> +-- Finalize the existing fibers
> +for _ = 1, cfg.fibers do
> + test:ok(channels[_]:get(cfg.timeout.get),
> + string.format('fiber #%d successfully finished', _))
> +end
> +
> +test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait')
> +
> +os.exit(test:check() and 0 or 1)
> --
> 2.25.0
>
next prev parent reply other threads:[~2020-09-24 13:00 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-23 19:06 [Tarantool-patches] [PATCH v2 0/2] Prevent JIT engine breakage on fibers switch-over Igor Munkin
2020-09-23 19:06 ` [Tarantool-patches] [PATCH v2 1/2] fiber: introduce a callback for " Igor Munkin
2020-09-24 12:54 ` sergos
2020-09-28 13:06 ` Igor Munkin
2020-09-29 9:15 ` Sergey Ostanevich
2020-09-29 10:05 ` Igor Munkin
2020-09-29 22:41 ` Vladislav Shpilevoy
2020-09-30 9:30 ` Igor Munkin
2020-09-30 22:00 ` Vladislav Shpilevoy
2020-09-23 19:06 ` [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield Igor Munkin
2020-09-24 13:00 ` sergos [this message]
2020-09-28 13:07 ` Igor Munkin
2020-09-28 15:36 ` Igor Munkin
2020-09-28 16:37 ` Igor Munkin
2020-09-28 17:45 ` Igor Munkin
2020-09-29 9:24 ` Sergey Ostanevich
2020-09-29 10:06 ` Igor Munkin
2020-09-29 22:41 ` Vladislav Shpilevoy
2020-09-30 6:27 ` Igor Munkin
2020-09-30 21:59 ` Vladislav Shpilevoy
2020-10-01 6:14 ` Igor Munkin
2020-09-24 13:15 ` [Tarantool-patches] [PATCH v2 0/2] Prevent JIT engine breakage on fibers switch-over sergos
2020-09-28 13:06 ` Igor Munkin
2020-09-29 9:14 ` Sergey Ostanevich
2020-10-01 21:25 ` Vladislav Shpilevoy
2020-10-01 21:29 ` Igor Munkin
2020-10-01 22:17 ` Igor Munkin
2020-10-02 12:43 ` Kirill Yukhin
2020-10-02 12:44 ` Igor Munkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org \
--to=sergos@tarantool.org \
--cc=imun@tarantool.org \
--cc=tarantool-patches@dev.tarantool.org \
--cc=v.shpilevoy@tarantool.org \
--subject='Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox