Tarantool development patches archive
 help / color / mirror / Atom feed
From: "sergos@tarantool.org" <sergos@tarantool.org>
To: Igor Munkin <imun@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield
Date: Thu, 24 Sep 2020 16:00:50 +0300	[thread overview]
Message-ID: <9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org> (raw)
In-Reply-To: <6b7def49c9d2252425d3944cc4274c9a155ae9e9.1600862684.git.imun@tarantool.org>

Hi!

Thanks for the patch, see my 2 comments below.


> On 23 Sep 2020, at 22:06, Igor Munkin <imun@tarantool.org> wrote:
> 
> Since Tarantool fibers don't respect Lua coroutine switch mechanism, JIT
> machinery stays unnotified when one lua_State substitutes another one.
> As a result if trace recording hasn't been aborted prior to fiber
> switch, the recording proceeds using the new lua_State and leads to a
> failure either on any further compiler phase or while the compiled trace
> is executed.
> 
> This changeset extends <cord_on_yield> routine aborting trace recording
> when the fiber switches to another one. If the switch-over occurs while
> mcode is being run the platform finishes its execution with EXIT_FAILURE
> code and calls panic routine prior to the exit.
> 
> Closes #1700
> Fixes #4491
> 
> Signed-off-by: Igor Munkin <imun@tarantool.org>
> ---
> 
> Here are also the benchmark results for the Release build:
> * Vanilla (8477b6c) -> Patched (f1026a1) (min, median, mean, max):
> | fibers: 10; iters: 100	0%	0%	0%	0%
> | fibers: 10; iters: 1000	2%	2%	1%	2%
> | fibers: 10; iters: 10000	6%	4%	5%	5%
> | fibers: 10; iters: 100000	4%	3%	4%	1%
> | fibers: 100; iters: 100	1%	0%	1%	10%
> | fibers: 100; iters: 1000	2%	2%	0%	-7%
> | fibers: 100; iters: 10000	2%	1%	2%	1%
> | fibers: 100; iters: 100000	2%	2%	2%	5%
> | fibers: 1000; iters: 100	1%	0%	0%	-2%
> | fibers: 1000; iters: 1000	1%	1%	1%	0%
> | fibers: 1000; iters: 10000	1%	1%	1%	2%
> | fibers: 1000; iters: 100000	1%	1%	1%	1%
> | fibers: 10000; iters: 100	0%	0%	0%	1%
> | fibers: 10000; iters: 1000	0%	2%	3%	3%
> | fibers: 10000; iters: 10000	0%	0%	0%	-2%
> | fibers: 10000; iters: 100000	0%	1%	0%	-4%
> 
> src/lua/utils.c                               | 49 +++++++++++++++++
> ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++
> 2 files changed, 102 insertions(+)
> create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 8e98f7607..39fe4e30c 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -29,6 +29,7 @@
>  * SUCH DAMAGE.
>  */
> #include "lua/utils.h"
> +#include <lj_trace.h>
> 
> #include <assert.h>
> #include <errno.h>
> @@ -1309,10 +1310,57 @@ tarantool_lua_utils_init(struct lua_State *L)
> 	return 0;
> }
> 
> +/*
> + * XXX: There is already defined <panic> macro in say.h header
> + * (included in diag.h). As a result the call below is misexpanded
> + * and compilation fails with the corresponding error. To avoid
> + * this error the macro is temporary renamed and restored later.
> + * Compilation now fails for the following cases:
> + * * temporary name <_panic> is used in this translation unit

I would propose some more sophisitcated name to avoid possible overlap 
with ‘hidden’ _something convention. Since it is Tarantool sources, a
reference to the gh will be just fine, like panic_gh1700.

> + * * <panic> is not define, so this hack can be freely dropped
> + */
> +#if defined(panic) && !defined(_panic)
> +#  define _panic panic
> +#  undef panic
> +#else
> +#  error "Can't redefine <panic> macro"
> +#endif
> +
> /**
>  * This routine encloses the checks and actions to be done when
>  * the running fiber yields the execution.
> + * Since Tarantool fibers don't switch-over the way Lua coroutines
> + * do the platform ought to notify JIT engine when one lua_State
> + * substitutes another one.
>  */
> void cord_on_yield(void)
> {
> +	struct global_State *g = G(tarantool_L);
> +	/*
> +	 * XXX: Switching fibers while running the trace leads to
> +	 * code misbehaviour and failures, so stop its execution.
> +	 */
> +	if (unlikely(tvref(g->jit_base))) {
> +		/*
> +		 * XXX: mcode is executed only in scope of Lua
> +		 * world and one can obtain the corresponding Lua
> +		 * coroutine from the fiber storage.
> +		 */
> +		struct lua_State *L = fiber()->storage.lua.stack;
> +		assert(L != NULL);
> +		lua_pushstring(L, "Fiber is switched on the trace”);

To me it is very much ’Too long WAL write’ style. Everything said,
nothing is clear. Could you please elaborate on the ‘FFI code’ and 
‘led to a yield’ and probably the fiber number.

> +		if (g->panic)
> +			g->panic(L);
> +		exit(EXIT_FAILURE);
> +	}
> +	/*
> +	 * Unconditionally abort trace recording whether fibers
> +	 * switch each other. Otherwise, further compilation may
> +	 * lead to a failure on any next compiler phase.
> +	 */
> +	lj_trace_abort(g);
> }
> +
> +/* Restore <panic> macro back */
> +#define panic _panic
> +#undef _panic
> diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
> new file mode 100755
> index 000000000..0dd8dfbee
> --- /dev/null
> +++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
> @@ -0,0 +1,53 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +
> +local test = tap.test('gh-4491-coio-wait-leads-to-segfault')
> +
> +-- Test file to demonstrate platform failure due to fiber switch
> +-- while trace recording, details:
> +--     https://github.com/tarantool/tarantool/issues/4491
> +
> +local fiber = require('fiber')
> +local ffi = require('ffi')
> +ffi.cdef('int coio_wait(int fd, int event, double timeout);')
> +
> +local cfg = {
> +  hotloop = arg[1] or 1,
> +  fibers = arg[1] or 2,
> +  timeout = { put = 1, get = 1 },
> +}
> +
> +test:plan(cfg.fibers + 1)
> +
> +local args = {
> +  fd      = 1   , -- STDIN file descriptor
> +  event   = 0x1 , -- COIO_READ event
> +  timeout = 0.05, -- Timeout value
> +}
> +
> +local function run(iterations, channel)
> +  for _ = 1, iterations do
> +    ffi.C.coio_wait(args.fd, args.event, args.timeout)
> +  end
> +  channel:put(true, cfg.timeout.put)
> +end
> +
> +local channels = { }
> +
> +jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop))
> +
> +for _ = 1, cfg.fibers do
> +  channels[_] = fiber.channel(1)
> +  fiber.new(run, cfg.hotloop + 1, channels[_])
> +end
> +
> +-- Finalize the existing fibers
> +for _ = 1, cfg.fibers do
> +  test:ok(channels[_]:get(cfg.timeout.get),
> +          string.format('fiber #%d successfully finished', _))
> +end
> +
> +test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait')
> +
> +os.exit(test:check() and 0 or 1)
> -- 
> 2.25.0
> 

  reply	other threads:[~2020-09-24 13:00 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-23 19:06 [Tarantool-patches] [PATCH v2 0/2] Prevent JIT engine breakage on fibers switch-over Igor Munkin
2020-09-23 19:06 ` [Tarantool-patches] [PATCH v2 1/2] fiber: introduce a callback for " Igor Munkin
2020-09-24 12:54   ` sergos
2020-09-28 13:06     ` Igor Munkin
2020-09-29  9:15       ` Sergey Ostanevich
2020-09-29 10:05         ` Igor Munkin
2020-09-29 22:41   ` Vladislav Shpilevoy
2020-09-30  9:30     ` Igor Munkin
2020-09-30 22:00       ` Vladislav Shpilevoy
2020-09-23 19:06 ` [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield Igor Munkin
2020-09-24 13:00   ` sergos [this message]
2020-09-28 13:07     ` Igor Munkin
2020-09-28 15:36       ` Igor Munkin
2020-09-28 16:37         ` Igor Munkin
2020-09-28 17:45           ` Igor Munkin
2020-09-29  9:24             ` Sergey Ostanevich
2020-09-29 10:06               ` Igor Munkin
2020-09-29 22:41   ` Vladislav Shpilevoy
2020-09-30  6:27     ` Igor Munkin
2020-09-30 21:59       ` Vladislav Shpilevoy
2020-10-01  6:14         ` Igor Munkin
2020-09-24 13:15 ` [Tarantool-patches] [PATCH v2 0/2] Prevent JIT engine breakage on fibers switch-over sergos
2020-09-28 13:06   ` Igor Munkin
2020-09-29  9:14     ` Sergey Ostanevich
2020-10-01 21:25 ` Vladislav Shpilevoy
2020-10-01 21:29   ` Igor Munkin
2020-10-01 22:17 ` Igor Munkin
2020-10-02 12:43 ` Kirill Yukhin
2020-10-02 12:44   ` Igor Munkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9D468A12-77EE-4876-BD91-BC9D840E0D08@tarantool.org \
    --to=sergos@tarantool.org \
    --cc=imun@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 2/2] lua: abort trace recording on fiber yield' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox