Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
@ 2020-03-30 22:44 Igor Munkin
  2020-03-31 16:58 ` Konstantin Osipov
  2020-03-31 23:57 ` Vladislav Shpilevoy
  0 siblings, 2 replies; 12+ messages in thread
From: Igor Munkin @ 2020-03-30 22:44 UTC (permalink / raw)
  To: Sergey Ostanevich, Vladislav Shpilevoy; +Cc: tarantool-patches

Since Tarantool fibers doesn't respect Lua coroutine switch mechanism,
JIT machinery stays unnotified when one lua_State substitues another
one. As a result if trace recording hasn't been aborted prior to fiber
switch, the recording proceeds using the new lua_State and leads to
failure either on any compiler phase or on execution the compiled trace.

This changeset adds a mandatory on_yield trigger aborting trace
recording when fiber switches to another one.

Fixes #4491
Closes #1700

Signed-off-by: Igor Munkin <imun@tarantool.org>
---

This patch is based on the changes introduced within [1].

Branch: https://github.com/tarantool/tarantool/tree/imun/gh-1700-abort-recording-on-fiber-switch
Issue:
* https://github.com/tarantool/tarantool/issues/1700
* https://github.com/tarantool/tarantool/issues/4491

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-March/015212.html

 src/lua/fiber.c                               | 39 +++++++++++++-
 ...-4491-coio-wait-leads-to-segfault.test.lua | 53 +++++++++++++++++++
 2 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100755 test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua

diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc03787..c79aa7c2b 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -38,6 +38,7 @@
 #include <lua.h>
 #include <lauxlib.h>
 #include <lualib.h>
+#include <luajit.h>
 
 void
 luaL_testcancel(struct lua_State *L)
@@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
 	return result;
 }
 
+static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+	(void) trigger;
+	(void) event;
+
+	/*
+	 * XXX: According to LuaJIT API reference luaJIT_setmode
+	 * function returns 0 on failure and 1 on success. Since
+	 * this call is aimed to abort the trace recording on
+	 * fiber_yield call and prevent already compiled traces
+	 * execution, the mode parameter is an invalid value (-1)
+	 * so JIT mode change is expected to fail and 0 is
+	 * returned. Otherwise non-zero return value signals
+	 * trigger_run routine about this trigger failure.
+	 */
+	return luaJIT_setmode(tarantool_L, 0, -1);
+}
+
 /**
  * Utility function for fiber.create and fiber.new
  */
@@ -467,10 +487,18 @@ fiber_create(struct lua_State *L)
 
 	struct fiber *f = fiber_new("lua", lua_fiber_run_f);
 	if (f == NULL) {
-		luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
-		luaT_error(L);
+		/* diagnostics is set in fiber_new. */
+		goto error;
 	}
 
+	struct trigger *t = malloc(sizeof(*t));
+	if (t == NULL) {
+		diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
+		goto error;
+	}
+	trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
+	trigger_add(&f->on_yield, t);
+
 	/* Move the arguments to the new coro */
 	lua_xmove(L, child_L, lua_gettop(L));
 	/* XXX: 'fiber' is leaked if this throws a Lua error. */
@@ -483,6 +511,13 @@ fiber_create(struct lua_State *L)
 	lua_pushinteger(child_L, coro_ref);
 	f->storage.lua.stack = child_L;
 	return f;
+
+error:
+	/* Release the anchored coroutine. */
+	luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
+	luaT_error(L);
+	unreachable();
+	return NULL;
 }
 
 /**
diff --git a/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
new file mode 100755
index 000000000..0dd8dfbee
--- /dev/null
+++ b/test/app-tap/gh-4491-coio-wait-leads-to-segfault.test.lua
@@ -0,0 +1,53 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+
+local test = tap.test('gh-4491-coio-wait-leads-to-segfault')
+
+-- Test file to demonstrate platform failure due to fiber switch
+-- while trace recording, details:
+--     https://github.com/tarantool/tarantool/issues/4491
+
+local fiber = require('fiber')
+local ffi = require('ffi')
+ffi.cdef('int coio_wait(int fd, int event, double timeout);')
+
+local cfg = {
+  hotloop = arg[1] or 1,
+  fibers = arg[1] or 2,
+  timeout = { put = 1, get = 1 },
+}
+
+test:plan(cfg.fibers + 1)
+
+local args = {
+  fd      = 1   , -- STDIN file descriptor
+  event   = 0x1 , -- COIO_READ event
+  timeout = 0.05, -- Timeout value
+}
+
+local function run(iterations, channel)
+  for _ = 1, iterations do
+    ffi.C.coio_wait(args.fd, args.event, args.timeout)
+  end
+  channel:put(true, cfg.timeout.put)
+end
+
+local channels = { }
+
+jit.opt.start('3', string.format('hotloop=%d', cfg.hotloop))
+
+for _ = 1, cfg.fibers do
+  channels[_] = fiber.channel(1)
+  fiber.new(run, cfg.hotloop + 1, channels[_])
+end
+
+-- Finalize the existing fibers
+for _ = 1, cfg.fibers do
+  test:ok(channels[_]:get(cfg.timeout.get),
+          string.format('fiber #%d successfully finished', _))
+end
+
+test:ok(true, 'trace is not recorded due to fiber switch underneath coio_wait')
+
+os.exit(test:check() and 0 or 1)
-- 
2.25.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-03-30 22:44 [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield Igor Munkin
@ 2020-03-31 16:58 ` Konstantin Osipov
  2020-03-31 23:57 ` Vladislav Shpilevoy
  1 sibling, 0 replies; 12+ messages in thread
From: Konstantin Osipov @ 2020-03-31 16:58 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy

* Igor Munkin <imun@tarantool.org> [20/03/31 02:06]:
> Since Tarantool fibers doesn't respect Lua coroutine switch mechanism,
> JIT machinery stays unnotified when one lua_State substitues another
> one. As a result if trace recording hasn't been aborted prior to fiber
> switch, the recording proceeds using the new lua_State and leads to
> failure either on any compiler phase or on execution the compiled trace.
> 
> This changeset adds a mandatory on_yield trigger aborting trace
> recording when fiber switches to another one.
> 
> Fixes #4491
> Closes #1700

This is simply awesome.

-- 
Konstantin Osipov, Moscow, Russia

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-03-30 22:44 [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield Igor Munkin
  2020-03-31 16:58 ` Konstantin Osipov
@ 2020-03-31 23:57 ` Vladislav Shpilevoy
  2020-07-07 22:24   ` Igor Munkin
  1 sibling, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-03-31 23:57 UTC (permalink / raw)
  To: Igor Munkin, Sergey Ostanevich; +Cc: tarantool-patches

Hi! Thanks for the patch!

See 7 comments below.

> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index 45bc03787..c79aa7c2b 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
>  	return result;
>  }
>  
> +static int
> +fiber_on_yield(struct trigger *trigger, void *event)
> +{
> +	(void) trigger;
> +	(void) event;
> +
> +	/*
> +	 * XXX: According to LuaJIT API reference luaJIT_setmode

1. What is 'XXX'?

> +	 * function returns 0 on failure and 1 on success. Since
> +	 * this call is aimed to abort the trace recording on
> +	 * fiber_yield call and prevent already compiled traces
> +	 * execution, the mode parameter is an invalid value (-1)
> +	 * so JIT mode change is expected to fail and 0 is
> +	 * returned. Otherwise non-zero return value signals
> +	 * trigger_run routine about this trigger failure.

2. So correct me if I am wrong - we deliberately fail this
function assuming that its call stops current trace collection/execution,
right? And we pass -1 because there is no other way to stop
trace collection/execution except this function.

Why can't we call lj_trace_abort() directly?

And most importantly, how does it affect perf? New trigger
is +1 virtual call on each yield of every Lua fiber and +1
execution of non-trival function luaJIT_setmode(). I think
it is better to write a micro bench checking how many yields
can we do per time unit before and after this patch. From Lua
fibers.

> +	 */
> +	return luaJIT_setmode(tarantool_L, 0, -1);

3. In case that function ever returns 1, that breaks our
error return convention, that error means -1 (it will work
in triggers, since they do '!= 0', but anyway). Also when
there is an error, diag is expected to be set. Better do
diag_set(LuajitError) here when this fails. Or even panic()
since it is not supposed to return 1 when the mode is bad.
Or assume it is always 0 (add assert()), and return const 0
from the trigger.

> +}
> +
>  /**
>   * Utility function for fiber.create and fiber.new
>   */
> @@ -467,10 +487,18 @@ fiber_create(struct lua_State *L)
>  
>  	struct fiber *f = fiber_new("lua", lua_fiber_run_f);
>  	if (f == NULL) {
> -		luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
> -		luaT_error(L);
> +		/* diagnostics is set in fiber_new. */
> +		goto error;
>  	}
>  
> +	struct trigger *t = malloc(sizeof(*t));
> +	if (t == NULL) {
> +		diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
> +		goto error;
> +	}
> +	trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
> +	trigger_add(&f->on_yield, t);

4. Yeah, now I understand why did you ask about when triggers are
deleted. In this particular case there is a better option. You can
allocate the trigger right on the stack, in lua_fiber_run_f(). This
is the trampoline for all Lua fibers, regardless how they are
created: via fiber.new(), or fiber.create(). You can add the trigger
before

    result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);

And remove afterwards. No need to use the heap then, nor move some
error handling code.

5. Also there is a question what will happen, if Lua is accessed from
a fiber, created not in Lua? That is basically all pooled fibers,
which serve IProto requests. See struct fiber_pool and its single
usage place. These fibers are started from C, and may end up executing
Lua code, if, for example, an IPROTO_CALL arrived.

The same question can be applied to fibers, created from C API:

    API_EXPORT struct fiber *
    fiber_new(const char *name, fiber_func f);

These fibers won't have the on_yield trigger, but can access Lua.
For example, they can run Lua on_replace triggers in a space by making
an insertion into it. Also they can access Lua API directly. By
calling things like luaT_call() (see module.h).

You may consider adding Jit collection/tracing termination to the
scheduler fiber. It is just one fiber, and from what I know, every
fiber after yield switches current thread to the scheduler fiber. This
also maybe less expensive. We probably won't even notice.

6. There is also a question about how does it affect collection of
valid traces? I don't know how Jit does it, so ask just in case.
Won't this make traces never collected? Because every fiber yields
soon or late.

7. Does this patch allow to yield in ffi?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-03-31 23:57 ` Vladislav Shpilevoy
@ 2020-07-07 22:24   ` Igor Munkin
  2020-07-10 10:26     ` sergos
  2020-07-11 20:28     ` Vladislav Shpilevoy
  0 siblings, 2 replies; 12+ messages in thread
From: Igor Munkin @ 2020-07-07 22:24 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Thanks for your review!

On 01.04.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> See 7 comments below.
> 
> > diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> > index 45bc03787..c79aa7c2b 100644
> > --- a/src/lua/fiber.c
> > +++ b/src/lua/fiber.c
> > @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
> >  	return result;
> >  }
> >  
> > +static int
> > +fiber_on_yield(struct trigger *trigger, void *event)
> > +{
> > +	(void) trigger;
> > +	(void) event;
> > +
> > +	/*
> > +	 * XXX: According to LuaJIT API reference luaJIT_setmode
> 
> 1. What is 'XXX'?

AFAIK this is just a legacy keyword coming from Sun code convention[1]:
| 10.5.4 Special Comments
|
| Use XXX in a comment to flag something that is bogus but works. Use
| FIXME to flag something that is bogus and broken.

I guess its meaning is kinda lost nowadays and often is considered close
to FIXME now. Some people suggest to avoid XXX since it's ambiguous but:
* I see it's already used in Tarantool sources:
| imun@tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l |
10
* This is the right case to use it: such <luaJIT_setmode> usage is bogus
  but works for the issue purposes.
* This keyword is highlighted by the text editor (e.g. vim).

> 
> > +	 * function returns 0 on failure and 1 on success. Since
> > +	 * this call is aimed to abort the trace recording on
> > +	 * fiber_yield call and prevent already compiled traces
> > +	 * execution, the mode parameter is an invalid value (-1)
> > +	 * so JIT mode change is expected to fail and 0 is
> > +	 * returned. Otherwise non-zero return value signals
> > +	 * trigger_run routine about this trigger failure.
> 
> 2. So correct me if I am wrong - we deliberately fail this
> function assuming that its call stops current trace collection/execution,
> right? And we pass -1 because there is no other way to stop

Yes, with one nit: the call either aborts current trace collection or
leads to the platform panic (we have no option here) -- similar to the
way we handle FFI sandwiches.

> trace collection/execution except this function.

Fortunately <luaJIT_setmode> aborts trace compilation unconditionally
and then changes JIT behaviour considering the given mode value. We pass
-1 since there is no special value to abort the trace compilation. This
value changes nothing, so it's just a hack to abort trace compilation
via exported LuaJIT C API.

> 
> Why can't we call lj_trace_abort() directly?

It's the internal API. Its usage complicates a switch between various
LuaJIT implementations (we faced several challenges when tried to build
Tarantool with uJIT). There is a public API to be used here (though in a
bit hacky way).

> 
> And most importantly, how does it affect perf? New trigger
> is +1 virtual call on each yield of every Lua fiber and +1
> execution of non-trival function luaJIT_setmode(). I think
> it is better to write a micro bench checking how many yields
> can we do per time unit before and after this patch. From Lua
> fibers.

I made some benchmarks[2] and here are the numbers for the bleeding
master:
* Vanilla (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.0015582
| #1700 bench: fibers: 10; iterations: 1000 => 0.004238
| #1700 bench: fibers: 10; iterations: 10000 => 0.031612
| #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
| #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
| #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
| #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
| #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
| #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
| #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
| #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
| #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
| #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
| #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
| #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
* Patched [Lua fiber trigger] (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
| #1700 bench: fibers: 10; iterations: 1000 => 0.004395
| #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
| #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
| #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
| #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
| #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
| #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
| #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
| #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
| #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
| #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
| #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
| #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
| #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
* Relative measurements (vanilla -> patched):
| #1700 bench: fibers: 10; iterations: 100 => 1.31%
| #1700 bench: fibers: 10; iterations: 1000 => 3.7%
| #1700 bench: fibers: 10; iterations: 10000 => 3.82%
| #1700 bench: fibers: 10; iterations: 100000 => 5.24%
| #1700 bench: fibers: 100; iterations: 100 => 2.99%
| #1700 bench: fibers: 100; iterations: 1000 => 5.19%
| #1700 bench: fibers: 100; iterations: 10000 => 5%
| #1700 bench: fibers: 100; iterations: 100000 => 5.24%
| #1700 bench: fibers: 1000; iterations: 100 => 6.06%
| #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
| #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
| #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
| #1700 bench: fibers: 10000; iterations: 100 => 0.29%
| #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
| #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
| #1700 bench: fibers: 10000; iterations: 100000 => 0.73%

> 
> > +	 */
> > +	return luaJIT_setmode(tarantool_L, 0, -1);
> 
> 3. In case that function ever returns 1, that breaks our
> error return convention, that error means -1 (it will work
> in triggers, since they do '!= 0', but anyway). Also when
> there is an error, diag is expected to be set. Better do
> diag_set(LuajitError) here when this fails. Or even panic()
> since it is not supposed to return 1 when the mode is bad.
> Or assume it is always 0 (add assert()), and return const 0
> from the trigger.

I guess assert with return 0 is enough for this case.

> 
> > +}
> > +
> >  /**
> >   * Utility function for fiber.create and fiber.new
> >   */
> > @@ -467,10 +487,18 @@ fiber_create(struct lua_State *L)
> >  
> >  	struct fiber *f = fiber_new("lua", lua_fiber_run_f);
> >  	if (f == NULL) {
> > -		luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
> > -		luaT_error(L);
> > +		/* diagnostics is set in fiber_new. */
> > +		goto error;
> >  	}
> >  
> > +	struct trigger *t = malloc(sizeof(*t));
> > +	if (t == NULL) {
> > +		diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
> > +		goto error;
> > +	}
> > +	trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
> > +	trigger_add(&f->on_yield, t);
> 
> 4. Yeah, now I understand why did you ask about when triggers are
> deleted. In this particular case there is a better option. You can
> allocate the trigger right on the stack, in lua_fiber_run_f(). This
> is the trampoline for all Lua fibers, regardless how they are
> created: via fiber.new(), or fiber.create(). You can add the trigger
> before
> 
>     result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
> 
> And remove afterwards. No need to use the heap then, nor move some
> error handling code.

Nice, thanks! I adjusted the patch, considering this point:

================================================================================

diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index c79aa7c2b..f0bcc1ab7 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -437,11 +437,15 @@ static int
 lua_fiber_run_f(MAYBE_UNUSED va_list ap)
 {
 	int result;
+	struct trigger t;
 	struct fiber *f = fiber();
 	struct lua_State *L = f->storage.lua.stack;
 	int coro_ref = lua_tointeger(L, -1);
 	lua_pop(L, 1);
+	trigger_create(&t, fiber_on_yield, NULL, NULL);
+	trigger_add(&f->on_yield, &t);
 	result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+	trigger_clear(&t);
 	/*
 	 * If fiber is not joinable
 	 * We can unref child stack here,
@@ -487,17 +491,9 @@ fiber_create(struct lua_State *L)
 
 	struct fiber *f = fiber_new("lua", lua_fiber_run_f);
 	if (f == NULL) {
-		/* diagnostics is set in fiber_new. */
-		goto error;
-	}
-
-	struct trigger *t = malloc(sizeof(*t));
-	if (t == NULL) {
-		diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
-		goto error;
+		luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
+		luaT_error(L);
 	}
-	trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
-	trigger_add(&f->on_yield, t);
 
 	/* Move the arguments to the new coro */
 	lua_xmove(L, child_L, lua_gettop(L));
@@ -511,13 +507,6 @@ fiber_create(struct lua_State *L)
 	lua_pushinteger(child_L, coro_ref);
 	f->storage.lua.stack = child_L;
 	return f;
-
-error:
-	/* Release the anchored coroutine. */
-	luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
-	luaT_error(L);
-	unreachable();
-	return NULL;
 }
 
 /**

================================================================================

> 
> 5. Also there is a question what will happen, if Lua is accessed from
> a fiber, created not in Lua? That is basically all pooled fibers,
> which serve IProto requests. See struct fiber_pool and its single
> usage place. These fibers are started from C, and may end up executing
> Lua code, if, for example, an IPROTO_CALL arrived.
> 
> The same question can be applied to fibers, created from C API:
> 
>     API_EXPORT struct fiber *
>     fiber_new(const char *name, fiber_func f);
> 
> These fibers won't have the on_yield trigger, but can access Lua.
> For example, they can run Lua on_replace triggers in a space by making
> an insertion into it. Also they can access Lua API directly. By
> calling things like luaT_call() (see module.h).

Well, strictly saying this means we need such trigger on each
created/pooled fiber, right?

> 
> You may consider adding Jit collection/tracing termination to the
> scheduler fiber. It is just one fiber, and from what I know, every
> fiber after yield switches current thread to the scheduler fiber. This
> also maybe less expensive. We probably won't even notice.

I've tried to implement such approach but it doesn't work. As you
mentioned on_yield triggers are not executed for scheduler fiber.
Nevertheless, here is the patch (I hope I've not missed something):

================================================================================

diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc775..33ddb894d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
  */
 struct fiber *script_fiber;
 bool start_loop = true;
+static struct trigger lua_on_yield_trigger;
 
 /* contents of src/lua/ files */
 extern char strict_lua[],
@@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
 	return 1;
 }
 
+static int
+tarantool_lua_on_yield(struct trigger *trigger, void *event)
+{
+	(void) trigger;
+	(void) event;
+	int status = luaJIT_setmode(tarantool_L, 0, -1);
+	assert(status == 0);
+	return 0;
+}
+
 void
 tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
 {
+	trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
+	trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
+
 	lua_State *L = luaL_newstate();
 	if (L == NULL) {
 		panic("failed to initialize Lua");

================================================================================

However, I've implemented your idea in another way: added an auxiliary
function field to cord structure to be called on <fiber_yield> or
<fiber_call>:

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 5389ce467..c2b339f00 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
 void
 fiber_call(struct fiber *callee)
 {
+	struct cord *cord = cord();
 	struct fiber *caller = fiber();
 	assert(! (caller->flags & FIBER_IS_READY));
 	assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
 	/** By convention, these triggers must not throw. */
 	if (! rlist_empty(&caller->on_yield))
 		trigger_run(&caller->on_yield, NULL);
+	if (cord->lua_on_yield)
+		cord->lua_on_yield();
 	clock_set_on_csw(caller);
 	callee->caller = caller;
 	callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
 	/** By convention, these triggers must not throw. */
 	if (! rlist_empty(&caller->on_yield))
 		trigger_run(&caller->on_yield, NULL);
+	if (cord->lua_on_yield)
+		cord->lua_on_yield();
 	clock_set_on_csw(caller);
 
 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
 	slab_cache_set_thread(&cord()->slabc);
 
 	cord->id = pthread_self();
+	cord->lua_on_yield = NULL;
 	cord->on_exit = NULL;
 	slab_cache_create(&cord->slabc, &runtime);
 	mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index cd9346a55..97e2ebb02 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -534,6 +534,7 @@ struct cord_on_exit;
  * model.
  */
 struct cord {
+	void (*lua_on_yield)(void);
 	/** The fiber that is currently being executed. */
 	struct fiber *fiber;
 	struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc775..65e4b5dd0 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
 	lua_atpanic(L, tarantool_panic_handler);
 	/* clear possible left-overs of init */
 	lua_settop(L, 0);
+	cord()->lua_on_yield = tarantool_lua_on_yield;
 	tarantool_L = L;
 }
 
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d7257..8645d49d9 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -37,6 +37,8 @@
 #include <diag.h>
 #include <fiber.h>
 
+#include <luajit.h>
+
 #include "serializer_opts.h"
 
 int luaL_nil_ref = LUA_REFNIL;
@@ -1276,3 +1278,9 @@ tarantool_lua_utils_init(struct lua_State *L)
 	assert(CTID_UUID != 0);
 	return 0;
 }
+
+void
+tarantool_lua_on_yield(void)
+{
+	(void)luaJIT_setmode(tarantool_L, 0, -1);
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index b10754e4a..71dc67fa0 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
 int
 tarantool_lua_utils_init(struct lua_State *L);
 
+void
+tarantool_lua_on_yield(void);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */

================================================================================

And here are the results:
* Vanilla (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.0015582
| #1700 bench: fibers: 10; iterations: 1000 => 0.004238
| #1700 bench: fibers: 10; iterations: 10000 => 0.031612
| #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
| #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
| #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
| #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
| #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
| #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
| #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
| #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
| #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
| #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
| #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
| #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
* Patched [cord callback] (mean, seconds):
| Total runs: #15
| #1700 bench: fibers: 10; iterations: 100 => 0.0015236
| #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
| #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
| #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
| #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
| #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
| #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
| #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
| #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
| #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
| #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
| #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
| #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
| #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
| #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
| #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
* Relative measurements (vanilla -> patched):
| #1700 bench: fibers: 10; iterations: 100 => -2.22%
| #1700 bench: fibers: 10; iterations: 1000 => 6.36%
| #1700 bench: fibers: 10; iterations: 10000 => 8.15%
| #1700 bench: fibers: 10; iterations: 100000 => 9.23%
| #1700 bench: fibers: 100; iterations: 100 => 2.83%
| #1700 bench: fibers: 100; iterations: 1000 => 7.78%
| #1700 bench: fibers: 100; iterations: 10000 => 7.6%
| #1700 bench: fibers: 100; iterations: 100000 => 7.81%
| #1700 bench: fibers: 1000; iterations: 100 => 8%
| #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
| #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
| #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
| #1700 bench: fibers: 10000; iterations: 100 => 4.2%
| #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
| #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
| #1700 bench: fibers: 10000; iterations: 100000 => 4.66%

Looks like this way is slower than the one implemented via triggers.

> 
> 6. There is also a question about how does it affect collection of
> valid traces? I don't know how Jit does it, so ask just in case.
> Won't this make traces never collected? Because every fiber yields
> soon or late.

All traces already aborts recording (or finalize it preparing for
further stitching) on each <fiber.yield> / <fiber.sleep> call, since
they are implemented via Lua C API. The patch just handle the case when
fibers are switched underneath an FFI call (see #4491[3]).

> 
> 7. Does this patch allow to yield in ffi?

Yep.

[1]: https://www.oracle.com/technetwork/java/javase/documentation/codeconventions-137265.html
[2]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e
[3]: https://github.com/tarantool/tarantool/issues/4491

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-07-07 22:24   ` Igor Munkin
@ 2020-07-10 10:26     ` sergos
  2020-09-21 19:23       ` Igor Munkin
  2020-07-11 20:28     ` Vladislav Shpilevoy
  1 sibling, 1 reply; 12+ messages in thread
From: sergos @ 2020-07-10 10:26 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy

Hi!

Thanks for the patch and investigation!


> On 8 Jul 2020, at 01:24, Igor Munkin <imun@tarantool.org> wrote:
> 
> Vlad,
> 
> Thanks for your review!
> 
> On 01.04.20, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>> 
>> See 7 comments below.
>> 
>>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>>> index 45bc03787..c79aa7c2b 100644
>>> --- a/src/lua/fiber.c
>>> +++ b/src/lua/fiber.c
>>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
>>> 	return result;
>>> }
>>> 
>>> +static int
>>> +fiber_on_yield(struct trigger *trigger, void *event)
>>> +{
>>> +	(void) trigger;
>>> +	(void) event;
>>> +
>>> +	/*
>>> +	 * XXX: According to LuaJIT API reference luaJIT_setmode
>> 
>> 1. What is 'XXX'?
> 
> AFAIK this is just a legacy keyword coming from Sun code convention[1]:
> | 10.5.4 Special Comments
> |
> | Use XXX in a comment to flag something that is bogus but works. Use
> | FIXME to flag something that is bogus and broken.
> 
> I guess its meaning is kinda lost nowadays and often is considered close
> to FIXME now. Some people suggest to avoid XXX since it's ambiguous but:
> * I see it's already used in Tarantool sources:
> | imun@tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l |
> 10
> * This is the right case to use it: such <luaJIT_setmode> usage is bogus
>  but works for the issue purposes.
> * This keyword is highlighted by the text editor (e.g. vim).
> 
>> 
>>> +	 * function returns 0 on failure and 1 on success. Since
>>> +	 * this call is aimed to abort the trace recording on
>>> +	 * fiber_yield call and prevent already compiled traces
>>> +	 * execution, the mode parameter is an invalid value (-1)
>>> +	 * so JIT mode change is expected to fail and 0 is
>>> +	 * returned. Otherwise non-zero return value signals
>>> +	 * trigger_run routine about this trigger failure.
>> 
>> 2. So correct me if I am wrong - we deliberately fail this
>> function assuming that its call stops current trace collection/execution,
>> right? And we pass -1 because there is no other way to stop
> 
> Yes, with one nit: the call either aborts current trace collection or
> leads to the platform panic (we have no option here) -- similar to the
> way we handle FFI sandwiches.
> 
>> trace collection/execution except this function.
> 
> Fortunately <luaJIT_setmode> aborts trace compilation unconditionally
> and then changes JIT behaviour considering the given mode value. We pass
> -1 since there is no special value to abort the trace compilation. This
> value changes nothing, so it's just a hack to abort trace compilation
> via exported LuaJIT C API.
> 
>> 
>> Why can't we call lj_trace_abort() directly?
> 
> It's the internal API. Its usage complicates a switch between various
> LuaJIT implementations (we faced several challenges when tried to build
> Tarantool with uJIT). There is a public API to be used here (though in a
> bit hacky way).

This hacky way looks fragile, since luaJIT_setmode() may change its behaviour
in the future and cause some unpredictable result. We have to mention it 
somewhere as a warninig for future LuaJIT updates from upstream. For example,
introduce a comment inside luaJIT_setmode() that will conflict with plain
patch.

> 
>> 
>> And most importantly, how does it affect perf? New trigger
>> is +1 virtual call on each yield of every Lua fiber and +1
>> execution of non-trival function luaJIT_setmode(). I think
>> it is better to write a micro bench checking how many yields
>> can we do per time unit before and after this patch. From Lua
>> fibers.
> 
> I made some benchmarks[2] and here are the numbers for the bleeding
> master:
> * Vanilla (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> * Patched [Lua fiber trigger] (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004395
> | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
> | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
> | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
> | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
> | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
> * Relative measurements (vanilla -> patched):
> | #1700 bench: fibers: 10; iterations: 100 => 1.31%
> | #1700 bench: fibers: 10; iterations: 1000 => 3.7%
> | #1700 bench: fibers: 10; iterations: 10000 => 3.82%
> | #1700 bench: fibers: 10; iterations: 100000 => 5.24%
> | #1700 bench: fibers: 100; iterations: 100 => 2.99%
> | #1700 bench: fibers: 100; iterations: 1000 => 5.19%
> | #1700 bench: fibers: 100; iterations: 10000 => 5%
> | #1700 bench: fibers: 100; iterations: 100000 => 5.24%
> | #1700 bench: fibers: 1000; iterations: 100 => 6.06%
> | #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
> | #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
> | #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
> | #1700 bench: fibers: 10000; iterations: 100 => 0.29%
> | #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
> | #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
> | #1700 bench: fibers: 10000; iterations: 100000 => 0.73%
> 
>> 
>>> +	 */
>>> +	return luaJIT_setmode(tarantool_L, 0, -1);
>> 
>> 3. In case that function ever returns 1, that breaks our
>> error return convention, that error means -1 (it will work
>> in triggers, since they do '!= 0', but anyway). Also when
>> there is an error, diag is expected to be set. Better do
>> diag_set(LuajitError) here when this fails. Or even panic()
>> since it is not supposed to return 1 when the mode is bad.
>> Or assume it is always 0 (add assert()), and return const 0
>> from the trigger.
> 
> I guess assert with return 0 is enough for this case.
> 
>> 
>>> +}
>>> +
>>> /**
>>>  * Utility function for fiber.create and fiber.new
>>>  */
>>> @@ -467,10 +487,18 @@ fiber_create(struct lua_State *L)
>>> 
>>> 	struct fiber *f = fiber_new("lua", lua_fiber_run_f);
>>> 	if (f == NULL) {
>>> -		luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
>>> -		luaT_error(L);
>>> +		/* diagnostics is set in fiber_new. */
>>> +		goto error;
>>> 	}
>>> 
>>> +	struct trigger *t = malloc(sizeof(*t));
>>> +	if (t == NULL) {
>>> +		diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
>>> +		goto error;
>>> +	}
>>> +	trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
>>> +	trigger_add(&f->on_yield, t);
>> 
>> 4. Yeah, now I understand why did you ask about when triggers are
>> deleted. In this particular case there is a better option. You can
>> allocate the trigger right on the stack, in lua_fiber_run_f(). This
>> is the trampoline for all Lua fibers, regardless how they are
>> created: via fiber.new(), or fiber.create(). You can add the trigger
>> before
>> 
>>    result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
>> 
>> And remove afterwards. No need to use the heap then, nor move some
>> error handling code.
> 
> Nice, thanks! I adjusted the patch, considering this point:
> 
> ================================================================================
> 
> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index c79aa7c2b..f0bcc1ab7 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -437,11 +437,15 @@ static int
> lua_fiber_run_f(MAYBE_UNUSED va_list ap)
> {
> 	int result;
> +	struct trigger t;
> 	struct fiber *f = fiber();
> 	struct lua_State *L = f->storage.lua.stack;
> 	int coro_ref = lua_tointeger(L, -1);
> 	lua_pop(L, 1);
> +	trigger_create(&t, fiber_on_yield, NULL, NULL);
> +	trigger_add(&f->on_yield, &t);
> 	result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
> +	trigger_clear(&t);
> 	/*
> 	 * If fiber is not joinable
> 	 * We can unref child stack here,
> @@ -487,17 +491,9 @@ fiber_create(struct lua_State *L)
> 
> 	struct fiber *f = fiber_new("lua", lua_fiber_run_f);
> 	if (f == NULL) {
> -		/* diagnostics is set in fiber_new. */
> -		goto error;
> -	}
> -
> -	struct trigger *t = malloc(sizeof(*t));
> -	if (t == NULL) {
> -		diag_set(OutOfMemory, sizeof(*t), "malloc", "t");
> -		goto error;
> +		luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
> +		luaT_error(L);
> 	}
> -	trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free);
> -	trigger_add(&f->on_yield, t);
> 
> 	/* Move the arguments to the new coro */
> 	lua_xmove(L, child_L, lua_gettop(L));
> @@ -511,13 +507,6 @@ fiber_create(struct lua_State *L)
> 	lua_pushinteger(child_L, coro_ref);
> 	f->storage.lua.stack = child_L;
> 	return f;
> -
> -error:
> -	/* Release the anchored coroutine. */
> -	luaL_unref(L, LUA_REGISTRYINDEX, coro_ref);
> -	luaT_error(L);
> -	unreachable();
> -	return NULL;
> }
> 
> /**
> 
> ================================================================================
> 
>> 
>> 5. Also there is a question what will happen, if Lua is accessed from
>> a fiber, created not in Lua? That is basically all pooled fibers,
>> which serve IProto requests. See struct fiber_pool and its single
>> usage place. These fibers are started from C, and may end up executing
>> Lua code, if, for example, an IPROTO_CALL arrived.
>> 
>> The same question can be applied to fibers, created from C API:
>> 
>>    API_EXPORT struct fiber *
>>    fiber_new(const char *name, fiber_func f);
>> 
>> These fibers won't have the on_yield trigger, but can access Lua.
>> For example, they can run Lua on_replace triggers in a space by making
>> an insertion into it. Also they can access Lua API directly. By
>> calling things like luaT_call() (see module.h).
> 
> Well, strictly saying this means we need such trigger on each
> created/pooled fiber, right?
> 
>> 
>> You may consider adding Jit collection/tracing termination to the
>> scheduler fiber. It is just one fiber, and from what I know, every
>> fiber after yield switches current thread to the scheduler fiber. This
>> also maybe less expensive. We probably won't even notice.
> 
> I've tried to implement such approach but it doesn't work. As you
> mentioned on_yield triggers are not executed for scheduler fiber.
> Nevertheless, here is the patch (I hope I've not missed something):
> 
> ================================================================================
> 
> diff --git a/src/lua/init.c b/src/lua/init.c
> index a0b2fc775..33ddb894d 100644
> --- a/src/lua/init.c
> +++ b/src/lua/init.c
> @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
>  */
> struct fiber *script_fiber;
> bool start_loop = true;
> +static struct trigger lua_on_yield_trigger;
> 
> /* contents of src/lua/ files */
> extern char strict_lua[],
> @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
> 	return 1;
> }
> 
> +static int
> +tarantool_lua_on_yield(struct trigger *trigger, void *event)
> +{
> +	(void) trigger;
> +	(void) event;
> +	int status = luaJIT_setmode(tarantool_L, 0, -1);
> +	assert(status == 0);
> +	return 0;
> +}
> +
> void
> tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
> {
> +	trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
> +	trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
> +
> 	lua_State *L = luaL_newstate();
> 	if (L == NULL) {
> 		panic("failed to initialize Lua");
> 
> ================================================================================
> 
> However, I've implemented your idea in another way: added an auxiliary
> function field to cord structure to be called on <fiber_yield> or
> <fiber_call>:
> 
> ================================================================================
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 5389ce467..c2b339f00 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
> void
> fiber_call(struct fiber *callee)
> {
> +	struct cord *cord = cord();
> 	struct fiber *caller = fiber();
> 	assert(! (caller->flags & FIBER_IS_READY));
> 	assert(rlist_empty(&callee->state));
> @@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
> 	/** By convention, these triggers must not throw. */
> 	if (! rlist_empty(&caller->on_yield))
> 		trigger_run(&caller->on_yield, NULL);
> +	if (cord->lua_on_yield)
> +		cord->lua_on_yield();
> 	clock_set_on_csw(caller);
> 	callee->caller = caller;
> 	callee->flags |= FIBER_IS_READY;
> @@ -645,6 +648,8 @@ fiber_yield(void)
> 	/** By convention, these triggers must not throw. */
> 	if (! rlist_empty(&caller->on_yield))
> 		trigger_run(&caller->on_yield, NULL);
> +	if (cord->lua_on_yield)
> +		cord->lua_on_yield();
> 	clock_set_on_csw(caller);
> 
> 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
> @@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
> 	slab_cache_set_thread(&cord()->slabc);
> 
> 	cord->id = pthread_self();
> +	cord->lua_on_yield = NULL;
> 	cord->on_exit = NULL;
> 	slab_cache_create(&cord->slabc, &runtime);
> 	mempool_create(&cord->fiber_mempool, &cord->slabc,
> diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
> index cd9346a55..97e2ebb02 100644
> --- a/src/lib/core/fiber.h
> +++ b/src/lib/core/fiber.h
> @@ -534,6 +534,7 @@ struct cord_on_exit;
>  * model.
>  */
> struct cord {
> +	void (*lua_on_yield)(void);
> 	/** The fiber that is currently being executed. */
> 	struct fiber *fiber;
> 	struct ev_loop *loop;
> diff --git a/src/lua/init.c b/src/lua/init.c
> index a0b2fc775..65e4b5dd0 100644
> --- a/src/lua/init.c
> +++ b/src/lua/init.c
> @@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
> 	lua_atpanic(L, tarantool_panic_handler);
> 	/* clear possible left-overs of init */
> 	lua_settop(L, 0);
> +	cord()->lua_on_yield = tarantool_lua_on_yield;
> 	tarantool_L = L;
> }
> 
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 0b05d7257..8645d49d9 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -37,6 +37,8 @@
> #include <diag.h>
> #include <fiber.h>
> 
> +#include <luajit.h>
> +
> #include "serializer_opts.h"
> 
> int luaL_nil_ref = LUA_REFNIL;
> @@ -1276,3 +1278,9 @@ tarantool_lua_utils_init(struct lua_State *L)
> 	assert(CTID_UUID != 0);
> 	return 0;
> }
> +
> +void
> +tarantool_lua_on_yield(void)
> +{
> +	(void)luaJIT_setmode(tarantool_L, 0, -1);
> +}
> diff --git a/src/lua/utils.h b/src/lua/utils.h
> index b10754e4a..71dc67fa0 100644
> --- a/src/lua/utils.h
> +++ b/src/lua/utils.h
> @@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
> int
> tarantool_lua_utils_init(struct lua_State *L);
> 
> +void
> +tarantool_lua_on_yield(void);
> +
> #if defined(__cplusplus)
> } /* extern "C" */
> #endif /* defined(__cplusplus) */
> 
> ================================================================================
> 
> And here are the results:
> * Vanilla (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> * Patched [cord callback] (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015236
> | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
> | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
> | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
> | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
> | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
> | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
> | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
> | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
> | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
> | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
> * Relative measurements (vanilla -> patched):
> | #1700 bench: fibers: 10; iterations: 100 => -2.22%
> | #1700 bench: fibers: 10; iterations: 1000 => 6.36%
> | #1700 bench: fibers: 10; iterations: 10000 => 8.15%
> | #1700 bench: fibers: 10; iterations: 100000 => 9.23%
> | #1700 bench: fibers: 100; iterations: 100 => 2.83%
> | #1700 bench: fibers: 100; iterations: 1000 => 7.78%
> | #1700 bench: fibers: 100; iterations: 10000 => 7.6%
> | #1700 bench: fibers: 100; iterations: 100000 => 7.81%
> | #1700 bench: fibers: 1000; iterations: 100 => 8%
> | #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
> | #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
> | #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
> | #1700 bench: fibers: 10000; iterations: 100 => 4.2%
> | #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
> | #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
> | #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
> 
> Looks like this way is slower than the one implemented via triggers.

But does it catch more cases, as Vlad supposed? Do you have an extra
test for it?

Also, I would like to see the impact on some ‘real’ test - such as box
insertion/select or so?


Regards,
Sergos

> 
>> 
>> 6. There is also a question about how does it affect collection of
>> valid traces? I don't know how Jit does it, so ask just in case.
>> Won't this make traces never collected? Because every fiber yields
>> soon or late.
> 
> All traces already aborts recording (or finalize it preparing for
> further stitching) on each <fiber.yield> / <fiber.sleep> call, since
> they are implemented via Lua C API. The patch just handle the case when
> fibers are switched underneath an FFI call (see #4491[3]).
> 
>> 
>> 7. Does this patch allow to yield in ffi?
> 
> Yep.
> 
> [1]: https://www.oracle.com/technetwork/java/javase/documentation/codeconventions-137265.html
> [2]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e
> [3]: https://github.com/tarantool/tarantool/issues/4491
> 
> -- 
> Best regards,
> IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-07-07 22:24   ` Igor Munkin
  2020-07-10 10:26     ` sergos
@ 2020-07-11 20:28     ` Vladislav Shpilevoy
  2020-09-07 20:35       ` Igor Munkin
  1 sibling, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-07-11 20:28 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi! The branch seems to be extremely outdated. Could you
please rebase it?

On 08/07/2020 00:24, Igor Munkin wrote:
> Vlad,
> 
> Thanks for your review!
> 
> On 01.04.20, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> See 7 comments below.
>>
>>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>>> index 45bc03787..c79aa7c2b 100644
>>> --- a/src/lua/fiber.c
>>> +++ b/src/lua/fiber.c
>>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
>>>  	return result;
>>>  }
>>>  
>>> +static int
>>> +fiber_on_yield(struct trigger *trigger, void *event)
>>> +{
>>> +	(void) trigger;
>>> +	(void) event;
>>> +
>>> +	/*
>>> +	 * XXX: According to LuaJIT API reference luaJIT_setmode
>>
>> 1. What is 'XXX'?
> 
> AFAIK this is just a legacy keyword coming from Sun code convention[1]:
> | 10.5.4 Special Comments
> |
> | Use XXX in a comment to flag something that is bogus but works. Use
> | FIXME to flag something that is bogus and broken.
> 
> I guess its meaning is kinda lost nowadays and often is considered close
> to FIXME now. Some people suggest to avoid XXX since it's ambiguous but:
> * I see it's already used in Tarantool sources:
> | imun@tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l |
> 10
> * This is the right case to use it: such <luaJIT_setmode> usage is bogus
>   but works for the issue purposes.
> * This keyword is highlighted by the text editor (e.g. vim).

Nice, didn't know what it is. Will use it from now.

>> And most importantly, how does it affect perf? New trigger
>> is +1 virtual call on each yield of every Lua fiber and +1
>> execution of non-trival function luaJIT_setmode(). I think
>> it is better to write a micro bench checking how many yields
>> can we do per time unit before and after this patch. From Lua
>> fibers.
> 
> I made some benchmarks[2] and here are the numbers for the bleeding
> master:
> * Vanilla (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> * Patched [Lua fiber trigger] (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004395
> | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
> | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
> | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
> | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
> | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
> * Relative measurements (vanilla -> patched):
> | #1700 bench: fibers: 10; iterations: 100 => 1.31%
> | #1700 bench: fibers: 10; iterations: 1000 => 3.7%
> | #1700 bench: fibers: 10; iterations: 10000 => 3.82%
> | #1700 bench: fibers: 10; iterations: 100000 => 5.24%
> | #1700 bench: fibers: 100; iterations: 100 => 2.99%
> | #1700 bench: fibers: 100; iterations: 1000 => 5.19%
> | #1700 bench: fibers: 100; iterations: 10000 => 5%
> | #1700 bench: fibers: 100; iterations: 100000 => 5.24%
> | #1700 bench: fibers: 1000; iterations: 100 => 6.06%
> | #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
> | #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
> | #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
> | #1700 bench: fibers: 10000; iterations: 100 => 0.29%
> | #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
> | #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
> | #1700 bench: fibers: 10000; iterations: 100000 => 0.73%

So it easily can be -6%. That is disappointing. We just recently
discussed how we should monitor perf, and here it is.

>> 5. Also there is a question what will happen, if Lua is accessed from
>> a fiber, created not in Lua? That is basically all pooled fibers,
>> which serve IProto requests. See struct fiber_pool and its single
>> usage place. These fibers are started from C, and may end up executing
>> Lua code, if, for example, an IPROTO_CALL arrived.
>>
>> The same question can be applied to fibers, created from C API:
>>
>>     API_EXPORT struct fiber *
>>     fiber_new(const char *name, fiber_func f);
>>
>> These fibers won't have the on_yield trigger, but can access Lua.
>> For example, they can run Lua on_replace triggers in a space by making
>> an insertion into it. Also they can access Lua API directly. By
>> calling things like luaT_call() (see module.h).
> 
> Well, strictly saying this means we need such trigger on each
> created/pooled fiber, right?

Yes. On each fiber ever created in the main thread. Even applier fiber
can start executing on_replace triggers on a space in Lua.

>> You may consider adding Jit collection/tracing termination to the
>> scheduler fiber. It is just one fiber, and from what I know, every
>> fiber after yield switches current thread to the scheduler fiber. This
>> also maybe less expensive. We probably won't even notice.
> 
> I've tried to implement such approach but it doesn't work. As you
> mentioned on_yield triggers are not executed for scheduler fiber.
> Nevertheless, here is the patch (I hope I've not missed something):
> 
> ================================================================================
> 
> diff --git a/src/lua/init.c b/src/lua/init.c
> index a0b2fc775..33ddb894d 100644
> --- a/src/lua/init.c
> +++ b/src/lua/init.c
> @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
>   */
>  struct fiber *script_fiber;
>  bool start_loop = true;
> +static struct trigger lua_on_yield_trigger;
>  
>  /* contents of src/lua/ files */
>  extern char strict_lua[],
> @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
>  	return 1;
>  }
>  
> +static int
> +tarantool_lua_on_yield(struct trigger *trigger, void *event)
> +{
> +	(void) trigger;
> +	(void) event;
> +	int status = luaJIT_setmode(tarantool_L, 0, -1);
> +	assert(status == 0);
> +	return 0;
> +}
> +
>  void
>  tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
>  {
> +	trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
> +	trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
> +
>  	lua_State *L = luaL_newstate();
>  	if (L == NULL) {
>  		panic("failed to initialize Lua");
> 
> ================================================================================

I never said sched fiber calls on_yield triggers. Because indeed, they could never
be installed from anywhere, and there was no need to call them. But I didn't know
that either.

Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see
that coro_transfer() in fiber_yield() gives control to some other fiber, which
can be either next in this loop iteration, or the scheduler if it is the end of
the event loop iteration. So it won't work.

> However, I've implemented your idea in another way: added an auxiliary
> function field to cord structure to be called on <fiber_yield> or
> <fiber_call>:
> > And here are the results:
> * Vanilla (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> * Patched [cord callback] (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015236
> | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
> | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
> | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
> | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
> | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
> | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
> | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
> | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
> | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
> | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
> * Relative measurements (vanilla -> patched):
> | #1700 bench: fibers: 10; iterations: 100 => -2.22%
> | #1700 bench: fibers: 10; iterations: 1000 => 6.36%
> | #1700 bench: fibers: 10; iterations: 10000 => 8.15%
> | #1700 bench: fibers: 10; iterations: 100000 => 9.23%
> | #1700 bench: fibers: 100; iterations: 100 => 2.83%
> | #1700 bench: fibers: 100; iterations: 1000 => 7.78%
> | #1700 bench: fibers: 100; iterations: 10000 => 7.6%
> | #1700 bench: fibers: 100; iterations: 100000 => 7.81%
> | #1700 bench: fibers: 1000; iterations: 100 => 8%
> | #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
> | #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
> | #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
> | #1700 bench: fibers: 10000; iterations: 100 => 4.2%
> | #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
> | #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
> | #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
> 
> Looks like this way is slower than the one implemented via triggers.

I don't understand why. This is very suspicious. It does not look right.
Number of calls is the same. Also you removed one lookup in the rlist of
on_yield triggers. Perhaps you did the bench in a different build mode?
Or did you run something heavy in background, browser with music/video
maybe?

Also the reason may be that when you installed the trigger, it worked
for Lua fibers only. But when you did it correct, for all fibers, we got
the real perf degradation value. Try to create a global int64 counter
and increment it on each call. Then compare its final value in the previous
solution with on_yield triggers vs the current solution. It would allow to
tell whether this new function is called more often in the latest solution.

Try to make the function extern instead of virtual. May look ugly,
but -9% worst degradation is uglier. I hope the extern hack will help.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-07-11 20:28     ` Vladislav Shpilevoy
@ 2020-09-07 20:35       ` Igor Munkin
  2020-09-17 14:21         ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Igor Munkin @ 2020-09-07 20:35 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Sorry for such long reply, I just tried to put it all together.

On 11.07.20, Vladislav Shpilevoy wrote:
> Hi! The branch seems to be extremely outdated. Could you
> please rebase it?

I pushed another implementation on the branch. It's a PoC, since I have
several linkage issues in tests and I temporarily disabled unit tests.

> 
> On 08/07/2020 00:24, Igor Munkin wrote:
> > Vlad,
> > 
> > Thanks for your review!
> > 
> > On 01.04.20, Vladislav Shpilevoy wrote:
> >> Hi! Thanks for the patch!
> >>
> >> See 7 comments below.
> >>
> >>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> >>> index 45bc03787..c79aa7c2b 100644
> >>> --- a/src/lua/fiber.c
> >>> +++ b/src/lua/fiber.c
> >>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)

<snipped>

> 
> >> And most importantly, how does it affect perf? New trigger
> >> is +1 virtual call on each yield of every Lua fiber and +1
> >> execution of non-trival function luaJIT_setmode(). I think
> >> it is better to write a micro bench checking how many yields
> >> can we do per time unit before and after this patch. From Lua
> >> fibers.
> > 
> > I made some benchmarks[2] and here are the numbers for the bleeding
> > master:
> > * Vanilla (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> > * Patched [Lua fiber trigger] (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004395
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
> > | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
> > | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
> > * Relative measurements (vanilla -> patched):
> > | #1700 bench: fibers: 10; iterations: 100 => 1.31%
> > | #1700 bench: fibers: 10; iterations: 1000 => 3.7%
> > | #1700 bench: fibers: 10; iterations: 10000 => 3.82%
> > | #1700 bench: fibers: 10; iterations: 100000 => 5.24%
> > | #1700 bench: fibers: 100; iterations: 100 => 2.99%
> > | #1700 bench: fibers: 100; iterations: 1000 => 5.19%
> > | #1700 bench: fibers: 100; iterations: 10000 => 5%
> > | #1700 bench: fibers: 100; iterations: 100000 => 5.24%
> > | #1700 bench: fibers: 1000; iterations: 100 => 6.06%
> > | #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
> > | #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
> > | #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.29%
> > | #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
> > | #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
> > | #1700 bench: fibers: 10000; iterations: 100000 => 0.73%
> 
> So it easily can be -6%. That is disappointing. We just recently
> discussed how we should monitor perf, and here it is.

I updated these benchmarks[1] and re-run them for Release build on dev4 host
and results make me frustrating. Here is also the patch, since the remote
branch is updated.

================================================================================

diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc037..0efc31f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -38,6 +38,7 @@
 #include <lua.h>
 #include <lauxlib.h>
 #include <lualib.h>
+#include <luajit.h>
 
 void
 luaL_testcancel(struct lua_State *L)
@@ -433,14 +434,26 @@ lbox_fiber_info(struct lua_State *L)
 }
 
 static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+       (void) trigger;
+       (void) event;
+       luaJIT_setmode(tarantool_L, 0, -1);
+}
+
+static int
 lua_fiber_run_f(MAYBE_UNUSED va_list ap)
 {
        int result;
+       struct trigger t;
        struct fiber *f = fiber();
        struct lua_State *L = f->storage.lua.stack;
        int coro_ref = lua_tointeger(L, -1);
        lua_pop(L, 1);
+       trigger_create(&t, fiber_on_yield, NULL, NULL);
+       trigger_add(&f->on_yield, &t);
        result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+       trigger_clear(&t);
        /*
         * If fiber is not joinable
         * We can unref child stack here,

================================================================================

* Vanilla -> Patched [Lua fiber trigger] (min, median, mean, max):
| Total runs: #15
| fibers: 10; iters: 100	0%	2%	0%	1%
| fibers: 10; iters: 1000	4%	9%	6%	0%
| fibers: 10; iters: 10000	4%	13%	7%	11%
| fibers: 10; iters: 100000	4%	5%	4%	2%
| fibers: 100; iters: 100	3%	3%	3%	-2%
| fibers: 100; iters: 1000	3%	3%	3%	2%
| fibers: 100; iters: 10000	5%	4%	4%	2%
| fibers: 100; iters: 100000	4%	4%	3%	-1%
| fibers: 1000; iters: 100	9%	17%	15%	13%
| fibers: 1000; iters: 1000	10%	12%	12%	15%
| fibers: 1000; iters: 10000	8%	9%	9%	9%
| fibers: 1000; iters: 100000	7%	8%	8%	5%
| fibers: 10000; iters: 100	4%	5%	5%	3%
| fibers: 10000; iters: 1000	5%	5%	4%	3%
| fibers: 10000; iters: 10000	4%	4%	3%	2%
| fibers: 10000; iters: 100000	5%	5%	4%	5%

As we discussed offline, here are also the numbers for the alternative
patch implementing a noop trigger.

================================================================================

diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc037..959206f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -432,15 +432,26 @@ lbox_fiber_info(struct lua_State *L)
        return 1;
 }
 
+static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+       (void) trigger;
+       (void) event;
+}
+
 static int
 lua_fiber_run_f(MAYBE_UNUSED va_list ap)
 {
        int result;
+       struct trigger t;
        struct fiber *f = fiber();
        struct lua_State *L = f->storage.lua.stack;
        int coro_ref = lua_tointeger(L, -1);
        lua_pop(L, 1);
+       trigger_create(&t, fiber_on_yield, NULL, NULL);
+       trigger_add(&f->on_yield, &t);
        result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+       trigger_clear(&t);
        /*
         * If fiber is not joinable
         * We can unref child stack here,

================================================================================

* Vanilla -> Patched [Lua noop trigger] (min, median, mean, max):
| fibers: 10; iters: 100	0%	0%	0%	0%
| fibers: 10; iters: 1000	2%	10%	4%	-1%
| fibers: 10; iters: 10000	2%	1%	0%	0%
| fibers: 10; iters: 100000	0%	0%	0%	2%
| fibers: 100; iters: 100	1%	0%	1%	-1%
| fibers: 100; iters: 1000	0%	1%	0%	-1%
| fibers: 100; iters: 10000	1%	1%	0%	-3%
| fibers: 100; iters: 100000	1%	2%	1%	-2%
| fibers: 1000; iters: 100	1%	3%	2%	1%
| fibers: 1000; iters: 1000	2%	3%	3%	2%
| fibers: 1000; iters: 10000	1%	2%	2%	3%
| fibers: 1000; iters: 100000	2%	2%	2%	1%
| fibers: 10000; iters: 100	1%	2%	3%	3%
| fibers: 10000; iters: 1000	3%	2%	2%	0%
| fibers: 10000; iters: 10000	1%	2%	1%	0%
| fibers: 10000; iters: 100000	2%	3%	2%	2%

This is freaking unbelievable. Nobody can affect those runs, but numbers
show unacceptable performance loss. I guess there is no way to measure
the approaches more precisely, so I just throw this patch away.

> 
> >> 5. Also there is a question what will happen, if Lua is accessed
> >> from a fiber, created not in Lua? That is basically all pooled
> >> fibers, which serve IProto requests. See struct fiber_pool and its
> >> single usage place. These fibers are started from C, and may end up
> >> executing Lua code, if, for example, an IPROTO_CALL arrived.
> >>
> >> The same question can be applied to fibers, created from C API:
> >>
> >>     API_EXPORT struct fiber * fiber_new(const char *name,
> >>     fiber_func f);
> >>
> >> These fibers won't have the on_yield trigger, but can access Lua.
> >> For example, they can run Lua on_replace triggers in a space by
> >> making an insertion into it. Also they can access Lua API directly.
> >> By calling things like luaT_call() (see module.h).
> > 
> > Well, strictly saying this means we need such trigger on each
> > created/pooled fiber, right?
> 
> Yes. On each fiber ever created in the main thread. Even applier fiber
> can start executing on_replace triggers on a space in Lua.

Well, since there is a public API for fiber creation and we can't
obligue user to set such trigger, this way even doesn't solve the issue.

Let's stop on the approach with extern/virtual handler then.

> 
> >> You may consider adding Jit collection/tracing termination to the
> >> scheduler fiber. It is just one fiber, and from what I know, every
> >> fiber after yield switches current thread to the scheduler fiber. This
> >> also maybe less expensive. We probably won't even notice.
> > 
> > I've tried to implement such approach but it doesn't work. As you
> > mentioned on_yield triggers are not executed for scheduler fiber.
> > Nevertheless, here is the patch (I hope I've not missed something):
> > 
> > ================================================================================
> > 
> > diff --git a/src/lua/init.c b/src/lua/init.c
> > index a0b2fc775..33ddb894d 100644
> > --- a/src/lua/init.c
> > +++ b/src/lua/init.c
> > @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
> >   */
> >  struct fiber *script_fiber;
> >  bool start_loop = true;
> > +static struct trigger lua_on_yield_trigger;
> >  
> >  /* contents of src/lua/ files */
> >  extern char strict_lua[],
> > @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
> >  	return 1;
> >  }
> >  
> > +static int
> > +tarantool_lua_on_yield(struct trigger *trigger, void *event)
> > +{
> > +	(void) trigger;
> > +	(void) event;
> > +	int status = luaJIT_setmode(tarantool_L, 0, -1);
> > +	assert(status == 0);
> > +	return 0;
> > +}
> > +
> >  void
> >  tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
> >  {
> > +	trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
> > +	trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
> > +
> >  	lua_State *L = luaL_newstate();
> >  	if (L == NULL) {
> >  		panic("failed to initialize Lua");
> > 
> > ================================================================================
> 
> I never said sched fiber calls on_yield triggers. Because indeed, they could never
> be installed from anywhere, and there was no need to call them. But I didn't know
> that either.
> 
> Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see
> that coro_transfer() in fiber_yield() gives control to some other fiber, which
> can be either next in this loop iteration, or the scheduler if it is the end of
> the event loop iteration. So it won't work.
> 
> > However, I've implemented your idea in another way: added an auxiliary
> > function field to cord structure to be called on <fiber_yield> or
> > <fiber_call>:
> > > And here are the results:
> > * Vanilla (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> > * Patched [cord callback] (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015236
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
> > | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
> > | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
> > | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
> > | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
> > * Relative measurements (vanilla -> patched):
> > | #1700 bench: fibers: 10; iterations: 100 => -2.22%
> > | #1700 bench: fibers: 10; iterations: 1000 => 6.36%
> > | #1700 bench: fibers: 10; iterations: 10000 => 8.15%
> > | #1700 bench: fibers: 10; iterations: 100000 => 9.23%
> > | #1700 bench: fibers: 100; iterations: 100 => 2.83%
> > | #1700 bench: fibers: 100; iterations: 1000 => 7.78%
> > | #1700 bench: fibers: 100; iterations: 10000 => 7.6%
> > | #1700 bench: fibers: 100; iterations: 100000 => 7.81%
> > | #1700 bench: fibers: 1000; iterations: 100 => 8%
> > | #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
> > | #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
> > | #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
> > | #1700 bench: fibers: 10000; iterations: 100 => 4.2%
> > | #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
> > | #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
> > | #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
> > 
> > Looks like this way is slower than the one implemented via triggers.
> 
> I don't understand why. This is very suspicious. It does not look right.
> Number of calls is the same. Also you removed one lookup in the rlist of
> on_yield triggers. Perhaps you did the bench in a different build mode?
> Or did you run something heavy in background, browser with music/video
> maybe?

The build mode was the same and the only heavy thing in background was
heavy metal, but it was streamed from another device.

I re-run benchmarks for Release build mode on dev4 host. Here is also
the patch.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..d9a0721 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
 void
 fiber_call(struct fiber *callee)
 {
+       struct cord *cord = cord();
        struct fiber *caller = fiber();
        assert(! (caller->flags & FIBER_IS_READY));
        assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
        slab_cache_set_thread(&cord()->slabc);
 
        cord->id = pthread_self();
+       cord->lua_on_yield = NULL;
        cord->on_exit = NULL;
        slab_cache_create(&cord->slabc, &runtime);
        mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index 16ee9f4..4cbf557 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -540,6 +540,7 @@ struct cord_on_exit;
  * model.
  */
 struct cord {
+       void (*lua_on_yield)(void);
        /** The fiber that is currently being executed. */
        struct fiber *fiber;
        struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc7..65e4b5d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
        lua_atpanic(L, tarantool_panic_handler);
        /* clear possible left-overs of init */
        lua_settop(L, 0);
+       cord()->lua_on_yield = tarantool_lua_on_yield;
        tarantool_L = L;
 }
 
diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0..c1885b8 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -37,6 +37,8 @@
 #include <diag.h>
 #include <fiber.h>
 
+#include <luajit.h>
+
 #include "serializer_opts.h"
 
 int luaL_nil_ref = LUA_REFNIL;
@@ -1308,3 +1310,9 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+void
+tarantool_lua_on_yield(void)
+{
+       (void)luaJIT_setmode(tarantool_L, 0, -1);
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index 7e02a05..cf8fe19 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -665,6 +665,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
 int
 tarantool_lua_utils_init(struct lua_State *L);
 
+void
+tarantool_lua_on_yield(void);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */

================================================================================

* Vanilla -> Patched [virtual cord callback] (min, median, mean, max):
| fibers: 10; iters: 100	2%	3%	1%	1%
| fibers: 10; iters: 1000	12%	17%	10%	3%
| fibers: 10; iters: 10000	6%	11%	8%	7%
| fibers: 10; iters: 100000	5%	6%	7%	10%
| fibers: 100; iters: 100	7%	5%	7%	3%
| fibers: 100; iters: 1000	10%	13%	12%	13%
| fibers: 100; iters: 10000	12%	12%	11%	8%
| fibers: 100; iters: 100000	12%	12%	12%	11%
| fibers: 1000; iters: 100	14%	14%	13%	8%
| fibers: 1000; iters: 1000	16%	16%	16%	13%
| fibers: 1000; iters: 10000	16%	16%	16%	14%
| fibers: 1000; iters: 100000	16%	16%	16%	14%
| fibers: 10000; iters: 100	3%	3%	3%	3%
| fibers: 10000; iters: 1000	2%	2%	2%	7%
| fibers: 10000; iters: 10000	2%	2%	2%	1%
| fibers: 10000; iters: 100000	2%	3%	2%	0%

As we discussed offline, here are also the numbers for the alternative
patch implementing a noop virtual callback.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..d9a0721 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
 void
 fiber_call(struct fiber *callee)
 {
+       struct cord *cord = cord();
        struct fiber *caller = fiber();
        assert(! (caller->flags & FIBER_IS_READY));
        assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
        slab_cache_set_thread(&cord()->slabc);
 
        cord->id = pthread_self();
+       cord->lua_on_yield = NULL;
        cord->on_exit = NULL;
        slab_cache_create(&cord->slabc, &runtime);
        mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index fffbe8f..51f2e9c 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -534,6 +534,7 @@ struct cord_on_exit;
  * model.
  */
 struct cord {
+       void (*lua_on_yield)(void);
        /** The fiber that is currently being executed. */
        struct fiber *fiber;
        struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc7..65e4b5d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
        lua_atpanic(L, tarantool_panic_handler);
        /* clear possible left-overs of init */
        lua_settop(L, 0);
+       cord()->lua_on_yield = tarantool_lua_on_yield;
        tarantool_L = L;
 }
 
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..1035b1e 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+void tarantool_lua_on_yield(void)
+{
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index b10754e..71dc67f 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
 int
 tarantool_lua_utils_init(struct lua_State *L);
 
+void
+tarantool_lua_on_yield(void);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */

================================================================================

* Vanilla -> Patched [virtual cord noop callback] (min, median, mean, max):
| fibers: 10; iters: 100	3%	3%	2%	1%
| fibers: 10; iters: 1000	-6%	7%	2%	1%
| fibers: 10; iters: 10000	2%	1%	0%	4%
| fibers: 10; iters: 100000	2%	3%	2%	3%
| fibers: 100; iters: 100	4%	6%	5%	2%
| fibers: 100; iters: 1000	8%	9%	9%	7%
| fibers: 100; iters: 10000	10%	10%	8%	6%
| fibers: 100; iters: 100000	10%	9%	10%	8%
| fibers: 1000; iters: 100	13%	13%	12%	8%
| fibers: 1000; iters: 1000	14%	15%	15%	16%
| fibers: 1000; iters: 10000	14%	15%	15%	16%
| fibers: 1000; iters: 100000	14%	15%	14%	12%
| fibers: 10000; iters: 100	0%	1%	2%	3%
| fibers: 10000; iters: 1000	1%	1%	1%	0%
| fibers: 10000; iters: 10000	1%	2%	2%	4%
| fibers: 10000; iters: 100000	3%	4%	3%	4%

Well... The numbers are horrible. Even noop ones. I drop this patch too.

> 
> Also the reason may be that when you installed the trigger, it worked
> for Lua fibers only. But when you did it correct, for all fibers, we got
> the real perf degradation value. Try to create a global int64 counter
> and increment it on each call. Then compare its final value in the previous
> solution with on_yield triggers vs the current solution. It would allow to
> tell whether this new function is called more often in the latest solution.

Precisely. The extern/virtual function is called a bit more often:
* Patched [Lua fiber trigger] (on_yield handler calls):
| fibers: 10; iters: 100	1001
| fibers: 10; iters: 1000	10001
| fibers: 10; iters: 10000	100001
| fibers: 10; iters: 100000	1000001
| fibers: 100; iters: 100	10001
| fibers: 100; iters: 1000	100001
| fibers: 100; iters: 10000	1000001
| fibers: 100; iters: 100000	10000001
| fibers: 1000; iters: 100	100001
| fibers: 1000; iters: 1000	1000001
| fibers: 1000; iters: 10000	10000001
| fibers: 1000; iters: 100000	100000001
| fibers: 10000; iters: 100	1000001
| fibers: 10000; iters: 1000	10000001
| fibers: 10000; iters: 10000	100000001
| fibers: 10000; iters: 100000	1000000001
* Patched [virtual cord callback] (on_yield handler calls):
| fibers: 10; iters: 100	1117
| fibers: 10; iters: 1000	11017
| fibers: 10; iters: 10000	110017
| fibers: 10; iters: 100000	1100017
| fibers: 100; iters: 100	10207
| fibers: 100; iters: 1000	101107
| fibers: 100; iters: 10000	1010107
| fibers: 100; iters: 100000	10100107
| fibers: 1000; iters: 100	101107
| fibers: 1000; iters: 1000	1002007
| fibers: 1000; iters: 10000	10011007
| fibers: 1000; iters: 100000	100101007
| fibers: 10000; iters: 100	1010107
| fibers: 10000; iters: 1000	10011007
| fibers: 10000; iters: 10000	100020007
| fibers: 10000; iters: 100000	1000110007

> 
> Try to make the function extern instead of virtual. May look ugly,
> but -9% worst degradation is uglier. I hope the extern hack will help.

I made this patch (on the branch). Omitting the issues with unit tests
linkage, the results are definitely prettier than ones obtained with
virtual cord callback. The diff is below.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..ed6104c 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
 #if ENABLE_FIBER_TOP
 #include <x86intrin.h> /* __rdtscp() */
 
+extern void lua_on_yield(void);
+
 static inline void
 clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
 {
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..68f7fb5 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,10 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+#include <luajit.h>
+
+void lua_on_yield(void)
+{
+       luaJIT_setmode(tarantool_L, 0, -1);
+}

================================================================================

* Vanilla -> Patched [extern callback] (min, median, mean, max):
| fibers: 10; iters: 100	1%	1%	0%	0%
| fibers: 10; iters: 1000	12%	8%	6%	0%
| fibers: 10; iters: 10000	6%	9%	7%	4%
| fibers: 10; iters: 100000	2%	3%	3%	5%
| fibers: 100; iters: 100	1%	1%	0%	-5%
| fibers: 100; iters: 1000	2%	3%	3%	3%
| fibers: 100; iters: 10000	3%	3%	2%	-1%
| fibers: 100; iters: 100000	3%	3%	4%	3%
| fibers: 1000; iters: 100	1%	1%	1%	-1%
| fibers: 1000; iters: 1000	1%	2%	1%	1%
| fibers: 1000; iters: 10000	2%	2%	2%	3%
| fibers: 1000; iters: 100000	1%	1%	1%	0%
| fibers: 10000; iters: 100	0%	0%	0%	0%
| fibers: 10000; iters: 1000	0%	2%	1%	2%
| fibers: 10000; iters: 10000	0%	0%	0%	3%
| fibers: 10000; iters: 100000	1%	2%	0%	0%

As we discussed offline, here are also the numbers for the alternative
patch implementing a noop extern callback.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..ed6104c 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
 #if ENABLE_FIBER_TOP
 #include <x86intrin.h> /* __rdtscp() */
 
+extern void lua_on_yield(void);
+
 static inline void
 clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
 {
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..7d0962f 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+void lua_on_yield(void)
+{
+}

================================================================================

* Vanilla -> Patched [extern noop callback] (min, median, mean, max):
| fibers: 10; iters: 100	0%	2%	0%	0%
| fibers: 10; iters: 1000	1%	3%	1%	-1%
| fibers: 10; iters: 10000	-1%	0%	-1%	-3%
| fibers: 10; iters: 100000	-2%	0%	-1%	0%
| fibers: 100; iters: 100	0%	-1%	0%	-4%
| fibers: 100; iters: 1000	0%	1%	0%	0%
| fibers: 100; iters: 10000	0%	0%	0%	-3%
| fibers: 100; iters: 100000	0%	1%	0%	-2%
| fibers: 1000; iters: 100	0%	0%	-1%	-3%
| fibers: 1000; iters: 1000	0%	0%	0%	1%
| fibers: 1000; iters: 10000	0%	0%	0%	0%
| fibers: 1000; iters: 100000	0%	0%	0%	-1%
| fibers: 10000; iters: 100	0%	0%	0%	2%
| fibers: 10000; iters: 1000	0%	-1%	0%	2%
| fibers: 10000; iters: 10000	-1%	-1%	0%	0%
| fibers: 10000; iters: 100000	-1%	0%	-1%	-3%

And here is a final one. I personally don't like it (considering my
comments in the previous reply), but *for now* it can be a solution.
These are the changes I pushed to the remote branch[2] and the diff (I
stripped unit test related part) with the benchmark results are below.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3ce1..ed6104c8d 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
 #if ENABLE_FIBER_TOP
 #include <x86intrin.h> /* __rdtscp() */
 
+extern void lua_on_yield(void);
+
 static inline void
 clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
 {
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0a2..49e3c2bf0 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+#include "lj_trace.h"
+void lua_on_yield(void)
+{
+       lj_trace_abort(G(tarantool_L));
+}

================================================================================

* Vanilla -> Patched [extern macro callback] (min, median, mean, max):
| fibers: 10; iters: 100	1%	1%	0%	0%
| fibers: 10; iters: 1000	0%	4%	0%	-1%
| fibers: 10; iters: 10000	0%	5%	2%	6%
| fibers: 10; iters: 100000	0%	0%	0%	0%
| fibers: 100; iters: 100	0%	-4%	-3%	-6%
| fibers: 100; iters: 1000	0%	3%	1%	0%
| fibers: 100; iters: 10000	0%	0%	0%	-2%
| fibers: 100; iters: 100000	0%	1%	0%	-2%
| fibers: 1000; iters: 100	0%	0%	0%	-4%
| fibers: 1000; iters: 1000	0%	0%	0%	-1%
| fibers: 1000; iters: 10000	0%	0%	0%	0%
| fibers: 1000; iters: 100000	0%	0%	0%	-1%
| fibers: 10000; iters: 100	-1%	1%	1%	2%
| fibers: 10000; iters: 1000	-1%	0%	0%	2%
| fibers: 10000; iters: 10000	0%	0%	0%	0%
| fibers: 10000; iters: 100000	0%	0%	0%	0%

There was also an alternative idea by Sergos: introduce a special
parameter to enable such feature by demand.

[1]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e
[2]: https://github.com/tarantool/tarantool/commit/466ce26

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-09-07 20:35       ` Igor Munkin
@ 2020-09-17 14:21         ` Vladislav Shpilevoy
  2020-09-19 15:29           ` Igor Munkin
  0 siblings, 1 reply; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-17 14:21 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi! Thanks for the investigation!

See 5 comments below.

> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index 0b05d72..7d0962f 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
>         luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
>         return 0;
>  }
> +
> +void lua_on_yield(void)
> +{
> +}
> 
> ================================================================================
> 
> * Vanilla -> Patched [extern noop callback] (min, median, mean, max):
> | fibers: 10; iters: 100	0%	2%	0%	0%
> | fibers: 10; iters: 1000	1%	3%	1%	-1%
> | fibers: 10; iters: 10000	-1%	0%	-1%	-3%
> | fibers: 10; iters: 100000	-2%	0%	-1%	0%
> | fibers: 100; iters: 100	0%	-1%	0%	-4%
> | fibers: 100; iters: 1000	0%	1%	0%	0%
> | fibers: 100; iters: 10000	0%	0%	0%	-3%
> | fibers: 100; iters: 100000	0%	1%	0%	-2%
> | fibers: 1000; iters: 100	0%	0%	-1%	-3%
> | fibers: 1000; iters: 1000	0%	0%	0%	1%
> | fibers: 1000; iters: 10000	0%	0%	0%	0%
> | fibers: 1000; iters: 100000	0%	0%	0%	-1%
> | fibers: 10000; iters: 100	0%	0%	0%	2%
> | fibers: 10000; iters: 1000	0%	-1%	0%	2%
> | fibers: 10000; iters: 10000	-1%	-1%	0%	0%
> | fibers: 10000; iters: 100000	-1%	0%	-1%	-3%
> 
> And here is a final one. I personally don't like it (considering my
> comments in the previous reply), but *for now* it can be a solution.

1. I couldn't find - why don't you like it? It seems to be the fastest
solution, not affecting the microbench at all, and definitely not
affecting any more complex scenarios.

> ================================================================================
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 483ae3ce1..ed6104c8d 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -46,6 +46,8 @@
>  #if ENABLE_FIBER_TOP
>  #include <x86intrin.h> /* __rdtscp() */
>  
> +extern void lua_on_yield(void);
> +
>  static inline void
>  clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
>  {
> @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
>         /** By convention, these triggers must not throw. */
>         if (! rlist_empty(&caller->on_yield))
>                 trigger_run(&caller->on_yield, NULL);
> +
> +       if (cord_is_main())
> +               lua_on_yield();

2. Why not inside fiber_call_impl? I thought we need to call
the abort on each coro_transfer().

> +
>         clock_set_on_csw(caller);
>         callee->caller = caller;
>         callee->flags |= FIBER_IS_READY;
> @@ -645,6 +651,10 @@ fiber_yield(void)
>         /** By convention, these triggers must not throw. */
>         if (! rlist_empty(&caller->on_yield))
>                 trigger_run(&caller->on_yield, NULL);
> +
> +       if (cord_is_main())
> +               lua_on_yield();
> +
>         clock_set_on_csw(caller);
>  
>         assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
> diff --git a/src/lua/utils.c b/src/lua/utils.c
> index af114b0a2..49e3c2bf0 100644
> --- a/src/lua/utils.c
> +++ b/src/lua/utils.c
> @@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L)
>         luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
>         return 0;
>  }
> +
> +#include "lj_trace.h"

3. Why is the header included here, and not in the beginning?

4. It is worth adding a comment.

> +void lua_on_yield(void)
> +{
> +       lj_trace_abort(G(tarantool_L));
> +}
> 
> ================================================================================
> 
> * Vanilla -> Patched [extern macro callback] (min, median, mean, max):
> | fibers: 10; iters: 100	1%	1%	0%	0%
> | fibers: 10; iters: 1000	0%	4%	0%	-1%
> | fibers: 10; iters: 10000	0%	5%	2%	6%
> | fibers: 10; iters: 100000	0%	0%	0%	0%
> | fibers: 100; iters: 100	0%	-4%	-3%	-6%
> | fibers: 100; iters: 1000	0%	3%	1%	0%
> | fibers: 100; iters: 10000	0%	0%	0%	-2%
> | fibers: 100; iters: 100000	0%	1%	0%	-2%
> | fibers: 1000; iters: 100	0%	0%	0%	-4%
> | fibers: 1000; iters: 1000	0%	0%	0%	-1%
> | fibers: 1000; iters: 10000	0%	0%	0%	0%
> | fibers: 1000; iters: 100000	0%	0%	0%	-1%
> | fibers: 10000; iters: 100	-1%	1%	1%	2%
> | fibers: 10000; iters: 1000	-1%	0%	0%	2%
> | fibers: 10000; iters: 10000	0%	0%	0%	0%
> | fibers: 10000; iters: 100000	0%	0%	0%	0%
> 
> There was also an alternative idea by Sergos: introduce a special
> parameter to enable such feature by demand.

5. I am not sure it is so necessary - from your bench it looks the overhead
is almost 0, not counting the rare noise about +-1%.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-09-17 14:21         ` Vladislav Shpilevoy
@ 2020-09-19 15:29           ` Igor Munkin
  2020-09-21 20:31             ` Vladislav Shpilevoy
  0 siblings, 1 reply; 12+ messages in thread
From: Igor Munkin @ 2020-09-19 15:29 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

Vlad,

Thanks for your reply! This is a PoC (so the patch was not polished).
I'll address all your comments it in a separate series but let's finish
the discussion here before it.

On 17.09.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the investigation!
> 
> See 5 comments below.
> 
> > diff --git a/src/lua/utils.c b/src/lua/utils.c
> > index 0b05d72..7d0962f 100644
> > --- a/src/lua/utils.c
> > +++ b/src/lua/utils.c
> > @@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
> >         luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
> >         return 0;
> >  }
> > +
> > +void lua_on_yield(void)
> > +{
> > +}
> > 
> > ================================================================================
> > 
> > * Vanilla -> Patched [extern noop callback] (min, median, mean, max):
> > | fibers: 10; iters: 100	0%	2%	0%	0%
> > | fibers: 10; iters: 1000	1%	3%	1%	-1%
> > | fibers: 10; iters: 10000	-1%	0%	-1%	-3%
> > | fibers: 10; iters: 100000	-2%	0%	-1%	0%
> > | fibers: 100; iters: 100	0%	-1%	0%	-4%
> > | fibers: 100; iters: 1000	0%	1%	0%	0%
> > | fibers: 100; iters: 10000	0%	0%	0%	-3%
> > | fibers: 100; iters: 100000	0%	1%	0%	-2%
> > | fibers: 1000; iters: 100	0%	0%	-1%	-3%
> > | fibers: 1000; iters: 1000	0%	0%	0%	1%
> > | fibers: 1000; iters: 10000	0%	0%	0%	0%
> > | fibers: 1000; iters: 100000	0%	0%	0%	-1%
> > | fibers: 10000; iters: 100	0%	0%	0%	2%
> > | fibers: 10000; iters: 1000	0%	-1%	0%	2%
> > | fibers: 10000; iters: 10000	-1%	-1%	0%	0%
> > | fibers: 10000; iters: 100000	-1%	0%	-1%	-3%
> > 
> > And here is a final one. I personally don't like it (considering my
> > comments in the previous reply), but *for now* it can be a solution.
> 
> 1. I couldn't find - why don't you like it? It seems to be the fastest
> solution, not affecting the microbench at all, and definitely not
> affecting any more complex scenarios.

Here is the cite from the first one reply[1]:
| > Why can't we call lj_trace_abort() directly?
|
| It's the internal API. Its usage complicates a switch between various
| LuaJIT implementations (we faced several challenges when tried to build
| Tarantool with uJIT). There is a public API to be used here (though in a
| bit hacky way).
|
| > 
| > And most importantly, how does it affect perf? New trigger
| > is +1 virtual call on each yield of every Lua fiber and +1
| > execution of non-trival function luaJIT_setmode(). I think
| > it is better to write a micro bench checking how many yields
| > can we do per time unit before and after this patch. From Lua
| > fibers.

You're right it's the fastest solution considering the microbench
results, but we pay a code maintenance trade-off for it. *This* is the
fact I don't like.

I tested a noop workload (there is only yield in Lua bench) so for
non-synthetic workload the numbers might differ less (or might not).
Since we're using our bundled LuaJIT fork, I guess we can postpone these
thoughts for now and simply fix the issue. But I definitely would like
to return to it. Hope we'll have a nicer solution in future.

> 
> > ================================================================================
> > 
> > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> > index 483ae3ce1..ed6104c8d 100644
> > --- a/src/lib/core/fiber.c
> > +++ b/src/lib/core/fiber.c
> > @@ -46,6 +46,8 @@
> >  #if ENABLE_FIBER_TOP
> >  #include <x86intrin.h> /* __rdtscp() */
> >  
> > +extern void lua_on_yield(void);
> > +
> >  static inline void
> >  clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
> >  {
> > @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
> >         /** By convention, these triggers must not throw. */
> >         if (! rlist_empty(&caller->on_yield))
> >                 trigger_run(&caller->on_yield, NULL);
> > +
> > +       if (cord_is_main())
> > +               lua_on_yield();
> 
> 2. Why not inside fiber_call_impl? I thought we need to call
> the abort on each coro_transfer().

<fiber_schedule_list> switches schedule fiber to the ready one. AFAICS,
scheduler doesn't enter Lua world, so no abort is necessary here.

> 
> > +
> >         clock_set_on_csw(caller);
> >         callee->caller = caller;
> >         callee->flags |= FIBER_IS_READY;
> > @@ -645,6 +651,10 @@ fiber_yield(void)
> >         /** By convention, these triggers must not throw. */
> >         if (! rlist_empty(&caller->on_yield))
> >                 trigger_run(&caller->on_yield, NULL);
> > +
> > +       if (cord_is_main())
> > +               lua_on_yield();
> > +
> >         clock_set_on_csw(caller);
> >  
> >         assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
> > diff --git a/src/lua/utils.c b/src/lua/utils.c
> > index af114b0a2..49e3c2bf0 100644
> > --- a/src/lua/utils.c
> > +++ b/src/lua/utils.c
> > @@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L)
> >         luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
> >         return 0;
> >  }
> > +
> > +#include "lj_trace.h"
> 
> 3. Why is the header included here, and not in the beginning?
> 
> 4. It is worth adding a comment.

This is a draft, I'll fix your comments in the v2 series.

> 
> > +void lua_on_yield(void)
> > +{
> > +       lj_trace_abort(G(tarantool_L));

I forgot to add the check whether yield occurs on the running trace.
Here are the corresponding *draft* changes:

================================================================================

diff --git a/src/lua/utils.c b/src/lua/utils.c
index 49e3c2bf0..8d72abdb9 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1310,7 +1310,18 @@ tarantool_lua_utils_init(struct lua_State *L)
 }
 
 #include "lj_trace.h"
+#include "lj_err.h"
 void lua_on_yield(void)
 {
-	lj_trace_abort(G(tarantool_L));
+	struct global_State *g = G(tarantool_L);
+	/* Forbid Lua world re-entrancy while running the trace */
+	if (unlikely(tvref(g->jit_base))) {
+		struct lua_State *L = fiber()->storage.lua.stack;
+		setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
+		if (g->panic)
+#undef panic
+			g->panic(L);
+		exit(EXIT_FAILURE);
+	}
+	lj_trace_abort(g);
 }

================================================================================

Here are benchmark results for the new implementation:
* Vanilla -> Patched [extern macro callback] (min, median, mean, max):
| fibers: 10; iters: 100	0%	1%	0%	0%
| fibers: 10; iters: 1000	2%	-2%	-3%	-5%
| fibers: 10; iters: 10000	-3%	-2%	-2%	-2%
| fibers: 10; iters: 100000	-1%	-1%	-2%	-4%
| fibers: 100; iters: 100	0%	0%	-2%	-9%
| fibers: 100; iters: 1000	0%	1%	1%	0%
| fibers: 100; iters: 10000	0%	0%	0%	0%
| fibers: 100; iters: 100000	0%	0%	0%	-1%
| fibers: 1000; iters: 100	0%	1%	0%	-3%
| fibers: 1000; iters: 1000	0%	1%	1%	4%
| fibers: 1000; iters: 10000	0%	0%	0%	2%
| fibers: 1000; iters: 100000	0%	0%	0%	1%
| fibers: 10000; iters: 100	0%	0%	0%	3%
| fibers: 10000; iters: 1000	0%	0%	0%	0%
| fibers: 10000; iters: 10000	0%	0%	0%	4%
| fibers: 10000; iters: 100000	0%	2%	1%	0%

> > +}
> > 
> > ================================================================================
> > 
> > * Vanilla -> Patched [extern macro callback] (min, median, mean, max):
> > | fibers: 10; iters: 100	1%	1%	0%	0%
> > | fibers: 10; iters: 1000	0%	4%	0%	-1%
> > | fibers: 10; iters: 10000	0%	5%	2%	6%
> > | fibers: 10; iters: 100000	0%	0%	0%	0%
> > | fibers: 100; iters: 100	0%	-4%	-3%	-6%
> > | fibers: 100; iters: 1000	0%	3%	1%	0%
> > | fibers: 100; iters: 10000	0%	0%	0%	-2%
> > | fibers: 100; iters: 100000	0%	1%	0%	-2%
> > | fibers: 1000; iters: 100	0%	0%	0%	-4%
> > | fibers: 1000; iters: 1000	0%	0%	0%	-1%
> > | fibers: 1000; iters: 10000	0%	0%	0%	0%
> > | fibers: 1000; iters: 100000	0%	0%	0%	-1%
> > | fibers: 10000; iters: 100	-1%	1%	1%	2%
> > | fibers: 10000; iters: 1000	-1%	0%	0%	2%
> > | fibers: 10000; iters: 10000	0%	0%	0%	0%
> > | fibers: 10000; iters: 100000	0%	0%	0%	0%
> > 
> > There was also an alternative idea by Sergos: introduce a special
> > parameter to enable such feature by demand.
> 
> 5. I am not sure it is so necessary - from your bench it looks the overhead
> is almost 0, not counting the rare noise about +-1%.

I agree, but Sergos proposed this way when results were not so good.

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-07-10 10:26     ` sergos
@ 2020-09-21 19:23       ` Igor Munkin
  2020-09-21 20:14         ` Sergey Ostanevich
  0 siblings, 1 reply; 12+ messages in thread
From: Igor Munkin @ 2020-09-21 19:23 UTC (permalink / raw)
  To: Sergey Ostanevich; +Cc: tarantool-patches, Vladislav Shpilevoy

Sergos,

Thanks for your review! Please, consider my comments below.

On 10.07.20, sergos@tarantool.org wrote:
> Hi!
> 
> Thanks for the patch and investigation!
> 
> 
> > On 8 Jul 2020, at 01:24, Igor Munkin <imun@tarantool.org> wrote:
> > 
> > Vlad,
> > 
> > Thanks for your review!
> > 
> > On 01.04.20, Vladislav Shpilevoy wrote:
> >> Hi! Thanks for the patch!
> >> 
> >> See 7 comments below.
> >> 

<snipped>

> > 
> >> 
> >> Why can't we call lj_trace_abort() directly?
> > 
> > It's the internal API. Its usage complicates a switch between various
> > LuaJIT implementations (we faced several challenges when tried to build
> > Tarantool with uJIT). There is a public API to be used here (though in a
> > bit hacky way).
> 
> This hacky way looks fragile, since luaJIT_setmode() may change its behaviour
> in the future and cause some unpredictable result. We have to mention it 
> somewhere as a warninig for future LuaJIT updates from upstream. For example,
> introduce a comment inside luaJIT_setmode() that will conflict with plain
> patch.

We discussed this in the nearby thread[1] with Vlad and finally came to
the solution with <lj_trace_abort>. I dropped several comments regarding
the rationale for the fix in v2 version.

> 

<snipped>

> > 
> > Looks like this way is slower than the one implemented via triggers.
> 
> But does it catch more cases, as Vlad supposed? Do you have an extra
> test for it?

I provided several benchmarks results in the nearby thread[2]. For the
chosen solution (via internal macro) it has almost no performance
degradation (omitting the noise).

> 
> Also, I would like to see the impact on some ‘real’ test - such as box
> insertion/select or so?

I tried yours benchmark[3] and got the following numbers:
* Vanilla (insert per second):
| min (15 runs):	809387.28574453
| median (15 runs):	822854.30884267
| mean (15 runs):	821996.668288715
| max (15 runs):	837764.83604149
* Patched (insert per second):
| min (15 runs):	816005.94236505
| median (15 runs):	829281.27443029
| mean (15 runs):	828522.48986598
| max (15 runs):	839318.90025576

Em... It looks like a performance improvement, doesn't it? It seems like
a compiler side-effect (e.g. invalid traces blacklisting), but I didn't
make a deep investigation for this.

> 
> 
> Regards,
> Sergos
> 

<snipped>

> 

[1]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-September/019306.html
[2]: https://lists.tarantool.org/pipermail/tarantool-patches/2020-September/019521.html
[3]: https://gist.github.com/sergos/feb397ed4d5a5f739ee501f768da31e6

-- 
Best regards,
IM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-09-21 19:23       ` Igor Munkin
@ 2020-09-21 20:14         ` Sergey Ostanevich
  0 siblings, 0 replies; 12+ messages in thread
From: Sergey Ostanevich @ 2020-09-21 20:14 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches, Vladislav Shpilevoy

[-- Attachment #1: Type: text/plain, Size: 2950 bytes --]


Hi, Igor!

Thanks for detailed report, the results are LGTM.
As for the measured speedup - I tend to ignore it, since the min-max spread is way bigger.

Regards,
Sergos 
Monday, 21 September 2020, 22:33 +0300 from Igor Munkin  <imun@tarantool.org>:
>Sergos,
>
>Thanks for your review! Please, consider my comments below.
>
>On 10.07.20,  sergos@tarantool.org wrote:
>> Hi!
>> 
>> Thanks for the patch and investigation!
>> 
>> 
>> > On 8 Jul 2020, at 01:24, Igor Munkin < imun@tarantool.org > wrote:
>> > 
>> > Vlad,
>> > 
>> > Thanks for your review!
>> > 
>> > On 01.04.20, Vladislav Shpilevoy wrote:
>> >> Hi! Thanks for the patch!
>> >> 
>> >> See 7 comments below.
>> >> 
>
><snipped>
>
>> > 
>> >> 
>> >> Why can't we call lj_trace_abort() directly?
>> > 
>> > It's the internal API. Its usage complicates a switch between various
>> > LuaJIT implementations (we faced several challenges when tried to build
>> > Tarantool with uJIT). There is a public API to be used here (though in a
>> > bit hacky way).
>> 
>> This hacky way looks fragile, since luaJIT_setmode() may change its behaviour
>> in the future and cause some unpredictable result. We have to mention it 
>> somewhere as a warninig for future LuaJIT updates from upstream. For example,
>> introduce a comment inside luaJIT_setmode() that will conflict with plain
>> patch.
>
>We discussed this in the nearby thread[1] with Vlad and finally came to
>the solution with <lj_trace_abort>. I dropped several comments regarding
>the rationale for the fix in v2 version.
>
>> 
>
><snipped>
>
>> > 
>> > Looks like this way is slower than the one implemented via triggers.
>> 
>> But does it catch more cases, as Vlad supposed? Do you have an extra
>> test for it?
>
>I provided several benchmarks results in the nearby thread[2]. For the
>chosen solution (via internal macro) it has almost no performance
>degradation (omitting the noise).
>
>> 
>> Also, I would like to see the impact on some ‘real’ test - such as box
>> insertion/select or so?
>
>I tried yours benchmark[3] and got the following numbers:
>* Vanilla (insert per second):
>| min (15 runs):	 809387.28574453
>| median (15 runs):	 822854.30884267
>| mean (15 runs):	 821996.668288715
>| max (15 runs):	 837764.83604149
>* Patched (insert per second):
>| min (15 runs):	 816005.94236505
>| median (15 runs):	 829281.27443029
>| mean (15 runs):	 828522.48986598
>| max (15 runs):	 839318.90025576
>
>Em... It looks like a performance improvement, doesn't it? It seems like
>a compiler side-effect (e.g. invalid traces blacklisting), but I didn't
>make a deep investigation for this.
>
>> 
>> 
>> Regards,
>> Sergos
>> 
>
><snipped>
>
>> 
>
>[1]:  https://lists.tarantool.org/pipermail/tarantool-patches/2020-September/019306.html
>[2]:  https://lists.tarantool.org/pipermail/tarantool-patches/2020-September/019521.html
>[3]:  https://gist.github.com/sergos/feb397ed4d5a5f739ee501f768da31e6
>
>-- 
>Best regards,
>IM

[-- Attachment #2: Type: text/html, Size: 8831 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
  2020-09-19 15:29           ` Igor Munkin
@ 2020-09-21 20:31             ` Vladislav Shpilevoy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-21 20:31 UTC (permalink / raw)
  To: Igor Munkin; +Cc: tarantool-patches

Hi! Thanks for the answers!

All good to proceed further.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-21 20:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30 22:44 [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield Igor Munkin
2020-03-31 16:58 ` Konstantin Osipov
2020-03-31 23:57 ` Vladislav Shpilevoy
2020-07-07 22:24   ` Igor Munkin
2020-07-10 10:26     ` sergos
2020-09-21 19:23       ` Igor Munkin
2020-09-21 20:14         ` Sergey Ostanevich
2020-07-11 20:28     ` Vladislav Shpilevoy
2020-09-07 20:35       ` Igor Munkin
2020-09-17 14:21         ` Vladislav Shpilevoy
2020-09-19 15:29           ` Igor Munkin
2020-09-21 20:31             ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox