[Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield

Igor Munkin imun at tarantool.org
Mon Sep 7 23:35:02 MSK 2020


Vlad,

Sorry for such long reply, I just tried to put it all together.

On 11.07.20, Vladislav Shpilevoy wrote:
> Hi! The branch seems to be extremely outdated. Could you
> please rebase it?

I pushed another implementation on the branch. It's a PoC, since I have
several linkage issues in tests and I temporarily disabled unit tests.

> 
> On 08/07/2020 00:24, Igor Munkin wrote:
> > Vlad,
> > 
> > Thanks for your review!
> > 
> > On 01.04.20, Vladislav Shpilevoy wrote:
> >> Hi! Thanks for the patch!
> >>
> >> See 7 comments below.
> >>
> >>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> >>> index 45bc03787..c79aa7c2b 100644
> >>> --- a/src/lua/fiber.c
> >>> +++ b/src/lua/fiber.c
> >>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)

<snipped>

> 
> >> And most importantly, how does it affect perf? New trigger
> >> is +1 virtual call on each yield of every Lua fiber and +1
> >> execution of non-trival function luaJIT_setmode(). I think
> >> it is better to write a micro bench checking how many yields
> >> can we do per time unit before and after this patch. From Lua
> >> fibers.
> > 
> > I made some benchmarks[2] and here are the numbers for the bleeding
> > master:
> > * Vanilla (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> > * Patched [Lua fiber trigger] (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004395
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
> > | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
> > | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
> > * Relative measurements (vanilla -> patched):
> > | #1700 bench: fibers: 10; iterations: 100 => 1.31%
> > | #1700 bench: fibers: 10; iterations: 1000 => 3.7%
> > | #1700 bench: fibers: 10; iterations: 10000 => 3.82%
> > | #1700 bench: fibers: 10; iterations: 100000 => 5.24%
> > | #1700 bench: fibers: 100; iterations: 100 => 2.99%
> > | #1700 bench: fibers: 100; iterations: 1000 => 5.19%
> > | #1700 bench: fibers: 100; iterations: 10000 => 5%
> > | #1700 bench: fibers: 100; iterations: 100000 => 5.24%
> > | #1700 bench: fibers: 1000; iterations: 100 => 6.06%
> > | #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
> > | #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
> > | #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.29%
> > | #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
> > | #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
> > | #1700 bench: fibers: 10000; iterations: 100000 => 0.73%
> 
> So it easily can be -6%. That is disappointing. We just recently
> discussed how we should monitor perf, and here it is.

I updated these benchmarks[1] and re-run them for Release build on dev4 host
and results make me frustrating. Here is also the patch, since the remote
branch is updated.

================================================================================

diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc037..0efc31f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -38,6 +38,7 @@
 #include <lua.h>
 #include <lauxlib.h>
 #include <lualib.h>
+#include <luajit.h>
 
 void
 luaL_testcancel(struct lua_State *L)
@@ -433,14 +434,26 @@ lbox_fiber_info(struct lua_State *L)
 }
 
 static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+       (void) trigger;
+       (void) event;
+       luaJIT_setmode(tarantool_L, 0, -1);
+}
+
+static int
 lua_fiber_run_f(MAYBE_UNUSED va_list ap)
 {
        int result;
+       struct trigger t;
        struct fiber *f = fiber();
        struct lua_State *L = f->storage.lua.stack;
        int coro_ref = lua_tointeger(L, -1);
        lua_pop(L, 1);
+       trigger_create(&t, fiber_on_yield, NULL, NULL);
+       trigger_add(&f->on_yield, &t);
        result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+       trigger_clear(&t);
        /*
         * If fiber is not joinable
         * We can unref child stack here,

================================================================================

* Vanilla -> Patched [Lua fiber trigger] (min, median, mean, max):
| Total runs: #15
| fibers: 10; iters: 100	0%	2%	0%	1%
| fibers: 10; iters: 1000	4%	9%	6%	0%
| fibers: 10; iters: 10000	4%	13%	7%	11%
| fibers: 10; iters: 100000	4%	5%	4%	2%
| fibers: 100; iters: 100	3%	3%	3%	-2%
| fibers: 100; iters: 1000	3%	3%	3%	2%
| fibers: 100; iters: 10000	5%	4%	4%	2%
| fibers: 100; iters: 100000	4%	4%	3%	-1%
| fibers: 1000; iters: 100	9%	17%	15%	13%
| fibers: 1000; iters: 1000	10%	12%	12%	15%
| fibers: 1000; iters: 10000	8%	9%	9%	9%
| fibers: 1000; iters: 100000	7%	8%	8%	5%
| fibers: 10000; iters: 100	4%	5%	5%	3%
| fibers: 10000; iters: 1000	5%	5%	4%	3%
| fibers: 10000; iters: 10000	4%	4%	3%	2%
| fibers: 10000; iters: 100000	5%	5%	4%	5%

As we discussed offline, here are also the numbers for the alternative
patch implementing a noop trigger.

================================================================================

diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc037..959206f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -432,15 +432,26 @@ lbox_fiber_info(struct lua_State *L)
        return 1;
 }
 
+static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+       (void) trigger;
+       (void) event;
+}
+
 static int
 lua_fiber_run_f(MAYBE_UNUSED va_list ap)
 {
        int result;
+       struct trigger t;
        struct fiber *f = fiber();
        struct lua_State *L = f->storage.lua.stack;
        int coro_ref = lua_tointeger(L, -1);
        lua_pop(L, 1);
+       trigger_create(&t, fiber_on_yield, NULL, NULL);
+       trigger_add(&f->on_yield, &t);
        result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+       trigger_clear(&t);
        /*
         * If fiber is not joinable
         * We can unref child stack here,

================================================================================

* Vanilla -> Patched [Lua noop trigger] (min, median, mean, max):
| fibers: 10; iters: 100	0%	0%	0%	0%
| fibers: 10; iters: 1000	2%	10%	4%	-1%
| fibers: 10; iters: 10000	2%	1%	0%	0%
| fibers: 10; iters: 100000	0%	0%	0%	2%
| fibers: 100; iters: 100	1%	0%	1%	-1%
| fibers: 100; iters: 1000	0%	1%	0%	-1%
| fibers: 100; iters: 10000	1%	1%	0%	-3%
| fibers: 100; iters: 100000	1%	2%	1%	-2%
| fibers: 1000; iters: 100	1%	3%	2%	1%
| fibers: 1000; iters: 1000	2%	3%	3%	2%
| fibers: 1000; iters: 10000	1%	2%	2%	3%
| fibers: 1000; iters: 100000	2%	2%	2%	1%
| fibers: 10000; iters: 100	1%	2%	3%	3%
| fibers: 10000; iters: 1000	3%	2%	2%	0%
| fibers: 10000; iters: 10000	1%	2%	1%	0%
| fibers: 10000; iters: 100000	2%	3%	2%	2%

This is freaking unbelievable. Nobody can affect those runs, but numbers
show unacceptable performance loss. I guess there is no way to measure
the approaches more precisely, so I just throw this patch away.

> 
> >> 5. Also there is a question what will happen, if Lua is accessed
> >> from a fiber, created not in Lua? That is basically all pooled
> >> fibers, which serve IProto requests. See struct fiber_pool and its
> >> single usage place. These fibers are started from C, and may end up
> >> executing Lua code, if, for example, an IPROTO_CALL arrived.
> >>
> >> The same question can be applied to fibers, created from C API:
> >>
> >>     API_EXPORT struct fiber * fiber_new(const char *name,
> >>     fiber_func f);
> >>
> >> These fibers won't have the on_yield trigger, but can access Lua.
> >> For example, they can run Lua on_replace triggers in a space by
> >> making an insertion into it. Also they can access Lua API directly.
> >> By calling things like luaT_call() (see module.h).
> > 
> > Well, strictly saying this means we need such trigger on each
> > created/pooled fiber, right?
> 
> Yes. On each fiber ever created in the main thread. Even applier fiber
> can start executing on_replace triggers on a space in Lua.

Well, since there is a public API for fiber creation and we can't
obligue user to set such trigger, this way even doesn't solve the issue.

Let's stop on the approach with extern/virtual handler then.

> 
> >> You may consider adding Jit collection/tracing termination to the
> >> scheduler fiber. It is just one fiber, and from what I know, every
> >> fiber after yield switches current thread to the scheduler fiber. This
> >> also maybe less expensive. We probably won't even notice.
> > 
> > I've tried to implement such approach but it doesn't work. As you
> > mentioned on_yield triggers are not executed for scheduler fiber.
> > Nevertheless, here is the patch (I hope I've not missed something):
> > 
> > ================================================================================
> > 
> > diff --git a/src/lua/init.c b/src/lua/init.c
> > index a0b2fc775..33ddb894d 100644
> > --- a/src/lua/init.c
> > +++ b/src/lua/init.c
> > @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
> >   */
> >  struct fiber *script_fiber;
> >  bool start_loop = true;
> > +static struct trigger lua_on_yield_trigger;
> >  
> >  /* contents of src/lua/ files */
> >  extern char strict_lua[],
> > @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
> >  	return 1;
> >  }
> >  
> > +static int
> > +tarantool_lua_on_yield(struct trigger *trigger, void *event)
> > +{
> > +	(void) trigger;
> > +	(void) event;
> > +	int status = luaJIT_setmode(tarantool_L, 0, -1);
> > +	assert(status == 0);
> > +	return 0;
> > +}
> > +
> >  void
> >  tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
> >  {
> > +	trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
> > +	trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
> > +
> >  	lua_State *L = luaL_newstate();
> >  	if (L == NULL) {
> >  		panic("failed to initialize Lua");
> > 
> > ================================================================================
> 
> I never said sched fiber calls on_yield triggers. Because indeed, they could never
> be installed from anywhere, and there was no need to call them. But I didn't know
> that either.
> 
> Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see
> that coro_transfer() in fiber_yield() gives control to some other fiber, which
> can be either next in this loop iteration, or the scheduler if it is the end of
> the event loop iteration. So it won't work.
> 
> > However, I've implemented your idea in another way: added an auxiliary
> > function field to cord structure to be called on <fiber_yield> or
> > <fiber_call>:
> > > And here are the results:
> > * Vanilla (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> > * Patched [cord callback] (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015236
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
> > | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
> > | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
> > | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
> > | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
> > * Relative measurements (vanilla -> patched):
> > | #1700 bench: fibers: 10; iterations: 100 => -2.22%
> > | #1700 bench: fibers: 10; iterations: 1000 => 6.36%
> > | #1700 bench: fibers: 10; iterations: 10000 => 8.15%
> > | #1700 bench: fibers: 10; iterations: 100000 => 9.23%
> > | #1700 bench: fibers: 100; iterations: 100 => 2.83%
> > | #1700 bench: fibers: 100; iterations: 1000 => 7.78%
> > | #1700 bench: fibers: 100; iterations: 10000 => 7.6%
> > | #1700 bench: fibers: 100; iterations: 100000 => 7.81%
> > | #1700 bench: fibers: 1000; iterations: 100 => 8%
> > | #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
> > | #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
> > | #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
> > | #1700 bench: fibers: 10000; iterations: 100 => 4.2%
> > | #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
> > | #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
> > | #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
> > 
> > Looks like this way is slower than the one implemented via triggers.
> 
> I don't understand why. This is very suspicious. It does not look right.
> Number of calls is the same. Also you removed one lookup in the rlist of
> on_yield triggers. Perhaps you did the bench in a different build mode?
> Or did you run something heavy in background, browser with music/video
> maybe?

The build mode was the same and the only heavy thing in background was
heavy metal, but it was streamed from another device.

I re-run benchmarks for Release build mode on dev4 host. Here is also
the patch.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..d9a0721 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
 void
 fiber_call(struct fiber *callee)
 {
+       struct cord *cord = cord();
        struct fiber *caller = fiber();
        assert(! (caller->flags & FIBER_IS_READY));
        assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
        slab_cache_set_thread(&cord()->slabc);
 
        cord->id = pthread_self();
+       cord->lua_on_yield = NULL;
        cord->on_exit = NULL;
        slab_cache_create(&cord->slabc, &runtime);
        mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index 16ee9f4..4cbf557 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -540,6 +540,7 @@ struct cord_on_exit;
  * model.
  */
 struct cord {
+       void (*lua_on_yield)(void);
        /** The fiber that is currently being executed. */
        struct fiber *fiber;
        struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc7..65e4b5d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
        lua_atpanic(L, tarantool_panic_handler);
        /* clear possible left-overs of init */
        lua_settop(L, 0);
+       cord()->lua_on_yield = tarantool_lua_on_yield;
        tarantool_L = L;
 }
 
diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0..c1885b8 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -37,6 +37,8 @@
 #include <diag.h>
 #include <fiber.h>
 
+#include <luajit.h>
+
 #include "serializer_opts.h"
 
 int luaL_nil_ref = LUA_REFNIL;
@@ -1308,3 +1310,9 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+void
+tarantool_lua_on_yield(void)
+{
+       (void)luaJIT_setmode(tarantool_L, 0, -1);
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index 7e02a05..cf8fe19 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -665,6 +665,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
 int
 tarantool_lua_utils_init(struct lua_State *L);
 
+void
+tarantool_lua_on_yield(void);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */

================================================================================

* Vanilla -> Patched [virtual cord callback] (min, median, mean, max):
| fibers: 10; iters: 100	2%	3%	1%	1%
| fibers: 10; iters: 1000	12%	17%	10%	3%
| fibers: 10; iters: 10000	6%	11%	8%	7%
| fibers: 10; iters: 100000	5%	6%	7%	10%
| fibers: 100; iters: 100	7%	5%	7%	3%
| fibers: 100; iters: 1000	10%	13%	12%	13%
| fibers: 100; iters: 10000	12%	12%	11%	8%
| fibers: 100; iters: 100000	12%	12%	12%	11%
| fibers: 1000; iters: 100	14%	14%	13%	8%
| fibers: 1000; iters: 1000	16%	16%	16%	13%
| fibers: 1000; iters: 10000	16%	16%	16%	14%
| fibers: 1000; iters: 100000	16%	16%	16%	14%
| fibers: 10000; iters: 100	3%	3%	3%	3%
| fibers: 10000; iters: 1000	2%	2%	2%	7%
| fibers: 10000; iters: 10000	2%	2%	2%	1%
| fibers: 10000; iters: 100000	2%	3%	2%	0%

As we discussed offline, here are also the numbers for the alternative
patch implementing a noop virtual callback.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..d9a0721 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
 void
 fiber_call(struct fiber *callee)
 {
+       struct cord *cord = cord();
        struct fiber *caller = fiber();
        assert(! (caller->flags & FIBER_IS_READY));
        assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+       if (cord->lua_on_yield)
+               cord->lua_on_yield();
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
        slab_cache_set_thread(&cord()->slabc);
 
        cord->id = pthread_self();
+       cord->lua_on_yield = NULL;
        cord->on_exit = NULL;
        slab_cache_create(&cord->slabc, &runtime);
        mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index fffbe8f..51f2e9c 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -534,6 +534,7 @@ struct cord_on_exit;
  * model.
  */
 struct cord {
+       void (*lua_on_yield)(void);
        /** The fiber that is currently being executed. */
        struct fiber *fiber;
        struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc7..65e4b5d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
        lua_atpanic(L, tarantool_panic_handler);
        /* clear possible left-overs of init */
        lua_settop(L, 0);
+       cord()->lua_on_yield = tarantool_lua_on_yield;
        tarantool_L = L;
 }
 
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..1035b1e 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+void tarantool_lua_on_yield(void)
+{
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index b10754e..71dc67f 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
 int
 tarantool_lua_utils_init(struct lua_State *L);
 
+void
+tarantool_lua_on_yield(void);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */

================================================================================

* Vanilla -> Patched [virtual cord noop callback] (min, median, mean, max):
| fibers: 10; iters: 100	3%	3%	2%	1%
| fibers: 10; iters: 1000	-6%	7%	2%	1%
| fibers: 10; iters: 10000	2%	1%	0%	4%
| fibers: 10; iters: 100000	2%	3%	2%	3%
| fibers: 100; iters: 100	4%	6%	5%	2%
| fibers: 100; iters: 1000	8%	9%	9%	7%
| fibers: 100; iters: 10000	10%	10%	8%	6%
| fibers: 100; iters: 100000	10%	9%	10%	8%
| fibers: 1000; iters: 100	13%	13%	12%	8%
| fibers: 1000; iters: 1000	14%	15%	15%	16%
| fibers: 1000; iters: 10000	14%	15%	15%	16%
| fibers: 1000; iters: 100000	14%	15%	14%	12%
| fibers: 10000; iters: 100	0%	1%	2%	3%
| fibers: 10000; iters: 1000	1%	1%	1%	0%
| fibers: 10000; iters: 10000	1%	2%	2%	4%
| fibers: 10000; iters: 100000	3%	4%	3%	4%

Well... The numbers are horrible. Even noop ones. I drop this patch too.

> 
> Also the reason may be that when you installed the trigger, it worked
> for Lua fibers only. But when you did it correct, for all fibers, we got
> the real perf degradation value. Try to create a global int64 counter
> and increment it on each call. Then compare its final value in the previous
> solution with on_yield triggers vs the current solution. It would allow to
> tell whether this new function is called more often in the latest solution.

Precisely. The extern/virtual function is called a bit more often:
* Patched [Lua fiber trigger] (on_yield handler calls):
| fibers: 10; iters: 100	1001
| fibers: 10; iters: 1000	10001
| fibers: 10; iters: 10000	100001
| fibers: 10; iters: 100000	1000001
| fibers: 100; iters: 100	10001
| fibers: 100; iters: 1000	100001
| fibers: 100; iters: 10000	1000001
| fibers: 100; iters: 100000	10000001
| fibers: 1000; iters: 100	100001
| fibers: 1000; iters: 1000	1000001
| fibers: 1000; iters: 10000	10000001
| fibers: 1000; iters: 100000	100000001
| fibers: 10000; iters: 100	1000001
| fibers: 10000; iters: 1000	10000001
| fibers: 10000; iters: 10000	100000001
| fibers: 10000; iters: 100000	1000000001
* Patched [virtual cord callback] (on_yield handler calls):
| fibers: 10; iters: 100	1117
| fibers: 10; iters: 1000	11017
| fibers: 10; iters: 10000	110017
| fibers: 10; iters: 100000	1100017
| fibers: 100; iters: 100	10207
| fibers: 100; iters: 1000	101107
| fibers: 100; iters: 10000	1010107
| fibers: 100; iters: 100000	10100107
| fibers: 1000; iters: 100	101107
| fibers: 1000; iters: 1000	1002007
| fibers: 1000; iters: 10000	10011007
| fibers: 1000; iters: 100000	100101007
| fibers: 10000; iters: 100	1010107
| fibers: 10000; iters: 1000	10011007
| fibers: 10000; iters: 10000	100020007
| fibers: 10000; iters: 100000	1000110007

> 
> Try to make the function extern instead of virtual. May look ugly,
> but -9% worst degradation is uglier. I hope the extern hack will help.

I made this patch (on the branch). Omitting the issues with unit tests
linkage, the results are definitely prettier than ones obtained with
virtual cord callback. The diff is below.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..ed6104c 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
 #if ENABLE_FIBER_TOP
 #include <x86intrin.h> /* __rdtscp() */
 
+extern void lua_on_yield(void);
+
 static inline void
 clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
 {
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..68f7fb5 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,10 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+#include <luajit.h>
+
+void lua_on_yield(void)
+{
+       luaJIT_setmode(tarantool_L, 0, -1);
+}

================================================================================

* Vanilla -> Patched [extern callback] (min, median, mean, max):
| fibers: 10; iters: 100	1%	1%	0%	0%
| fibers: 10; iters: 1000	12%	8%	6%	0%
| fibers: 10; iters: 10000	6%	9%	7%	4%
| fibers: 10; iters: 100000	2%	3%	3%	5%
| fibers: 100; iters: 100	1%	1%	0%	-5%
| fibers: 100; iters: 1000	2%	3%	3%	3%
| fibers: 100; iters: 10000	3%	3%	2%	-1%
| fibers: 100; iters: 100000	3%	3%	4%	3%
| fibers: 1000; iters: 100	1%	1%	1%	-1%
| fibers: 1000; iters: 1000	1%	2%	1%	1%
| fibers: 1000; iters: 10000	2%	2%	2%	3%
| fibers: 1000; iters: 100000	1%	1%	1%	0%
| fibers: 10000; iters: 100	0%	0%	0%	0%
| fibers: 10000; iters: 1000	0%	2%	1%	2%
| fibers: 10000; iters: 10000	0%	0%	0%	3%
| fibers: 10000; iters: 100000	1%	2%	0%	0%

As we discussed offline, here are also the numbers for the alternative
patch implementing a noop extern callback.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..ed6104c 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
 #if ENABLE_FIBER_TOP
 #include <x86intrin.h> /* __rdtscp() */
 
+extern void lua_on_yield(void);
+
 static inline void
 clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
 {
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..7d0962f 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+void lua_on_yield(void)
+{
+}

================================================================================

* Vanilla -> Patched [extern noop callback] (min, median, mean, max):
| fibers: 10; iters: 100	0%	2%	0%	0%
| fibers: 10; iters: 1000	1%	3%	1%	-1%
| fibers: 10; iters: 10000	-1%	0%	-1%	-3%
| fibers: 10; iters: 100000	-2%	0%	-1%	0%
| fibers: 100; iters: 100	0%	-1%	0%	-4%
| fibers: 100; iters: 1000	0%	1%	0%	0%
| fibers: 100; iters: 10000	0%	0%	0%	-3%
| fibers: 100; iters: 100000	0%	1%	0%	-2%
| fibers: 1000; iters: 100	0%	0%	-1%	-3%
| fibers: 1000; iters: 1000	0%	0%	0%	1%
| fibers: 1000; iters: 10000	0%	0%	0%	0%
| fibers: 1000; iters: 100000	0%	0%	0%	-1%
| fibers: 10000; iters: 100	0%	0%	0%	2%
| fibers: 10000; iters: 1000	0%	-1%	0%	2%
| fibers: 10000; iters: 10000	-1%	-1%	0%	0%
| fibers: 10000; iters: 100000	-1%	0%	-1%	-3%

And here is a final one. I personally don't like it (considering my
comments in the previous reply), but *for now* it can be a solution.
These are the changes I pushed to the remote branch[2] and the diff (I
stripped unit test related part) with the benchmark results are below.

================================================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3ce1..ed6104c8d 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
 #if ENABLE_FIBER_TOP
 #include <x86intrin.h> /* __rdtscp() */
 
+extern void lua_on_yield(void);
+
 static inline void
 clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
 {
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
        callee->caller = caller;
        callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
        /** By convention, these triggers must not throw. */
        if (! rlist_empty(&caller->on_yield))
                trigger_run(&caller->on_yield, NULL);
+
+       if (cord_is_main())
+               lua_on_yield();
+
        clock_set_on_csw(caller);
 
        assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0a2..49e3c2bf0 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L)
        luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
        return 0;
 }
+
+#include "lj_trace.h"
+void lua_on_yield(void)
+{
+       lj_trace_abort(G(tarantool_L));
+}

================================================================================

* Vanilla -> Patched [extern macro callback] (min, median, mean, max):
| fibers: 10; iters: 100	1%	1%	0%	0%
| fibers: 10; iters: 1000	0%	4%	0%	-1%
| fibers: 10; iters: 10000	0%	5%	2%	6%
| fibers: 10; iters: 100000	0%	0%	0%	0%
| fibers: 100; iters: 100	0%	-4%	-3%	-6%
| fibers: 100; iters: 1000	0%	3%	1%	0%
| fibers: 100; iters: 10000	0%	0%	0%	-2%
| fibers: 100; iters: 100000	0%	1%	0%	-2%
| fibers: 1000; iters: 100	0%	0%	0%	-4%
| fibers: 1000; iters: 1000	0%	0%	0%	-1%
| fibers: 1000; iters: 10000	0%	0%	0%	0%
| fibers: 1000; iters: 100000	0%	0%	0%	-1%
| fibers: 10000; iters: 100	-1%	1%	1%	2%
| fibers: 10000; iters: 1000	-1%	0%	0%	2%
| fibers: 10000; iters: 10000	0%	0%	0%	0%
| fibers: 10000; iters: 100000	0%	0%	0%	0%

There was also an alternative idea by Sergos: introduce a special
parameter to enable such feature by demand.

[1]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e
[2]: https://github.com/tarantool/tarantool/commit/466ce26

-- 
Best regards,
IM


More information about the Tarantool-patches mailing list