[Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
Igor Munkin
imun at tarantool.org
Mon Sep 7 23:35:02 MSK 2020
Vlad,
Sorry for such long reply, I just tried to put it all together.
On 11.07.20, Vladislav Shpilevoy wrote:
> Hi! The branch seems to be extremely outdated. Could you
> please rebase it?
I pushed another implementation on the branch. It's a PoC, since I have
several linkage issues in tests and I temporarily disabled unit tests.
>
> On 08/07/2020 00:24, Igor Munkin wrote:
> > Vlad,
> >
> > Thanks for your review!
> >
> > On 01.04.20, Vladislav Shpilevoy wrote:
> >> Hi! Thanks for the patch!
> >>
> >> See 7 comments below.
> >>
> >>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> >>> index 45bc03787..c79aa7c2b 100644
> >>> --- a/src/lua/fiber.c
> >>> +++ b/src/lua/fiber.c
> >>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
<snipped>
>
> >> And most importantly, how does it affect perf? New trigger
> >> is +1 virtual call on each yield of every Lua fiber and +1
> >> execution of non-trival function luaJIT_setmode(). I think
> >> it is better to write a micro bench checking how many yields
> >> can we do per time unit before and after this patch. From Lua
> >> fibers.
> >
> > I made some benchmarks[2] and here are the numbers for the bleeding
> > master:
> > * Vanilla (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> > * Patched [Lua fiber trigger] (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004395
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
> > | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
> > | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
> > * Relative measurements (vanilla -> patched):
> > | #1700 bench: fibers: 10; iterations: 100 => 1.31%
> > | #1700 bench: fibers: 10; iterations: 1000 => 3.7%
> > | #1700 bench: fibers: 10; iterations: 10000 => 3.82%
> > | #1700 bench: fibers: 10; iterations: 100000 => 5.24%
> > | #1700 bench: fibers: 100; iterations: 100 => 2.99%
> > | #1700 bench: fibers: 100; iterations: 1000 => 5.19%
> > | #1700 bench: fibers: 100; iterations: 10000 => 5%
> > | #1700 bench: fibers: 100; iterations: 100000 => 5.24%
> > | #1700 bench: fibers: 1000; iterations: 100 => 6.06%
> > | #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
> > | #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
> > | #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.29%
> > | #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
> > | #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
> > | #1700 bench: fibers: 10000; iterations: 100000 => 0.73%
>
> So it easily can be -6%. That is disappointing. We just recently
> discussed how we should monitor perf, and here it is.
I updated these benchmarks[1] and re-run them for Release build on dev4 host
and results make me frustrating. Here is also the patch, since the remote
branch is updated.
================================================================================
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc037..0efc31f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -38,6 +38,7 @@
#include <lua.h>
#include <lauxlib.h>
#include <lualib.h>
+#include <luajit.h>
void
luaL_testcancel(struct lua_State *L)
@@ -433,14 +434,26 @@ lbox_fiber_info(struct lua_State *L)
}
static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+ (void) trigger;
+ (void) event;
+ luaJIT_setmode(tarantool_L, 0, -1);
+}
+
+static int
lua_fiber_run_f(MAYBE_UNUSED va_list ap)
{
int result;
+ struct trigger t;
struct fiber *f = fiber();
struct lua_State *L = f->storage.lua.stack;
int coro_ref = lua_tointeger(L, -1);
lua_pop(L, 1);
+ trigger_create(&t, fiber_on_yield, NULL, NULL);
+ trigger_add(&f->on_yield, &t);
result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+ trigger_clear(&t);
/*
* If fiber is not joinable
* We can unref child stack here,
================================================================================
* Vanilla -> Patched [Lua fiber trigger] (min, median, mean, max):
| Total runs: #15
| fibers: 10; iters: 100 0% 2% 0% 1%
| fibers: 10; iters: 1000 4% 9% 6% 0%
| fibers: 10; iters: 10000 4% 13% 7% 11%
| fibers: 10; iters: 100000 4% 5% 4% 2%
| fibers: 100; iters: 100 3% 3% 3% -2%
| fibers: 100; iters: 1000 3% 3% 3% 2%
| fibers: 100; iters: 10000 5% 4% 4% 2%
| fibers: 100; iters: 100000 4% 4% 3% -1%
| fibers: 1000; iters: 100 9% 17% 15% 13%
| fibers: 1000; iters: 1000 10% 12% 12% 15%
| fibers: 1000; iters: 10000 8% 9% 9% 9%
| fibers: 1000; iters: 100000 7% 8% 8% 5%
| fibers: 10000; iters: 100 4% 5% 5% 3%
| fibers: 10000; iters: 1000 5% 5% 4% 3%
| fibers: 10000; iters: 10000 4% 4% 3% 2%
| fibers: 10000; iters: 100000 5% 5% 4% 5%
As we discussed offline, here are also the numbers for the alternative
patch implementing a noop trigger.
================================================================================
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 45bc037..959206f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -432,15 +432,26 @@ lbox_fiber_info(struct lua_State *L)
return 1;
}
+static int
+fiber_on_yield(struct trigger *trigger, void *event)
+{
+ (void) trigger;
+ (void) event;
+}
+
static int
lua_fiber_run_f(MAYBE_UNUSED va_list ap)
{
int result;
+ struct trigger t;
struct fiber *f = fiber();
struct lua_State *L = f->storage.lua.stack;
int coro_ref = lua_tointeger(L, -1);
lua_pop(L, 1);
+ trigger_create(&t, fiber_on_yield, NULL, NULL);
+ trigger_add(&f->on_yield, &t);
result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET);
+ trigger_clear(&t);
/*
* If fiber is not joinable
* We can unref child stack here,
================================================================================
* Vanilla -> Patched [Lua noop trigger] (min, median, mean, max):
| fibers: 10; iters: 100 0% 0% 0% 0%
| fibers: 10; iters: 1000 2% 10% 4% -1%
| fibers: 10; iters: 10000 2% 1% 0% 0%
| fibers: 10; iters: 100000 0% 0% 0% 2%
| fibers: 100; iters: 100 1% 0% 1% -1%
| fibers: 100; iters: 1000 0% 1% 0% -1%
| fibers: 100; iters: 10000 1% 1% 0% -3%
| fibers: 100; iters: 100000 1% 2% 1% -2%
| fibers: 1000; iters: 100 1% 3% 2% 1%
| fibers: 1000; iters: 1000 2% 3% 3% 2%
| fibers: 1000; iters: 10000 1% 2% 2% 3%
| fibers: 1000; iters: 100000 2% 2% 2% 1%
| fibers: 10000; iters: 100 1% 2% 3% 3%
| fibers: 10000; iters: 1000 3% 2% 2% 0%
| fibers: 10000; iters: 10000 1% 2% 1% 0%
| fibers: 10000; iters: 100000 2% 3% 2% 2%
This is freaking unbelievable. Nobody can affect those runs, but numbers
show unacceptable performance loss. I guess there is no way to measure
the approaches more precisely, so I just throw this patch away.
>
> >> 5. Also there is a question what will happen, if Lua is accessed
> >> from a fiber, created not in Lua? That is basically all pooled
> >> fibers, which serve IProto requests. See struct fiber_pool and its
> >> single usage place. These fibers are started from C, and may end up
> >> executing Lua code, if, for example, an IPROTO_CALL arrived.
> >>
> >> The same question can be applied to fibers, created from C API:
> >>
> >> API_EXPORT struct fiber * fiber_new(const char *name,
> >> fiber_func f);
> >>
> >> These fibers won't have the on_yield trigger, but can access Lua.
> >> For example, they can run Lua on_replace triggers in a space by
> >> making an insertion into it. Also they can access Lua API directly.
> >> By calling things like luaT_call() (see module.h).
> >
> > Well, strictly saying this means we need such trigger on each
> > created/pooled fiber, right?
>
> Yes. On each fiber ever created in the main thread. Even applier fiber
> can start executing on_replace triggers on a space in Lua.
Well, since there is a public API for fiber creation and we can't
obligue user to set such trigger, this way even doesn't solve the issue.
Let's stop on the approach with extern/virtual handler then.
>
> >> You may consider adding Jit collection/tracing termination to the
> >> scheduler fiber. It is just one fiber, and from what I know, every
> >> fiber after yield switches current thread to the scheduler fiber. This
> >> also maybe less expensive. We probably won't even notice.
> >
> > I've tried to implement such approach but it doesn't work. As you
> > mentioned on_yield triggers are not executed for scheduler fiber.
> > Nevertheless, here is the patch (I hope I've not missed something):
> >
> > ================================================================================
> >
> > diff --git a/src/lua/init.c b/src/lua/init.c
> > index a0b2fc775..33ddb894d 100644
> > --- a/src/lua/init.c
> > +++ b/src/lua/init.c
> > @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
> > */
> > struct fiber *script_fiber;
> > bool start_loop = true;
> > +static struct trigger lua_on_yield_trigger;
> >
> > /* contents of src/lua/ files */
> > extern char strict_lua[],
> > @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
> > return 1;
> > }
> >
> > +static int
> > +tarantool_lua_on_yield(struct trigger *trigger, void *event)
> > +{
> > + (void) trigger;
> > + (void) event;
> > + int status = luaJIT_setmode(tarantool_L, 0, -1);
> > + assert(status == 0);
> > + return 0;
> > +}
> > +
> > void
> > tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
> > {
> > + trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
> > + trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
> > +
> > lua_State *L = luaL_newstate();
> > if (L == NULL) {
> > panic("failed to initialize Lua");
> >
> > ================================================================================
>
> I never said sched fiber calls on_yield triggers. Because indeed, they could never
> be installed from anywhere, and there was no need to call them. But I didn't know
> that either.
>
> Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see
> that coro_transfer() in fiber_yield() gives control to some other fiber, which
> can be either next in this loop iteration, or the scheduler if it is the end of
> the event loop iteration. So it won't work.
>
> > However, I've implemented your idea in another way: added an auxiliary
> > function field to cord structure to be called on <fiber_yield> or
> > <fiber_call>:
> > > And here are the results:
> > * Vanilla (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> > * Patched [cord callback] (mean, seconds):
> > | Total runs: #15
> > | #1700 bench: fibers: 10; iterations: 100 => 0.0015236
> > | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
> > | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
> > | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
> > | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
> > | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
> > | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
> > | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
> > | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
> > | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
> > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
> > | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
> > | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
> > | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
> > | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
> > | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
> > * Relative measurements (vanilla -> patched):
> > | #1700 bench: fibers: 10; iterations: 100 => -2.22%
> > | #1700 bench: fibers: 10; iterations: 1000 => 6.36%
> > | #1700 bench: fibers: 10; iterations: 10000 => 8.15%
> > | #1700 bench: fibers: 10; iterations: 100000 => 9.23%
> > | #1700 bench: fibers: 100; iterations: 100 => 2.83%
> > | #1700 bench: fibers: 100; iterations: 1000 => 7.78%
> > | #1700 bench: fibers: 100; iterations: 10000 => 7.6%
> > | #1700 bench: fibers: 100; iterations: 100000 => 7.81%
> > | #1700 bench: fibers: 1000; iterations: 100 => 8%
> > | #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
> > | #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
> > | #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
> > | #1700 bench: fibers: 10000; iterations: 100 => 4.2%
> > | #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
> > | #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
> > | #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
> >
> > Looks like this way is slower than the one implemented via triggers.
>
> I don't understand why. This is very suspicious. It does not look right.
> Number of calls is the same. Also you removed one lookup in the rlist of
> on_yield triggers. Perhaps you did the bench in a different build mode?
> Or did you run something heavy in background, browser with music/video
> maybe?
The build mode was the same and the only heavy thing in background was
heavy metal, but it was streamed from another device.
I re-run benchmarks for Release build mode on dev4 host. Here is also
the patch.
================================================================================
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..d9a0721 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
void
fiber_call(struct fiber *callee)
{
+ struct cord *cord = cord();
struct fiber *caller = fiber();
assert(! (caller->flags & FIBER_IS_READY));
assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+ if (cord->lua_on_yield)
+ cord->lua_on_yield();
clock_set_on_csw(caller);
callee->caller = caller;
callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+ if (cord->lua_on_yield)
+ cord->lua_on_yield();
clock_set_on_csw(caller);
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
slab_cache_set_thread(&cord()->slabc);
cord->id = pthread_self();
+ cord->lua_on_yield = NULL;
cord->on_exit = NULL;
slab_cache_create(&cord->slabc, &runtime);
mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index 16ee9f4..4cbf557 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -540,6 +540,7 @@ struct cord_on_exit;
* model.
*/
struct cord {
+ void (*lua_on_yield)(void);
/** The fiber that is currently being executed. */
struct fiber *fiber;
struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc7..65e4b5d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
lua_atpanic(L, tarantool_panic_handler);
/* clear possible left-overs of init */
lua_settop(L, 0);
+ cord()->lua_on_yield = tarantool_lua_on_yield;
tarantool_L = L;
}
diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0..c1885b8 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -37,6 +37,8 @@
#include <diag.h>
#include <fiber.h>
+#include <luajit.h>
+
#include "serializer_opts.h"
int luaL_nil_ref = LUA_REFNIL;
@@ -1308,3 +1310,9 @@ tarantool_lua_utils_init(struct lua_State *L)
luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
return 0;
}
+
+void
+tarantool_lua_on_yield(void)
+{
+ (void)luaJIT_setmode(tarantool_L, 0, -1);
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index 7e02a05..cf8fe19 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -665,6 +665,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
int
tarantool_lua_utils_init(struct lua_State *L);
+void
+tarantool_lua_on_yield(void);
+
#if defined(__cplusplus)
} /* extern "C" */
#endif /* defined(__cplusplus) */
================================================================================
* Vanilla -> Patched [virtual cord callback] (min, median, mean, max):
| fibers: 10; iters: 100 2% 3% 1% 1%
| fibers: 10; iters: 1000 12% 17% 10% 3%
| fibers: 10; iters: 10000 6% 11% 8% 7%
| fibers: 10; iters: 100000 5% 6% 7% 10%
| fibers: 100; iters: 100 7% 5% 7% 3%
| fibers: 100; iters: 1000 10% 13% 12% 13%
| fibers: 100; iters: 10000 12% 12% 11% 8%
| fibers: 100; iters: 100000 12% 12% 12% 11%
| fibers: 1000; iters: 100 14% 14% 13% 8%
| fibers: 1000; iters: 1000 16% 16% 16% 13%
| fibers: 1000; iters: 10000 16% 16% 16% 14%
| fibers: 1000; iters: 100000 16% 16% 16% 14%
| fibers: 10000; iters: 100 3% 3% 3% 3%
| fibers: 10000; iters: 1000 2% 2% 2% 7%
| fibers: 10000; iters: 10000 2% 2% 2% 1%
| fibers: 10000; iters: 100000 2% 3% 2% 0%
As we discussed offline, here are also the numbers for the alternative
patch implementing a noop virtual callback.
================================================================================
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..d9a0721 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee)
void
fiber_call(struct fiber *callee)
{
+ struct cord *cord = cord();
struct fiber *caller = fiber();
assert(! (caller->flags & FIBER_IS_READY));
assert(rlist_empty(&callee->state));
@@ -416,6 +417,8 @@ fiber_call(struct fiber *callee)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+ if (cord->lua_on_yield)
+ cord->lua_on_yield();
clock_set_on_csw(caller);
callee->caller = caller;
callee->flags |= FIBER_IS_READY;
@@ -645,6 +648,8 @@ fiber_yield(void)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+ if (cord->lua_on_yield)
+ cord->lua_on_yield();
clock_set_on_csw(caller);
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
@@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name)
slab_cache_set_thread(&cord()->slabc);
cord->id = pthread_self();
+ cord->lua_on_yield = NULL;
cord->on_exit = NULL;
slab_cache_create(&cord->slabc, &runtime);
mempool_create(&cord->fiber_mempool, &cord->slabc,
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index fffbe8f..51f2e9c 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -534,6 +534,7 @@ struct cord_on_exit;
* model.
*/
struct cord {
+ void (*lua_on_yield)(void);
/** The fiber that is currently being executed. */
struct fiber *fiber;
struct ev_loop *loop;
diff --git a/src/lua/init.c b/src/lua/init.c
index a0b2fc7..65e4b5d 100644
--- a/src/lua/init.c
+++ b/src/lua/init.c
@@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
lua_atpanic(L, tarantool_panic_handler);
/* clear possible left-overs of init */
lua_settop(L, 0);
+ cord()->lua_on_yield = tarantool_lua_on_yield;
tarantool_L = L;
}
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..1035b1e 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
return 0;
}
+
+void tarantool_lua_on_yield(void)
+{
+}
diff --git a/src/lua/utils.h b/src/lua/utils.h
index b10754e..71dc67f 100644
--- a/src/lua/utils.h
+++ b/src/lua/utils.h
@@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it);
int
tarantool_lua_utils_init(struct lua_State *L);
+void
+tarantool_lua_on_yield(void);
+
#if defined(__cplusplus)
} /* extern "C" */
#endif /* defined(__cplusplus) */
================================================================================
* Vanilla -> Patched [virtual cord noop callback] (min, median, mean, max):
| fibers: 10; iters: 100 3% 3% 2% 1%
| fibers: 10; iters: 1000 -6% 7% 2% 1%
| fibers: 10; iters: 10000 2% 1% 0% 4%
| fibers: 10; iters: 100000 2% 3% 2% 3%
| fibers: 100; iters: 100 4% 6% 5% 2%
| fibers: 100; iters: 1000 8% 9% 9% 7%
| fibers: 100; iters: 10000 10% 10% 8% 6%
| fibers: 100; iters: 100000 10% 9% 10% 8%
| fibers: 1000; iters: 100 13% 13% 12% 8%
| fibers: 1000; iters: 1000 14% 15% 15% 16%
| fibers: 1000; iters: 10000 14% 15% 15% 16%
| fibers: 1000; iters: 100000 14% 15% 14% 12%
| fibers: 10000; iters: 100 0% 1% 2% 3%
| fibers: 10000; iters: 1000 1% 1% 1% 0%
| fibers: 10000; iters: 10000 1% 2% 2% 4%
| fibers: 10000; iters: 100000 3% 4% 3% 4%
Well... The numbers are horrible. Even noop ones. I drop this patch too.
>
> Also the reason may be that when you installed the trigger, it worked
> for Lua fibers only. But when you did it correct, for all fibers, we got
> the real perf degradation value. Try to create a global int64 counter
> and increment it on each call. Then compare its final value in the previous
> solution with on_yield triggers vs the current solution. It would allow to
> tell whether this new function is called more often in the latest solution.
Precisely. The extern/virtual function is called a bit more often:
* Patched [Lua fiber trigger] (on_yield handler calls):
| fibers: 10; iters: 100 1001
| fibers: 10; iters: 1000 10001
| fibers: 10; iters: 10000 100001
| fibers: 10; iters: 100000 1000001
| fibers: 100; iters: 100 10001
| fibers: 100; iters: 1000 100001
| fibers: 100; iters: 10000 1000001
| fibers: 100; iters: 100000 10000001
| fibers: 1000; iters: 100 100001
| fibers: 1000; iters: 1000 1000001
| fibers: 1000; iters: 10000 10000001
| fibers: 1000; iters: 100000 100000001
| fibers: 10000; iters: 100 1000001
| fibers: 10000; iters: 1000 10000001
| fibers: 10000; iters: 10000 100000001
| fibers: 10000; iters: 100000 1000000001
* Patched [virtual cord callback] (on_yield handler calls):
| fibers: 10; iters: 100 1117
| fibers: 10; iters: 1000 11017
| fibers: 10; iters: 10000 110017
| fibers: 10; iters: 100000 1100017
| fibers: 100; iters: 100 10207
| fibers: 100; iters: 1000 101107
| fibers: 100; iters: 10000 1010107
| fibers: 100; iters: 100000 10100107
| fibers: 1000; iters: 100 101107
| fibers: 1000; iters: 1000 1002007
| fibers: 1000; iters: 10000 10011007
| fibers: 1000; iters: 100000 100101007
| fibers: 10000; iters: 100 1010107
| fibers: 10000; iters: 1000 10011007
| fibers: 10000; iters: 10000 100020007
| fibers: 10000; iters: 100000 1000110007
>
> Try to make the function extern instead of virtual. May look ugly,
> but -9% worst degradation is uglier. I hope the extern hack will help.
I made this patch (on the branch). Omitting the issues with unit tests
linkage, the results are definitely prettier than ones obtained with
virtual cord callback. The diff is below.
================================================================================
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..ed6104c 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
#if ENABLE_FIBER_TOP
#include <x86intrin.h> /* __rdtscp() */
+extern void lua_on_yield(void);
+
static inline void
clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
{
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+
+ if (cord_is_main())
+ lua_on_yield();
+
clock_set_on_csw(caller);
callee->caller = caller;
callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+
+ if (cord_is_main())
+ lua_on_yield();
+
clock_set_on_csw(caller);
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..68f7fb5 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,10 @@ tarantool_lua_utils_init(struct lua_State *L)
luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
return 0;
}
+
+#include <luajit.h>
+
+void lua_on_yield(void)
+{
+ luaJIT_setmode(tarantool_L, 0, -1);
+}
================================================================================
* Vanilla -> Patched [extern callback] (min, median, mean, max):
| fibers: 10; iters: 100 1% 1% 0% 0%
| fibers: 10; iters: 1000 12% 8% 6% 0%
| fibers: 10; iters: 10000 6% 9% 7% 4%
| fibers: 10; iters: 100000 2% 3% 3% 5%
| fibers: 100; iters: 100 1% 1% 0% -5%
| fibers: 100; iters: 1000 2% 3% 3% 3%
| fibers: 100; iters: 10000 3% 3% 2% -1%
| fibers: 100; iters: 100000 3% 3% 4% 3%
| fibers: 1000; iters: 100 1% 1% 1% -1%
| fibers: 1000; iters: 1000 1% 2% 1% 1%
| fibers: 1000; iters: 10000 2% 2% 2% 3%
| fibers: 1000; iters: 100000 1% 1% 1% 0%
| fibers: 10000; iters: 100 0% 0% 0% 0%
| fibers: 10000; iters: 1000 0% 2% 1% 2%
| fibers: 10000; iters: 10000 0% 0% 0% 3%
| fibers: 10000; iters: 100000 1% 2% 0% 0%
As we discussed offline, here are also the numbers for the alternative
patch implementing a noop extern callback.
================================================================================
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3c..ed6104c 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
#if ENABLE_FIBER_TOP
#include <x86intrin.h> /* __rdtscp() */
+extern void lua_on_yield(void);
+
static inline void
clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
{
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+
+ if (cord_is_main())
+ lua_on_yield();
+
clock_set_on_csw(caller);
callee->caller = caller;
callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+
+ if (cord_is_main())
+ lua_on_yield();
+
clock_set_on_csw(caller);
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 0b05d72..7d0962f 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
return 0;
}
+
+void lua_on_yield(void)
+{
+}
================================================================================
* Vanilla -> Patched [extern noop callback] (min, median, mean, max):
| fibers: 10; iters: 100 0% 2% 0% 0%
| fibers: 10; iters: 1000 1% 3% 1% -1%
| fibers: 10; iters: 10000 -1% 0% -1% -3%
| fibers: 10; iters: 100000 -2% 0% -1% 0%
| fibers: 100; iters: 100 0% -1% 0% -4%
| fibers: 100; iters: 1000 0% 1% 0% 0%
| fibers: 100; iters: 10000 0% 0% 0% -3%
| fibers: 100; iters: 100000 0% 1% 0% -2%
| fibers: 1000; iters: 100 0% 0% -1% -3%
| fibers: 1000; iters: 1000 0% 0% 0% 1%
| fibers: 1000; iters: 10000 0% 0% 0% 0%
| fibers: 1000; iters: 100000 0% 0% 0% -1%
| fibers: 10000; iters: 100 0% 0% 0% 2%
| fibers: 10000; iters: 1000 0% -1% 0% 2%
| fibers: 10000; iters: 10000 -1% -1% 0% 0%
| fibers: 10000; iters: 100000 -1% 0% -1% -3%
And here is a final one. I personally don't like it (considering my
comments in the previous reply), but *for now* it can be a solution.
These are the changes I pushed to the remote branch[2] and the diff (I
stripped unit test related part) with the benchmark results are below.
================================================================================
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 483ae3ce1..ed6104c8d 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -46,6 +46,8 @@
#if ENABLE_FIBER_TOP
#include <x86intrin.h> /* __rdtscp() */
+extern void lua_on_yield(void);
+
static inline void
clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
{
@@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+
+ if (cord_is_main())
+ lua_on_yield();
+
clock_set_on_csw(caller);
callee->caller = caller;
callee->flags |= FIBER_IS_READY;
@@ -645,6 +651,10 @@ fiber_yield(void)
/** By convention, these triggers must not throw. */
if (! rlist_empty(&caller->on_yield))
trigger_run(&caller->on_yield, NULL);
+
+ if (cord_is_main())
+ lua_on_yield();
+
clock_set_on_csw(caller);
assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
diff --git a/src/lua/utils.c b/src/lua/utils.c
index af114b0a2..49e3c2bf0 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L)
luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
return 0;
}
+
+#include "lj_trace.h"
+void lua_on_yield(void)
+{
+ lj_trace_abort(G(tarantool_L));
+}
================================================================================
* Vanilla -> Patched [extern macro callback] (min, median, mean, max):
| fibers: 10; iters: 100 1% 1% 0% 0%
| fibers: 10; iters: 1000 0% 4% 0% -1%
| fibers: 10; iters: 10000 0% 5% 2% 6%
| fibers: 10; iters: 100000 0% 0% 0% 0%
| fibers: 100; iters: 100 0% -4% -3% -6%
| fibers: 100; iters: 1000 0% 3% 1% 0%
| fibers: 100; iters: 10000 0% 0% 0% -2%
| fibers: 100; iters: 100000 0% 1% 0% -2%
| fibers: 1000; iters: 100 0% 0% 0% -4%
| fibers: 1000; iters: 1000 0% 0% 0% -1%
| fibers: 1000; iters: 10000 0% 0% 0% 0%
| fibers: 1000; iters: 100000 0% 0% 0% -1%
| fibers: 10000; iters: 100 -1% 1% 1% 2%
| fibers: 10000; iters: 1000 -1% 0% 0% 2%
| fibers: 10000; iters: 10000 0% 0% 0% 0%
| fibers: 10000; iters: 100000 0% 0% 0% 0%
There was also an alternative idea by Sergos: introduce a special
parameter to enable such feature by demand.
[1]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e
[2]: https://github.com/tarantool/tarantool/commit/466ce26
--
Best regards,
IM
More information about the Tarantool-patches
mailing list