From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng1.m.smailru.net (smtpng1.m.smailru.net [94.100.181.251]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 07306469719 for ; Mon, 7 Sep 2020 23:45:32 +0300 (MSK) Date: Mon, 7 Sep 2020 23:35:02 +0300 From: Igor Munkin Message-ID: <20200907203502.GG18920@tarantool.org> References: <20200707222436.GG5559@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Vlad, Sorry for such long reply, I just tried to put it all together. On 11.07.20, Vladislav Shpilevoy wrote: > Hi! The branch seems to be extremely outdated. Could you > please rebase it? I pushed another implementation on the branch. It's a PoC, since I have several linkage issues in tests and I temporarily disabled unit tests. > > On 08/07/2020 00:24, Igor Munkin wrote: > > Vlad, > > > > Thanks for your review! > > > > On 01.04.20, Vladislav Shpilevoy wrote: > >> Hi! Thanks for the patch! > >> > >> See 7 comments below. > >> > >>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c > >>> index 45bc03787..c79aa7c2b 100644 > >>> --- a/src/lua/fiber.c > >>> +++ b/src/lua/fiber.c > >>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) > > >> And most importantly, how does it affect perf? New trigger > >> is +1 virtual call on each yield of every Lua fiber and +1 > >> execution of non-trival function luaJIT_setmode(). I think > >> it is better to write a micro bench checking how many yields > >> can we do per time unit before and after this patch. From Lua > >> fibers. > > > > I made some benchmarks[2] and here are the numbers for the bleeding > > master: > > * Vanilla (mean, seconds): > > | Total runs: #15 > > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582 > > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238 > > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612 > > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667 > > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667 > > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816 > > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667 > > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333 > > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333 > > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667 > > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333 > > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667 > > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667 > > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148 > > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667 > > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667 > > * Patched [Lua fiber trigger] (mean, seconds): > > | Total runs: #15 > > | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667 > > | #1700 bench: fibers: 10; iterations: 1000 => 0.004395 > > | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667 > > | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667 > > | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333 > > | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888 > > | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667 > > | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667 > > | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667 > > | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333 > > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128 > > | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667 > > | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388 > > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538 > > | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188 > > | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667 > > * Relative measurements (vanilla -> patched): > > | #1700 bench: fibers: 10; iterations: 100 => 1.31% > > | #1700 bench: fibers: 10; iterations: 1000 => 3.7% > > | #1700 bench: fibers: 10; iterations: 10000 => 3.82% > > | #1700 bench: fibers: 10; iterations: 100000 => 5.24% > > | #1700 bench: fibers: 100; iterations: 100 => 2.99% > > | #1700 bench: fibers: 100; iterations: 1000 => 5.19% > > | #1700 bench: fibers: 100; iterations: 10000 => 5% > > | #1700 bench: fibers: 100; iterations: 100000 => 5.24% > > | #1700 bench: fibers: 1000; iterations: 100 => 6.06% > > | #1700 bench: fibers: 1000; iterations: 1000 => 6.38% > > | #1700 bench: fibers: 1000; iterations: 10000 => 6.94% > > | #1700 bench: fibers: 1000; iterations: 100000 => 6.48% > > | #1700 bench: fibers: 10000; iterations: 100 => 0.29% > > | #1700 bench: fibers: 10000; iterations: 1000 => 0.03% > > | #1700 bench: fibers: 10000; iterations: 10000 => -0.24% > > | #1700 bench: fibers: 10000; iterations: 100000 => 0.73% > > So it easily can be -6%. That is disappointing. We just recently > discussed how we should monitor perf, and here it is. I updated these benchmarks[1] and re-run them for Release build on dev4 host and results make me frustrating. Here is also the patch, since the remote branch is updated. ================================================================================ diff --git a/src/lua/fiber.c b/src/lua/fiber.c index 45bc037..0efc31f 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -38,6 +38,7 @@ #include #include #include +#include void luaL_testcancel(struct lua_State *L) @@ -433,14 +434,26 @@ lbox_fiber_info(struct lua_State *L) } static int +fiber_on_yield(struct trigger *trigger, void *event) +{ + (void) trigger; + (void) event; + luaJIT_setmode(tarantool_L, 0, -1); +} + +static int lua_fiber_run_f(MAYBE_UNUSED va_list ap) { int result; + struct trigger t; struct fiber *f = fiber(); struct lua_State *L = f->storage.lua.stack; int coro_ref = lua_tointeger(L, -1); lua_pop(L, 1); + trigger_create(&t, fiber_on_yield, NULL, NULL); + trigger_add(&f->on_yield, &t); result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); + trigger_clear(&t); /* * If fiber is not joinable * We can unref child stack here, ================================================================================ * Vanilla -> Patched [Lua fiber trigger] (min, median, mean, max): | Total runs: #15 | fibers: 10; iters: 100 0% 2% 0% 1% | fibers: 10; iters: 1000 4% 9% 6% 0% | fibers: 10; iters: 10000 4% 13% 7% 11% | fibers: 10; iters: 100000 4% 5% 4% 2% | fibers: 100; iters: 100 3% 3% 3% -2% | fibers: 100; iters: 1000 3% 3% 3% 2% | fibers: 100; iters: 10000 5% 4% 4% 2% | fibers: 100; iters: 100000 4% 4% 3% -1% | fibers: 1000; iters: 100 9% 17% 15% 13% | fibers: 1000; iters: 1000 10% 12% 12% 15% | fibers: 1000; iters: 10000 8% 9% 9% 9% | fibers: 1000; iters: 100000 7% 8% 8% 5% | fibers: 10000; iters: 100 4% 5% 5% 3% | fibers: 10000; iters: 1000 5% 5% 4% 3% | fibers: 10000; iters: 10000 4% 4% 3% 2% | fibers: 10000; iters: 100000 5% 5% 4% 5% As we discussed offline, here are also the numbers for the alternative patch implementing a noop trigger. ================================================================================ diff --git a/src/lua/fiber.c b/src/lua/fiber.c index 45bc037..959206f 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -432,15 +432,26 @@ lbox_fiber_info(struct lua_State *L) return 1; } +static int +fiber_on_yield(struct trigger *trigger, void *event) +{ + (void) trigger; + (void) event; +} + static int lua_fiber_run_f(MAYBE_UNUSED va_list ap) { int result; + struct trigger t; struct fiber *f = fiber(); struct lua_State *L = f->storage.lua.stack; int coro_ref = lua_tointeger(L, -1); lua_pop(L, 1); + trigger_create(&t, fiber_on_yield, NULL, NULL); + trigger_add(&f->on_yield, &t); result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); + trigger_clear(&t); /* * If fiber is not joinable * We can unref child stack here, ================================================================================ * Vanilla -> Patched [Lua noop trigger] (min, median, mean, max): | fibers: 10; iters: 100 0% 0% 0% 0% | fibers: 10; iters: 1000 2% 10% 4% -1% | fibers: 10; iters: 10000 2% 1% 0% 0% | fibers: 10; iters: 100000 0% 0% 0% 2% | fibers: 100; iters: 100 1% 0% 1% -1% | fibers: 100; iters: 1000 0% 1% 0% -1% | fibers: 100; iters: 10000 1% 1% 0% -3% | fibers: 100; iters: 100000 1% 2% 1% -2% | fibers: 1000; iters: 100 1% 3% 2% 1% | fibers: 1000; iters: 1000 2% 3% 3% 2% | fibers: 1000; iters: 10000 1% 2% 2% 3% | fibers: 1000; iters: 100000 2% 2% 2% 1% | fibers: 10000; iters: 100 1% 2% 3% 3% | fibers: 10000; iters: 1000 3% 2% 2% 0% | fibers: 10000; iters: 10000 1% 2% 1% 0% | fibers: 10000; iters: 100000 2% 3% 2% 2% This is freaking unbelievable. Nobody can affect those runs, but numbers show unacceptable performance loss. I guess there is no way to measure the approaches more precisely, so I just throw this patch away. > > >> 5. Also there is a question what will happen, if Lua is accessed > >> from a fiber, created not in Lua? That is basically all pooled > >> fibers, which serve IProto requests. See struct fiber_pool and its > >> single usage place. These fibers are started from C, and may end up > >> executing Lua code, if, for example, an IPROTO_CALL arrived. > >> > >> The same question can be applied to fibers, created from C API: > >> > >> API_EXPORT struct fiber * fiber_new(const char *name, > >> fiber_func f); > >> > >> These fibers won't have the on_yield trigger, but can access Lua. > >> For example, they can run Lua on_replace triggers in a space by > >> making an insertion into it. Also they can access Lua API directly. > >> By calling things like luaT_call() (see module.h). > > > > Well, strictly saying this means we need such trigger on each > > created/pooled fiber, right? > > Yes. On each fiber ever created in the main thread. Even applier fiber > can start executing on_replace triggers on a space in Lua. Well, since there is a public API for fiber creation and we can't obligue user to set such trigger, this way even doesn't solve the issue. Let's stop on the approach with extern/virtual handler then. > > >> You may consider adding Jit collection/tracing termination to the > >> scheduler fiber. It is just one fiber, and from what I know, every > >> fiber after yield switches current thread to the scheduler fiber. This > >> also maybe less expensive. We probably won't even notice. > > > > I've tried to implement such approach but it doesn't work. As you > > mentioned on_yield triggers are not executed for scheduler fiber. > > Nevertheless, here is the patch (I hope I've not missed something): > > > > ================================================================================ > > > > diff --git a/src/lua/init.c b/src/lua/init.c > > index a0b2fc775..33ddb894d 100644 > > --- a/src/lua/init.c > > +++ b/src/lua/init.c > > @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body; > > */ > > struct fiber *script_fiber; > > bool start_loop = true; > > +static struct trigger lua_on_yield_trigger; > > > > /* contents of src/lua/ files */ > > extern char strict_lua[], > > @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L) > > return 1; > > } > > > > +static int > > +tarantool_lua_on_yield(struct trigger *trigger, void *event) > > +{ > > + (void) trigger; > > + (void) event; > > + int status = luaJIT_setmode(tarantool_L, 0, -1); > > + assert(status == 0); > > + return 0; > > +} > > + > > void > > tarantool_lua_init(const char *tarantool_bin, int argc, char **argv) > > { > > + trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL); > > + trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger); > > + > > lua_State *L = luaL_newstate(); > > if (L == NULL) { > > panic("failed to initialize Lua"); > > > > ================================================================================ > > I never said sched fiber calls on_yield triggers. Because indeed, they could never > be installed from anywhere, and there was no need to call them. But I didn't know > that either. > > Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see > that coro_transfer() in fiber_yield() gives control to some other fiber, which > can be either next in this loop iteration, or the scheduler if it is the end of > the event loop iteration. So it won't work. > > > However, I've implemented your idea in another way: added an auxiliary > > function field to cord structure to be called on or > > : > > > And here are the results: > > * Vanilla (mean, seconds): > > | Total runs: #15 > > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582 > > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238 > > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612 > > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667 > > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667 > > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816 > > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667 > > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333 > > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333 > > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667 > > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333 > > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667 > > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667 > > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148 > > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667 > > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667 > > * Patched [cord callback] (mean, seconds): > > | Total runs: #15 > > | #1700 bench: fibers: 10; iterations: 100 => 0.0015236 > > | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667 > > | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333 > > | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333 > > | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333 > > | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333 > > | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654 > > | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333 > > | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333 > > | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333 > > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376 > > | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667 > > | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333 > > | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333 > > | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958 > > | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333 > > * Relative measurements (vanilla -> patched): > > | #1700 bench: fibers: 10; iterations: 100 => -2.22% > > | #1700 bench: fibers: 10; iterations: 1000 => 6.36% > > | #1700 bench: fibers: 10; iterations: 10000 => 8.15% > > | #1700 bench: fibers: 10; iterations: 100000 => 9.23% > > | #1700 bench: fibers: 100; iterations: 100 => 2.83% > > | #1700 bench: fibers: 100; iterations: 1000 => 7.78% > > | #1700 bench: fibers: 100; iterations: 10000 => 7.6% > > | #1700 bench: fibers: 100; iterations: 100000 => 7.81% > > | #1700 bench: fibers: 1000; iterations: 100 => 8% > > | #1700 bench: fibers: 1000; iterations: 1000 => 8.49% > > | #1700 bench: fibers: 1000; iterations: 10000 => 8.82% > > | #1700 bench: fibers: 1000; iterations: 100000 => 8.16% > > | #1700 bench: fibers: 10000; iterations: 100 => 4.2% > > | #1700 bench: fibers: 10000; iterations: 1000 => 4.12% > > | #1700 bench: fibers: 10000; iterations: 10000 => 3.89% > > | #1700 bench: fibers: 10000; iterations: 100000 => 4.66% > > > > Looks like this way is slower than the one implemented via triggers. > > I don't understand why. This is very suspicious. It does not look right. > Number of calls is the same. Also you removed one lookup in the rlist of > on_yield triggers. Perhaps you did the bench in a different build mode? > Or did you run something heavy in background, browser with music/video > maybe? The build mode was the same and the only heavy thing in background was heavy metal, but it was streamed from another device. I re-run benchmarks for Release build mode on dev4 host. Here is also the patch. ================================================================================ diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3c..d9a0721 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee) void fiber_call(struct fiber *callee) { + struct cord *cord = cord(); struct fiber *caller = fiber(); assert(! (caller->flags & FIBER_IS_READY)); assert(rlist_empty(&callee->state)); @@ -416,6 +417,8 @@ fiber_call(struct fiber *callee) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + if (cord->lua_on_yield) + cord->lua_on_yield(); clock_set_on_csw(caller); callee->caller = caller; callee->flags |= FIBER_IS_READY; @@ -645,6 +648,8 @@ fiber_yield(void) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + if (cord->lua_on_yield) + cord->lua_on_yield(); clock_set_on_csw(caller); assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); @@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name) slab_cache_set_thread(&cord()->slabc); cord->id = pthread_self(); + cord->lua_on_yield = NULL; cord->on_exit = NULL; slab_cache_create(&cord->slabc, &runtime); mempool_create(&cord->fiber_mempool, &cord->slabc, diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index 16ee9f4..4cbf557 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -540,6 +540,7 @@ struct cord_on_exit; * model. */ struct cord { + void (*lua_on_yield)(void); /** The fiber that is currently being executed. */ struct fiber *fiber; struct ev_loop *loop; diff --git a/src/lua/init.c b/src/lua/init.c index a0b2fc7..65e4b5d 100644 --- a/src/lua/init.c +++ b/src/lua/init.c @@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv) lua_atpanic(L, tarantool_panic_handler); /* clear possible left-overs of init */ lua_settop(L, 0); + cord()->lua_on_yield = tarantool_lua_on_yield; tarantool_L = L; } diff --git a/src/lua/utils.c b/src/lua/utils.c index af114b0..c1885b8 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -37,6 +37,8 @@ #include #include +#include + #include "serializer_opts.h" int luaL_nil_ref = LUA_REFNIL; @@ -1308,3 +1310,9 @@ tarantool_lua_utils_init(struct lua_State *L) luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); return 0; } + +void +tarantool_lua_on_yield(void) +{ + (void)luaJIT_setmode(tarantool_L, 0, -1); +} diff --git a/src/lua/utils.h b/src/lua/utils.h index 7e02a05..cf8fe19 100644 --- a/src/lua/utils.h +++ b/src/lua/utils.h @@ -665,6 +665,9 @@ void luaL_iterator_delete(struct luaL_iterator *it); int tarantool_lua_utils_init(struct lua_State *L); +void +tarantool_lua_on_yield(void); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ ================================================================================ * Vanilla -> Patched [virtual cord callback] (min, median, mean, max): | fibers: 10; iters: 100 2% 3% 1% 1% | fibers: 10; iters: 1000 12% 17% 10% 3% | fibers: 10; iters: 10000 6% 11% 8% 7% | fibers: 10; iters: 100000 5% 6% 7% 10% | fibers: 100; iters: 100 7% 5% 7% 3% | fibers: 100; iters: 1000 10% 13% 12% 13% | fibers: 100; iters: 10000 12% 12% 11% 8% | fibers: 100; iters: 100000 12% 12% 12% 11% | fibers: 1000; iters: 100 14% 14% 13% 8% | fibers: 1000; iters: 1000 16% 16% 16% 13% | fibers: 1000; iters: 10000 16% 16% 16% 14% | fibers: 1000; iters: 100000 16% 16% 16% 14% | fibers: 10000; iters: 100 3% 3% 3% 3% | fibers: 10000; iters: 1000 2% 2% 2% 7% | fibers: 10000; iters: 10000 2% 2% 2% 1% | fibers: 10000; iters: 100000 2% 3% 2% 0% As we discussed offline, here are also the numbers for the alternative patch implementing a noop virtual callback. ================================================================================ diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3c..d9a0721 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee) void fiber_call(struct fiber *callee) { + struct cord *cord = cord(); struct fiber *caller = fiber(); assert(! (caller->flags & FIBER_IS_READY)); assert(rlist_empty(&callee->state)); @@ -416,6 +417,8 @@ fiber_call(struct fiber *callee) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + if (cord->lua_on_yield) + cord->lua_on_yield(); clock_set_on_csw(caller); callee->caller = caller; callee->flags |= FIBER_IS_READY; @@ -645,6 +648,8 @@ fiber_yield(void) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + if (cord->lua_on_yield) + cord->lua_on_yield(); clock_set_on_csw(caller); assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); @@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name) slab_cache_set_thread(&cord()->slabc); cord->id = pthread_self(); + cord->lua_on_yield = NULL; cord->on_exit = NULL; slab_cache_create(&cord->slabc, &runtime); mempool_create(&cord->fiber_mempool, &cord->slabc, diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index fffbe8f..51f2e9c 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -534,6 +534,7 @@ struct cord_on_exit; * model. */ struct cord { + void (*lua_on_yield)(void); /** The fiber that is currently being executed. */ struct fiber *fiber; struct ev_loop *loop; diff --git a/src/lua/init.c b/src/lua/init.c index a0b2fc7..65e4b5d 100644 --- a/src/lua/init.c +++ b/src/lua/init.c @@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv) lua_atpanic(L, tarantool_panic_handler); /* clear possible left-overs of init */ lua_settop(L, 0); + cord()->lua_on_yield = tarantool_lua_on_yield; tarantool_L = L; } diff --git a/src/lua/utils.c b/src/lua/utils.c index 0b05d72..1035b1e 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L) luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); return 0; } + +void tarantool_lua_on_yield(void) +{ +} diff --git a/src/lua/utils.h b/src/lua/utils.h index b10754e..71dc67f 100644 --- a/src/lua/utils.h +++ b/src/lua/utils.h @@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it); int tarantool_lua_utils_init(struct lua_State *L); +void +tarantool_lua_on_yield(void); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ ================================================================================ * Vanilla -> Patched [virtual cord noop callback] (min, median, mean, max): | fibers: 10; iters: 100 3% 3% 2% 1% | fibers: 10; iters: 1000 -6% 7% 2% 1% | fibers: 10; iters: 10000 2% 1% 0% 4% | fibers: 10; iters: 100000 2% 3% 2% 3% | fibers: 100; iters: 100 4% 6% 5% 2% | fibers: 100; iters: 1000 8% 9% 9% 7% | fibers: 100; iters: 10000 10% 10% 8% 6% | fibers: 100; iters: 100000 10% 9% 10% 8% | fibers: 1000; iters: 100 13% 13% 12% 8% | fibers: 1000; iters: 1000 14% 15% 15% 16% | fibers: 1000; iters: 10000 14% 15% 15% 16% | fibers: 1000; iters: 100000 14% 15% 14% 12% | fibers: 10000; iters: 100 0% 1% 2% 3% | fibers: 10000; iters: 1000 1% 1% 1% 0% | fibers: 10000; iters: 10000 1% 2% 2% 4% | fibers: 10000; iters: 100000 3% 4% 3% 4% Well... The numbers are horrible. Even noop ones. I drop this patch too. > > Also the reason may be that when you installed the trigger, it worked > for Lua fibers only. But when you did it correct, for all fibers, we got > the real perf degradation value. Try to create a global int64 counter > and increment it on each call. Then compare its final value in the previous > solution with on_yield triggers vs the current solution. It would allow to > tell whether this new function is called more often in the latest solution. Precisely. The extern/virtual function is called a bit more often: * Patched [Lua fiber trigger] (on_yield handler calls): | fibers: 10; iters: 100 1001 | fibers: 10; iters: 1000 10001 | fibers: 10; iters: 10000 100001 | fibers: 10; iters: 100000 1000001 | fibers: 100; iters: 100 10001 | fibers: 100; iters: 1000 100001 | fibers: 100; iters: 10000 1000001 | fibers: 100; iters: 100000 10000001 | fibers: 1000; iters: 100 100001 | fibers: 1000; iters: 1000 1000001 | fibers: 1000; iters: 10000 10000001 | fibers: 1000; iters: 100000 100000001 | fibers: 10000; iters: 100 1000001 | fibers: 10000; iters: 1000 10000001 | fibers: 10000; iters: 10000 100000001 | fibers: 10000; iters: 100000 1000000001 * Patched [virtual cord callback] (on_yield handler calls): | fibers: 10; iters: 100 1117 | fibers: 10; iters: 1000 11017 | fibers: 10; iters: 10000 110017 | fibers: 10; iters: 100000 1100017 | fibers: 100; iters: 100 10207 | fibers: 100; iters: 1000 101107 | fibers: 100; iters: 10000 1010107 | fibers: 100; iters: 100000 10100107 | fibers: 1000; iters: 100 101107 | fibers: 1000; iters: 1000 1002007 | fibers: 1000; iters: 10000 10011007 | fibers: 1000; iters: 100000 100101007 | fibers: 10000; iters: 100 1010107 | fibers: 10000; iters: 1000 10011007 | fibers: 10000; iters: 10000 100020007 | fibers: 10000; iters: 100000 1000110007 > > Try to make the function extern instead of virtual. May look ugly, > but -9% worst degradation is uglier. I hope the extern hack will help. I made this patch (on the branch). Omitting the issues with unit tests linkage, the results are definitely prettier than ones obtained with virtual cord callback. The diff is below. ================================================================================ diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3c..ed6104c 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -46,6 +46,8 @@ #if ENABLE_FIBER_TOP #include /* __rdtscp() */ +extern void lua_on_yield(void); + static inline void clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta) { @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + + if (cord_is_main()) + lua_on_yield(); + clock_set_on_csw(caller); callee->caller = caller; callee->flags |= FIBER_IS_READY; @@ -645,6 +651,10 @@ fiber_yield(void) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + + if (cord_is_main()) + lua_on_yield(); + clock_set_on_csw(caller); assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); diff --git a/src/lua/utils.c b/src/lua/utils.c index 0b05d72..68f7fb5 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -1308,3 +1308,10 @@ tarantool_lua_utils_init(struct lua_State *L) luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); return 0; } + +#include + +void lua_on_yield(void) +{ + luaJIT_setmode(tarantool_L, 0, -1); +} ================================================================================ * Vanilla -> Patched [extern callback] (min, median, mean, max): | fibers: 10; iters: 100 1% 1% 0% 0% | fibers: 10; iters: 1000 12% 8% 6% 0% | fibers: 10; iters: 10000 6% 9% 7% 4% | fibers: 10; iters: 100000 2% 3% 3% 5% | fibers: 100; iters: 100 1% 1% 0% -5% | fibers: 100; iters: 1000 2% 3% 3% 3% | fibers: 100; iters: 10000 3% 3% 2% -1% | fibers: 100; iters: 100000 3% 3% 4% 3% | fibers: 1000; iters: 100 1% 1% 1% -1% | fibers: 1000; iters: 1000 1% 2% 1% 1% | fibers: 1000; iters: 10000 2% 2% 2% 3% | fibers: 1000; iters: 100000 1% 1% 1% 0% | fibers: 10000; iters: 100 0% 0% 0% 0% | fibers: 10000; iters: 1000 0% 2% 1% 2% | fibers: 10000; iters: 10000 0% 0% 0% 3% | fibers: 10000; iters: 100000 1% 2% 0% 0% As we discussed offline, here are also the numbers for the alternative patch implementing a noop extern callback. ================================================================================ diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3c..ed6104c 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -46,6 +46,8 @@ #if ENABLE_FIBER_TOP #include /* __rdtscp() */ +extern void lua_on_yield(void); + static inline void clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta) { @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + + if (cord_is_main()) + lua_on_yield(); + clock_set_on_csw(caller); callee->caller = caller; callee->flags |= FIBER_IS_READY; @@ -645,6 +651,10 @@ fiber_yield(void) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + + if (cord_is_main()) + lua_on_yield(); + clock_set_on_csw(caller); assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); diff --git a/src/lua/utils.c b/src/lua/utils.c index 0b05d72..7d0962f 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L) luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); return 0; } + +void lua_on_yield(void) +{ +} ================================================================================ * Vanilla -> Patched [extern noop callback] (min, median, mean, max): | fibers: 10; iters: 100 0% 2% 0% 0% | fibers: 10; iters: 1000 1% 3% 1% -1% | fibers: 10; iters: 10000 -1% 0% -1% -3% | fibers: 10; iters: 100000 -2% 0% -1% 0% | fibers: 100; iters: 100 0% -1% 0% -4% | fibers: 100; iters: 1000 0% 1% 0% 0% | fibers: 100; iters: 10000 0% 0% 0% -3% | fibers: 100; iters: 100000 0% 1% 0% -2% | fibers: 1000; iters: 100 0% 0% -1% -3% | fibers: 1000; iters: 1000 0% 0% 0% 1% | fibers: 1000; iters: 10000 0% 0% 0% 0% | fibers: 1000; iters: 100000 0% 0% 0% -1% | fibers: 10000; iters: 100 0% 0% 0% 2% | fibers: 10000; iters: 1000 0% -1% 0% 2% | fibers: 10000; iters: 10000 -1% -1% 0% 0% | fibers: 10000; iters: 100000 -1% 0% -1% -3% And here is a final one. I personally don't like it (considering my comments in the previous reply), but *for now* it can be a solution. These are the changes I pushed to the remote branch[2] and the diff (I stripped unit test related part) with the benchmark results are below. ================================================================================ diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 483ae3ce1..ed6104c8d 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -46,6 +46,8 @@ #if ENABLE_FIBER_TOP #include /* __rdtscp() */ +extern void lua_on_yield(void); + static inline void clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta) { @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + + if (cord_is_main()) + lua_on_yield(); + clock_set_on_csw(caller); callee->caller = caller; callee->flags |= FIBER_IS_READY; @@ -645,6 +651,10 @@ fiber_yield(void) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + + if (cord_is_main()) + lua_on_yield(); + clock_set_on_csw(caller); assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); diff --git a/src/lua/utils.c b/src/lua/utils.c index af114b0a2..49e3c2bf0 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L) luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); return 0; } + +#include "lj_trace.h" +void lua_on_yield(void) +{ + lj_trace_abort(G(tarantool_L)); +} ================================================================================ * Vanilla -> Patched [extern macro callback] (min, median, mean, max): | fibers: 10; iters: 100 1% 1% 0% 0% | fibers: 10; iters: 1000 0% 4% 0% -1% | fibers: 10; iters: 10000 0% 5% 2% 6% | fibers: 10; iters: 100000 0% 0% 0% 0% | fibers: 100; iters: 100 0% -4% -3% -6% | fibers: 100; iters: 1000 0% 3% 1% 0% | fibers: 100; iters: 10000 0% 0% 0% -2% | fibers: 100; iters: 100000 0% 1% 0% -2% | fibers: 1000; iters: 100 0% 0% 0% -4% | fibers: 1000; iters: 1000 0% 0% 0% -1% | fibers: 1000; iters: 10000 0% 0% 0% 0% | fibers: 1000; iters: 100000 0% 0% 0% -1% | fibers: 10000; iters: 100 -1% 1% 1% 2% | fibers: 10000; iters: 1000 -1% 0% 0% 2% | fibers: 10000; iters: 10000 0% 0% 0% 0% | fibers: 10000; iters: 100000 0% 0% 0% 0% There was also an alternative idea by Sergos: introduce a special parameter to enable such feature by demand. [1]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e [2]: https://github.com/tarantool/tarantool/commit/466ce26 -- Best regards, IM