From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 287D3469719 for ; Sat, 19 Sep 2020 18:40:31 +0300 (MSK) Date: Sat, 19 Sep 2020 18:29:58 +0300 From: Igor Munkin Message-ID: <20200919152958.GN18920@tarantool.org> References: <20200707222436.GG5559@tarantool.org> <20200907203502.GG18920@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Vlad, Thanks for your reply! This is a PoC (so the patch was not polished). I'll address all your comments it in a separate series but let's finish the discussion here before it. On 17.09.20, Vladislav Shpilevoy wrote: > Hi! Thanks for the investigation! > > See 5 comments below. > > > diff --git a/src/lua/utils.c b/src/lua/utils.c > > index 0b05d72..7d0962f 100644 > > --- a/src/lua/utils.c > > +++ b/src/lua/utils.c > > @@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L) > > luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); > > return 0; > > } > > + > > +void lua_on_yield(void) > > +{ > > +} > > > > ================================================================================ > > > > * Vanilla -> Patched [extern noop callback] (min, median, mean, max): > > | fibers: 10; iters: 100 0% 2% 0% 0% > > | fibers: 10; iters: 1000 1% 3% 1% -1% > > | fibers: 10; iters: 10000 -1% 0% -1% -3% > > | fibers: 10; iters: 100000 -2% 0% -1% 0% > > | fibers: 100; iters: 100 0% -1% 0% -4% > > | fibers: 100; iters: 1000 0% 1% 0% 0% > > | fibers: 100; iters: 10000 0% 0% 0% -3% > > | fibers: 100; iters: 100000 0% 1% 0% -2% > > | fibers: 1000; iters: 100 0% 0% -1% -3% > > | fibers: 1000; iters: 1000 0% 0% 0% 1% > > | fibers: 1000; iters: 10000 0% 0% 0% 0% > > | fibers: 1000; iters: 100000 0% 0% 0% -1% > > | fibers: 10000; iters: 100 0% 0% 0% 2% > > | fibers: 10000; iters: 1000 0% -1% 0% 2% > > | fibers: 10000; iters: 10000 -1% -1% 0% 0% > > | fibers: 10000; iters: 100000 -1% 0% -1% -3% > > > > And here is a final one. I personally don't like it (considering my > > comments in the previous reply), but *for now* it can be a solution. > > 1. I couldn't find - why don't you like it? It seems to be the fastest > solution, not affecting the microbench at all, and definitely not > affecting any more complex scenarios. Here is the cite from the first one reply[1]: | > Why can't we call lj_trace_abort() directly? | | It's the internal API. Its usage complicates a switch between various | LuaJIT implementations (we faced several challenges when tried to build | Tarantool with uJIT). There is a public API to be used here (though in a | bit hacky way). | | > | > And most importantly, how does it affect perf? New trigger | > is +1 virtual call on each yield of every Lua fiber and +1 | > execution of non-trival function luaJIT_setmode(). I think | > it is better to write a micro bench checking how many yields | > can we do per time unit before and after this patch. From Lua | > fibers. You're right it's the fastest solution considering the microbench results, but we pay a code maintenance trade-off for it. *This* is the fact I don't like. I tested a noop workload (there is only yield in Lua bench) so for non-synthetic workload the numbers might differ less (or might not). Since we're using our bundled LuaJIT fork, I guess we can postpone these thoughts for now and simply fix the issue. But I definitely would like to return to it. Hope we'll have a nicer solution in future. > > > ================================================================================ > > > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > > index 483ae3ce1..ed6104c8d 100644 > > --- a/src/lib/core/fiber.c > > +++ b/src/lib/core/fiber.c > > @@ -46,6 +46,8 @@ > > #if ENABLE_FIBER_TOP > > #include /* __rdtscp() */ > > > > +extern void lua_on_yield(void); > > + > > static inline void > > clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta) > > { > > @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee) > > /** By convention, these triggers must not throw. */ > > if (! rlist_empty(&caller->on_yield)) > > trigger_run(&caller->on_yield, NULL); > > + > > + if (cord_is_main()) > > + lua_on_yield(); > > 2. Why not inside fiber_call_impl? I thought we need to call > the abort on each coro_transfer(). switches schedule fiber to the ready one. AFAICS, scheduler doesn't enter Lua world, so no abort is necessary here. > > > + > > clock_set_on_csw(caller); > > callee->caller = caller; > > callee->flags |= FIBER_IS_READY; > > @@ -645,6 +651,10 @@ fiber_yield(void) > > /** By convention, these triggers must not throw. */ > > if (! rlist_empty(&caller->on_yield)) > > trigger_run(&caller->on_yield, NULL); > > + > > + if (cord_is_main()) > > + lua_on_yield(); > > + > > clock_set_on_csw(caller); > > > > assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); > > diff --git a/src/lua/utils.c b/src/lua/utils.c > > index af114b0a2..49e3c2bf0 100644 > > --- a/src/lua/utils.c > > +++ b/src/lua/utils.c > > @@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L) > > luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX); > > return 0; > > } > > + > > +#include "lj_trace.h" > > 3. Why is the header included here, and not in the beginning? > > 4. It is worth adding a comment. This is a draft, I'll fix your comments in the v2 series. > > > +void lua_on_yield(void) > > +{ > > + lj_trace_abort(G(tarantool_L)); I forgot to add the check whether yield occurs on the running trace. Here are the corresponding *draft* changes: ================================================================================ diff --git a/src/lua/utils.c b/src/lua/utils.c index 49e3c2bf0..8d72abdb9 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -1310,7 +1310,18 @@ tarantool_lua_utils_init(struct lua_State *L) } #include "lj_trace.h" +#include "lj_err.h" void lua_on_yield(void) { - lj_trace_abort(G(tarantool_L)); + struct global_State *g = G(tarantool_L); + /* Forbid Lua world re-entrancy while running the trace */ + if (unlikely(tvref(g->jit_base))) { + struct lua_State *L = fiber()->storage.lua.stack; + setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL)); + if (g->panic) +#undef panic + g->panic(L); + exit(EXIT_FAILURE); + } + lj_trace_abort(g); } ================================================================================ Here are benchmark results for the new implementation: * Vanilla -> Patched [extern macro callback] (min, median, mean, max): | fibers: 10; iters: 100 0% 1% 0% 0% | fibers: 10; iters: 1000 2% -2% -3% -5% | fibers: 10; iters: 10000 -3% -2% -2% -2% | fibers: 10; iters: 100000 -1% -1% -2% -4% | fibers: 100; iters: 100 0% 0% -2% -9% | fibers: 100; iters: 1000 0% 1% 1% 0% | fibers: 100; iters: 10000 0% 0% 0% 0% | fibers: 100; iters: 100000 0% 0% 0% -1% | fibers: 1000; iters: 100 0% 1% 0% -3% | fibers: 1000; iters: 1000 0% 1% 1% 4% | fibers: 1000; iters: 10000 0% 0% 0% 2% | fibers: 1000; iters: 100000 0% 0% 0% 1% | fibers: 10000; iters: 100 0% 0% 0% 3% | fibers: 10000; iters: 1000 0% 0% 0% 0% | fibers: 10000; iters: 10000 0% 0% 0% 4% | fibers: 10000; iters: 100000 0% 2% 1% 0% > > +} > > > > ================================================================================ > > > > * Vanilla -> Patched [extern macro callback] (min, median, mean, max): > > | fibers: 10; iters: 100 1% 1% 0% 0% > > | fibers: 10; iters: 1000 0% 4% 0% -1% > > | fibers: 10; iters: 10000 0% 5% 2% 6% > > | fibers: 10; iters: 100000 0% 0% 0% 0% > > | fibers: 100; iters: 100 0% -4% -3% -6% > > | fibers: 100; iters: 1000 0% 3% 1% 0% > > | fibers: 100; iters: 10000 0% 0% 0% -2% > > | fibers: 100; iters: 100000 0% 1% 0% -2% > > | fibers: 1000; iters: 100 0% 0% 0% -4% > > | fibers: 1000; iters: 1000 0% 0% 0% -1% > > | fibers: 1000; iters: 10000 0% 0% 0% 0% > > | fibers: 1000; iters: 100000 0% 0% 0% -1% > > | fibers: 10000; iters: 100 -1% 1% 1% 2% > > | fibers: 10000; iters: 1000 -1% 0% 0% 2% > > | fibers: 10000; iters: 10000 0% 0% 0% 0% > > | fibers: 10000; iters: 100000 0% 0% 0% 0% > > > > There was also an alternative idea by Sergos: introduce a special > > parameter to enable such feature by demand. > > 5. I am not sure it is so necessary - from your bench it looks the overhead > is almost 0, not counting the rare noise about +-1%. I agree, but Sergos proposed this way when results were not so good. -- Best regards, IM