From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp43.i.mail.ru (smtp43.i.mail.ru [94.100.177.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D14FA445320 for ; Sat, 11 Jul 2020 23:28:34 +0300 (MSK) References: <20200707222436.GG5559@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Sat, 11 Jul 2020 22:28:31 +0200 MIME-Version: 1.0 In-Reply-To: <20200707222436.GG5559@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Munkin Cc: tarantool-patches@dev.tarantool.org Hi! The branch seems to be extremely outdated. Could you please rebase it? On 08/07/2020 00:24, Igor Munkin wrote: > Vlad, > > Thanks for your review! > > On 01.04.20, Vladislav Shpilevoy wrote: >> Hi! Thanks for the patch! >> >> See 7 comments below. >> >>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c >>> index 45bc03787..c79aa7c2b 100644 >>> --- a/src/lua/fiber.c >>> +++ b/src/lua/fiber.c >>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) >>> return result; >>> } >>> >>> +static int >>> +fiber_on_yield(struct trigger *trigger, void *event) >>> +{ >>> + (void) trigger; >>> + (void) event; >>> + >>> + /* >>> + * XXX: According to LuaJIT API reference luaJIT_setmode >> >> 1. What is 'XXX'? > > AFAIK this is just a legacy keyword coming from Sun code convention[1]: > | 10.5.4 Special Comments > | > | Use XXX in a comment to flag something that is bogus but works. Use > | FIXME to flag something that is bogus and broken. > > I guess its meaning is kinda lost nowadays and often is considered close > to FIXME now. Some people suggest to avoid XXX since it's ambiguous but: > * I see it's already used in Tarantool sources: > | imun@tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l | > 10 > * This is the right case to use it: such usage is bogus > but works for the issue purposes. > * This keyword is highlighted by the text editor (e.g. vim). Nice, didn't know what it is. Will use it from now. >> And most importantly, how does it affect perf? New trigger >> is +1 virtual call on each yield of every Lua fiber and +1 >> execution of non-trival function luaJIT_setmode(). I think >> it is better to write a micro bench checking how many yields >> can we do per time unit before and after this patch. From Lua >> fibers. > > I made some benchmarks[2] and here are the numbers for the bleeding > master: > * Vanilla (mean, seconds): > | Total runs: #15 > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582 > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238 > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612 > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667 > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667 > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816 > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667 > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333 > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333 > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667 > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333 > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667 > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667 > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148 > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667 > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667 > * Patched [Lua fiber trigger] (mean, seconds): > | Total runs: #15 > | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667 > | #1700 bench: fibers: 10; iterations: 1000 => 0.004395 > | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667 > | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667 > | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333 > | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888 > | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667 > | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667 > | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667 > | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333 > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128 > | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667 > | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388 > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538 > | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188 > | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667 > * Relative measurements (vanilla -> patched): > | #1700 bench: fibers: 10; iterations: 100 => 1.31% > | #1700 bench: fibers: 10; iterations: 1000 => 3.7% > | #1700 bench: fibers: 10; iterations: 10000 => 3.82% > | #1700 bench: fibers: 10; iterations: 100000 => 5.24% > | #1700 bench: fibers: 100; iterations: 100 => 2.99% > | #1700 bench: fibers: 100; iterations: 1000 => 5.19% > | #1700 bench: fibers: 100; iterations: 10000 => 5% > | #1700 bench: fibers: 100; iterations: 100000 => 5.24% > | #1700 bench: fibers: 1000; iterations: 100 => 6.06% > | #1700 bench: fibers: 1000; iterations: 1000 => 6.38% > | #1700 bench: fibers: 1000; iterations: 10000 => 6.94% > | #1700 bench: fibers: 1000; iterations: 100000 => 6.48% > | #1700 bench: fibers: 10000; iterations: 100 => 0.29% > | #1700 bench: fibers: 10000; iterations: 1000 => 0.03% > | #1700 bench: fibers: 10000; iterations: 10000 => -0.24% > | #1700 bench: fibers: 10000; iterations: 100000 => 0.73% So it easily can be -6%. That is disappointing. We just recently discussed how we should monitor perf, and here it is. >> 5. Also there is a question what will happen, if Lua is accessed from >> a fiber, created not in Lua? That is basically all pooled fibers, >> which serve IProto requests. See struct fiber_pool and its single >> usage place. These fibers are started from C, and may end up executing >> Lua code, if, for example, an IPROTO_CALL arrived. >> >> The same question can be applied to fibers, created from C API: >> >> API_EXPORT struct fiber * >> fiber_new(const char *name, fiber_func f); >> >> These fibers won't have the on_yield trigger, but can access Lua. >> For example, they can run Lua on_replace triggers in a space by making >> an insertion into it. Also they can access Lua API directly. By >> calling things like luaT_call() (see module.h). > > Well, strictly saying this means we need such trigger on each > created/pooled fiber, right? Yes. On each fiber ever created in the main thread. Even applier fiber can start executing on_replace triggers on a space in Lua. >> You may consider adding Jit collection/tracing termination to the >> scheduler fiber. It is just one fiber, and from what I know, every >> fiber after yield switches current thread to the scheduler fiber. This >> also maybe less expensive. We probably won't even notice. > > I've tried to implement such approach but it doesn't work. As you > mentioned on_yield triggers are not executed for scheduler fiber. > Nevertheless, here is the patch (I hope I've not missed something): > > ================================================================================ > > diff --git a/src/lua/init.c b/src/lua/init.c > index a0b2fc775..33ddb894d 100644 > --- a/src/lua/init.c > +++ b/src/lua/init.c > @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body; > */ > struct fiber *script_fiber; > bool start_loop = true; > +static struct trigger lua_on_yield_trigger; > > /* contents of src/lua/ files */ > extern char strict_lua[], > @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L) > return 1; > } > > +static int > +tarantool_lua_on_yield(struct trigger *trigger, void *event) > +{ > + (void) trigger; > + (void) event; > + int status = luaJIT_setmode(tarantool_L, 0, -1); > + assert(status == 0); > + return 0; > +} > + > void > tarantool_lua_init(const char *tarantool_bin, int argc, char **argv) > { > + trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL); > + trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger); > + > lua_State *L = luaL_newstate(); > if (L == NULL) { > panic("failed to initialize Lua"); > > ================================================================================ I never said sched fiber calls on_yield triggers. Because indeed, they could never be installed from anywhere, and there was no need to call them. But I didn't know that either. Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see that coro_transfer() in fiber_yield() gives control to some other fiber, which can be either next in this loop iteration, or the scheduler if it is the end of the event loop iteration. So it won't work. > However, I've implemented your idea in another way: added an auxiliary > function field to cord structure to be called on or > : > > And here are the results: > * Vanilla (mean, seconds): > | Total runs: #15 > | #1700 bench: fibers: 10; iterations: 100 => 0.0015582 > | #1700 bench: fibers: 10; iterations: 1000 => 0.004238 > | #1700 bench: fibers: 10; iterations: 10000 => 0.031612 > | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667 > | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667 > | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816 > | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667 > | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333 > | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333 > | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667 > | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333 > | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667 > | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667 > | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148 > | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667 > | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667 > * Patched [cord callback] (mean, seconds): > | Total runs: #15 > | #1700 bench: fibers: 10; iterations: 100 => 0.0015236 > | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667 > | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333 > | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333 > | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333 > | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333 > | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654 > | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333 > | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333 > | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333 > | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376 > | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667 > | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333 > | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333 > | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958 > | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333 > * Relative measurements (vanilla -> patched): > | #1700 bench: fibers: 10; iterations: 100 => -2.22% > | #1700 bench: fibers: 10; iterations: 1000 => 6.36% > | #1700 bench: fibers: 10; iterations: 10000 => 8.15% > | #1700 bench: fibers: 10; iterations: 100000 => 9.23% > | #1700 bench: fibers: 100; iterations: 100 => 2.83% > | #1700 bench: fibers: 100; iterations: 1000 => 7.78% > | #1700 bench: fibers: 100; iterations: 10000 => 7.6% > | #1700 bench: fibers: 100; iterations: 100000 => 7.81% > | #1700 bench: fibers: 1000; iterations: 100 => 8% > | #1700 bench: fibers: 1000; iterations: 1000 => 8.49% > | #1700 bench: fibers: 1000; iterations: 10000 => 8.82% > | #1700 bench: fibers: 1000; iterations: 100000 => 8.16% > | #1700 bench: fibers: 10000; iterations: 100 => 4.2% > | #1700 bench: fibers: 10000; iterations: 1000 => 4.12% > | #1700 bench: fibers: 10000; iterations: 10000 => 3.89% > | #1700 bench: fibers: 10000; iterations: 100000 => 4.66% > > Looks like this way is slower than the one implemented via triggers. I don't understand why. This is very suspicious. It does not look right. Number of calls is the same. Also you removed one lookup in the rlist of on_yield triggers. Perhaps you did the bench in a different build mode? Or did you run something heavy in background, browser with music/video maybe? Also the reason may be that when you installed the trigger, it worked for Lua fibers only. But when you did it correct, for all fibers, we got the real perf degradation value. Try to create a global int64 counter and increment it on each call. Then compare its final value in the previous solution with on_yield triggers vs the current solution. It would allow to tell whether this new function is called more often in the latest solution. Try to make the function extern instead of virtual. May look ugly, but -9% worst degradation is uglier. I hope the extern hack will help.