From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 5A040445320 for ; Wed, 8 Jul 2020 01:34:52 +0300 (MSK) Date: Wed, 8 Jul 2020 01:24:36 +0300 From: Igor Munkin Message-ID: <20200707222436.GG5559@tarantool.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Vlad, Thanks for your review! On 01.04.20, Vladislav Shpilevoy wrote: > Hi! Thanks for the patch! > > See 7 comments below. > > > diff --git a/src/lua/fiber.c b/src/lua/fiber.c > > index 45bc03787..c79aa7c2b 100644 > > --- a/src/lua/fiber.c > > +++ b/src/lua/fiber.c > > @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap) > > return result; > > } > > > > +static int > > +fiber_on_yield(struct trigger *trigger, void *event) > > +{ > > + (void) trigger; > > + (void) event; > > + > > + /* > > + * XXX: According to LuaJIT API reference luaJIT_setmode > > 1. What is 'XXX'? AFAIK this is just a legacy keyword coming from Sun code convention[1]: | 10.5.4 Special Comments | | Use XXX in a comment to flag something that is bogus but works. Use | FIXME to flag something that is bogus and broken. I guess its meaning is kinda lost nowadays and often is considered close to FIXME now. Some people suggest to avoid XXX since it's ambiguous but: * I see it's already used in Tarantool sources: | imun@tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l | 10 * This is the right case to use it: such usage is bogus but works for the issue purposes. * This keyword is highlighted by the text editor (e.g. vim). > > > + * function returns 0 on failure and 1 on success. Since > > + * this call is aimed to abort the trace recording on > > + * fiber_yield call and prevent already compiled traces > > + * execution, the mode parameter is an invalid value (-1) > > + * so JIT mode change is expected to fail and 0 is > > + * returned. Otherwise non-zero return value signals > > + * trigger_run routine about this trigger failure. > > 2. So correct me if I am wrong - we deliberately fail this > function assuming that its call stops current trace collection/execution, > right? And we pass -1 because there is no other way to stop Yes, with one nit: the call either aborts current trace collection or leads to the platform panic (we have no option here) -- similar to the way we handle FFI sandwiches. > trace collection/execution except this function. Fortunately aborts trace compilation unconditionally and then changes JIT behaviour considering the given mode value. We pass -1 since there is no special value to abort the trace compilation. This value changes nothing, so it's just a hack to abort trace compilation via exported LuaJIT C API. > > Why can't we call lj_trace_abort() directly? It's the internal API. Its usage complicates a switch between various LuaJIT implementations (we faced several challenges when tried to build Tarantool with uJIT). There is a public API to be used here (though in a bit hacky way). > > And most importantly, how does it affect perf? New trigger > is +1 virtual call on each yield of every Lua fiber and +1 > execution of non-trival function luaJIT_setmode(). I think > it is better to write a micro bench checking how many yields > can we do per time unit before and after this patch. From Lua > fibers. I made some benchmarks[2] and here are the numbers for the bleeding master: * Vanilla (mean, seconds): | Total runs: #15 | #1700 bench: fibers: 10; iterations: 100 => 0.0015582 | #1700 bench: fibers: 10; iterations: 1000 => 0.004238 | #1700 bench: fibers: 10; iterations: 10000 => 0.031612 | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667 | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667 | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816 | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667 | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333 | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333 | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667 | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333 | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667 | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667 | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148 | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667 | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667 * Patched [Lua fiber trigger] (mean, seconds): | Total runs: #15 | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667 | #1700 bench: fibers: 10; iterations: 1000 => 0.004395 | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667 | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667 | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333 | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888 | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667 | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667 | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667 | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333 | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128 | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667 | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388 | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538 | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188 | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667 * Relative measurements (vanilla -> patched): | #1700 bench: fibers: 10; iterations: 100 => 1.31% | #1700 bench: fibers: 10; iterations: 1000 => 3.7% | #1700 bench: fibers: 10; iterations: 10000 => 3.82% | #1700 bench: fibers: 10; iterations: 100000 => 5.24% | #1700 bench: fibers: 100; iterations: 100 => 2.99% | #1700 bench: fibers: 100; iterations: 1000 => 5.19% | #1700 bench: fibers: 100; iterations: 10000 => 5% | #1700 bench: fibers: 100; iterations: 100000 => 5.24% | #1700 bench: fibers: 1000; iterations: 100 => 6.06% | #1700 bench: fibers: 1000; iterations: 1000 => 6.38% | #1700 bench: fibers: 1000; iterations: 10000 => 6.94% | #1700 bench: fibers: 1000; iterations: 100000 => 6.48% | #1700 bench: fibers: 10000; iterations: 100 => 0.29% | #1700 bench: fibers: 10000; iterations: 1000 => 0.03% | #1700 bench: fibers: 10000; iterations: 10000 => -0.24% | #1700 bench: fibers: 10000; iterations: 100000 => 0.73% > > > + */ > > + return luaJIT_setmode(tarantool_L, 0, -1); > > 3. In case that function ever returns 1, that breaks our > error return convention, that error means -1 (it will work > in triggers, since they do '!= 0', but anyway). Also when > there is an error, diag is expected to be set. Better do > diag_set(LuajitError) here when this fails. Or even panic() > since it is not supposed to return 1 when the mode is bad. > Or assume it is always 0 (add assert()), and return const 0 > from the trigger. I guess assert with return 0 is enough for this case. > > > +} > > + > > /** > > * Utility function for fiber.create and fiber.new > > */ > > @@ -467,10 +487,18 @@ fiber_create(struct lua_State *L) > > > > struct fiber *f = fiber_new("lua", lua_fiber_run_f); > > if (f == NULL) { > > - luaL_unref(L, LUA_REGISTRYINDEX, coro_ref); > > - luaT_error(L); > > + /* diagnostics is set in fiber_new. */ > > + goto error; > > } > > > > + struct trigger *t = malloc(sizeof(*t)); > > + if (t == NULL) { > > + diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); > > + goto error; > > + } > > + trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free); > > + trigger_add(&f->on_yield, t); > > 4. Yeah, now I understand why did you ask about when triggers are > deleted. In this particular case there is a better option. You can > allocate the trigger right on the stack, in lua_fiber_run_f(). This > is the trampoline for all Lua fibers, regardless how they are > created: via fiber.new(), or fiber.create(). You can add the trigger > before > > result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); > > And remove afterwards. No need to use the heap then, nor move some > error handling code. Nice, thanks! I adjusted the patch, considering this point: ================================================================================ diff --git a/src/lua/fiber.c b/src/lua/fiber.c index c79aa7c2b..f0bcc1ab7 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -437,11 +437,15 @@ static int lua_fiber_run_f(MAYBE_UNUSED va_list ap) { int result; + struct trigger t; struct fiber *f = fiber(); struct lua_State *L = f->storage.lua.stack; int coro_ref = lua_tointeger(L, -1); lua_pop(L, 1); + trigger_create(&t, fiber_on_yield, NULL, NULL); + trigger_add(&f->on_yield, &t); result = luaT_call(L, lua_gettop(L) - 1, LUA_MULTRET); + trigger_clear(&t); /* * If fiber is not joinable * We can unref child stack here, @@ -487,17 +491,9 @@ fiber_create(struct lua_State *L) struct fiber *f = fiber_new("lua", lua_fiber_run_f); if (f == NULL) { - /* diagnostics is set in fiber_new. */ - goto error; - } - - struct trigger *t = malloc(sizeof(*t)); - if (t == NULL) { - diag_set(OutOfMemory, sizeof(*t), "malloc", "t"); - goto error; + luaL_unref(L, LUA_REGISTRYINDEX, coro_ref); + luaT_error(L); } - trigger_create(t, fiber_on_yield, NULL, (trigger_f0) free); - trigger_add(&f->on_yield, t); /* Move the arguments to the new coro */ lua_xmove(L, child_L, lua_gettop(L)); @@ -511,13 +507,6 @@ fiber_create(struct lua_State *L) lua_pushinteger(child_L, coro_ref); f->storage.lua.stack = child_L; return f; - -error: - /* Release the anchored coroutine. */ - luaL_unref(L, LUA_REGISTRYINDEX, coro_ref); - luaT_error(L); - unreachable(); - return NULL; } /** ================================================================================ > > 5. Also there is a question what will happen, if Lua is accessed from > a fiber, created not in Lua? That is basically all pooled fibers, > which serve IProto requests. See struct fiber_pool and its single > usage place. These fibers are started from C, and may end up executing > Lua code, if, for example, an IPROTO_CALL arrived. > > The same question can be applied to fibers, created from C API: > > API_EXPORT struct fiber * > fiber_new(const char *name, fiber_func f); > > These fibers won't have the on_yield trigger, but can access Lua. > For example, they can run Lua on_replace triggers in a space by making > an insertion into it. Also they can access Lua API directly. By > calling things like luaT_call() (see module.h). Well, strictly saying this means we need such trigger on each created/pooled fiber, right? > > You may consider adding Jit collection/tracing termination to the > scheduler fiber. It is just one fiber, and from what I know, every > fiber after yield switches current thread to the scheduler fiber. This > also maybe less expensive. We probably won't even notice. I've tried to implement such approach but it doesn't work. As you mentioned on_yield triggers are not executed for scheduler fiber. Nevertheless, here is the patch (I hope I've not missed something): ================================================================================ diff --git a/src/lua/init.c b/src/lua/init.c index a0b2fc775..33ddb894d 100644 --- a/src/lua/init.c +++ b/src/lua/init.c @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body; */ struct fiber *script_fiber; bool start_loop = true; +static struct trigger lua_on_yield_trigger; /* contents of src/lua/ files */ extern char strict_lua[], @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L) return 1; } +static int +tarantool_lua_on_yield(struct trigger *trigger, void *event) +{ + (void) trigger; + (void) event; + int status = luaJIT_setmode(tarantool_L, 0, -1); + assert(status == 0); + return 0; +} + void tarantool_lua_init(const char *tarantool_bin, int argc, char **argv) { + trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL); + trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger); + lua_State *L = luaL_newstate(); if (L == NULL) { panic("failed to initialize Lua"); ================================================================================ However, I've implemented your idea in another way: added an auxiliary function field to cord structure to be called on or : ================================================================================ diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 5389ce467..c2b339f00 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -408,6 +408,7 @@ fiber_call_impl(struct fiber *callee) void fiber_call(struct fiber *callee) { + struct cord *cord = cord(); struct fiber *caller = fiber(); assert(! (caller->flags & FIBER_IS_READY)); assert(rlist_empty(&callee->state)); @@ -416,6 +417,8 @@ fiber_call(struct fiber *callee) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + if (cord->lua_on_yield) + cord->lua_on_yield(); clock_set_on_csw(caller); callee->caller = caller; callee->flags |= FIBER_IS_READY; @@ -645,6 +648,8 @@ fiber_yield(void) /** By convention, these triggers must not throw. */ if (! rlist_empty(&caller->on_yield)) trigger_run(&caller->on_yield, NULL); + if (cord->lua_on_yield) + cord->lua_on_yield(); clock_set_on_csw(caller); assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); @@ -1365,6 +1370,7 @@ cord_create(struct cord *cord, const char *name) slab_cache_set_thread(&cord()->slabc); cord->id = pthread_self(); + cord->lua_on_yield = NULL; cord->on_exit = NULL; slab_cache_create(&cord->slabc, &runtime); mempool_create(&cord->fiber_mempool, &cord->slabc, diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index cd9346a55..97e2ebb02 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -534,6 +534,7 @@ struct cord_on_exit; * model. */ struct cord { + void (*lua_on_yield)(void); /** The fiber that is currently being executed. */ struct fiber *fiber; struct ev_loop *loop; diff --git a/src/lua/init.c b/src/lua/init.c index a0b2fc775..65e4b5dd0 100644 --- a/src/lua/init.c +++ b/src/lua/init.c @@ -522,6 +522,7 @@ tarantool_lua_init(const char *tarantool_bin, int argc, char **argv) lua_atpanic(L, tarantool_panic_handler); /* clear possible left-overs of init */ lua_settop(L, 0); + cord()->lua_on_yield = tarantool_lua_on_yield; tarantool_L = L; } diff --git a/src/lua/utils.c b/src/lua/utils.c index 0b05d7257..8645d49d9 100644 --- a/src/lua/utils.c +++ b/src/lua/utils.c @@ -37,6 +37,8 @@ #include #include +#include + #include "serializer_opts.h" int luaL_nil_ref = LUA_REFNIL; @@ -1276,3 +1278,9 @@ tarantool_lua_utils_init(struct lua_State *L) assert(CTID_UUID != 0); return 0; } + +void +tarantool_lua_on_yield(void) +{ + (void)luaJIT_setmode(tarantool_L, 0, -1); +} diff --git a/src/lua/utils.h b/src/lua/utils.h index b10754e4a..71dc67fa0 100644 --- a/src/lua/utils.h +++ b/src/lua/utils.h @@ -683,6 +683,9 @@ void luaL_iterator_delete(struct luaL_iterator *it); int tarantool_lua_utils_init(struct lua_State *L); +void +tarantool_lua_on_yield(void); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ ================================================================================ And here are the results: * Vanilla (mean, seconds): | Total runs: #15 | #1700 bench: fibers: 10; iterations: 100 => 0.0015582 | #1700 bench: fibers: 10; iterations: 1000 => 0.004238 | #1700 bench: fibers: 10; iterations: 10000 => 0.031612 | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667 | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667 | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816 | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667 | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333 | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333 | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667 | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333 | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667 | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667 | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148 | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667 | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667 * Patched [cord callback] (mean, seconds): | Total runs: #15 | #1700 bench: fibers: 10; iterations: 100 => 0.0015236 | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667 | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333 | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333 | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333 | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333 | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654 | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333 | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333 | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333 | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376 | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667 | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333 | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333 | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958 | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333 * Relative measurements (vanilla -> patched): | #1700 bench: fibers: 10; iterations: 100 => -2.22% | #1700 bench: fibers: 10; iterations: 1000 => 6.36% | #1700 bench: fibers: 10; iterations: 10000 => 8.15% | #1700 bench: fibers: 10; iterations: 100000 => 9.23% | #1700 bench: fibers: 100; iterations: 100 => 2.83% | #1700 bench: fibers: 100; iterations: 1000 => 7.78% | #1700 bench: fibers: 100; iterations: 10000 => 7.6% | #1700 bench: fibers: 100; iterations: 100000 => 7.81% | #1700 bench: fibers: 1000; iterations: 100 => 8% | #1700 bench: fibers: 1000; iterations: 1000 => 8.49% | #1700 bench: fibers: 1000; iterations: 10000 => 8.82% | #1700 bench: fibers: 1000; iterations: 100000 => 8.16% | #1700 bench: fibers: 10000; iterations: 100 => 4.2% | #1700 bench: fibers: 10000; iterations: 1000 => 4.12% | #1700 bench: fibers: 10000; iterations: 10000 => 3.89% | #1700 bench: fibers: 10000; iterations: 100000 => 4.66% Looks like this way is slower than the one implemented via triggers. > > 6. There is also a question about how does it affect collection of > valid traces? I don't know how Jit does it, so ask just in case. > Won't this make traces never collected? Because every fiber yields > soon or late. All traces already aborts recording (or finalize it preparing for further stitching) on each / call, since they are implemented via Lua C API. The patch just handle the case when fibers are switched underneath an FFI call (see #4491[3]). > > 7. Does this patch allow to yield in ffi? Yep. [1]: https://www.oracle.com/technetwork/java/javase/documentation/codeconventions-137265.html [2]: https://gist.github.com/igormunkin/7e0cf48005bd003ffbdf30181eedb40e [3]: https://github.com/tarantool/tarantool/issues/4491 -- Best regards, IM