[Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
Igor Munkin
imun at tarantool.org
Sat Sep 19 18:29:58 MSK 2020
Vlad,
Thanks for your reply! This is a PoC (so the patch was not polished).
I'll address all your comments it in a separate series but let's finish
the discussion here before it.
On 17.09.20, Vladislav Shpilevoy wrote:
> Hi! Thanks for the investigation!
>
> See 5 comments below.
>
> > diff --git a/src/lua/utils.c b/src/lua/utils.c
> > index 0b05d72..7d0962f 100644
> > --- a/src/lua/utils.c
> > +++ b/src/lua/utils.c
> > @@ -1308,3 +1308,7 @@ tarantool_lua_utils_init(struct lua_State *L)
> > luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
> > return 0;
> > }
> > +
> > +void lua_on_yield(void)
> > +{
> > +}
> >
> > ================================================================================
> >
> > * Vanilla -> Patched [extern noop callback] (min, median, mean, max):
> > | fibers: 10; iters: 100 0% 2% 0% 0%
> > | fibers: 10; iters: 1000 1% 3% 1% -1%
> > | fibers: 10; iters: 10000 -1% 0% -1% -3%
> > | fibers: 10; iters: 100000 -2% 0% -1% 0%
> > | fibers: 100; iters: 100 0% -1% 0% -4%
> > | fibers: 100; iters: 1000 0% 1% 0% 0%
> > | fibers: 100; iters: 10000 0% 0% 0% -3%
> > | fibers: 100; iters: 100000 0% 1% 0% -2%
> > | fibers: 1000; iters: 100 0% 0% -1% -3%
> > | fibers: 1000; iters: 1000 0% 0% 0% 1%
> > | fibers: 1000; iters: 10000 0% 0% 0% 0%
> > | fibers: 1000; iters: 100000 0% 0% 0% -1%
> > | fibers: 10000; iters: 100 0% 0% 0% 2%
> > | fibers: 10000; iters: 1000 0% -1% 0% 2%
> > | fibers: 10000; iters: 10000 -1% -1% 0% 0%
> > | fibers: 10000; iters: 100000 -1% 0% -1% -3%
> >
> > And here is a final one. I personally don't like it (considering my
> > comments in the previous reply), but *for now* it can be a solution.
>
> 1. I couldn't find - why don't you like it? It seems to be the fastest
> solution, not affecting the microbench at all, and definitely not
> affecting any more complex scenarios.
Here is the cite from the first one reply[1]:
| > Why can't we call lj_trace_abort() directly?
|
| It's the internal API. Its usage complicates a switch between various
| LuaJIT implementations (we faced several challenges when tried to build
| Tarantool with uJIT). There is a public API to be used here (though in a
| bit hacky way).
|
| >
| > And most importantly, how does it affect perf? New trigger
| > is +1 virtual call on each yield of every Lua fiber and +1
| > execution of non-trival function luaJIT_setmode(). I think
| > it is better to write a micro bench checking how many yields
| > can we do per time unit before and after this patch. From Lua
| > fibers.
You're right it's the fastest solution considering the microbench
results, but we pay a code maintenance trade-off for it. *This* is the
fact I don't like.
I tested a noop workload (there is only yield in Lua bench) so for
non-synthetic workload the numbers might differ less (or might not).
Since we're using our bundled LuaJIT fork, I guess we can postpone these
thoughts for now and simply fix the issue. But I definitely would like
to return to it. Hope we'll have a nicer solution in future.
>
> > ================================================================================
> >
> > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> > index 483ae3ce1..ed6104c8d 100644
> > --- a/src/lib/core/fiber.c
> > +++ b/src/lib/core/fiber.c
> > @@ -46,6 +46,8 @@
> > #if ENABLE_FIBER_TOP
> > #include <x86intrin.h> /* __rdtscp() */
> >
> > +extern void lua_on_yield(void);
> > +
> > static inline void
> > clock_stat_add_delta(struct clock_stat *stat, uint64_t clock_delta)
> > {
> > @@ -416,6 +418,10 @@ fiber_call(struct fiber *callee)
> > /** By convention, these triggers must not throw. */
> > if (! rlist_empty(&caller->on_yield))
> > trigger_run(&caller->on_yield, NULL);
> > +
> > + if (cord_is_main())
> > + lua_on_yield();
>
> 2. Why not inside fiber_call_impl? I thought we need to call
> the abort on each coro_transfer().
<fiber_schedule_list> switches schedule fiber to the ready one. AFAICS,
scheduler doesn't enter Lua world, so no abort is necessary here.
>
> > +
> > clock_set_on_csw(caller);
> > callee->caller = caller;
> > callee->flags |= FIBER_IS_READY;
> > @@ -645,6 +651,10 @@ fiber_yield(void)
> > /** By convention, these triggers must not throw. */
> > if (! rlist_empty(&caller->on_yield))
> > trigger_run(&caller->on_yield, NULL);
> > +
> > + if (cord_is_main())
> > + lua_on_yield();
> > +
> > clock_set_on_csw(caller);
> >
> > assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
> > diff --git a/src/lua/utils.c b/src/lua/utils.c
> > index af114b0a2..49e3c2bf0 100644
> > --- a/src/lua/utils.c
> > +++ b/src/lua/utils.c
> > @@ -1308,3 +1308,9 @@ tarantool_lua_utils_init(struct lua_State *L)
> > luaT_newthread_ref = luaL_ref(L, LUA_REGISTRYINDEX);
> > return 0;
> > }
> > +
> > +#include "lj_trace.h"
>
> 3. Why is the header included here, and not in the beginning?
>
> 4. It is worth adding a comment.
This is a draft, I'll fix your comments in the v2 series.
>
> > +void lua_on_yield(void)
> > +{
> > + lj_trace_abort(G(tarantool_L));
I forgot to add the check whether yield occurs on the running trace.
Here are the corresponding *draft* changes:
================================================================================
diff --git a/src/lua/utils.c b/src/lua/utils.c
index 49e3c2bf0..8d72abdb9 100644
--- a/src/lua/utils.c
+++ b/src/lua/utils.c
@@ -1310,7 +1310,18 @@ tarantool_lua_utils_init(struct lua_State *L)
}
#include "lj_trace.h"
+#include "lj_err.h"
void lua_on_yield(void)
{
- lj_trace_abort(G(tarantool_L));
+ struct global_State *g = G(tarantool_L);
+ /* Forbid Lua world re-entrancy while running the trace */
+ if (unlikely(tvref(g->jit_base))) {
+ struct lua_State *L = fiber()->storage.lua.stack;
+ setstrV(L, L->top++, lj_err_str(L, LJ_ERR_JITCALL));
+ if (g->panic)
+#undef panic
+ g->panic(L);
+ exit(EXIT_FAILURE);
+ }
+ lj_trace_abort(g);
}
================================================================================
Here are benchmark results for the new implementation:
* Vanilla -> Patched [extern macro callback] (min, median, mean, max):
| fibers: 10; iters: 100 0% 1% 0% 0%
| fibers: 10; iters: 1000 2% -2% -3% -5%
| fibers: 10; iters: 10000 -3% -2% -2% -2%
| fibers: 10; iters: 100000 -1% -1% -2% -4%
| fibers: 100; iters: 100 0% 0% -2% -9%
| fibers: 100; iters: 1000 0% 1% 1% 0%
| fibers: 100; iters: 10000 0% 0% 0% 0%
| fibers: 100; iters: 100000 0% 0% 0% -1%
| fibers: 1000; iters: 100 0% 1% 0% -3%
| fibers: 1000; iters: 1000 0% 1% 1% 4%
| fibers: 1000; iters: 10000 0% 0% 0% 2%
| fibers: 1000; iters: 100000 0% 0% 0% 1%
| fibers: 10000; iters: 100 0% 0% 0% 3%
| fibers: 10000; iters: 1000 0% 0% 0% 0%
| fibers: 10000; iters: 10000 0% 0% 0% 4%
| fibers: 10000; iters: 100000 0% 2% 1% 0%
> > +}
> >
> > ================================================================================
> >
> > * Vanilla -> Patched [extern macro callback] (min, median, mean, max):
> > | fibers: 10; iters: 100 1% 1% 0% 0%
> > | fibers: 10; iters: 1000 0% 4% 0% -1%
> > | fibers: 10; iters: 10000 0% 5% 2% 6%
> > | fibers: 10; iters: 100000 0% 0% 0% 0%
> > | fibers: 100; iters: 100 0% -4% -3% -6%
> > | fibers: 100; iters: 1000 0% 3% 1% 0%
> > | fibers: 100; iters: 10000 0% 0% 0% -2%
> > | fibers: 100; iters: 100000 0% 1% 0% -2%
> > | fibers: 1000; iters: 100 0% 0% 0% -4%
> > | fibers: 1000; iters: 1000 0% 0% 0% -1%
> > | fibers: 1000; iters: 10000 0% 0% 0% 0%
> > | fibers: 1000; iters: 100000 0% 0% 0% -1%
> > | fibers: 10000; iters: 100 -1% 1% 1% 2%
> > | fibers: 10000; iters: 1000 -1% 0% 0% 2%
> > | fibers: 10000; iters: 10000 0% 0% 0% 0%
> > | fibers: 10000; iters: 100000 0% 0% 0% 0%
> >
> > There was also an alternative idea by Sergos: introduce a special
> > parameter to enable such feature by demand.
>
> 5. I am not sure it is so necessary - from your bench it looks the overhead
> is almost 0, not counting the rare noise about +-1%.
I agree, but Sergos proposed this way when results were not so good.
--
Best regards,
IM
More information about the Tarantool-patches
mailing list