[Tarantool-patches] [PATCH] fiber: abort trace recording on fiber yield
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sat Jul 11 23:28:31 MSK 2020
Hi! The branch seems to be extremely outdated. Could you
please rebase it?
On 08/07/2020 00:24, Igor Munkin wrote:
> Vlad,
>
> Thanks for your review!
>
> On 01.04.20, Vladislav Shpilevoy wrote:
>> Hi! Thanks for the patch!
>>
>> See 7 comments below.
>>
>>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>>> index 45bc03787..c79aa7c2b 100644
>>> --- a/src/lua/fiber.c
>>> +++ b/src/lua/fiber.c
>>> @@ -454,6 +455,25 @@ lua_fiber_run_f(MAYBE_UNUSED va_list ap)
>>> return result;
>>> }
>>>
>>> +static int
>>> +fiber_on_yield(struct trigger *trigger, void *event)
>>> +{
>>> + (void) trigger;
>>> + (void) event;
>>> +
>>> + /*
>>> + * XXX: According to LuaJIT API reference luaJIT_setmode
>>
>> 1. What is 'XXX'?
>
> AFAIK this is just a legacy keyword coming from Sun code convention[1]:
> | 10.5.4 Special Comments
> |
> | Use XXX in a comment to flag something that is bogus but works. Use
> | FIXME to flag something that is bogus and broken.
>
> I guess its meaning is kinda lost nowadays and often is considered close
> to FIXME now. Some people suggest to avoid XXX since it's ambiguous but:
> * I see it's already used in Tarantool sources:
> | imun at tunix : ~/tarantool (master=) $ grep -rnF 'XXX: ' src | wc -l |
> 10
> * This is the right case to use it: such <luaJIT_setmode> usage is bogus
> but works for the issue purposes.
> * This keyword is highlighted by the text editor (e.g. vim).
Nice, didn't know what it is. Will use it from now.
>> And most importantly, how does it affect perf? New trigger
>> is +1 virtual call on each yield of every Lua fiber and +1
>> execution of non-trival function luaJIT_setmode(). I think
>> it is better to write a micro bench checking how many yields
>> can we do per time unit before and after this patch. From Lua
>> fibers.
>
> I made some benchmarks[2] and here are the numbers for the bleeding
> master:
> * Vanilla (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> * Patched [Lua fiber trigger] (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.00157866666666667
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004395
> | #1700 bench: fibers: 10; iterations: 10000 => 0.0328212666666667
> | #1700 bench: fibers: 10; iterations: 100000 => 0.317387866666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00404193333333333
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0284888
> | #1700 bench: fibers: 100; iterations: 10000 => 0.270978066666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.70369026666667
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0304198666666667
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.283783733333333
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.8139128
> | #1700 bench: fibers: 1000; iterations: 100000 => 28.1274792666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.6049388
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9653538
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.6273188
> | #1700 bench: fibers: 10000; iterations: 100000 => 596.194249466667
> * Relative measurements (vanilla -> patched):
> | #1700 bench: fibers: 10; iterations: 100 => 1.31%
> | #1700 bench: fibers: 10; iterations: 1000 => 3.7%
> | #1700 bench: fibers: 10; iterations: 10000 => 3.82%
> | #1700 bench: fibers: 10; iterations: 100000 => 5.24%
> | #1700 bench: fibers: 100; iterations: 100 => 2.99%
> | #1700 bench: fibers: 100; iterations: 1000 => 5.19%
> | #1700 bench: fibers: 100; iterations: 10000 => 5%
> | #1700 bench: fibers: 100; iterations: 100000 => 5.24%
> | #1700 bench: fibers: 1000; iterations: 100 => 6.06%
> | #1700 bench: fibers: 1000; iterations: 1000 => 6.38%
> | #1700 bench: fibers: 1000; iterations: 10000 => 6.94%
> | #1700 bench: fibers: 1000; iterations: 100000 => 6.48%
> | #1700 bench: fibers: 10000; iterations: 100 => 0.29%
> | #1700 bench: fibers: 10000; iterations: 1000 => 0.03%
> | #1700 bench: fibers: 10000; iterations: 10000 => -0.24%
> | #1700 bench: fibers: 10000; iterations: 100000 => 0.73%
So it easily can be -6%. That is disappointing. We just recently
discussed how we should monitor perf, and here it is.
>> 5. Also there is a question what will happen, if Lua is accessed from
>> a fiber, created not in Lua? That is basically all pooled fibers,
>> which serve IProto requests. See struct fiber_pool and its single
>> usage place. These fibers are started from C, and may end up executing
>> Lua code, if, for example, an IPROTO_CALL arrived.
>>
>> The same question can be applied to fibers, created from C API:
>>
>> API_EXPORT struct fiber *
>> fiber_new(const char *name, fiber_func f);
>>
>> These fibers won't have the on_yield trigger, but can access Lua.
>> For example, they can run Lua on_replace triggers in a space by making
>> an insertion into it. Also they can access Lua API directly. By
>> calling things like luaT_call() (see module.h).
>
> Well, strictly saying this means we need such trigger on each
> created/pooled fiber, right?
Yes. On each fiber ever created in the main thread. Even applier fiber
can start executing on_replace triggers on a space in Lua.
>> You may consider adding Jit collection/tracing termination to the
>> scheduler fiber. It is just one fiber, and from what I know, every
>> fiber after yield switches current thread to the scheduler fiber. This
>> also maybe less expensive. We probably won't even notice.
>
> I've tried to implement such approach but it doesn't work. As you
> mentioned on_yield triggers are not executed for scheduler fiber.
> Nevertheless, here is the patch (I hope I've not missed something):
>
> ================================================================================
>
> diff --git a/src/lua/init.c b/src/lua/init.c
> index a0b2fc775..33ddb894d 100644
> --- a/src/lua/init.c
> +++ b/src/lua/init.c
> @@ -79,6 +79,7 @@ struct ibuf *tarantool_lua_ibuf = &tarantool_lua_ibuf_body;
> */
> struct fiber *script_fiber;
> bool start_loop = true;
> +static struct trigger lua_on_yield_trigger;
>
> /* contents of src/lua/ files */
> extern char strict_lua[],
> @@ -432,9 +433,22 @@ luaopen_tarantool(lua_State *L)
> return 1;
> }
>
> +static int
> +tarantool_lua_on_yield(struct trigger *trigger, void *event)
> +{
> + (void) trigger;
> + (void) event;
> + int status = luaJIT_setmode(tarantool_L, 0, -1);
> + assert(status == 0);
> + return 0;
> +}
> +
> void
> tarantool_lua_init(const char *tarantool_bin, int argc, char **argv)
> {
> + trigger_create(&lua_on_yield_trigger, tarantool_lua_on_yield, NULL, NULL);
> + trigger_add(&cord()->sched.on_yield, &lua_on_yield_trigger);
> +
> lua_State *L = luaL_newstate();
> if (L == NULL) {
> panic("failed to initialize Lua");
>
> ================================================================================
I never said sched fiber calls on_yield triggers. Because indeed, they could never
be installed from anywhere, and there was no need to call them. But I didn't know
that either.
Anyway, I am not sure anymore that fibers return to scheduler on each yield. I see
that coro_transfer() in fiber_yield() gives control to some other fiber, which
can be either next in this loop iteration, or the scheduler if it is the end of
the event loop iteration. So it won't work.
> However, I've implemented your idea in another way: added an auxiliary
> function field to cord structure to be called on <fiber_yield> or
> <fiber_call>:
> > And here are the results:
> * Vanilla (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015582
> | #1700 bench: fibers: 10; iterations: 1000 => 0.004238
> | #1700 bench: fibers: 10; iterations: 10000 => 0.031612
> | #1700 bench: fibers: 10; iterations: 100000 => 0.301573066666667
> | #1700 bench: fibers: 100; iterations: 100 => 0.00392426666666667
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0270816
> | #1700 bench: fibers: 100; iterations: 10000 => 0.258050666666667
> | #1700 bench: fibers: 100; iterations: 100000 => 2.56898493333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0286791333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.266762866666667
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.63106633333333
> | #1700 bench: fibers: 1000; iterations: 100000 => 26.4156422666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.603156666666667
> | #1700 bench: fibers: 10000; iterations: 1000 => 5.9630148
> | #1700 bench: fibers: 10000; iterations: 10000 => 59.7718396666667
> | #1700 bench: fibers: 10000; iterations: 100000 => 591.859124866667
> * Patched [cord callback] (mean, seconds):
> | Total runs: #15
> | #1700 bench: fibers: 10; iterations: 100 => 0.0015236
> | #1700 bench: fibers: 10; iterations: 1000 => 0.00450766666666667
> | #1700 bench: fibers: 10; iterations: 10000 => 0.0341899333333333
> | #1700 bench: fibers: 10; iterations: 100000 => 0.329420333333333
> | #1700 bench: fibers: 100; iterations: 100 => 0.00403553333333333
> | #1700 bench: fibers: 100; iterations: 1000 => 0.0291887333333333
> | #1700 bench: fibers: 100; iterations: 10000 => 0.2776654
> | #1700 bench: fibers: 100; iterations: 100000 => 2.76978893333333
> | #1700 bench: fibers: 1000; iterations: 100 => 0.0309739333333333
> | #1700 bench: fibers: 1000; iterations: 1000 => 0.289414733333333
> | #1700 bench: fibers: 1000; iterations: 10000 => 2.8631376
> | #1700 bench: fibers: 1000; iterations: 100000 => 28.5713692666667
> | #1700 bench: fibers: 10000; iterations: 100 => 0.628503733333333
> | #1700 bench: fibers: 10000; iterations: 1000 => 6.20881053333333
> | #1700 bench: fibers: 10000; iterations: 10000 => 62.0973958
> | #1700 bench: fibers: 10000; iterations: 100000 => 619.493991933333
> * Relative measurements (vanilla -> patched):
> | #1700 bench: fibers: 10; iterations: 100 => -2.22%
> | #1700 bench: fibers: 10; iterations: 1000 => 6.36%
> | #1700 bench: fibers: 10; iterations: 10000 => 8.15%
> | #1700 bench: fibers: 10; iterations: 100000 => 9.23%
> | #1700 bench: fibers: 100; iterations: 100 => 2.83%
> | #1700 bench: fibers: 100; iterations: 1000 => 7.78%
> | #1700 bench: fibers: 100; iterations: 10000 => 7.6%
> | #1700 bench: fibers: 100; iterations: 100000 => 7.81%
> | #1700 bench: fibers: 1000; iterations: 100 => 8%
> | #1700 bench: fibers: 1000; iterations: 1000 => 8.49%
> | #1700 bench: fibers: 1000; iterations: 10000 => 8.82%
> | #1700 bench: fibers: 1000; iterations: 100000 => 8.16%
> | #1700 bench: fibers: 10000; iterations: 100 => 4.2%
> | #1700 bench: fibers: 10000; iterations: 1000 => 4.12%
> | #1700 bench: fibers: 10000; iterations: 10000 => 3.89%
> | #1700 bench: fibers: 10000; iterations: 100000 => 4.66%
>
> Looks like this way is slower than the one implemented via triggers.
I don't understand why. This is very suspicious. It does not look right.
Number of calls is the same. Also you removed one lookup in the rlist of
on_yield triggers. Perhaps you did the bench in a different build mode?
Or did you run something heavy in background, browser with music/video
maybe?
Also the reason may be that when you installed the trigger, it worked
for Lua fibers only. But when you did it correct, for all fibers, we got
the real perf degradation value. Try to create a global int64 counter
and increment it on each call. Then compare its final value in the previous
solution with on_yield triggers vs the current solution. It would allow to
tell whether this new function is called more often in the latest solution.
Try to make the function extern instead of virtual. May look ugly,
but -9% worst degradation is uglier. I hope the extern hack will help.
More information about the Tarantool-patches
mailing list