[Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Sat Oct 26 21:00:32 MSK 2019
Hi! Thanks for the fixes!
>>> 115:
>>> cpu average (%): 5.4571279061075
>>> cpu instant (%): 5.9653973440576
>>> 120:
>>> cpu average (%): 21.944382148464
>>> cpu instant (%): 23.849021825646
>>> 116:
>>> cpu average (%): 8.6603872318158
>>> cpu instant (%): 9.6812031335093
>>> 119:
>>> cpu average (%): 21.933168871944
>>> cpu instant (%): 20.007540530351
>>> cpu misses: 0
>>> 118:
>>> cpu average (%): 19.342901995963
>>> cpu instant (%): 16.932679820703
>>> 117:
>>> cpu average (%): 11.549674814981
>>> cpu instant (%): 13.128901177161
>>> ...
>>> ```
>>
>> 2. This is super cool!
>>
>> Could we give names to predefined fibers? For
>> example, make an alias top.scheduler = top[1].
>>
>> So as in the console output I would see:
>>
>> ---
>> - scheduler:
>> cpu average (%): 98.230148883023
>> cpu instant (%): 100
>>
>> Instead of '1'. Because personally I didn't
>> even know that 1 is always the scheduler, and
>> I would be confused to see a regular fiber
>> consumed 99% of time in a console.
>
> Hmm, what about this?
>
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
> {
> struct lua_State *L = (struct lua_State *) cb_ctx;
>
> - lua_pushinteger(L, f->fid);
> + /* Index system fibers by name instead of id */
> + if (f->fid <= FIBER_ID_MAX_RESERVED) {
> + lua_pushstring(L, f->name);
> + } else {
> + lua_pushinteger(L, f->fid);
> + }
> lua_newtable(L);
>
>
What about always using a name, when it is not empty? When a
system is running, it might be hard to find to which fiber
an ID belongs. Or 'name:id' like in logs (or whatever format
we use in the logs).
For example I started a vshard recovery fiber, it somewhy
consumes 99% CPU, but I don't know its ID. And I would need
to use vshard.storage.internal.recovery_fiber:id() to find it.
Up to you, and probably to Kirill since this is public API.
>>> #include "memory.h"
>>> #include "trigger.h"
>>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>>> err;\
>>> })
>>>
>>> +#if ENABLE_FIBER_TOP
>>> +/**
>>> + * An action performed each time a context switch happens.
>>> + * Used to count each fiber's processing time.
>>> + * This is a macro rather than a function, since it is used
>>> + * in scheduler too.
>>
>> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler,
>> so why does it prevent making it a function? Caller is always
>> struct fiber *, so you can make it a function taking fiber *
>> parameter, can't you?
>
> You cannot call functions from scheduler cos it doesn’t have a
> stack AFAIU. I can make it an inline function if you want me to.
>
Does not have a stack? Then how does it work at all? You use
stack in your macros - you declare a couple of variables. Also
intermediate arithmetic results might be saved on the stack.
Talking of an 'inline' function - if that will work, then it
would look better IMO. At least it won't raise questions why
it is not a function.
>> 5. Am I right, that we need to update CPU consumption each
>> time a context switch happens?
>
> Yep.
>
>> If so, then by logic you
>> would need to call clock_set_on_csw() on each fiber->csw++,
>> correct?
>
> Right. I also call clock_set_on_csw() on each event loop iteration
> end.
>
>>
>> Currently that happens in 2 places: fiber_call_impl() and
>> fiber_yield(). In both these places you already call
>> clock_set_on_csw():
>>
>> - You always call clock_set_on_csw() right before
>> fiber_call_impl();
>>
>> - You call clock_set_on_csw() in fiber_yield().
>>
>> I would then move fiber->cws++ logic into your macros too.
>> So all the context switch and CPU time tracking logic
>> would be in one place.
>>
>> The only problem is that your macros takes caller, but csw++
>> is done for callee. Probably we could increment it for the
>> caller instead, I don't anything depends on that.
>>
>> Also it would lead to increment of the scheduler's csw
>> counter. Not sure whether it is ok.
>
> Loop iteration end isn’t a context switch, but I still update
> clocks here.
Yes, you update, but not via clock_set_on_csw(). Am I wrong?
The only fiber which gets updated in the end of the loop via
clock_set_on_csw() is the scheduler. For other fibers you
manually update clock_acc, clock_delta_last, clock_delta in
loop_on_iteration_end().
> I don’t want to increment csw here as a side
> effect, so maybe leave csw increment as it is?
>
If you are talking about the scheduler's csw counter, then
I don't think it matters. 'Csw' is just statistics. It is
not used for anything. Besides, why a switch to the scheduler
does not count? End of the loop = switch to the scheduler, so
it means its csw should be incremented, shouldn't it?
>>> ev_idle_init(&cord->idle_event, fiber_schedule_idle);
>>> +
>>> +#if ENABLE_FIBER_TOP
>>> +/* fiber.top <http://fiber.top>() currently works only for the main thread. */
>>> +if (cord_is_main()) {
>>> +fiber_top_init();
>>> +fiber_top_enable();
>>
>> 7. I think, we need to enable top by default only
>> when we are sure that it does not affect performance.
>
> Ok, disabled it by default. User can turn it on with
> fiber.top_enable(). This also resolved the issue I had
> with your swim test.
>
> Looking at perf difference, maybe we can turn it on by default?
IMO, 10-16% looks quite significant taking into account,
that 99% of users won't use it at all, and others only for
debug of rare exceptional cases. For which they can call
enable(), get top(), and disable() back. I think, we should
ask Kirill. I am against default on.
More information about the Tarantool-patches
mailing list