From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 26CBC43D679 for ; Sat, 26 Oct 2019 20:55:09 +0300 (MSK) References: <5b3c1fae-1c49-4b13-3a0e-0774c4add3fb@tarantool.org> From: Vladislav Shpilevoy Message-ID: <4d729ef4-6b21-fa22-b379-dd670ac6fcc5@tarantool.org> Date: Sat, 26 Oct 2019 20:00:32 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption List-Id: Tarantool development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko Cc: tarantool-patches@freelists.org, tarantool-patches@dev.tarantool.org Hi! Thanks for the fixes! >>>      115: >>>        cpu average (%): 5.4571279061075 >>>        cpu instant (%): 5.9653973440576 >>>      120: >>>        cpu average (%): 21.944382148464 >>>        cpu instant (%): 23.849021825646 >>>      116: >>>        cpu average (%): 8.6603872318158 >>>        cpu instant (%): 9.6812031335093 >>>      119: >>>        cpu average (%): 21.933168871944 >>>        cpu instant (%): 20.007540530351 >>>      cpu misses: 0 >>>      118: >>>        cpu average (%): 19.342901995963 >>>        cpu instant (%): 16.932679820703 >>>      117: >>>        cpu average (%): 11.549674814981 >>>        cpu instant (%): 13.128901177161 >>>    ... >>>    ``` >> >> 2. This is super cool! >> >> Could we give names to predefined fibers? For >> example, make an alias top.scheduler = top[1]. >> >> So as in the console output I would see: >> >>    --- >>    - scheduler: >>        cpu average (%): 98.230148883023 >>        cpu instant (%): 100 >> >> Instead of '1'. Because personally I didn't >> even know that 1 is always the scheduler, and >> I would be confused to see a regular fiber >> consumed 99% of time in a console. > > Hmm, what about this? > > --- a/src/lua/fiber.c > +++ b/src/lua/fiber.c > @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx) >  { >         struct lua_State *L = (struct lua_State *) cb_ctx; >   > -       lua_pushinteger(L, f->fid); > +       /* Index system fibers by name instead of id */ > +       if (f->fid <= FIBER_ID_MAX_RESERVED) { > +               lua_pushstring(L, f->name); > +       } else { > +               lua_pushinteger(L, f->fid); > +       } >         lua_newtable(L); > > What about always using a name, when it is not empty? When a system is running, it might be hard to find to which fiber an ID belongs. Or 'name:id' like in logs (or whatever format we use in the logs). For example I started a vshard recovery fiber, it somewhy consumes 99% CPU, but I don't know its ID. And I would need to use vshard.storage.internal.recovery_fiber:id() to find it. Up to you, and probably to Kirill since this is public API. >>> #include "memory.h" >>> #include "trigger.h" >>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap); >>> err;\ >>> }) >>> >>> +#if ENABLE_FIBER_TOP >>> +/** >>> + * An action performed each time a context switch happens. >>> + * Used to count each fiber's processing time. >>> + * This is a macro rather than a function, since it is used >>> + * in scheduler too. >> >> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler, >> so why does it prevent making it a function? Caller is always >> struct fiber *, so you can make it a function taking fiber * >> parameter, can't you? > > You cannot call functions from scheduler cos it doesn’t have a > stack AFAIU. I can make it an inline function if you want me to. > Does not have a stack? Then how does it work at all? You use stack in your macros - you declare a couple of variables. Also intermediate arithmetic results might be saved on the stack. Talking of an 'inline' function - if that will work, then it would look better IMO. At least it won't raise questions why it is not a function. >> 5. Am I right, that we need to update CPU consumption each >> time a context switch happens? > > Yep. > >> If so, then by logic you >> would need to call clock_set_on_csw() on each fiber->csw++, >> correct? > > Right. I also call clock_set_on_csw() on each event loop iteration > end. > >> >> Currently that happens in 2 places: fiber_call_impl() and >> fiber_yield(). In both these places you already call >> clock_set_on_csw(): >> >>    - You always call clock_set_on_csw() right before >>      fiber_call_impl(); >> >>    - You call clock_set_on_csw() in fiber_yield(). >> >> I would then move fiber->cws++ logic into your macros too. >> So all the context switch and CPU time tracking logic >> would be in one place. >> >> The only problem is that your macros takes caller, but csw++ >> is done for callee. Probably we could increment it for the >> caller instead, I don't anything depends on that. >> >> Also it would lead to increment of the scheduler's csw >> counter. Not sure whether it is ok. > > Loop iteration end isn’t a context switch, but I still update > clocks here. Yes, you update, but not via clock_set_on_csw(). Am I wrong? The only fiber which gets updated in the end of the loop via clock_set_on_csw() is the scheduler. For other fibers you manually update clock_acc, clock_delta_last, clock_delta in loop_on_iteration_end(). > I don’t want to increment csw here as a side > effect, so maybe leave csw increment as it is? > If you are talking about the scheduler's csw counter, then I don't think it matters. 'Csw' is just statistics. It is not used for anything. Besides, why a switch to the scheduler does not count? End of the loop = switch to the scheduler, so it means its csw should be incremented, shouldn't it? >>> ev_idle_init(&cord->idle_event, fiber_schedule_idle); >>> + >>> +#if ENABLE_FIBER_TOP >>> +/* fiber.top () currently works only for the main thread. */ >>> +if (cord_is_main()) { >>> +fiber_top_init(); >>> +fiber_top_enable(); >> >> 7. I think, we need to enable top by default only >> when we are sure that it does not affect performance. > > Ok, disabled it by default. User can turn it on with > fiber.top_enable(). This also resolved the issue I had > with your swim test. > > Looking at perf difference, maybe we can turn it on by default? IMO, 10-16% looks quite significant taking into account, that 99% of users won't use it at all, and others only for debug of rare exceptional cases. For which they can call enable(), get top(), and disable() back. I think, we should ask Kirill. I am against default on.