From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tarantool-patches@freelists.org, tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption Date: Mon, 28 Oct 2019 19:16:27 +0300 [thread overview] Message-ID: <0E4E89B7-CA95-48D2-A719-F19DF6541CD0@tarantool.org> (raw) In-Reply-To: <4d729ef4-6b21-fa22-b379-dd670ac6fcc5@tarantool.org> Hi! Thanks for your answer! I’ve updated the branch, please take a look at the incremental diff below. > 26 окт. 2019 г., в 21:00, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а): > > Hi! Thanks for the fixes! > >>>> 115: >>>> cpu average (%): 5.4571279061075 >>>> cpu instant (%): 5.9653973440576 >>>> 120: >>>> cpu average (%): 21.944382148464 >>>> cpu instant (%): 23.849021825646 >>>> 116: >>>> cpu average (%): 8.6603872318158 >>>> cpu instant (%): 9.6812031335093 >>>> 119: >>>> cpu average (%): 21.933168871944 >>>> cpu instant (%): 20.007540530351 >>>> cpu misses: 0 >>>> 118: >>>> cpu average (%): 19.342901995963 >>>> cpu instant (%): 16.932679820703 >>>> 117: >>>> cpu average (%): 11.549674814981 >>>> cpu instant (%): 13.128901177161 >>>> ... >>>> ``` >>> >>> 2. This is super cool! >>> >>> Could we give names to predefined fibers? For >>> example, make an alias top.scheduler = top[1]. >>> >>> So as in the console output I would see: >>> >>> --- >>> - scheduler: >>> cpu average (%): 98.230148883023 >>> cpu instant (%): 100 >>> >>> Instead of '1'. Because personally I didn't >>> even know that 1 is always the scheduler, and >>> I would be confused to see a regular fiber >>> consumed 99% of time in a console. >> >> Hmm, what about this? >> >> --- a/src/lua/fiber.c >> +++ b/src/lua/fiber.c >> @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx) >> { >> struct lua_State *L = (struct lua_State *) cb_ctx; >> >> - lua_pushinteger(L, f->fid); >> + /* Index system fibers by name instead of id */ >> + if (f->fid <= FIBER_ID_MAX_RESERVED) { >> + lua_pushstring(L, f->name); >> + } else { >> + lua_pushinteger(L, f->fid); >> + } >> lua_newtable(L); >> >> > > What about always using a name, when it is not empty? When a > system is running, it might be hard to find to which fiber > an ID belongs. Or 'name:id' like in logs (or whatever format > we use in the logs). > > For example I started a vshard recovery fiber, it somewhy > consumes 99% CPU, but I don't know its ID. And I would need > to use vshard.storage.internal.recovery_fiber:id() to find it. > > Up to you, and probably to Kirill since this is public API. > Let’s stick with `id/name` just like in the logs then. >>>> #include "memory.h" >>>> #include "trigger.h" >>>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap); >>>> err;\ >>>> }) >>>> >>>> +#if ENABLE_FIBER_TOP >>>> +/** >>>> + * An action performed each time a context switch happens. >>>> + * Used to count each fiber's processing time. >>>> + * This is a macro rather than a function, since it is used >>>> + * in scheduler too. >>> >>> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler, >>> so why does it prevent making it a function? Caller is always >>> struct fiber *, so you can make it a function taking fiber * >>> parameter, can't you? >> >> You cannot call functions from scheduler cos it doesn’t have a >> stack AFAIU. I can make it an inline function if you want me to. >> > > Does not have a stack? Then how does it work at all? You use > stack in your macros - you declare a couple of variables. Also > intermediate arithmetic results might be saved on the stack. > Yes, you’re right, thanks. It was a strange belief of mine. > Talking of an 'inline' function - if that will work, then it > would look better IMO. At least it won't raise questions why > it is not a function. Will do then. > >>> 5. Am I right, that we need to update CPU consumption each >>> time a context switch happens? >> >> Yep. >> >>> If so, then by logic you >>> would need to call clock_set_on_csw() on each fiber->csw++, >>> correct? >> >> Right. I also call clock_set_on_csw() on each event loop iteration >> end. >> >>> >>> Currently that happens in 2 places: fiber_call_impl() and >>> fiber_yield(). In both these places you already call >>> clock_set_on_csw(): >>> >>> - You always call clock_set_on_csw() right before >>> fiber_call_impl(); >>> >>> - You call clock_set_on_csw() in fiber_yield(). >>> >>> I would then move fiber->cws++ logic into your macros too. >>> So all the context switch and CPU time tracking logic >>> would be in one place. >>> >>> The only problem is that your macros takes caller, but csw++ >>> is done for callee. Probably we could increment it for the >>> caller instead, I don't anything depends on that. >>> >>> Also it would lead to increment of the scheduler's csw >>> counter. Not sure whether it is ok. >> >> Loop iteration end isn’t a context switch, but I still update >> clocks here. > > Yes, you update, but not via clock_set_on_csw(). Am I wrong? > The only fiber which gets updated in the end of the loop via > clock_set_on_csw() is the scheduler. For other fibers you > manually update clock_acc, clock_delta_last, clock_delta in > loop_on_iteration_end(). Yep, I was talking about the sched. > >> I don’t want to increment csw here as a side >> effect, so maybe leave csw increment as it is? >> > > If you are talking about the scheduler's csw counter, then > I don't think it matters. 'Csw' is just statistics. It is > not used for anything. Besides, why a switch to the scheduler > does not count? End of the loop = switch to the scheduler, so > it means its csw should be incremented, shouldn't it? Ok > >>>> ev_idle_init(&cord->idle_event, fiber_schedule_idle); >>>> + >>>> +#if ENABLE_FIBER_TOP >>>> +/* fiber.top <http://fiber.top>() currently works only for the main thread. */ >>>> +if (cord_is_main()) { >>>> +fiber_top_init(); >>>> +fiber_top_enable(); >>> >>> 7. I think, we need to enable top by default only >>> when we are sure that it does not affect performance. >> >> Ok, disabled it by default. User can turn it on with >> fiber.top_enable(). This also resolved the issue I had >> with your swim test. >> >> Looking at perf difference, maybe we can turn it on by default? > > IMO, 10-16% looks quite significant taking into account, > that 99% of users won't use it at all, and others only for > debug of rare exceptional cases. For which they can call > enable(), get top(), and disable() back. I think, we should > ask Kirill. I am against default on. Ok diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index e3fe77e21..4c646b278 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -87,31 +87,32 @@ static int (*fiber_invoke)(fiber_func f, va_list ap); }) #if ENABLE_FIBER_TOP +static bool fiber_top_enabled = false; + /** * An action performed each time a context switch happens. * Used to count each fiber's processing time. - * This is a macro rather than a function, since it is used - * in scheduler too. */ -#define clock_set_on_csw(caller) \ -({ \ - if (fiber_top_enabled) { \ - uint64_t clock; \ - uint32_t cpu_id; \ - clock = __rdtscp(&cpu_id); \ - \ - if (cpu_id == cord()->cpu_id_last) { \ - (caller)->clock_delta += clock - cord()->clock_last; \ - cord()->clock_delta += clock - cord()->clock_last; \ - } else { \ - cord()->cpu_id_last = cpu_id; \ - cord()->cpu_miss_count++; \ - } \ - cord()->clock_last = clock; \ - } \ -}) +static inline void +clock_set_on_csw(struct fiber *caller) +{ + caller->csw++; + if (!fiber_top_enabled) + return; -static bool fiber_top_enabled = false; + uint64_t clock; + uint32_t cpu_id; + clock = __rdtscp(&cpu_id); + + if (cpu_id == cord()->cpu_id_last) { + caller->clock_delta += clock - cord()->clock_last; + cord()->clock_delta += clock - cord()->clock_last; + } else { + cord()->cpu_id_last = cpu_id; + cord()->cpu_miss_count++; + } + cord()->clock_last = clock; +} #else #define clock_set_on_csw(caller) ; @@ -262,7 +263,6 @@ fiber_call_impl(struct fiber *callee) cord->fiber = callee; callee->flags &= ~FIBER_IS_READY; - callee->csw++; ASAN_START_SWITCH_FIBER(asan_state, 1, callee->stack, callee->stack_size); @@ -515,7 +515,6 @@ fiber_yield(void) assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); assert(! (callee->flags & FIBER_IS_DEAD)); cord->fiber = callee; - callee->csw++; callee->flags &= ~FIBER_IS_READY; ASAN_START_SWITCH_FIBER(asan_state, (caller->flags & FIBER_IS_DEAD) == 0, diff --git a/src/lua/fiber.c b/src/lua/fiber.c index c38bd886f..a30290bfa 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -324,13 +324,11 @@ static int lbox_fiber_top_entry(struct fiber *f, void *cb_ctx) { struct lua_State *L = (struct lua_State *) cb_ctx; + char name_buf[64]; + + snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name); + lua_pushstring(L, name_buf); - /* Index system fibers by name instead of id */ - if (f->fid <= FIBER_ID_MAX_RESERVED) { - lua_pushstring(L, f->name); - } else { - lua_pushinteger(L, f->fid); - } lua_newtable(L); lua_pushliteral(L, "average"); diff --git a/test/app/fiber.result b/test/app/fiber.result index e45cd7639..f9904d465 100644 --- a/test/app/fiber.result +++ b/test/app/fiber.result @@ -1478,7 +1478,7 @@ type(a) ... -- scheduler is present in fiber.top() -- and is indexed by name -a["sched"] ~= nil +a["1/sched"] ~= nil --- - true ... @@ -1518,7 +1518,11 @@ sum_avg > 99 and sum_avg < 101 or sum_avg tbl = nil --- ... -f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[fiber.self().id()] end) +f = fiber.new(function()\ + for i = 1,1000 do end\ + fiber.yield()\ + tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\ +end) --- ... while f:status() ~= 'dead' do fiber.sleep(0.01) end diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua index e30aba77e..7f320e318 100644 --- a/test/app/fiber.test.lua +++ b/test/app/fiber.test.lua @@ -638,7 +638,7 @@ a = fiber.top() type(a) -- scheduler is present in fiber.top() -- and is indexed by name -a["sched"] ~= nil +a["1/sched"] ~= nil type(a["cpu misses"]) == 'number' sum_inst = 0 sum_avg = 0 @@ -658,7 +658,11 @@ sum_inst -- not exact due to accumulated integer division errors sum_avg > 99 and sum_avg < 101 or sum_avg tbl = nil -f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[fiber.self().id()] end) +f = fiber.new(function()\ + for i = 1,1000 do end\ + fiber.yield()\ + tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\ +end) while f:status() ~= 'dead' do fiber.sleep(0.01) end tbl["average"] > 0 tbl["instant"] > 0
next prev parent reply other threads:[~2019-10-28 16:16 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <cover.1570546695.git.sergepetrenko@tarantool.org> [not found] ` <eb924da60bc66cbd93d7920f53d6c332d15ffab8.1570546695.git.sergepetrenko@tarantool.org> 2019-10-22 21:18 ` Vladislav Shpilevoy 2019-10-25 15:13 ` Serge Petrenko 2019-10-25 16:19 ` Serge Petrenko 2019-10-26 18:00 ` Vladislav Shpilevoy 2019-10-28 16:16 ` Serge Petrenko [this message] 2019-10-28 23:00 ` Vladislav Shpilevoy 2019-11-01 14:09 ` Serge Petrenko [not found] ` <52e7822bbcd802528d448c15ce9e9fbe4479c73a.1570546695.git.sergepetrenko@tarantool.org> 2019-10-22 21:20 ` [Tarantool-patches] [tarantool-patches] [PATCH v2 1/2] test: modify swim_run_test to break event loop Vladislav Shpilevoy 2019-10-25 15:15 ` Serge Petrenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=0E4E89B7-CA95-48D2-A719-F19DF6541CD0@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=tarantool-patches@freelists.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox