Hi! Thank you for review! I addressed your comments and answered inline. Incremental diff is below. > 23 окт. 2019 г., в 0:18, Vladislav Shpilevoy написал(а): > > Thanks for the patch! > > Sorry, the version in the email is a bit outdated. I > pasted below the actual version from the branch. > > See 9 comments below. > >> lua: add fiber.top() listing fiber cpu consumption >> >> Implement a new function in Lua fiber library: top(). It returns a table >> of alive fibers (including the scheduler). Each table entry has two >> fields: average cpu consumption, which is calculated with exponential >> moving average over event loop iterations, and current cpu consumption, >> which shows fiber's cpu usage over the last event loop iteration. >> The patch relies on CPU timestamp counter to measure each fiber's time >> share. >> >> Closes #2694 >> >> @TarantoolBot document >> Title: fiber: new function `fiber.top()` >> >> `fiber.top()` returns a table of all alive fibers and lists their cpu >> consumption. Let's take a look at the example: >> ``` >> tarantool> fiber.top() >> --- >> - 1: >> cpu average (%): 10.779696493982 >> cpu instant (%): 10.435256168573 > > 1. Have you considered making these option names > one word? 'cpu average (%)' -> 'average'. The > same for instant. My motivation is that it could > simplify wrappers which are going to use the top > to show it in a GUI or something. You actually use > it in the tests. > > It would not hard readability IMO. It is obvious > that top is in percents, and about CPU time. No problem. > >> 115: >> cpu average (%): 5.4571279061075 >> cpu instant (%): 5.9653973440576 >> 120: >> cpu average (%): 21.944382148464 >> cpu instant (%): 23.849021825646 >> 116: >> cpu average (%): 8.6603872318158 >> cpu instant (%): 9.6812031335093 >> 119: >> cpu average (%): 21.933168871944 >> cpu instant (%): 20.007540530351 >> cpu misses: 0 >> 118: >> cpu average (%): 19.342901995963 >> cpu instant (%): 16.932679820703 >> 117: >> cpu average (%): 11.549674814981 >> cpu instant (%): 13.128901177161 >> ... >> ``` > > 2. This is super cool! > > Could we give names to predefined fibers? For > example, make an alias top.scheduler = top[1]. > > So as in the console output I would see: > > --- > - scheduler: > cpu average (%): 98.230148883023 > cpu instant (%): 100 > > Instead of '1'. Because personally I didn't > even know that 1 is always the scheduler, and > I would be confused to see a regular fiber > consumed 99% of time in a console. Hmm, what about this? --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx) { struct lua_State *L = (struct lua_State *) cb_ctx; - lua_pushinteger(L, f->fid); + /* Index system fibers by name instead of id */ + if (f->fid <= FIBER_ID_MAX_RESERVED) { + lua_pushstring(L, f->name); + } else { + lua_pushinteger(L, f->fid); + } lua_newtable(L); > >> In the table above keys are fiber id's (and a single 'cpu misses' key >> which indicates the amount of times tx thread was rescheduled on a >> different cpu core. More on that later). >> The two metrics available for each fiber are: >> 1) cpu instant (per cent), >> which indicates the share of time fiber was executing during the >> previous event loop iteration >> 2) cpu average (per cent), which is calculated as an exponential moving >> average of `cpu instant` values over all previous event loop iterations. >> >> More info on `cpu misses` field returned by `fiber.top()`: >> `cpu misses` indicates the amount of times tx thread detected it was >> rescheduled on a different cpu core during the last event loop >> iteration. >> fiber.top() uses cpu timestamp counter to measure each fiber's execution >> time. However, each cpu core may have its own counter value (you can >> only rely on counter deltas if both measurements were taken on the same >> core, otherwise the delta may even get negative). >> When tx thread is rescheduled to a different cpu core, tarantool just >> assumes cpu delta was zero for the lust measurement. This loweres >> precision of our computations, so the bigger `cpu misses` value the >> lower the precision of fiber.top() results. >> >> Fiber.top() doesn't work on arm architecture at the moment. > > 2. Have you done benchmarks how much top slows down fiber > switching? It should not be hard to measure. Just switching. > Lots of fibers which only yield. And measure loop iterations > per second. Won't work? Or average/median duration of one > loop iteration. Hi! I tested context switches per second. The results are in the issue comments: https://github.com/tarantool/tarantool/issues/2694#issuecomment-546381304 Performance degradation is between 10 and 16 per cent. >> >> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c >> index b813c1739..bea49eb41 100644 >> --- a/src/lib/core/fiber.c >> +++ b/src/lib/core/fiber.c >> @@ -37,11 +37,14 @@ >> #include >> #include >> #include >> - >> #include "assoc.h" > > 3. Sorry, looks like stray diff. Could you > please drop it? > Thanks for noticing! Fixed. >> #include "memory.h" >> #include "trigger.h" >> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap); >> err; \ >> }) >> >> +#if ENABLE_FIBER_TOP >> +/** >> + * An action performed each time a context switch happens. >> + * Used to count each fiber's processing time. >> + * This is a macro rather than a function, since it is used >> + * in scheduler too. > > 4. Hm. Not sure if I understand. Ok, it is used in the scheduler, > so why does it prevent making it a function? Caller is always > struct fiber *, so you can make it a function taking fiber * > parameter, can't you? You cannot call functions from scheduler cos it doesn’t have a stack AFAIU. I can make it an inline function if you want me to. > >> + */ >> +#define clock_set_on_csw(caller) \ >> +({ \ >> + uint64_t clock; \ >> + uint32_t cpu_id; \ >> + clock = __rdtscp(&cpu_id); \ >> + \ >> + if (cpu_id == cord()->cpu_id_last) { \ >> + (caller)->clock_delta += clock - cord()->clock_last; \ >> + cord()->clock_delta += clock - cord()->clock_last; \ >> + } else { \ >> + cord()->cpu_id_last = cpu_id; \ >> + cord()->cpu_miss_count++; \ >> + } \ >> + cord()->clock_last = clock; \ >> +})> @@ -584,6 +617,7 @@ fiber_schedule_list(struct rlist *list) >> } >> last->caller = fiber(); >> assert(fiber() == &cord()->sched); >> + clock_set_on_csw(fiber()); > > 5. Am I right, that we need to update CPU consumption each > time a context switch happens? Yep. > If so, then by logic you > would need to call clock_set_on_csw() on each fiber->csw++, > correct? Right. I also call clock_set_on_csw() on each event loop iteration end. > > Currently that happens in 2 places: fiber_call_impl() and > fiber_yield(). In both these places you already call > clock_set_on_csw(): > > - You always call clock_set_on_csw() right before > fiber_call_impl(); > > - You call clock_set_on_csw() in fiber_yield(). > > I would then move fiber->cws++ logic into your macros too. > So all the context switch and CPU time tracking logic > would be in one place. > > The only problem is that your macros takes caller, but csw++ > is done for callee. Probably we could increment it for the > caller instead, I don't anything depends on that. > > Also it would lead to increment of the scheduler's csw > counter. Not sure whether it is ok. Loop iteration end isn’t a context switch, but I still update clocks here. I don’t want to increment csw here as a side effect, so maybe leave csw increment as it is? > > Like that (I didn't test it): > > ============================================================ > > diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c > index bea49eb41..e9aac99d0 100644 > --- a/src/lib/core/fiber.c > +++ b/src/lib/core/fiber.c > @@ -105,6 +105,7 @@ static int (*fiber_invoke)(fiber_func f, va_list ap); > cord()->cpu_id_last = cpu_id; \ > cord()->cpu_miss_count++; \ > } \ > + caller->csw++; \ > cord()->clock_last = clock; \ > }) > > @@ -258,7 +259,7 @@ fiber_call_impl(struct fiber *callee) > cord->fiber = callee; > > callee->flags &= ~FIBER_IS_READY; > - callee->csw++; > + clock_set_on_csw(caller); > ASAN_START_SWITCH_FIBER(asan_state, 1, > callee->stack, > callee->stack_size); > @@ -277,7 +278,6 @@ fiber_call(struct fiber *callee) > /** By convention, these triggers must not throw. */ > if (! rlist_empty(&caller->on_yield)) > trigger_run(&caller->on_yield, NULL); > - clock_set_on_csw(caller); > callee->caller = caller; > callee->flags |= FIBER_IS_READY; > caller->flags |= FIBER_IS_READY; > @@ -511,7 +511,6 @@ fiber_yield(void) > assert(callee->flags & FIBER_IS_READY || callee == &cord->sched); > assert(! (callee->flags & FIBER_IS_DEAD)); > cord->fiber = callee; > - callee->csw++; > callee->flags &= ~FIBER_IS_READY; > ASAN_START_SWITCH_FIBER(asan_state, > (caller->flags & FIBER_IS_DEAD) == 0, > @@ -617,7 +616,6 @@ fiber_schedule_list(struct rlist *list) > } > last->caller = fiber(); > assert(fiber() == &cord()->sched); > - clock_set_on_csw(fiber()); > fiber_call_impl(first); > } > > ============================================================ > >> fiber_call_impl(first); >> } >> >> @@ -1044,6 +1082,107 @@ fiber_destroy_all(struct cord *cord)> +bool >> +fiber_top_is_enabled() >> +{ >> + return fiber_top_enabled; >> +} >> + >> +inline void >> +fiber_top_enable() > > 6. Please, add 'static' modifier. Looks like > this function is used inside this file only. > The same for fiber_top_disable(). It’s used in lua fiber module. > >> +{ >> + if (!fiber_top_enabled) { >> + ev_prepare_start(cord()->loop, &cord()->prepare_event); >> + ev_check_start(cord()->loop, &cord()->check_event); >> + fiber_top_enabled = true; >> + >> + cord()->clock_acc = 0; >> + cord()->cpu_miss_count_last = 0; >> + cord()->clock_delta_last = 0; >> + } >> +} >> @@ -1077,6 +1216,14 @@ cord_create(struct cord *cord, const char *name) >> ev_async_init(&cord->wakeup_event, fiber_schedule_wakeup); >> >> ev_idle_init(&cord->idle_event, fiber_schedule_idle); >> + >> +#if ENABLE_FIBER_TOP >> + /* fiber.top() currently works only for the main thread. */ >> + if (cord_is_main()) { >> + fiber_top_init(); >> + fiber_top_enable(); > > 7. I think, we need to enable top by default only > when we are sure that it does not affect performance. Ok, disabled it by default. User can turn it on with fiber.top_enable(). This also resolved the issue I had with your swim test. Looking at perf difference, maybe we can turn it on by default? > >> + } >> +#endif /* ENABLE_FIBER_TOP */ >> cord_set_name(name); >> >> #if ENABLE_ASAN >> diff --git a/src/lua/fiber.c b/src/lua/fiber.c >> index 336be60a2..2adff1536 100644 >> --- a/src/lua/fiber.c >> +++ b/src/lua/fiber.c >> @@ -319,6 +319,61 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx) >> + >> +static int >> +lbox_fiber_top(struct lua_State *L) >> +{ >> + if (!fiber_top_is_enabled()) { >> + luaL_error(L, "fiber.top() is disabled. enable it with" > > 8. 'enable' is a first word in the sentence. So probably > it is 'Enable’. Yep. Done. > >> + " fiber.top_enable() first"); >> + } >> + lua_newtable(L); >> + lua_pushliteral(L, "cpu misses"); >> + lua_pushnumber(L, cord()->cpu_miss_count_last); >> + lua_settable(L, -3); >> + >> + lbox_fiber_top_entry(&cord()->sched, L); >> + fiber_stat(lbox_fiber_top_entry, L); >> + >> + return 1; >> +} >> diff --git a/test/app/fiber.result b/test/app/fiber.result >> index 3c6115a33..196fae3b7 100644 >> --- a/test/app/fiber.result >> +++ b/test/app/fiber.result >> @@ -1462,6 +1462,84 @@ fiber.join(fiber.self()) >> --- >> - error: the fiber is not joinable >> ... >> +sum = 0 >> +--- >> +... >> +-- gh-2694 fiber.top() >> +a = fiber.top() >> +--- >> +... >> +-- scheduler is present in fiber.top() >> +a[1] ~= nil >> +--- >> +- true >> +... >> +type(a["cpu misses"]) == 'number' >> +--- >> +- true >> +... >> +sum_inst = 0 >> +--- >> +... >> +sum_avg = 0 >> +--- >> +... >> +test_run:cmd('setopt delimiter ";"') > > 9. I would consider usage of '\' instead of a > delimiter change for such a short piece of > multiline code. But up to you. Just in case > you didn't know about '\', because I didn't > know either until recently. > I didn’t know about ‘\’ as well. Thanks. Here’s the incremental diff: diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index bea49eb41..883fbac36 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -37,6 +37,7 @@ #include #include #include + #include "assoc.h" #include "memory.h" #include "trigger.h" @@ -1221,7 +1222,6 @@ cord_create(struct cord *cord, const char *name) /* fiber.top() currently works only for the main thread. */ if (cord_is_main()) { fiber_top_init(); - fiber_top_enable(); } #endif /* ENABLE_FIBER_TOP */ cord_set_name(name); diff --git a/src/lua/fiber.c b/src/lua/fiber.c index 62f3ccce4..c38bd886f 100644 --- a/src/lua/fiber.c +++ b/src/lua/fiber.c @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx) { struct lua_State *L = (struct lua_State *) cb_ctx; - lua_pushinteger(L, f->fid); + /* Index system fibers by name instead of id */ + if (f->fid <= FIBER_ID_MAX_RESERVED) { + lua_pushstring(L, f->name); + } else { + lua_pushinteger(L, f->fid); + } lua_newtable(L); - lua_pushliteral(L, "cpu average (%)"); + lua_pushliteral(L, "average"); lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100); lua_settable(L, -3); - lua_pushliteral(L, "cpu instant (%)"); + lua_pushliteral(L, "instant"); lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100); lua_settable(L, -3); lua_settable(L, -3); @@ -343,7 +348,7 @@ static int lbox_fiber_top(struct lua_State *L) { if (!fiber_top_is_enabled()) { - luaL_error(L, "fiber.top() is disabled. enable it with" + luaL_error(L, "fiber.top() is disabled. Enable it with" " fiber.top_enable() first"); } lua_newtable(L); diff --git a/test/app/fiber.result b/test/app/fiber.result index 196fae3b7..e45cd7639 100644 --- a/test/app/fiber.result +++ b/test/app/fiber.result @@ -1466,11 +1466,19 @@ sum = 0 --- ... -- gh-2694 fiber.top() +fiber.top_enable() +--- +... a = fiber.top() --- ... +type(a) +--- +- table +... -- scheduler is present in fiber.top() -a[1] ~= nil +-- and is indexed by name +a["sched"] ~= nil --- - true ... @@ -1484,21 +1492,19 @@ sum_inst = 0 sum_avg = 0 --- ... -test_run:cmd('setopt delimiter ";"') ---- -- true -... -for k, v in pairs(a) do - if type(v) == 'table' then - sum_inst = sum_inst + v["cpu instant (%)"] - sum_avg = sum_avg + v["cpu average (%)"] - end -end; +-- update table to make sure +-- a full event loop iteration +-- has ended +a = fiber.top() --- ... -test_run:cmd('setopt delimiter ""'); +for k, v in pairs(a) do\ + if type(v) == 'table' then\ + sum_inst = sum_inst + v["instant"]\ + sum_avg = sum_avg + v["average"]\ + end\ +end --- -- true ... sum_inst --- @@ -1518,11 +1524,11 @@ f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[f while f:status() ~= 'dead' do fiber.sleep(0.01) end --- ... -tbl["cpu average (%)"] > 0 +tbl["average"] > 0 --- - true ... -tbl["cpu instant (%)"] > 0 +tbl["instant"] > 0 --- - true ... @@ -1531,14 +1537,7 @@ fiber.top_disable() ... fiber.top() --- -- error: fiber.top() is disabled. enable it with fiber.top_enable() first -... -fiber.top_enable() ---- -... -type(fiber.top()) ---- -- table +- error: fiber.top() is disabled. Enable it with fiber.top_enable() first ... -- cleanup test_run:cmd("clear filter") diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua index 15507c2c7..e30aba77e 100644 --- a/test/app/fiber.test.lua +++ b/test/app/fiber.test.lua @@ -632,33 +632,39 @@ fiber.join(fiber.self()) sum = 0 -- gh-2694 fiber.top() +fiber.top_enable() + a = fiber.top() +type(a) -- scheduler is present in fiber.top() -a[1] ~= nil +-- and is indexed by name +a["sched"] ~= nil type(a["cpu misses"]) == 'number' sum_inst = 0 sum_avg = 0 -test_run:cmd('setopt delimiter ";"') -for k, v in pairs(a) do - if type(v) == 'table' then - sum_inst = sum_inst + v["cpu instant (%)"] - sum_avg = sum_avg + v["cpu average (%)"] - end -end; -test_run:cmd('setopt delimiter ""'); + +-- update table to make sure +-- a full event loop iteration +-- has ended +a = fiber.top() +for k, v in pairs(a) do\ + if type(v) == 'table' then\ + sum_inst = sum_inst + v["instant"]\ + sum_avg = sum_avg + v["average"]\ + end\ +end + sum_inst -- not exact due to accumulated integer division errors sum_avg > 99 and sum_avg < 101 or sum_avg tbl = nil f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[fiber.self().id()] end) while f:status() ~= 'dead' do fiber.sleep(0.01) end -tbl["cpu average (%)"] > 0 -tbl["cpu instant (%)"] > 0 +tbl["average"] > 0 +tbl["instant"] > 0 fiber.top_disable() fiber.top() -fiber.top_enable() -type(fiber.top()) -- cleanup test_run:cmd("clear filter") -- Serge Petrenko sergepetrenko@tarantool.org