Tarantool development patches archive
 help / color / mirror / Atom feed
* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
       [not found] ` <eb924da60bc66cbd93d7920f53d6c332d15ffab8.1570546695.git.sergepetrenko@tarantool.org>
@ 2019-10-22 21:18   ` Vladislav Shpilevoy
  2019-10-25 15:13     ` Serge Petrenko
  0 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2019-10-22 21:18 UTC (permalink / raw)
  To: tarantool-patches, Serge Petrenko, georgy, tarantool-patches

Thanks for the patch!

Sorry, the version in the email is a bit outdated. I
pasted below the actual version from the branch.

See 9 comments below.

>     lua: add fiber.top() listing fiber cpu consumption
>     
>     Implement a new function in Lua fiber library: top(). It returns a table
>     of alive fibers (including the scheduler). Each table entry has two
>     fields: average cpu consumption, which is calculated with exponential
>     moving average over event loop iterations, and current cpu consumption,
>     which shows fiber's cpu usage over the last event loop iteration.
>     The patch relies on CPU timestamp counter to measure each fiber's time
>     share.
>     
>     Closes #2694
>     
>     @TarantoolBot document
>     Title: fiber: new function `fiber.top()`
>     
>     `fiber.top()` returns a table of all alive fibers and lists their cpu
>     consumption. Let's take a look at the example:
>     ```
>     tarantool> fiber.top()
>     ---
>     - 1:
>         cpu average (%): 10.779696493982
>         cpu instant (%): 10.435256168573

1. Have you considered making these option names
one word? 'cpu average (%)' -> 'average'. The
same for instant. My motivation is that it could
simplify wrappers which are going to use the top
to show it in a GUI or something. You actually use
it in the tests.

It would not hard readability IMO. It is obvious
that top is in percents, and about CPU time.

>       115:
>         cpu average (%): 5.4571279061075
>         cpu instant (%): 5.9653973440576
>       120:
>         cpu average (%): 21.944382148464
>         cpu instant (%): 23.849021825646
>       116:
>         cpu average (%): 8.6603872318158
>         cpu instant (%): 9.6812031335093
>       119:
>         cpu average (%): 21.933168871944
>         cpu instant (%): 20.007540530351
>       cpu misses: 0
>       118:
>         cpu average (%): 19.342901995963
>         cpu instant (%): 16.932679820703
>       117:
>         cpu average (%): 11.549674814981
>         cpu instant (%): 13.128901177161
>     ...
>     ```

2. This is super cool!

Could we give names to predefined fibers? For
example, make an alias top.scheduler = top[1].

So as in the console output I would see:

    ---
    - scheduler:
        cpu average (%): 98.230148883023
        cpu instant (%): 100

Instead of '1'. Because personally I didn't
even know that 1 is always the scheduler, and
I would be confused to see a regular fiber
consumed 99% of time in a console.

>     In the table above keys are fiber id's (and a single 'cpu misses' key
>     which indicates the amount of times tx thread was rescheduled on a
>     different cpu core. More on that later).
>     The two metrics available for each fiber are:
>     1) cpu instant (per cent),
>     which indicates the share of time fiber was executing during the
>     previous event loop iteration
>     2) cpu average (per cent), which is calculated as an exponential moving
>     average of `cpu instant` values over all previous event loop iterations.
>     
>     More info on `cpu misses` field returned by `fiber.top()`:
>     `cpu misses` indicates the amount of times tx thread detected it was
>     rescheduled on a different cpu core during the last event loop
>     iteration.
>     fiber.top() uses cpu timestamp counter to measure each fiber's execution
>     time. However, each cpu core may have its own counter value (you can
>     only rely on counter deltas if both measurements were taken on the same
>     core, otherwise the delta may even get negative).
>     When tx thread is rescheduled to a different cpu core, tarantool just
>     assumes cpu delta was zero for the lust measurement. This loweres
>     precision of our computations, so the bigger `cpu misses` value the
>     lower the precision of fiber.top() results.
>     
>     Fiber.top() doesn't work on arm architecture at the moment.

2. Have you done benchmarks how much top slows down fiber
switching? It should not be hard to measure. Just switching.
Lots of fibers which only yield. And measure loop iterations
per second. Won't work? Or average/median duration of one
loop iteration.

> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index b813c1739..bea49eb41 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -37,11 +37,14 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <pmatomic.h>
> -
>  #include "assoc.h"

3. Sorry, looks like stray diff. Could you
please drop it?

>  #include "memory.h"
>  #include "trigger.h"
> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>  	err;								\
>  })
>  
> +#if ENABLE_FIBER_TOP
> +/**
> + * An action performed each time a context switch happens.
> + * Used to count each fiber's processing time.
> + * This is a macro rather than a function, since it is used
> + * in scheduler too.

4. Hm. Not sure if I understand. Ok, it is used in the scheduler,
so why does it prevent making it a function? Caller is always
struct fiber *, so you can make it a function taking fiber *
parameter, can't you?

> + */
> +#define clock_set_on_csw(caller)					\
> +({									\
> +	uint64_t clock;							\
> +	uint32_t cpu_id;		 				\
> +	clock = __rdtscp(&cpu_id);					\
> +									\
> +	if (cpu_id == cord()->cpu_id_last) {				\
> +		(caller)->clock_delta += clock - cord()->clock_last;	\
> +		cord()->clock_delta += clock - cord()->clock_last;	\
> +	} else {							\
> +		cord()->cpu_id_last = cpu_id;				\
> +		cord()->cpu_miss_count++;				\
> +	}								\
> +	cord()->clock_last = clock;					\
> +})> @@ -584,6 +617,7 @@ fiber_schedule_list(struct rlist *list)
>  	}
>  	last->caller = fiber();
>  	assert(fiber() == &cord()->sched);
> +	clock_set_on_csw(fiber());

5. Am I right, that we need to update CPU consumption each
time a context switch happens? If so, then by logic you
would need to call clock_set_on_csw() on each fiber->csw++,
correct?

Currently that happens in 2 places: fiber_call_impl() and
fiber_yield(). In both these places you already call
clock_set_on_csw():

    - You always call clock_set_on_csw() right before
      fiber_call_impl();

    - You call clock_set_on_csw() in fiber_yield().

I would then move fiber->cws++ logic into your macros too.
So all the context switch and CPU time tracking logic
would be in one place.

The only problem is that your macros takes caller, but csw++
is done for callee. Probably we could increment it for the
caller instead, I don't anything depends on that.

Also it would lead to increment of the scheduler's csw
counter. Not sure whether it is ok.

Like that (I didn't test it):

============================================================

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index bea49eb41..e9aac99d0 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -105,6 +105,7 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
 		cord()->cpu_id_last = cpu_id;				\
 		cord()->cpu_miss_count++;				\
 	}								\
+	caller->csw++;							\
 	cord()->clock_last = clock;					\
 })
 
@@ -258,7 +259,7 @@ fiber_call_impl(struct fiber *callee)
 	cord->fiber = callee;
 
 	callee->flags &= ~FIBER_IS_READY;
-	callee->csw++;
+	clock_set_on_csw(caller);
 	ASAN_START_SWITCH_FIBER(asan_state, 1,
 				callee->stack,
 				callee->stack_size);
@@ -277,7 +278,6 @@ fiber_call(struct fiber *callee)
 	/** By convention, these triggers must not throw. */
 	if (! rlist_empty(&caller->on_yield))
 		trigger_run(&caller->on_yield, NULL);
-	clock_set_on_csw(caller);
 	callee->caller = caller;
 	callee->flags |= FIBER_IS_READY;
 	caller->flags |= FIBER_IS_READY;
@@ -511,7 +511,6 @@ fiber_yield(void)
 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
 	assert(! (callee->flags & FIBER_IS_DEAD));
 	cord->fiber = callee;
-	callee->csw++;
 	callee->flags &= ~FIBER_IS_READY;
 	ASAN_START_SWITCH_FIBER(asan_state,
 				(caller->flags & FIBER_IS_DEAD) == 0,
@@ -617,7 +616,6 @@ fiber_schedule_list(struct rlist *list)
 	}
 	last->caller = fiber();
 	assert(fiber() == &cord()->sched);
-	clock_set_on_csw(fiber());
 	fiber_call_impl(first);
 }
 
============================================================

>  	fiber_call_impl(first);
>  }
>  
> @@ -1044,6 +1082,107 @@ fiber_destroy_all(struct cord *cord)> +bool
> +fiber_top_is_enabled()
> +{
> +	return fiber_top_enabled;
> +}
> +
> +inline void
> +fiber_top_enable()

6. Please, add 'static' modifier. Looks like
this function is used inside this file only.
The same for fiber_top_disable().

> +{
> +	if (!fiber_top_enabled) {
> +		ev_prepare_start(cord()->loop, &cord()->prepare_event);
> +		ev_check_start(cord()->loop, &cord()->check_event);
> +		fiber_top_enabled = true;
> +
> +		cord()->clock_acc = 0;
> +		cord()->cpu_miss_count_last = 0;
> +		cord()->clock_delta_last = 0;
> +	}
> +}
> @@ -1077,6 +1216,14 @@ cord_create(struct cord *cord, const char *name)
>  	ev_async_init(&cord->wakeup_event, fiber_schedule_wakeup);
>  
>  	ev_idle_init(&cord->idle_event, fiber_schedule_idle);
> +
> +#if ENABLE_FIBER_TOP
> +	/* fiber.top() currently works only for the main thread. */
> +	if (cord_is_main()) {
> +		fiber_top_init();
> +		fiber_top_enable();

7. I think, we need to enable top by default only
when we are sure that it does not affect performance.

> +	}
> +#endif /* ENABLE_FIBER_TOP */
>  	cord_set_name(name);
>  
>  #if ENABLE_ASAN
> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index 336be60a2..2adff1536 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -319,6 +319,61 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
> +
> +static int
> +lbox_fiber_top(struct lua_State *L)
> +{
> +	if (!fiber_top_is_enabled()) {
> +		luaL_error(L, "fiber.top() is disabled. enable it with"

8. 'enable' is a first word in the sentence. So probably
it is 'Enable'.

> +			      " fiber.top_enable() first");
> +	}
> +	lua_newtable(L);
> +	lua_pushliteral(L, "cpu misses");
> +	lua_pushnumber(L, cord()->cpu_miss_count_last);
> +	lua_settable(L, -3);
> +
> +	lbox_fiber_top_entry(&cord()->sched, L);
> +	fiber_stat(lbox_fiber_top_entry, L);
> +
> +	return 1;
> +}
> diff --git a/test/app/fiber.result b/test/app/fiber.result
> index 3c6115a33..196fae3b7 100644
> --- a/test/app/fiber.result
> +++ b/test/app/fiber.result
> @@ -1462,6 +1462,84 @@ fiber.join(fiber.self())
>  ---
>  - error: the fiber is not joinable
>  ...
> +sum = 0
> +---
> +...
> +-- gh-2694 fiber.top()
> +a = fiber.top()
> +---
> +...
> +-- scheduler is present in fiber.top()
> +a[1] ~= nil
> +---
> +- true
> +...
> +type(a["cpu misses"]) == 'number'
> +---
> +- true
> +...
> +sum_inst = 0
> +---
> +...
> +sum_avg = 0
> +---
> +...
> +test_run:cmd('setopt delimiter ";"')

9. I would consider usage of '\' instead of a
delimiter change for such a short piece of
multiline code. But up to you. Just in case
you didn't know about '\', because I didn't
know either until recently.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 1/2] test: modify swim_run_test to break event loop
       [not found] ` <52e7822bbcd802528d448c15ce9e9fbe4479c73a.1570546695.git.sergepetrenko@tarantool.org>
@ 2019-10-22 21:20   ` Vladislav Shpilevoy
  2019-10-25 15:15     ` Serge Petrenko
  0 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2019-10-22 21:20 UTC (permalink / raw)
  To: tarantool-patches, Serge Petrenko, georgy, tarantool-patches

Hi! Thanks for the patch!

On 08/10/2019 17:03, Serge Petrenko wrote:
> Add a wrapper function around swim test passed to swim_run_test which
> breaks from default cord event loop. Otherwise we will hang indefinitely
> in ev_run in case there is a started watcher.

But why? The loop runs only until there are fibers to run.
By design of these tests, after they are done, there should
be not a single fiber nor a watcher, and the loop is finished
without a break.

This is why the tests pass without this patch. So what is
a purpose of this commit? It does not change anything except
that it may mask some errors. Because if the SWIM unit tests
would hang without a break, then something is wrong. I caught
several bugs in SWIM via this.

If fiber.top() somehow interferes, then probably you could
turn it off for this test? I just don't want to miss possible
SWIM bugs.

> 
> Found during work on #2694
> ---
>  test/unit/swim_test_utils.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
  2019-10-22 21:18   ` [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption Vladislav Shpilevoy
@ 2019-10-25 15:13     ` Serge Petrenko
  2019-10-25 16:19       ` Serge Petrenko
  2019-10-26 18:00       ` Vladislav Shpilevoy
  0 siblings, 2 replies; 9+ messages in thread
From: Serge Petrenko @ 2019-10-25 15:13 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 18490 bytes --]

Hi! Thank you for review!
I addressed your comments and answered inline.
Incremental diff is below.

> 23 окт. 2019 г., в 0:18, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Thanks for the patch!
> 
> Sorry, the version in the email is a bit outdated. I
> pasted below the actual version from the branch.
> 
> See 9 comments below.
> 
>>    lua: add fiber.top() listing fiber cpu consumption
>> 
>>    Implement a new function in Lua fiber library: top(). It returns a table
>>    of alive fibers (including the scheduler). Each table entry has two
>>    fields: average cpu consumption, which is calculated with exponential
>>    moving average over event loop iterations, and current cpu consumption,
>>    which shows fiber's cpu usage over the last event loop iteration.
>>    The patch relies on CPU timestamp counter to measure each fiber's time
>>    share.
>> 
>>    Closes #2694
>> 
>>    @TarantoolBot document
>>    Title: fiber: new function `fiber.top()`
>> 
>>    `fiber.top()` returns a table of all alive fibers and lists their cpu
>>    consumption. Let's take a look at the example:
>>    ```
>>    tarantool> fiber.top()
>>    ---
>>    - 1:
>>        cpu average (%): 10.779696493982
>>        cpu instant (%): 10.435256168573
> 
> 1. Have you considered making these option names
> one word? 'cpu average (%)' -> 'average'. The
> same for instant. My motivation is that it could
> simplify wrappers which are going to use the top
> to show it in a GUI or something. You actually use
> it in the tests.
> 
> It would not hard readability IMO. It is obvious
> that top is in percents, and about CPU time.

No problem.

> 
>>      115:
>>        cpu average (%): 5.4571279061075
>>        cpu instant (%): 5.9653973440576
>>      120:
>>        cpu average (%): 21.944382148464
>>        cpu instant (%): 23.849021825646
>>      116:
>>        cpu average (%): 8.6603872318158
>>        cpu instant (%): 9.6812031335093
>>      119:
>>        cpu average (%): 21.933168871944
>>        cpu instant (%): 20.007540530351
>>      cpu misses: 0
>>      118:
>>        cpu average (%): 19.342901995963
>>        cpu instant (%): 16.932679820703
>>      117:
>>        cpu average (%): 11.549674814981
>>        cpu instant (%): 13.128901177161
>>    ...
>>    ```
> 
> 2. This is super cool!
> 
> Could we give names to predefined fibers? For
> example, make an alias top.scheduler = top[1].
> 
> So as in the console output I would see:
> 
>    ---
>    - scheduler:
>        cpu average (%): 98.230148883023
>        cpu instant (%): 100
> 
> Instead of '1'. Because personally I didn't
> even know that 1 is always the scheduler, and
> I would be confused to see a regular fiber
> consumed 99% of time in a console.

Hmm, what about this?

--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
 {
        struct lua_State *L = (struct lua_State *) cb_ctx;
 
-       lua_pushinteger(L, f->fid);
+       /* Index system fibers by name instead of id */
+       if (f->fid <= FIBER_ID_MAX_RESERVED) {
+               lua_pushstring(L, f->name);
+       } else {
+               lua_pushinteger(L, f->fid);
+       }
        lua_newtable(L);


> 
>>    In the table above keys are fiber id's (and a single 'cpu misses' key
>>    which indicates the amount of times tx thread was rescheduled on a
>>    different cpu core. More on that later).
>>    The two metrics available for each fiber are:
>>    1) cpu instant (per cent),
>>    which indicates the share of time fiber was executing during the
>>    previous event loop iteration
>>    2) cpu average (per cent), which is calculated as an exponential moving
>>    average of `cpu instant` values over all previous event loop iterations.
>> 
>>    More info on `cpu misses` field returned by `fiber.top()`:
>>    `cpu misses` indicates the amount of times tx thread detected it was
>>    rescheduled on a different cpu core during the last event loop
>>    iteration.
>>    fiber.top() uses cpu timestamp counter to measure each fiber's execution
>>    time. However, each cpu core may have its own counter value (you can
>>    only rely on counter deltas if both measurements were taken on the same
>>    core, otherwise the delta may even get negative).
>>    When tx thread is rescheduled to a different cpu core, tarantool just
>>    assumes cpu delta was zero for the lust measurement. This loweres
>>    precision of our computations, so the bigger `cpu misses` value the
>>    lower the precision of fiber.top() results.
>> 
>>    Fiber.top() doesn't work on arm architecture at the moment.
> 
> 2. Have you done benchmarks how much top slows down fiber
> switching? It should not be hard to measure. Just switching.
> Lots of fibers which only yield. And measure loop iterations
> per second. Won't work? Or average/median duration of one
> loop iteration.

Hi! I tested context switches per second.
The results are in the issue comments:
https://github.com/tarantool/tarantool/issues/2694#issuecomment-546381304 <https://github.com/tarantool/tarantool/issues/2694#issuecomment-546381304>
Performance degradation is between 10 and 16 per cent.

>> 
>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>> index b813c1739..bea49eb41 100644
>> --- a/src/lib/core/fiber.c
>> +++ b/src/lib/core/fiber.c
>> @@ -37,11 +37,14 @@
>> #include <stdlib.h>
>> #include <string.h>
>> #include <pmatomic.h>
>> -
>> #include "assoc.h"
> 
> 3. Sorry, looks like stray diff. Could you
> please drop it?
> 

Thanks for noticing! Fixed.

>> #include "memory.h"
>> #include "trigger.h"
>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>> 	err;								\
>> })
>> 
>> +#if ENABLE_FIBER_TOP
>> +/**
>> + * An action performed each time a context switch happens.
>> + * Used to count each fiber's processing time.
>> + * This is a macro rather than a function, since it is used
>> + * in scheduler too.
> 
> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler,
> so why does it prevent making it a function? Caller is always
> struct fiber *, so you can make it a function taking fiber *
> parameter, can't you?

You cannot call functions from scheduler cos it doesn’t have a
stack AFAIU. I can make it an inline function if you want me to.

> 
>> + */
>> +#define clock_set_on_csw(caller)					\
>> +({									\
>> +	uint64_t clock;							\
>> +	uint32_t cpu_id;		 				\
>> +	clock = __rdtscp(&cpu_id);					\
>> +									\
>> +	if (cpu_id == cord()->cpu_id_last) {				\
>> +		(caller)->clock_delta += clock - cord()->clock_last;	\
>> +		cord()->clock_delta += clock - cord()->clock_last;	\
>> +	} else {							\
>> +		cord()->cpu_id_last = cpu_id;				\
>> +		cord()->cpu_miss_count++;				\
>> +	}								\
>> +	cord()->clock_last = clock;					\
>> +})> @@ -584,6 +617,7 @@ fiber_schedule_list(struct rlist *list)
>> 	}
>> 	last->caller = fiber();
>> 	assert(fiber() == &cord()->sched);
>> +	clock_set_on_csw(fiber());
> 
> 5. Am I right, that we need to update CPU consumption each
> time a context switch happens?

Yep.

> If so, then by logic you
> would need to call clock_set_on_csw() on each fiber->csw++,
> correct?

Right. I also call clock_set_on_csw() on each event loop iteration
end.

> 
> Currently that happens in 2 places: fiber_call_impl() and
> fiber_yield(). In both these places you already call
> clock_set_on_csw():
> 
>    - You always call clock_set_on_csw() right before
>      fiber_call_impl();
> 
>    - You call clock_set_on_csw() in fiber_yield().
> 
> I would then move fiber->cws++ logic into your macros too.
> So all the context switch and CPU time tracking logic
> would be in one place.
> 
> The only problem is that your macros takes caller, but csw++
> is done for callee. Probably we could increment it for the
> caller instead, I don't anything depends on that.
> 
> Also it would lead to increment of the scheduler's csw
> counter. Not sure whether it is ok.

Loop iteration end isn’t a context switch, but I still update
clocks here. I don’t want to increment csw here as a side
effect, so maybe leave csw increment as it is?

> 
> Like that (I didn't test it):
> 
> ============================================================
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index bea49eb41..e9aac99d0 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -105,6 +105,7 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
> 		cord()->cpu_id_last = cpu_id;				\
> 		cord()->cpu_miss_count++;				\
> 	}								\
> +	caller->csw++;							\
> 	cord()->clock_last = clock;					\
> })
> 
> @@ -258,7 +259,7 @@ fiber_call_impl(struct fiber *callee)
> 	cord->fiber = callee;
> 
> 	callee->flags &= ~FIBER_IS_READY;
> -	callee->csw++;
> +	clock_set_on_csw(caller);
> 	ASAN_START_SWITCH_FIBER(asan_state, 1,
> 				callee->stack,
> 				callee->stack_size);
> @@ -277,7 +278,6 @@ fiber_call(struct fiber *callee)
> 	/** By convention, these triggers must not throw. */
> 	if (! rlist_empty(&caller->on_yield))
> 		trigger_run(&caller->on_yield, NULL);
> -	clock_set_on_csw(caller);
> 	callee->caller = caller;
> 	callee->flags |= FIBER_IS_READY;
> 	caller->flags |= FIBER_IS_READY;
> @@ -511,7 +511,6 @@ fiber_yield(void)
> 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
> 	assert(! (callee->flags & FIBER_IS_DEAD));
> 	cord->fiber = callee;
> -	callee->csw++;
> 	callee->flags &= ~FIBER_IS_READY;
> 	ASAN_START_SWITCH_FIBER(asan_state,
> 				(caller->flags & FIBER_IS_DEAD) == 0,
> @@ -617,7 +616,6 @@ fiber_schedule_list(struct rlist *list)
> 	}
> 	last->caller = fiber();
> 	assert(fiber() == &cord()->sched);
> -	clock_set_on_csw(fiber());
> 	fiber_call_impl(first);
> }
> 
> ============================================================
> 
>> 	fiber_call_impl(first);
>> }
>> 
>> @@ -1044,6 +1082,107 @@ fiber_destroy_all(struct cord *cord)> +bool
>> +fiber_top_is_enabled()
>> +{
>> +	return fiber_top_enabled;
>> +}
>> +
>> +inline void
>> +fiber_top_enable()
> 
> 6. Please, add 'static' modifier. Looks like
> this function is used inside this file only.
> The same for fiber_top_disable().

It’s used in lua fiber module.

> 
>> +{
>> +	if (!fiber_top_enabled) {
>> +		ev_prepare_start(cord()->loop, &cord()->prepare_event);
>> +		ev_check_start(cord()->loop, &cord()->check_event);
>> +		fiber_top_enabled = true;
>> +
>> +		cord()->clock_acc = 0;
>> +		cord()->cpu_miss_count_last = 0;
>> +		cord()->clock_delta_last = 0;
>> +	}
>> +}
>> @@ -1077,6 +1216,14 @@ cord_create(struct cord *cord, const char *name)
>> 	ev_async_init(&cord->wakeup_event, fiber_schedule_wakeup);
>> 
>> 	ev_idle_init(&cord->idle_event, fiber_schedule_idle);
>> +
>> +#if ENABLE_FIBER_TOP
>> +	/* fiber.top() currently works only for the main thread. */
>> +	if (cord_is_main()) {
>> +		fiber_top_init();
>> +		fiber_top_enable();
> 
> 7. I think, we need to enable top by default only
> when we are sure that it does not affect performance.

Ok, disabled it by default. User can turn it on with
fiber.top_enable(). This also resolved the issue I had
with your swim test.

Looking at perf difference, maybe we can turn it on by default?

> 
>> +	}
>> +#endif /* ENABLE_FIBER_TOP */
>> 	cord_set_name(name);
>> 
>> #if ENABLE_ASAN
>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>> index 336be60a2..2adff1536 100644
>> --- a/src/lua/fiber.c
>> +++ b/src/lua/fiber.c
>> @@ -319,6 +319,61 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
>> +
>> +static int
>> +lbox_fiber_top(struct lua_State *L)
>> +{
>> +	if (!fiber_top_is_enabled()) {
>> +		luaL_error(L, "fiber.top() is disabled. enable it with"
> 
> 8. 'enable' is a first word in the sentence. So probably
> it is 'Enable’.

Yep. Done.

> 
>> +			      " fiber.top_enable() first");
>> +	}
>> +	lua_newtable(L);
>> +	lua_pushliteral(L, "cpu misses");
>> +	lua_pushnumber(L, cord()->cpu_miss_count_last);
>> +	lua_settable(L, -3);
>> +
>> +	lbox_fiber_top_entry(&cord()->sched, L);
>> +	fiber_stat(lbox_fiber_top_entry, L);
>> +
>> +	return 1;
>> +}
>> diff --git a/test/app/fiber.result b/test/app/fiber.result
>> index 3c6115a33..196fae3b7 100644
>> --- a/test/app/fiber.result
>> +++ b/test/app/fiber.result
>> @@ -1462,6 +1462,84 @@ fiber.join(fiber.self())
>> ---
>> - error: the fiber is not joinable
>> ...
>> +sum = 0
>> +---
>> +...
>> +-- gh-2694 fiber.top()
>> +a = fiber.top()
>> +---
>> +...
>> +-- scheduler is present in fiber.top()
>> +a[1] ~= nil
>> +---
>> +- true
>> +...
>> +type(a["cpu misses"]) == 'number'
>> +---
>> +- true
>> +...
>> +sum_inst = 0
>> +---
>> +...
>> +sum_avg = 0
>> +---
>> +...
>> +test_run:cmd('setopt delimiter ";"')
> 
> 9. I would consider usage of '\' instead of a
> delimiter change for such a short piece of
> multiline code. But up to you. Just in case
> you didn't know about '\', because I didn't
> know either until recently.
> 


I didn’t know about ‘\’ as well. Thanks.


Here’s the incremental diff:

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index bea49eb41..883fbac36 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -37,6 +37,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <pmatomic.h>
+
 #include "assoc.h"
 #include "memory.h"
 #include "trigger.h"
@@ -1221,7 +1222,6 @@ cord_create(struct cord *cord, const char *name)
 	/* fiber.top() currently works only for the main thread. */
 	if (cord_is_main()) {
 		fiber_top_init();
-		fiber_top_enable();
 	}
 #endif /* ENABLE_FIBER_TOP */
 	cord_set_name(name);
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 62f3ccce4..c38bd886f 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
 {
 	struct lua_State *L = (struct lua_State *) cb_ctx;
 
-	lua_pushinteger(L, f->fid);
+	/* Index system fibers by name instead of id */
+	if (f->fid <= FIBER_ID_MAX_RESERVED) {
+		lua_pushstring(L, f->name);
+	} else {
+		lua_pushinteger(L, f->fid);
+	}
 	lua_newtable(L);
 
-	lua_pushliteral(L, "cpu average (%)");
+	lua_pushliteral(L, "average");
 	lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100);
 	lua_settable(L, -3);
-	lua_pushliteral(L, "cpu instant (%)");
+	lua_pushliteral(L, "instant");
 	lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100);
 	lua_settable(L, -3);
 	lua_settable(L, -3);
@@ -343,7 +348,7 @@ static int
 lbox_fiber_top(struct lua_State *L)
 {
 	if (!fiber_top_is_enabled()) {
-		luaL_error(L, "fiber.top() is disabled. enable it with"
+		luaL_error(L, "fiber.top() is disabled. Enable it with"
 			      " fiber.top_enable() first");
 	}
 	lua_newtable(L);
diff --git a/test/app/fiber.result b/test/app/fiber.result
index 196fae3b7..e45cd7639 100644
--- a/test/app/fiber.result
+++ b/test/app/fiber.result
@@ -1466,11 +1466,19 @@ sum = 0
 ---
 ...
 -- gh-2694 fiber.top()
+fiber.top_enable()
+---
+...
 a = fiber.top()
 ---
 ...
+type(a)
+---
+- table
+...
 -- scheduler is present in fiber.top()
-a[1] ~= nil
+-- and is indexed by name
+a["sched"] ~= nil
 ---
 - true
 ...
@@ -1484,21 +1492,19 @@ sum_inst = 0
 sum_avg = 0
 ---
 ...
-test_run:cmd('setopt delimiter ";"')
----
-- true
-...
-for k, v in pairs(a) do
-    if type(v) == 'table' then
-        sum_inst = sum_inst + v["cpu instant (%)"]
-        sum_avg = sum_avg + v["cpu average (%)"]
-    end
-end;
+-- update table to make sure
+-- a full event loop iteration
+-- has ended
+a = fiber.top()
 ---
 ...
-test_run:cmd('setopt delimiter ""');
+for k, v in pairs(a) do\
+    if type(v) == 'table' then\
+        sum_inst = sum_inst + v["instant"]\
+        sum_avg = sum_avg + v["average"]\
+    end\
+end
 ---
-- true
 ...
 sum_inst
 ---
@@ -1518,11 +1524,11 @@ f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[f
 while f:status() ~= 'dead' do fiber.sleep(0.01) end
 ---
 ...
-tbl["cpu average (%)"] > 0
+tbl["average"] > 0
 ---
 - true
 ...
-tbl["cpu instant (%)"] > 0
+tbl["instant"] > 0
 ---
 - true
 ...
@@ -1531,14 +1537,7 @@ fiber.top_disable()
 ...
 fiber.top()
 ---
-- error: fiber.top() is disabled. enable it with fiber.top_enable() first
-...
-fiber.top_enable()
----
-...
-type(fiber.top())
----
-- table
+- error: fiber.top() is disabled. Enable it with fiber.top_enable() first
 ...
 -- cleanup
 test_run:cmd("clear filter")
diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
index 15507c2c7..e30aba77e 100644
--- a/test/app/fiber.test.lua
+++ b/test/app/fiber.test.lua
@@ -632,33 +632,39 @@ fiber.join(fiber.self())
 sum = 0
 
 -- gh-2694 fiber.top()
+fiber.top_enable()
+
 a = fiber.top()
+type(a)
 -- scheduler is present in fiber.top()
-a[1] ~= nil
+-- and is indexed by name
+a["sched"] ~= nil
 type(a["cpu misses"]) == 'number'
 sum_inst = 0
 sum_avg = 0
-test_run:cmd('setopt delimiter ";"')
-for k, v in pairs(a) do
-    if type(v) == 'table' then
-        sum_inst = sum_inst + v["cpu instant (%)"]
-        sum_avg = sum_avg + v["cpu average (%)"]
-    end
-end;
-test_run:cmd('setopt delimiter ""');
+
+-- update table to make sure
+-- a full event loop iteration
+-- has ended
+a = fiber.top()
+for k, v in pairs(a) do\
+    if type(v) == 'table' then\
+        sum_inst = sum_inst + v["instant"]\
+        sum_avg = sum_avg + v["average"]\
+    end\
+end
+
 sum_inst
 -- not exact due to accumulated integer division errors
 sum_avg > 99 and sum_avg < 101 or sum_avg
 tbl = nil
 f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[fiber.self().id()] end)
 while f:status() ~= 'dead' do fiber.sleep(0.01) end
-tbl["cpu average (%)"] > 0
-tbl["cpu instant (%)"] > 0
+tbl["average"] > 0
+tbl["instant"] > 0
 
 fiber.top_disable()
 fiber.top()
-fiber.top_enable()
-type(fiber.top())
 
 -- cleanup
 test_run:cmd("clear filter")

--
Serge Petrenko
sergepetrenko@tarantool.org


[-- Attachment #2: Type: text/html, Size: 45977 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 1/2] test: modify swim_run_test to break event loop
  2019-10-22 21:20   ` [Tarantool-patches] [tarantool-patches] [PATCH v2 1/2] test: modify swim_run_test to break event loop Vladislav Shpilevoy
@ 2019-10-25 15:15     ` Serge Petrenko
  0 siblings, 0 replies; 9+ messages in thread
From: Serge Petrenko @ 2019-10-25 15:15 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, tarantool-patches

Hi! Thank you for the explanation.
I threw this patch away


> 23 окт. 2019 г., в 0:20, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Hi! Thanks for the patch!
> 
> On 08/10/2019 17:03, Serge Petrenko wrote:
>> Add a wrapper function around swim test passed to swim_run_test which
>> breaks from default cord event loop. Otherwise we will hang indefinitely
>> in ev_run in case there is a started watcher.
> 
> But why? The loop runs only until there are fibers to run.
> By design of these tests, after they are done, there should
> be not a single fiber nor a watcher, and the loop is finished
> without a break.
> 
> This is why the tests pass without this patch. So what is
> a purpose of this commit? It does not change anything except
> that it may mask some errors. Because if the SWIM unit tests
> would hang without a break, then something is wrong. I caught
> several bugs in SWIM via this.
> 
> If fiber.top() somehow interferes, then probably you could
> turn it off for this test? I just don't want to miss possible
> SWIM bugs.
> 


>> 
>> Found during work on #2694
>> ---
>> test/unit/swim_test_utils.c | 14 +++++++++++++-
>> 1 file changed, 13 insertions(+), 1 deletion(-)
>> 
> 


--
Serge Petrenko
sergepetrenko@tarantool.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
  2019-10-25 15:13     ` Serge Petrenko
@ 2019-10-25 16:19       ` Serge Petrenko
  2019-10-26 18:00       ` Vladislav Shpilevoy
  1 sibling, 0 replies; 9+ messages in thread
From: Serge Petrenko @ 2019-10-25 16:19 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 22115 bytes --]

Hi!
Found a test failure on our CI.
The fix is below. Besides, it makes sense to turn this code off while
fiber.top <http://fiber.top/>() is turned off. The changes are on the branch.

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 883fbac36..09afa068e 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -93,22 +93,25 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
  * This is a macro rather than a function, since it is used
  * in scheduler too.
  */
-#define clock_set_on_csw(caller)					\
-({									\
-	uint64_t clock;							\
-	uint32_t cpu_id;		 				\
-	clock = __rdtscp(&cpu_id);					\
-									\
-	if (cpu_id == cord()->cpu_id_last) {				\
-		(caller)->clock_delta += clock - cord()->clock_last;	\
-		cord()->clock_delta += clock - cord()->clock_last;	\
-	} else {							\
-		cord()->cpu_id_last = cpu_id;				\
-		cord()->cpu_miss_count++;				\
-	}								\
-	cord()->clock_last = clock;					\
+#define clock_set_on_csw(caller)					     \
+({									     \
+	if (fiber_top_enabled) {					     \
+		uint64_t clock;						     \
+		uint32_t cpu_id;		 			     \
+		clock = __rdtscp(&cpu_id);				     \
+									     \
+		if (cpu_id == cord()->cpu_id_last) {			     \
+			(caller)->clock_delta += clock - cord()->clock_last; \
+			cord()->clock_delta += clock - cord()->clock_last;   \
+		} else {						     \
+			cord()->cpu_id_last = cpu_id;			     \
+			cord()->cpu_miss_count++;			     \
+		}							     \
+		cord()->clock_last = clock;				     \
+	}								     \
 })
 
+static bool fiber_top_enabled = false;
 
 #else
 #define clock_set_on_csw(caller) ;
@@ -1151,8 +1154,6 @@ fiber_top_init()
 	ev_check_init(&cord()->check_event, loop_on_iteration_start);
 }
 
-static bool fiber_top_enabled = false;
-
 bool
 fiber_top_is_enabled()
 {


--
Serge Petrenko
sergepetrenko@tarantool.org




> 25 окт. 2019 г., в 18:13, Serge Petrenko <sergepetrenko@tarantool.org> написал(а):
> 
> Hi! Thank you for review!
> I addressed your comments and answered inline.
> Incremental diff is below.
> 
>> 23 окт. 2019 г., в 0:18, Vladislav Shpilevoy <v.shpilevoy@tarantool.org <mailto:v.shpilevoy@tarantool.org>> написал(а):
>> 
>> Thanks for the patch!
>> 
>> Sorry, the version in the email is a bit outdated. I
>> pasted below the actual version from the branch.
>> 
>> See 9 comments below.
>> 
>>>    lua: add fiber.top <http://fiber.top/>() listing fiber cpu consumption
>>> 
>>>    Implement a new function in Lua fiber library: top(). It returns a table
>>>    of alive fibers (including the scheduler). Each table entry has two
>>>    fields: average cpu consumption, which is calculated with exponential
>>>    moving average over event loop iterations, and current cpu consumption,
>>>    which shows fiber's cpu usage over the last event loop iteration.
>>>    The patch relies on CPU timestamp counter to measure each fiber's time
>>>    share.
>>> 
>>>    Closes #2694
>>> 
>>>    @TarantoolBot document
>>>    Title: fiber: new function `fiber.top <http://fiber.top/>()`
>>> 
>>>    `fiber.top <http://fiber.top/>()` returns a table of all alive fibers and lists their cpu
>>>    consumption. Let's take a look at the example:
>>>    ```
>>>    tarantool> fiber.top <http://fiber.top/>()
>>>    ---
>>>    - 1:
>>>        cpu average (%): 10.779696493982
>>>        cpu instant (%): 10.435256168573
>> 
>> 1. Have you considered making these option names
>> one word? 'cpu average (%)' -> 'average'. The
>> same for instant. My motivation is that it could
>> simplify wrappers which are going to use the top
>> to show it in a GUI or something. You actually use
>> it in the tests.
>> 
>> It would not hard readability IMO. It is obvious
>> that top is in percents, and about CPU time.
> 
> No problem.
> 
>> 
>>>      115:
>>>        cpu average (%): 5.4571279061075
>>>        cpu instant (%): 5.9653973440576
>>>      120:
>>>        cpu average (%): 21.944382148464
>>>        cpu instant (%): 23.849021825646
>>>      116:
>>>        cpu average (%): 8.6603872318158
>>>        cpu instant (%): 9.6812031335093
>>>      119:
>>>        cpu average (%): 21.933168871944
>>>        cpu instant (%): 20.007540530351
>>>      cpu misses: 0
>>>      118:
>>>        cpu average (%): 19.342901995963
>>>        cpu instant (%): 16.932679820703
>>>      117:
>>>        cpu average (%): 11.549674814981
>>>        cpu instant (%): 13.128901177161
>>>    ...
>>>    ```
>> 
>> 2. This is super cool!
>> 
>> Could we give names to predefined fibers? For
>> example, make an alias top.scheduler = top[1].
>> 
>> So as in the console output I would see:
>> 
>>    ---
>>    - scheduler:
>>        cpu average (%): 98.230148883023
>>        cpu instant (%): 100
>> 
>> Instead of '1'. Because personally I didn't
>> even know that 1 is always the scheduler, and
>> I would be confused to see a regular fiber
>> consumed 99% of time in a console.
> 
> Hmm, what about this?
> 
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>  {
>         struct lua_State *L = (struct lua_State *) cb_ctx;
>  
> -       lua_pushinteger(L, f->fid);
> +       /* Index system fibers by name instead of id */
> +       if (f->fid <= FIBER_ID_MAX_RESERVED) {
> +               lua_pushstring(L, f->name);
> +       } else {
> +               lua_pushinteger(L, f->fid);
> +       }
>         lua_newtable(L);
> 
> 
>> 
>>>    In the table above keys are fiber id's (and a single 'cpu misses' key
>>>    which indicates the amount of times tx thread was rescheduled on a
>>>    different cpu core. More on that later).
>>>    The two metrics available for each fiber are:
>>>    1) cpu instant (per cent),
>>>    which indicates the share of time fiber was executing during the
>>>    previous event loop iteration
>>>    2) cpu average (per cent), which is calculated as an exponential moving
>>>    average of `cpu instant` values over all previous event loop iterations.
>>> 
>>>    More info on `cpu misses` field returned by `fiber.top <http://fiber.top/>()`:
>>>    `cpu misses` indicates the amount of times tx thread detected it was
>>>    rescheduled on a different cpu core during the last event loop
>>>    iteration.
>>>    fiber.top <http://fiber.top/>() uses cpu timestamp counter to measure each fiber's execution
>>>    time. However, each cpu core may have its own counter value (you can
>>>    only rely on counter deltas if both measurements were taken on the same
>>>    core, otherwise the delta may even get negative).
>>>    When tx thread is rescheduled to a different cpu core, tarantool just
>>>    assumes cpu delta was zero for the lust measurement. This loweres
>>>    precision of our computations, so the bigger `cpu misses` value the
>>>    lower the precision of fiber.top <http://fiber.top/>() results.
>>> 
>>>    Fiber.top <http://fiber.top/>() doesn't work on arm architecture at the moment.
>> 
>> 2. Have you done benchmarks how much top slows down fiber
>> switching? It should not be hard to measure. Just switching.
>> Lots of fibers which only yield. And measure loop iterations
>> per second. Won't work? Or average/median duration of one
>> loop iteration.
> 
> Hi! I tested context switches per second.
> The results are in the issue comments:
> https://github.com/tarantool/tarantool/issues/2694#issuecomment-546381304 <https://github.com/tarantool/tarantool/issues/2694#issuecomment-546381304>
> Performance degradation is between 10 and 16 per cent.
> 
>>> 
>>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>>> index b813c1739..bea49eb41 100644
>>> --- a/src/lib/core/fiber.c
>>> +++ b/src/lib/core/fiber.c
>>> @@ -37,11 +37,14 @@
>>> #include <stdlib.h>
>>> #include <string.h>
>>> #include <pmatomic.h>
>>> -
>>> #include "assoc.h"
>> 
>> 3. Sorry, looks like stray diff. Could you
>> please drop it?
>> 
> 
> Thanks for noticing! Fixed.
> 
>>> #include "memory.h"
>>> #include "trigger.h"
>>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>>> 	err;								\
>>> })
>>> 
>>> +#if ENABLE_FIBER_TOP
>>> +/**
>>> + * An action performed each time a context switch happens.
>>> + * Used to count each fiber's processing time.
>>> + * This is a macro rather than a function, since it is used
>>> + * in scheduler too.
>> 
>> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler,
>> so why does it prevent making it a function? Caller is always
>> struct fiber *, so you can make it a function taking fiber *
>> parameter, can't you?
> 
> You cannot call functions from scheduler cos it doesn’t have a
> stack AFAIU. I can make it an inline function if you want me to.
> 
>> 
>>> + */
>>> +#define clock_set_on_csw(caller)					\
>>> +({									\
>>> +	uint64_t clock;							\
>>> +	uint32_t cpu_id;		 				\
>>> +	clock = __rdtscp(&cpu_id);					\
>>> +									\
>>> +	if (cpu_id == cord()->cpu_id_last) {				\
>>> +		(caller)->clock_delta += clock - cord()->clock_last;	\
>>> +		cord()->clock_delta += clock - cord()->clock_last;	\
>>> +	} else {							\
>>> +		cord()->cpu_id_last = cpu_id;				\
>>> +		cord()->cpu_miss_count++;				\
>>> +	}								\
>>> +	cord()->clock_last = clock;					\
>>> +})> @@ -584,6 +617,7 @@ fiber_schedule_list(struct rlist *list)
>>> 	}
>>> 	last->caller = fiber();
>>> 	assert(fiber() == &cord()->sched);
>>> +	clock_set_on_csw(fiber());
>> 
>> 5. Am I right, that we need to update CPU consumption each
>> time a context switch happens? 
> 
> Yep.
> 
>> If so, then by logic you
>> would need to call clock_set_on_csw() on each fiber->csw++,
>> correct?
> 
> Right. I also call clock_set_on_csw() on each event loop iteration
> end.
> 
>> 
>> Currently that happens in 2 places: fiber_call_impl() and
>> fiber_yield(). In both these places you already call
>> clock_set_on_csw():
>> 
>>    - You always call clock_set_on_csw() right before
>>      fiber_call_impl();
>> 
>>    - You call clock_set_on_csw() in fiber_yield().
>> 
>> I would then move fiber->cws++ logic into your macros too.
>> So all the context switch and CPU time tracking logic
>> would be in one place.
>> 
>> The only problem is that your macros takes caller, but csw++
>> is done for callee. Probably we could increment it for the
>> caller instead, I don't anything depends on that.
>> 
>> Also it would lead to increment of the scheduler's csw
>> counter. Not sure whether it is ok.
> 
> Loop iteration end isn’t a context switch, but I still update
> clocks here. I don’t want to increment csw here as a side
> effect, so maybe leave csw increment as it is?
> 
>> 
>> Like that (I didn't test it):
>> 
>> ============================================================
>> 
>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>> index bea49eb41..e9aac99d0 100644
>> --- a/src/lib/core/fiber.c
>> +++ b/src/lib/core/fiber.c
>> @@ -105,6 +105,7 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>> 		cord()->cpu_id_last = cpu_id;				\
>> 		cord()->cpu_miss_count++;				\
>> 	}								\
>> +	caller->csw++;							\
>> 	cord()->clock_last = clock;					\
>> })
>> 
>> @@ -258,7 +259,7 @@ fiber_call_impl(struct fiber *callee)
>> 	cord->fiber = callee;
>> 
>> 	callee->flags &= ~FIBER_IS_READY;
>> -	callee->csw++;
>> +	clock_set_on_csw(caller);
>> 	ASAN_START_SWITCH_FIBER(asan_state, 1,
>> 				callee->stack,
>> 				callee->stack_size);
>> @@ -277,7 +278,6 @@ fiber_call(struct fiber *callee)
>> 	/** By convention, these triggers must not throw. */
>> 	if (! rlist_empty(&caller->on_yield))
>> 		trigger_run(&caller->on_yield, NULL);
>> -	clock_set_on_csw(caller);
>> 	callee->caller = caller;
>> 	callee->flags |= FIBER_IS_READY;
>> 	caller->flags |= FIBER_IS_READY;
>> @@ -511,7 +511,6 @@ fiber_yield(void)
>> 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
>> 	assert(! (callee->flags & FIBER_IS_DEAD));
>> 	cord->fiber = callee;
>> -	callee->csw++;
>> 	callee->flags &= ~FIBER_IS_READY;
>> 	ASAN_START_SWITCH_FIBER(asan_state,
>> 				(caller->flags & FIBER_IS_DEAD) == 0,
>> @@ -617,7 +616,6 @@ fiber_schedule_list(struct rlist *list)
>> 	}
>> 	last->caller = fiber();
>> 	assert(fiber() == &cord()->sched);
>> -	clock_set_on_csw(fiber());
>> 	fiber_call_impl(first);
>> }
>> 
>> ============================================================
>> 
>>> 	fiber_call_impl(first);
>>> }
>>> 
>>> @@ -1044,6 +1082,107 @@ fiber_destroy_all(struct cord *cord)> +bool
>>> +fiber_top_is_enabled()
>>> +{
>>> +	return fiber_top_enabled;
>>> +}
>>> +
>>> +inline void
>>> +fiber_top_enable()
>> 
>> 6. Please, add 'static' modifier. Looks like
>> this function is used inside this file only.
>> The same for fiber_top_disable().
> 
> It’s used in lua fiber module.
> 
>> 
>>> +{
>>> +	if (!fiber_top_enabled) {
>>> +		ev_prepare_start(cord()->loop, &cord()->prepare_event);
>>> +		ev_check_start(cord()->loop, &cord()->check_event);
>>> +		fiber_top_enabled = true;
>>> +
>>> +		cord()->clock_acc = 0;
>>> +		cord()->cpu_miss_count_last = 0;
>>> +		cord()->clock_delta_last = 0;
>>> +	}
>>> +}
>>> @@ -1077,6 +1216,14 @@ cord_create(struct cord *cord, const char *name)
>>> 	ev_async_init(&cord->wakeup_event, fiber_schedule_wakeup);
>>> 
>>> 	ev_idle_init(&cord->idle_event, fiber_schedule_idle);
>>> +
>>> +#if ENABLE_FIBER_TOP
>>> +	/* fiber.top <http://fiber.top/>() currently works only for the main thread. */
>>> +	if (cord_is_main()) {
>>> +		fiber_top_init();
>>> +		fiber_top_enable();
>> 
>> 7. I think, we need to enable top by default only
>> when we are sure that it does not affect performance.
> 
> Ok, disabled it by default. User can turn it on with
> fiber.top_enable(). This also resolved the issue I had
> with your swim test.
> 
> Looking at perf difference, maybe we can turn it on by default?
> 
>> 
>>> +	}
>>> +#endif /* ENABLE_FIBER_TOP */
>>> 	cord_set_name(name);
>>> 
>>> #if ENABLE_ASAN
>>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>>> index 336be60a2..2adff1536 100644
>>> --- a/src/lua/fiber.c
>>> +++ b/src/lua/fiber.c
>>> @@ -319,6 +319,61 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
>>> +
>>> +static int
>>> +lbox_fiber_top(struct lua_State *L)
>>> +{
>>> +	if (!fiber_top_is_enabled()) {
>>> +		luaL_error(L, "fiber.top <http://fiber.top/>() is disabled. enable it with"
>> 
>> 8. 'enable' is a first word in the sentence. So probably
>> it is 'Enable’.
> 
> Yep. Done.
> 
>> 
>>> +			      " fiber.top_enable() first");
>>> +	}
>>> +	lua_newtable(L);
>>> +	lua_pushliteral(L, "cpu misses");
>>> +	lua_pushnumber(L, cord()->cpu_miss_count_last);
>>> +	lua_settable(L, -3);
>>> +
>>> +	lbox_fiber_top_entry(&cord()->sched, L);
>>> +	fiber_stat(lbox_fiber_top_entry, L);
>>> +
>>> +	return 1;
>>> +}
>>> diff --git a/test/app/fiber.result b/test/app/fiber.result
>>> index 3c6115a33..196fae3b7 100644
>>> --- a/test/app/fiber.result
>>> +++ b/test/app/fiber.result
>>> @@ -1462,6 +1462,84 @@ fiber.join(fiber.self())
>>> ---
>>> - error: the fiber is not joinable
>>> ...
>>> +sum = 0
>>> +---
>>> +...
>>> +-- gh-2694 fiber.top <http://fiber.top/>()
>>> +a = fiber.top <http://fiber.top/>()
>>> +---
>>> +...
>>> +-- scheduler is present in fiber.top <http://fiber.top/>()
>>> +a[1] ~= nil
>>> +---
>>> +- true
>>> +...
>>> +type(a["cpu misses"]) == 'number'
>>> +---
>>> +- true
>>> +...
>>> +sum_inst = 0
>>> +---
>>> +...
>>> +sum_avg = 0
>>> +---
>>> +...
>>> +test_run:cmd('setopt delimiter ";"')
>> 
>> 9. I would consider usage of '\' instead of a
>> delimiter change for such a short piece of
>> multiline code. But up to you. Just in case
>> you didn't know about '\', because I didn't
>> know either until recently.
>> 
> 
> 
> I didn’t know about ‘\’ as well. Thanks.
> 
> 
> Here’s the incremental diff:
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index bea49eb41..883fbac36 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -37,6 +37,7 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <pmatomic.h>
> +
>  #include "assoc.h"
>  #include "memory.h"
>  #include "trigger.h"
> @@ -1221,7 +1222,6 @@ cord_create(struct cord *cord, const char *name)
>  	/* fiber.top <http://fiber.top/>() currently works only for the main thread. */
>  	if (cord_is_main()) {
>  		fiber_top_init();
> -		fiber_top_enable();
>  	}
>  #endif /* ENABLE_FIBER_TOP */
>  	cord_set_name(name);
> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index 62f3ccce4..c38bd886f 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>  {
>  	struct lua_State *L = (struct lua_State *) cb_ctx;
>  
> -	lua_pushinteger(L, f->fid);
> +	/* Index system fibers by name instead of id */
> +	if (f->fid <= FIBER_ID_MAX_RESERVED) {
> +		lua_pushstring(L, f->name);
> +	} else {
> +		lua_pushinteger(L, f->fid);
> +	}
>  	lua_newtable(L);
>  
> -	lua_pushliteral(L, "cpu average (%)");
> +	lua_pushliteral(L, "average");
>  	lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100);
>  	lua_settable(L, -3);
> -	lua_pushliteral(L, "cpu instant (%)");
> +	lua_pushliteral(L, "instant");
>  	lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100);
>  	lua_settable(L, -3);
>  	lua_settable(L, -3);
> @@ -343,7 +348,7 @@ static int
>  lbox_fiber_top(struct lua_State *L)
>  {
>  	if (!fiber_top_is_enabled()) {
> -		luaL_error(L, "fiber.top <http://fiber.top/>() is disabled. enable it with"
> +		luaL_error(L, "fiber.top <http://fiber.top/>() is disabled. Enable it with"
>  			      " fiber.top_enable() first");
>  	}
>  	lua_newtable(L);
> diff --git a/test/app/fiber.result b/test/app/fiber.result
> index 196fae3b7..e45cd7639 100644
> --- a/test/app/fiber.result
> +++ b/test/app/fiber.result
> @@ -1466,11 +1466,19 @@ sum = 0
>  ---
>  ...
>  -- gh-2694 fiber.top <http://fiber.top/>()
> +fiber.top_enable()
> +---
> +...
>  a = fiber.top <http://fiber.top/>()
>  ---
>  ...
> +type(a)
> +---
> +- table
> +...
>  -- scheduler is present in fiber.top <http://fiber.top/>()
> -a[1] ~= nil
> +-- and is indexed by name
> +a["sched"] ~= nil
>  ---
>  - true
>  ...
> @@ -1484,21 +1492,19 @@ sum_inst = 0
>  sum_avg = 0
>  ---
>  ...
> -test_run:cmd('setopt delimiter ";"')
> ----
> -- true
> -...
> -for k, v in pairs(a) do
> -    if type(v) == 'table' then
> -        sum_inst = sum_inst + v["cpu instant (%)"]
> -        sum_avg = sum_avg + v["cpu average (%)"]
> -    end
> -end;
> +-- update table to make sure
> +-- a full event loop iteration
> +-- has ended
> +a = fiber.top <http://fiber.top/>()
>  ---
>  ...
> -test_run:cmd('setopt delimiter ""');
> +for k, v in pairs(a) do\
> +    if type(v) == 'table' then\
> +        sum_inst = sum_inst + v["instant"]\
> +        sum_avg = sum_avg + v["average"]\
> +    end\
> +end
>  ---
> -- true
>  ...
>  sum_inst
>  ---
> @@ -1518,11 +1524,11 @@ f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top <http://fiber.top/>()[f
>  while f:status() ~= 'dead' do fiber.sleep(0.01) end
>  ---
>  ...
> -tbl["cpu average (%)"] > 0
> +tbl["average"] > 0
>  ---
>  - true
>  ...
> -tbl["cpu instant (%)"] > 0
> +tbl["instant"] > 0
>  ---
>  - true
>  ...
> @@ -1531,14 +1537,7 @@ fiber.top_disable()
>  ...
>  fiber.top <http://fiber.top/>()
>  ---
> -- error: fiber.top <http://fiber.top/>() is disabled. enable it with fiber.top_enable() first
> -...
> -fiber.top_enable()
> ----
> -...
> -type(fiber.top <http://fiber.top/>())
> ----
> -- table
> +- error: fiber.top <http://fiber.top/>() is disabled. Enable it with fiber.top_enable() first
>  ...
>  -- cleanup
>  test_run:cmd("clear filter")
> diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
> index 15507c2c7..e30aba77e 100644
> --- a/test/app/fiber.test.lua
> +++ b/test/app/fiber.test.lua
> @@ -632,33 +632,39 @@ fiber.join(fiber.self())
>  sum = 0
>  
>  -- gh-2694 fiber.top <http://fiber.top/>()
> +fiber.top_enable()
> +
>  a = fiber.top <http://fiber.top/>()
> +type(a)
>  -- scheduler is present in fiber.top <http://fiber.top/>()
> -a[1] ~= nil
> +-- and is indexed by name
> +a["sched"] ~= nil
>  type(a["cpu misses"]) == 'number'
>  sum_inst = 0
>  sum_avg = 0
> -test_run:cmd('setopt delimiter ";"')
> -for k, v in pairs(a) do
> -    if type(v) == 'table' then
> -        sum_inst = sum_inst + v["cpu instant (%)"]
> -        sum_avg = sum_avg + v["cpu average (%)"]
> -    end
> -end;
> -test_run:cmd('setopt delimiter ""');
> +
> +-- update table to make sure
> +-- a full event loop iteration
> +-- has ended
> +a = fiber.top <http://fiber.top/>()
> +for k, v in pairs(a) do\
> +    if type(v) == 'table' then\
> +        sum_inst = sum_inst + v["instant"]\
> +        sum_avg = sum_avg + v["average"]\
> +    end\
> +end
> +
>  sum_inst
>  -- not exact due to accumulated integer division errors
>  sum_avg > 99 and sum_avg < 101 or sum_avg
>  tbl = nil
>  f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top <http://fiber.top/>()[fiber.self().id()] end)
>  while f:status() ~= 'dead' do fiber.sleep(0.01) end
> -tbl["cpu average (%)"] > 0
> -tbl["cpu instant (%)"] > 0
> +tbl["average"] > 0
> +tbl["instant"] > 0
>  
>  fiber.top_disable()
>  fiber.top <http://fiber.top/>()
> -fiber.top_enable()
> -type(fiber.top <http://fiber.top/>())
>  
>  -- cleanup
>  test_run:cmd("clear filter")
> 
> --
> Serge Petrenko
> sergepetrenko@tarantool.org <mailto:sergepetrenko@tarantool.org>

[-- Attachment #2: Type: text/html, Size: 56381 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
  2019-10-25 15:13     ` Serge Petrenko
  2019-10-25 16:19       ` Serge Petrenko
@ 2019-10-26 18:00       ` Vladislav Shpilevoy
  2019-10-28 16:16         ` Serge Petrenko
  1 sibling, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2019-10-26 18:00 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, tarantool-patches

Hi! Thanks for the fixes!

>>>      115:
>>>        cpu average (%): 5.4571279061075
>>>        cpu instant (%): 5.9653973440576
>>>      120:
>>>        cpu average (%): 21.944382148464
>>>        cpu instant (%): 23.849021825646
>>>      116:
>>>        cpu average (%): 8.6603872318158
>>>        cpu instant (%): 9.6812031335093
>>>      119:
>>>        cpu average (%): 21.933168871944
>>>        cpu instant (%): 20.007540530351
>>>      cpu misses: 0
>>>      118:
>>>        cpu average (%): 19.342901995963
>>>        cpu instant (%): 16.932679820703
>>>      117:
>>>        cpu average (%): 11.549674814981
>>>        cpu instant (%): 13.128901177161
>>>    ...
>>>    ```
>>
>> 2. This is super cool!
>>
>> Could we give names to predefined fibers? For
>> example, make an alias top.scheduler = top[1].
>>
>> So as in the console output I would see:
>>
>>    ---
>>    - scheduler:
>>        cpu average (%): 98.230148883023
>>        cpu instant (%): 100
>>
>> Instead of '1'. Because personally I didn't
>> even know that 1 is always the scheduler, and
>> I would be confused to see a regular fiber
>> consumed 99% of time in a console.
> 
> Hmm, what about this?
> 
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>  {
>         struct lua_State *L = (struct lua_State *) cb_ctx;
>  
> -       lua_pushinteger(L, f->fid);
> +       /* Index system fibers by name instead of id */
> +       if (f->fid <= FIBER_ID_MAX_RESERVED) {
> +               lua_pushstring(L, f->name);
> +       } else {
> +               lua_pushinteger(L, f->fid);
> +       }
>         lua_newtable(L);
> 
> 

What about always using a name, when it is not empty? When a
system is running, it might be hard to find to which fiber
an ID belongs. Or 'name:id' like in logs (or whatever format
we use in the logs).

For example I started a vshard recovery fiber, it somewhy
consumes 99% CPU, but I don't know its ID. And I would need
to use vshard.storage.internal.recovery_fiber:id() to find it.

Up to you, and probably to Kirill since this is public API.

>>> #include "memory.h"
>>> #include "trigger.h"
>>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>>> err;\
>>> })
>>>
>>> +#if ENABLE_FIBER_TOP
>>> +/**
>>> + * An action performed each time a context switch happens.
>>> + * Used to count each fiber's processing time.
>>> + * This is a macro rather than a function, since it is used
>>> + * in scheduler too.
>>
>> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler,
>> so why does it prevent making it a function? Caller is always
>> struct fiber *, so you can make it a function taking fiber *
>> parameter, can't you?
> 
> You cannot call functions from scheduler cos it doesn’t have a
> stack AFAIU. I can make it an inline function if you want me to.
> 

Does not have a stack? Then how does it work at all? You use
stack in your macros - you declare a couple of variables. Also
intermediate arithmetic results might be saved on the stack.

Talking of an 'inline' function - if that will work, then it
would look better IMO. At least it won't raise questions why
it is not a function.

>> 5. Am I right, that we need to update CPU consumption each
>> time a context switch happens? 
> 
> Yep.
> 
>> If so, then by logic you
>> would need to call clock_set_on_csw() on each fiber->csw++,
>> correct?
> 
> Right. I also call clock_set_on_csw() on each event loop iteration
> end.
> 
>>
>> Currently that happens in 2 places: fiber_call_impl() and
>> fiber_yield(). In both these places you already call
>> clock_set_on_csw():
>>
>>    - You always call clock_set_on_csw() right before
>>      fiber_call_impl();
>>
>>    - You call clock_set_on_csw() in fiber_yield().
>>
>> I would then move fiber->cws++ logic into your macros too.
>> So all the context switch and CPU time tracking logic
>> would be in one place.
>>
>> The only problem is that your macros takes caller, but csw++
>> is done for callee. Probably we could increment it for the
>> caller instead, I don't anything depends on that.
>>
>> Also it would lead to increment of the scheduler's csw
>> counter. Not sure whether it is ok.
> 
> Loop iteration end isn’t a context switch, but I still update
> clocks here.

Yes, you update, but not via clock_set_on_csw(). Am I wrong?
The only fiber which gets updated in the end of the loop via
clock_set_on_csw() is the scheduler. For other fibers you
manually update clock_acc, clock_delta_last, clock_delta in
loop_on_iteration_end().

> I don’t want to increment csw here as a side
> effect, so maybe leave csw increment as it is?
> 

If you are talking about the scheduler's csw counter, then
I don't think it matters. 'Csw' is just statistics. It is
not used for anything. Besides, why a switch to the scheduler
does not count? End of the loop = switch to the scheduler, so
it means its csw should be incremented, shouldn't it?

>>> ev_idle_init(&cord->idle_event, fiber_schedule_idle);
>>> +
>>> +#if ENABLE_FIBER_TOP
>>> +/* fiber.top <http://fiber.top>() currently works only for the main thread. */
>>> +if (cord_is_main()) {
>>> +fiber_top_init();
>>> +fiber_top_enable();
>>
>> 7. I think, we need to enable top by default only
>> when we are sure that it does not affect performance.
> 
> Ok, disabled it by default. User can turn it on with
> fiber.top_enable(). This also resolved the issue I had
> with your swim test.
> 
> Looking at perf difference, maybe we can turn it on by default?

IMO, 10-16% looks quite significant taking into account,
that 99% of users won't use it at all, and others only for
debug of rare exceptional cases. For which they can call
enable(), get top(), and disable() back. I think, we should
ask Kirill. I am against default on.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
  2019-10-26 18:00       ` Vladislav Shpilevoy
@ 2019-10-28 16:16         ` Serge Petrenko
  2019-10-28 23:00           ` Vladislav Shpilevoy
  0 siblings, 1 reply; 9+ messages in thread
From: Serge Petrenko @ 2019-10-28 16:16 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, tarantool-patches

Hi! Thanks for your answer!

I’ve updated the branch, please take a look at the incremental diff below.

> 26 окт. 2019 г., в 21:00, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Hi! Thanks for the fixes!
> 
>>>>      115:
>>>>        cpu average (%): 5.4571279061075
>>>>        cpu instant (%): 5.9653973440576
>>>>      120:
>>>>        cpu average (%): 21.944382148464
>>>>        cpu instant (%): 23.849021825646
>>>>      116:
>>>>        cpu average (%): 8.6603872318158
>>>>        cpu instant (%): 9.6812031335093
>>>>      119:
>>>>        cpu average (%): 21.933168871944
>>>>        cpu instant (%): 20.007540530351
>>>>      cpu misses: 0
>>>>      118:
>>>>        cpu average (%): 19.342901995963
>>>>        cpu instant (%): 16.932679820703
>>>>      117:
>>>>        cpu average (%): 11.549674814981
>>>>        cpu instant (%): 13.128901177161
>>>>    ...
>>>>    ```
>>> 
>>> 2. This is super cool!
>>> 
>>> Could we give names to predefined fibers? For
>>> example, make an alias top.scheduler = top[1].
>>> 
>>> So as in the console output I would see:
>>> 
>>>    ---
>>>    - scheduler:
>>>        cpu average (%): 98.230148883023
>>>        cpu instant (%): 100
>>> 
>>> Instead of '1'. Because personally I didn't
>>> even know that 1 is always the scheduler, and
>>> I would be confused to see a regular fiber
>>> consumed 99% of time in a console.
>> 
>> Hmm, what about this?
>> 
>> --- a/src/lua/fiber.c
>> +++ b/src/lua/fiber.c
>> @@ -325,13 +325,18 @@ lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>>  {
>>         struct lua_State *L = (struct lua_State *) cb_ctx;
>>  
>> -       lua_pushinteger(L, f->fid);
>> +       /* Index system fibers by name instead of id */
>> +       if (f->fid <= FIBER_ID_MAX_RESERVED) {
>> +               lua_pushstring(L, f->name);
>> +       } else {
>> +               lua_pushinteger(L, f->fid);
>> +       }
>>         lua_newtable(L);
>> 
>> 
> 
> What about always using a name, when it is not empty? When a
> system is running, it might be hard to find to which fiber
> an ID belongs. Or 'name:id' like in logs (or whatever format
> we use in the logs).
> 
> For example I started a vshard recovery fiber, it somewhy
> consumes 99% CPU, but I don't know its ID. And I would need
> to use vshard.storage.internal.recovery_fiber:id() to find it.
> 
> Up to you, and probably to Kirill since this is public API.
> 

Let’s stick with `id/name` just like in the logs then.

>>>> #include "memory.h"
>>>> #include "trigger.h"
>>>> @@ -82,6 +85,34 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>>>> err;\
>>>> })
>>>> 
>>>> +#if ENABLE_FIBER_TOP
>>>> +/**
>>>> + * An action performed each time a context switch happens.
>>>> + * Used to count each fiber's processing time.
>>>> + * This is a macro rather than a function, since it is used
>>>> + * in scheduler too.
>>> 
>>> 4. Hm. Not sure if I understand. Ok, it is used in the scheduler,
>>> so why does it prevent making it a function? Caller is always
>>> struct fiber *, so you can make it a function taking fiber *
>>> parameter, can't you?
>> 
>> You cannot call functions from scheduler cos it doesn’t have a
>> stack AFAIU. I can make it an inline function if you want me to.
>> 
> 
> Does not have a stack? Then how does it work at all? You use
> stack in your macros - you declare a couple of variables. Also
> intermediate arithmetic results might be saved on the stack.
> 

Yes, you’re right, thanks. It was a strange belief of mine.

> Talking of an 'inline' function - if that will work, then it
> would look better IMO. At least it won't raise questions why
> it is not a function.

Will do then.

> 
>>> 5. Am I right, that we need to update CPU consumption each
>>> time a context switch happens? 
>> 
>> Yep.
>> 
>>> If so, then by logic you
>>> would need to call clock_set_on_csw() on each fiber->csw++,
>>> correct?
>> 
>> Right. I also call clock_set_on_csw() on each event loop iteration
>> end.
>> 
>>> 
>>> Currently that happens in 2 places: fiber_call_impl() and
>>> fiber_yield(). In both these places you already call
>>> clock_set_on_csw():
>>> 
>>>    - You always call clock_set_on_csw() right before
>>>      fiber_call_impl();
>>> 
>>>    - You call clock_set_on_csw() in fiber_yield().
>>> 
>>> I would then move fiber->cws++ logic into your macros too.
>>> So all the context switch and CPU time tracking logic
>>> would be in one place.
>>> 
>>> The only problem is that your macros takes caller, but csw++
>>> is done for callee. Probably we could increment it for the
>>> caller instead, I don't anything depends on that.
>>> 
>>> Also it would lead to increment of the scheduler's csw
>>> counter. Not sure whether it is ok.
>> 
>> Loop iteration end isn’t a context switch, but I still update
>> clocks here.
> 
> Yes, you update, but not via clock_set_on_csw(). Am I wrong?
> The only fiber which gets updated in the end of the loop via
> clock_set_on_csw() is the scheduler. For other fibers you
> manually update clock_acc, clock_delta_last, clock_delta in
> loop_on_iteration_end().

Yep, I was talking about the sched.

> 
>> I don’t want to increment csw here as a side
>> effect, so maybe leave csw increment as it is?
>> 
> 
> If you are talking about the scheduler's csw counter, then
> I don't think it matters. 'Csw' is just statistics. It is
> not used for anything. Besides, why a switch to the scheduler
> does not count? End of the loop = switch to the scheduler, so
> it means its csw should be incremented, shouldn't it?

Ok

> 
>>>> ev_idle_init(&cord->idle_event, fiber_schedule_idle);
>>>> +
>>>> +#if ENABLE_FIBER_TOP
>>>> +/* fiber.top <http://fiber.top>() currently works only for the main thread. */
>>>> +if (cord_is_main()) {
>>>> +fiber_top_init();
>>>> +fiber_top_enable();
>>> 
>>> 7. I think, we need to enable top by default only
>>> when we are sure that it does not affect performance.
>> 
>> Ok, disabled it by default. User can turn it on with
>> fiber.top_enable(). This also resolved the issue I had
>> with your swim test.
>> 
>> Looking at perf difference, maybe we can turn it on by default?
> 
> IMO, 10-16% looks quite significant taking into account,
> that 99% of users won't use it at all, and others only for
> debug of rare exceptional cases. For which they can call
> enable(), get top(), and disable() back. I think, we should
> ask Kirill. I am against default on.

Ok

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index e3fe77e21..4c646b278 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -87,31 +87,32 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
 })
 
 #if ENABLE_FIBER_TOP
+static bool fiber_top_enabled = false;
+
 /**
  * An action performed each time a context switch happens.
  * Used to count each fiber's processing time.
- * This is a macro rather than a function, since it is used
- * in scheduler too.
  */
-#define clock_set_on_csw(caller)					     \
-({									     \
-	if (fiber_top_enabled) {					     \
-		uint64_t clock;						     \
-		uint32_t cpu_id;		 			     \
-		clock = __rdtscp(&cpu_id);				     \
-									     \
-		if (cpu_id == cord()->cpu_id_last) {			     \
-			(caller)->clock_delta += clock - cord()->clock_last; \
-			cord()->clock_delta += clock - cord()->clock_last;   \
-		} else {						     \
-			cord()->cpu_id_last = cpu_id;			     \
-			cord()->cpu_miss_count++;			     \
-		}							     \
-		cord()->clock_last = clock;				     \
-	}								     \
-})
+static inline void
+clock_set_on_csw(struct fiber *caller)
+{
+	caller->csw++;
+	if (!fiber_top_enabled)
+		return;
 
-static bool fiber_top_enabled = false;
+	uint64_t clock;
+	uint32_t cpu_id;
+	clock = __rdtscp(&cpu_id);
+
+	if (cpu_id == cord()->cpu_id_last) {
+		caller->clock_delta += clock - cord()->clock_last;
+		cord()->clock_delta += clock - cord()->clock_last;
+	} else {
+		cord()->cpu_id_last = cpu_id;
+		cord()->cpu_miss_count++;
+	}
+	cord()->clock_last = clock;
+}
 
 #else
 #define clock_set_on_csw(caller) ;
@@ -262,7 +263,6 @@ fiber_call_impl(struct fiber *callee)
 	cord->fiber = callee;
 
 	callee->flags &= ~FIBER_IS_READY;
-	callee->csw++;
 	ASAN_START_SWITCH_FIBER(asan_state, 1,
 				callee->stack,
 				callee->stack_size);
@@ -515,7 +515,6 @@ fiber_yield(void)
 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
 	assert(! (callee->flags & FIBER_IS_DEAD));
 	cord->fiber = callee;
-	callee->csw++;
 	callee->flags &= ~FIBER_IS_READY;
 	ASAN_START_SWITCH_FIBER(asan_state,
 				(caller->flags & FIBER_IS_DEAD) == 0,
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index c38bd886f..a30290bfa 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -324,13 +324,11 @@ static int
 lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
 {
 	struct lua_State *L = (struct lua_State *) cb_ctx;
+	char name_buf[64];
+
+	snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
+	lua_pushstring(L, name_buf);
 
-	/* Index system fibers by name instead of id */
-	if (f->fid <= FIBER_ID_MAX_RESERVED) {
-		lua_pushstring(L, f->name);
-	} else {
-		lua_pushinteger(L, f->fid);
-	}
 	lua_newtable(L);
 
 	lua_pushliteral(L, "average");
diff --git a/test/app/fiber.result b/test/app/fiber.result
index e45cd7639..f9904d465 100644
--- a/test/app/fiber.result
+++ b/test/app/fiber.result
@@ -1478,7 +1478,7 @@ type(a)
 ...
 -- scheduler is present in fiber.top()
 -- and is indexed by name
-a["sched"] ~= nil
+a["1/sched"] ~= nil
 ---
 - true
 ...
@@ -1518,7 +1518,11 @@ sum_avg > 99 and sum_avg < 101 or sum_avg
 tbl = nil
 ---
 ...
-f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[fiber.self().id()] end)
+f = fiber.new(function()\
+    for i = 1,1000 do end\
+    fiber.yield()\
+    tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\
+end)
 ---
 ...
 while f:status() ~= 'dead' do fiber.sleep(0.01) end
diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
index e30aba77e..7f320e318 100644
--- a/test/app/fiber.test.lua
+++ b/test/app/fiber.test.lua
@@ -638,7 +638,7 @@ a = fiber.top()
 type(a)
 -- scheduler is present in fiber.top()
 -- and is indexed by name
-a["sched"] ~= nil
+a["1/sched"] ~= nil
 type(a["cpu misses"]) == 'number'
 sum_inst = 0
 sum_avg = 0
@@ -658,7 +658,11 @@ sum_inst
 -- not exact due to accumulated integer division errors
 sum_avg > 99 and sum_avg < 101 or sum_avg
 tbl = nil
-f = fiber.new(function() for i = 1,1000 do end fiber.yield() tbl = fiber.top()[fiber.self().id()] end)
+f = fiber.new(function()\
+    for i = 1,1000 do end\
+    fiber.yield()\
+    tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\
+end)
 while f:status() ~= 'dead' do fiber.sleep(0.01) end
 tbl["average"] > 0
 tbl["instant"] > 0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
  2019-10-28 16:16         ` Serge Petrenko
@ 2019-10-28 23:00           ` Vladislav Shpilevoy
  2019-11-01 14:09             ` Serge Petrenko
  0 siblings, 1 reply; 9+ messages in thread
From: Vladislav Shpilevoy @ 2019-10-28 23:00 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, tarantool-patches

Hi! Thanks for the fixes!

I think it is worth mentioning in the doc request,
that fiber.top may affect performance when enabled, so
users should not enable it when possible.

> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index e3fe77e21..4c646b278 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -87,31 +87,32 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>  })
>  
>  #if ENABLE_FIBER_TOP
> +static bool fiber_top_enabled = false;

I just realized that we might want to make it
thread local (__thread). We have fibers in other
threads, and I don't think we want to turn on fiber
top for them accidentally.

On the other hand sometimes access to a thread local
variable might be an expensive thing in my experience.
So probably we might want not to care about other
threads starting to collect top together with the main.
Just so as to keep performance unaffected when top is
disabled.

I would bench thread local fiber_top_enabled vs
global fiber_top_enabled, both with disabled top.
If they are the same, then lets make it thread local.

Note, this has nothing to do with fiber_top_init(),
which you call for the main thread only. Check for
fiber_top_enabled happens regardless of whether the
top was initialized, in all threads.

Other than that the patch LGTM. I think you need to
get a second review from Georgy or Alexander Tu.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption
  2019-10-28 23:00           ` Vladislav Shpilevoy
@ 2019-11-01 14:09             ` Serge Petrenko
  0 siblings, 0 replies; 9+ messages in thread
From: Serge Petrenko @ 2019-11-01 14:09 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

Thanks for reviewing this!

> 29 окт. 2019 г., в 2:00, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Hi! Thanks for the fixes!
> 
> I think it is worth mentioning in the doc request,
> that fiber.top may affect performance when enabled, so
> users should not enable it when possible.

I added the warning to the doc request.

> 
>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>> index e3fe77e21..4c646b278 100644
>> --- a/src/lib/core/fiber.c
>> +++ b/src/lib/core/fiber.c
>> @@ -87,31 +87,32 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>> })
>> 
>> #if ENABLE_FIBER_TOP
>> +static bool fiber_top_enabled = false;
> 
> I just realized that we might want to make it
> thread local (__thread). We have fibers in other
> threads, and I don't think we want to turn on fiber
> top for them accidentally.
> 
> On the other hand sometimes access to a thread local
> variable might be an expensive thing in my experience.
> So probably we might want not to care about other
> threads starting to collect top together with the main.
> Just so as to keep performance unaffected when top is
> disabled.
> 
> I would bench thread local fiber_top_enabled vs
> global fiber_top_enabled, both with disabled top.
> If they are the same, then lets make it thread local.
> 

I’ve tested your proposal. The difference is ±2%.
I can post the results if you want me to.
So, let it be `static __thread bool fiber_top_enabled`.

> Note, this has nothing to do with fiber_top_init(),
> which you call for the main thread only. Check for
> fiber_top_enabled happens regardless of whether the
> top was initialized, in all threads.
> 
> Other than that the patch LGTM. I think you need to
> get a second review from Georgy or Alexander Tu.
> 

I’ve resent v3 with the fixes mentioned above and one additional feature
requested by Mons (added «time» entry listing cpu time consumed by each
fiber both to fiber.info <http://fiber.info/>() and fiber.top <http://fiber.top/>())

--
Serge Petrenko
sergepetrenko@tarantool.org




[-- Attachment #2: Type: text/html, Size: 3997 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-01 14:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1570546695.git.sergepetrenko@tarantool.org>
     [not found] ` <eb924da60bc66cbd93d7920f53d6c332d15ffab8.1570546695.git.sergepetrenko@tarantool.org>
2019-10-22 21:18   ` [Tarantool-patches] [tarantool-patches] [PATCH v2 2/2] lua: add fiber.top() listing fiber cpu consumption Vladislav Shpilevoy
2019-10-25 15:13     ` Serge Petrenko
2019-10-25 16:19       ` Serge Petrenko
2019-10-26 18:00       ` Vladislav Shpilevoy
2019-10-28 16:16         ` Serge Petrenko
2019-10-28 23:00           ` Vladislav Shpilevoy
2019-11-01 14:09             ` Serge Petrenko
     [not found] ` <52e7822bbcd802528d448c15ce9e9fbe4479c73a.1570546695.git.sergepetrenko@tarantool.org>
2019-10-22 21:20   ` [Tarantool-patches] [tarantool-patches] [PATCH v2 1/2] test: modify swim_run_test to break event loop Vladislav Shpilevoy
2019-10-25 15:15     ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox