Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption
@ 2019-11-01 14:05 Serge Petrenko
  2019-11-02 17:12 ` Vladislav Shpilevoy
  2019-11-09  6:53 ` Kirill Yukhin
  0 siblings, 2 replies; 5+ messages in thread
From: Serge Petrenko @ 2019-11-01 14:05 UTC (permalink / raw)
  To: v.shpilevoy, georgy; +Cc: tarantool-patches, tarantool-patches

Implement a new function in Lua fiber library: top(). It returns a table
of alive fibers (including the scheduler). Each table entry has two
fields: average cpu consumption, which is calculated with exponential
moving average over event loop iterations, and current cpu consumption,
which shows fiber's cpu usage over the last event loop iteration.
The patch relies on CPU timestamp counter to measure each fiber's time
share.

Closes #2694

@TarantoolBot document
Title: fiber: new function `fiber.top()`

`fiber.top()` returns a table of all alive fibers and lists their cpu
consumption. Let's take a look at the example:
```
tarantool> fiber.top()
---
- 104/lua:
    instant: 18.433514726042
    time: 0.677505865
    average: 21.98826143184
  103/lua:
    instant: 19.131392015951
    time: 0.689521917
    average: 20.807772656431
  107/lua:
    instant: 18.624600174469
    time: 0.681585168
    average: 17.78194117452
  101/on_shutdown:
    instant: 0
    time: 0
    average: 0
  105/lua:
    instant: 18.562289702156
    time: 0.682085309
    average: 15.513811055476
  106/lua:
    instant: 18.441822789017
    time: 0.677320271
    average: 15.427595583115
  102/interactive:
    instant: 0
    time: 0.000367182
    average: 0
  cpu misses: 0
  1/sched:
    instant: 6.8063805923649
    time: 0.253035056
    average: 8.3479789103691
...

```
In the table above keys are strings containing fiber ids and names
(the only exception is a single 'cpu misses' key which indicates the
number of times tx thread was rescheduled on a different cpu core.
More on that later).
The three metrics available for each fiber are:
1) instant (per cent),
which indicates the share of time fiber was executing during the
previous event loop iteration
2) average (per cent), which is calculated as an exponential moving
average of `instant` values over all previous event loop iterations.
3) time (seconds), which estimates how much cpu time each fiber spent
processing during its lifetime.

More info on `cpu misses` field returned by `fiber.top()`:
`cpu misses` indicates the amount of times tx thread detected it was
rescheduled on a different cpu core during the last event loop
iteration.
fiber.top() uses cpu timestamp counter to measure each fiber's execution
time. However, each cpu core may have its own counter value (you can
only rely on counter deltas if both measurements were taken on the same
core, otherwise the delta may even get negative).
When tx thread is rescheduled to a different cpu core, tarantool just
assumes cpu delta was zero for the latest measurement. This loweres
precision of our computations, so the bigger `cpu misses` value the
lower the precision of fiber.top() results.

Fiber.top() doesn't work on arm architecture at the moment.

Please note, that enabling fiber.top() slows down fiber switching by
about 15 per cent, so it is disabled by default.
To enable it you need to issue `fiber.top_enable()`.
You can disable it back after you finished debugging  using
`fiber.top_disable()`.
"Time" entry is also added to each fibers output in fiber.info()
(it duplicates "time" entry from fiber.top()).
Note, that "time" is only counted while fiber.top is enabled.
---
https://github.com/tarantool/tarantool/issues/2694
https://github.com/tarantool/tarantool/tree/sp/gh-2694-fiber-top

Changes in v3: 
  - disable fiber.top() by default, throw away the 
    erroneous swim test fix from the first patch.
  - improve readability: start indexing fibers by id/name
    instead of plain id. 
  - various review fixes requested by @Gerold103
  - add fiber cpu time accounting: new field in
    fiber.info() for each fiber: "time". Same field
    in fiber.top() for each fiber.

Changes in v2: 
 - fixed unit/swim test hang in the 
   first patch
 - refactoring in second patch


 src/lib/core/fiber.c    | 194 +++++++++++++++++++++++++++++++++++++++-
 src/lib/core/fiber.h    |  70 +++++++++++++++
 src/lua/fiber.c         |  70 +++++++++++++++
 test/app/fiber.result   |  85 ++++++++++++++++++
 test/app/fiber.test.lua |  42 +++++++++
 5 files changed, 459 insertions(+), 2 deletions(-)

diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 93f22ae68..52888cc64 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -42,6 +42,10 @@
 #include "memory.h"
 #include "trigger.h"
 
+#if ENABLE_FIBER_TOP
+#include <x86intrin.h> /* __rdtscp() */
+#endif /* ENABLE_FIBER_TOP */
+
 #include "third_party/valgrind/memcheck.h"
 
 static int (*fiber_invoke)(fiber_func f, va_list ap);
@@ -82,6 +86,38 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
 	err;								\
 })
 
+#if ENABLE_FIBER_TOP
+static __thread bool fiber_top_enabled = false;
+
+/**
+ * An action performed each time a context switch happens.
+ * Used to count each fiber's processing time.
+ */
+static inline void
+clock_set_on_csw(struct fiber *caller)
+{
+	caller->csw++;
+	if (!fiber_top_enabled)
+		return;
+
+	uint64_t clock;
+	uint32_t cpu_id;
+	clock = __rdtscp(&cpu_id);
+
+	if (cpu_id == cord()->cpu_id_last) {
+		caller->clock_delta += clock - cord()->clock_last;
+		cord()->clock_delta += clock - cord()->clock_last;
+	} else {
+		cord()->cpu_id_last = cpu_id;
+		cord()->cpu_miss_count++;
+	}
+	cord()->clock_last = clock;
+}
+
+#else
+#define clock_set_on_csw(caller) ;
+#endif /* ENABLE_FIBER_TOP */
+
 /*
  * Defines a handler to be executed on exit from cord's thread func,
  * accessible via cord()->on_exit (normally NULL). It is used to
@@ -227,7 +263,6 @@ fiber_call_impl(struct fiber *callee)
 	cord->fiber = callee;
 
 	callee->flags &= ~FIBER_IS_READY;
-	callee->csw++;
 	ASAN_START_SWITCH_FIBER(asan_state, 1,
 				callee->stack,
 				callee->stack_size);
@@ -246,6 +281,7 @@ fiber_call(struct fiber *callee)
 	/** By convention, these triggers must not throw. */
 	if (! rlist_empty(&caller->on_yield))
 		trigger_run(&caller->on_yield, NULL);
+	clock_set_on_csw(caller);
 	callee->caller = caller;
 	callee->flags |= FIBER_IS_READY;
 	caller->flags |= FIBER_IS_READY;
@@ -474,11 +510,11 @@ fiber_yield(void)
 	/** By convention, these triggers must not throw. */
 	if (! rlist_empty(&caller->on_yield))
 		trigger_run(&caller->on_yield, NULL);
+	clock_set_on_csw(caller);
 
 	assert(callee->flags & FIBER_IS_READY || callee == &cord->sched);
 	assert(! (callee->flags & FIBER_IS_DEAD));
 	cord->fiber = callee;
-	callee->csw++;
 	callee->flags &= ~FIBER_IS_READY;
 	ASAN_START_SWITCH_FIBER(asan_state,
 				(caller->flags & FIBER_IS_DEAD) == 0,
@@ -584,6 +620,7 @@ fiber_schedule_list(struct rlist *list)
 	}
 	last->caller = fiber();
 	assert(fiber() == &cord()->sched);
+	clock_set_on_csw(fiber());
 	fiber_call_impl(first);
 }
 
@@ -656,6 +693,11 @@ fiber_reset(struct fiber *fiber)
 	rlist_create(&fiber->on_yield);
 	rlist_create(&fiber->on_stop);
 	fiber->flags = FIBER_DEFAULT_FLAGS;
+#if ENABLE_FIBER_TOP
+	fiber->cputime = 0;
+	fiber->clock_acc = 0;
+	fiber->clock_delta = 0;
+#endif /* ENABLE_FIBER_TOP */
 }
 
 /** Destroy an active fiber and prepare it for reuse. */
@@ -1044,6 +1086,147 @@ fiber_destroy_all(struct cord *cord)
 						      struct fiber, link));
 }
 
+#if ENABLE_FIBER_TOP
+static void
+loop_on_iteration_start(ev_loop *loop, ev_check *watcher, int revents)
+{
+	(void) loop;
+	(void) watcher;
+	(void) revents;
+
+	cord()->clock_last = __rdtscp(&cord()->cpu_id_last);
+	cord()->cpu_miss_count = 0;
+
+	/*
+	 * We want to measure thread cpu time here to calculate
+	 * each fiber's cpu time, so don't use libev's ev_now() or
+	 * ev_time() since they use either monotonic or realtime
+	 * system clocks.
+	 */
+	struct timespec ts;
+	if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts) != 0) {
+		say_debug("clock_gettime(): failed to get this"
+			  "thread's cpu time.");
+		return;
+	}
+	cord()->cputime_last = (uint64_t) ts.tv_sec * FIBER_TIME_RES +
+					  ts.tv_nsec;
+}
+
+
+/**
+ * Calculate the exponential moving average for the clock deltas
+ * per loop iteration. The coeffitient is 1/16.
+ */
+static inline uint64_t
+clock_diff_accumulate(uint64_t acc, uint64_t delta)
+{
+	if (acc > 0) {
+		return delta / 16 + 15 * acc / 16;
+	} else {
+		return delta;
+	}
+}
+
+static void
+loop_on_iteration_end(ev_loop *loop, ev_prepare *watcher, int revents)
+{
+	(void) loop;
+	(void) watcher;
+	(void) revents;
+	struct fiber *fiber;
+	assert(fiber() == &cord()->sched);
+
+	/*
+	 * Record the scheduler's latest clock change, even though
+	 * it's not a context switch, but an event loop iteration
+	 * end.
+	 */
+	clock_set_on_csw(&cord()->sched);
+
+	cord()->cpu_miss_count_last = cord()->cpu_miss_count;
+	cord()->cpu_miss_count = 0;
+
+	struct timespec ts;
+	uint64_t delta_time;
+	double nsec_per_clock = 0;
+
+	if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts) != 0) {
+		say_debug("clock_gettime(): failed to get this"
+			  "thread's cpu time.");
+	} else {
+		delta_time = (uint64_t) ts.tv_sec * FIBER_TIME_RES +
+			     ts.tv_nsec;
+		assert(delta_time > cord()->cputime_last);
+		delta_time -= cord()->cputime_last;
+
+		if (cord()->clock_delta > 0)
+			nsec_per_clock = (double) delta_time / cord()->clock_delta;
+	}
+
+	cord()->clock_acc = clock_diff_accumulate(cord()->clock_acc, cord()->clock_delta);
+	cord()->clock_delta_last = cord()->clock_delta;
+	cord()->clock_delta = 0;
+
+	cord()->sched.clock_acc = clock_diff_accumulate(cord()->sched.clock_acc, cord()->sched.clock_delta);
+	cord()->sched.clock_delta_last = cord()->sched.clock_delta;
+	cord()->sched.cputime += cord()->sched.clock_delta * nsec_per_clock;
+	cord()->sched.clock_delta = 0;
+
+	rlist_foreach_entry(fiber, &cord()->alive, link) {
+		fiber->clock_acc = clock_diff_accumulate(fiber->clock_acc, fiber->clock_delta);
+		fiber->clock_delta_last = fiber->clock_delta;
+		fiber->cputime += fiber->clock_delta * nsec_per_clock;
+		fiber->clock_delta = 0;
+	}
+}
+
+static inline void
+fiber_top_init()
+{
+	ev_prepare_init(&cord()->prepare_event, loop_on_iteration_end);
+	ev_check_init(&cord()->check_event, loop_on_iteration_start);
+}
+
+bool
+fiber_top_is_enabled()
+{
+	return fiber_top_enabled;
+}
+
+inline void
+fiber_top_enable()
+{
+	if (!fiber_top_enabled) {
+		ev_prepare_start(cord()->loop, &cord()->prepare_event);
+		ev_check_start(cord()->loop, &cord()->check_event);
+		fiber_top_enabled = true;
+
+		cord()->clock_acc = 0;
+		cord()->cpu_miss_count_last = 0;
+		cord()->clock_delta_last = 0;
+		struct timespec ts;
+		if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts) != 0) {
+			say_debug("clock_gettime(): failed to get this"
+				  "thread's cpu time.");
+			return;
+		}
+		cord()->cputime_last = (uint64_t) ts.tv_sec * FIBER_TIME_RES +
+						  ts.tv_nsec;
+	}
+}
+
+inline void
+fiber_top_disable()
+{
+	if (fiber_top_enabled) {
+		ev_prepare_stop(cord()->loop, &cord()->prepare_event);
+		ev_check_stop(cord()->loop, &cord()->check_event);
+		fiber_top_enabled = false;
+	}
+}
+#endif /* ENABLE_FIBER_TOP */
+
 void
 cord_create(struct cord *cord, const char *name)
 {
@@ -1077,6 +1260,13 @@ cord_create(struct cord *cord, const char *name)
 	ev_async_init(&cord->wakeup_event, fiber_schedule_wakeup);
 
 	ev_idle_init(&cord->idle_event, fiber_schedule_idle);
+
+#if ENABLE_FIBER_TOP
+	/* fiber.top() currently works only for the main thread. */
+	if (cord_is_main()) {
+		fiber_top_init();
+	}
+#endif /* ENABLE_FIBER_TOP */
 	cord_set_name(name);
 
 #if ENABLE_ASAN
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index fb168e25e..ab313a050 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -46,6 +46,21 @@
 
 #include <third_party/coro/coro.h>
 
+/*
+ * Fiber top doesn't work on ARM processors at the moment,
+ * because we haven't chosen an alternative to rdtsc.
+ */
+#ifdef __CC_ARM
+#define ENABLE_FIBER_TOP 0
+#else
+#define ENABLE_FIBER_TOP 1
+#endif
+
+#if ENABLE_FIBER_TOP
+/* A fiber reports used up CPU time with nanosecond resolution. */
+#define FIBER_TIME_RES 1000000000
+#endif /* ENABLE_FIBER_TOP */
+
 #if defined(__cplusplus)
 extern "C" {
 #endif /* defined(__cplusplus) */
@@ -386,6 +401,23 @@ struct fiber {
 	uint32_t fid;
 	/** Fiber flags */
 	uint32_t flags;
+#if ENABLE_FIBER_TOP
+	/**
+	 * Accumulated clock value calculated using exponential
+	 * moving average.
+	 */
+	uint64_t clock_acc;
+	/**
+	 * Total processor time this fiber has spent with
+	 * 1 / FIBER_TIME_RES second precision.
+	 */
+	uint64_t cputime;
+	/**
+	 * Clock delta calculated on previous event loop iteration.
+	 */
+	uint64_t clock_delta_last;
+	uint64_t clock_delta;
+#endif /* ENABLE_FIBER_TOP */
 	/** Link in cord->alive or cord->dead list. */
 	struct rlist link;
 	/** Link in cord->ready list. */
@@ -457,6 +489,23 @@ struct cord {
 	 * reserved.
 	 */
 	uint32_t max_fid;
+#if ENABLE_FIBER_TOP
+	uint64_t clock_acc;
+	uint64_t clock_delta;
+	uint64_t clock_delta_last;
+	uint64_t clock_last;
+	uint32_t cpu_id_last;
+	uint32_t cpu_miss_count;
+	uint32_t cpu_miss_count_last;
+	/**
+	 * This thread's CPU time at the beginning of event loop
+	 * iteration. Used to calculate how much cpu time has
+	 * each loop iteration consumed and update fiber cpu
+	 * times propotionally. The resolution is
+	 * 1 / FIBER_TIME_RES seconds.
+	 */
+	uint64_t cputime_last;
+#endif /* ENABLE_FIBER_TOP */
 	pthread_t id;
 	const struct cord_on_exit *on_exit;
 	/** A helper hash to map id -> fiber. */
@@ -482,6 +531,16 @@ struct cord {
 	 * is no 1 ms delay in case of zero sleep timeout.
 	 */
 	ev_idle idle_event;
+#if ENABLE_FIBER_TOP
+	/** An event triggered on every event loop iteration start. */
+	ev_check check_event;
+	/**
+	 * An event triggered on every event loop iteration end.
+	 * Just like the event above it is used in per-fiber cpu
+	 * time calculations.
+	 */
+	ev_prepare prepare_event;
+#endif /* ENABLE_FIBER_TOP */
 	/** A memory cache for (struct fiber) */
 	struct mempool fiber_mempool;
 	/** A runtime slab cache for general use in this cord. */
@@ -625,6 +684,17 @@ typedef int (*fiber_stat_cb)(struct fiber *f, void *ctx);
 int
 fiber_stat(fiber_stat_cb cb, void *cb_ctx);
 
+#if ENABLE_FIBER_TOP
+bool
+fiber_top_is_enabled();
+
+void
+fiber_top_enable();
+
+void
+fiber_top_disable();
+#endif /* ENABLE_FIBER_TOP */
+
 /** Useful for C unit tests */
 static inline int
 fiber_c_invoke(fiber_func f, va_list ap)
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index 124908a05..a030e444d 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -276,6 +276,10 @@ lbox_fiber_statof(struct fiber *f, void *cb_ctx, bool backtrace)
 	lua_pushnumber(L, f->csw);
 	lua_settable(L, -3);
 
+	lua_pushliteral(L, "time");
+	lua_pushnumber(L, f->cputime / (double) FIBER_TIME_RES);
+	lua_settable(L, -3);
+
 	lua_pushliteral(L, "memory");
 	lua_newtable(L);
 	lua_pushstring(L, "used");
@@ -319,6 +323,67 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
 	return lbox_fiber_statof(f, cb_ctx, false);
 }
 
+#if ENABLE_FIBER_TOP
+static int
+lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
+{
+	struct lua_State *L = (struct lua_State *) cb_ctx;
+	char name_buf[64];
+
+	snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
+	lua_pushstring(L, name_buf);
+
+	lua_newtable(L);
+
+	lua_pushliteral(L, "average");
+	lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100);
+	lua_settable(L, -3);
+	lua_pushliteral(L, "instant");
+	lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100);
+	lua_settable(L, -3);
+	lua_pushliteral(L, "time");
+	lua_pushnumber(L, f->cputime / (double) FIBER_TIME_RES);
+	lua_settable(L, -3);
+	lua_settable(L, -3);
+
+	return 0;
+}
+
+static int
+lbox_fiber_top(struct lua_State *L)
+{
+	if (!fiber_top_is_enabled()) {
+		luaL_error(L, "fiber.top() is disabled. Enable it with"
+			      " fiber.top_enable() first");
+	}
+	lua_newtable(L);
+	lua_pushliteral(L, "cpu misses");
+	lua_pushnumber(L, cord()->cpu_miss_count_last);
+	lua_settable(L, -3);
+
+	lbox_fiber_top_entry(&cord()->sched, L);
+	fiber_stat(lbox_fiber_top_entry, L);
+
+	return 1;
+}
+
+static int
+lbox_fiber_top_enable(struct lua_State *L)
+{
+	(void) L;
+	fiber_top_enable();
+	return 0;
+}
+
+static int
+lbox_fiber_top_disable(struct lua_State *L)
+{
+	(void) L;
+	fiber_top_disable();
+	return 0;
+}
+#endif /* ENABLE_FIBER_TOP */
+
 /**
  * Return fiber statistics.
  */
@@ -743,6 +808,11 @@ static const struct luaL_Reg lbox_fiber_meta [] = {
 
 static const struct luaL_Reg fiberlib[] = {
 	{"info", lbox_fiber_info},
+#if ENABLE_FIBER_TOP
+	{"top", lbox_fiber_top},
+	{"top_enable", lbox_fiber_top_enable},
+	{"top_disable", lbox_fiber_top_disable},
+#endif /* ENABLE_FIBER_TOP */
 	{"sleep", lbox_fiber_sleep},
 	{"yield", lbox_fiber_yield},
 	{"self", lbox_fiber_self},
diff --git a/test/app/fiber.result b/test/app/fiber.result
index 3c6115a33..3b9e5da9a 100644
--- a/test/app/fiber.result
+++ b/test/app/fiber.result
@@ -1462,6 +1462,91 @@ fiber.join(fiber.self())
 ---
 - error: the fiber is not joinable
 ...
+sum = 0
+---
+...
+-- gh-2694 fiber.top()
+fiber.top_enable()
+---
+...
+a = fiber.top()
+---
+...
+type(a)
+---
+- table
+...
+-- scheduler is present in fiber.top()
+-- and is indexed by name
+a["1/sched"] ~= nil
+---
+- true
+...
+type(a["cpu misses"]) == 'number'
+---
+- true
+...
+sum_inst = 0
+---
+...
+sum_avg = 0
+---
+...
+-- update table to make sure
+-- a full event loop iteration
+-- has ended
+a = fiber.top()
+---
+...
+for k, v in pairs(a) do\
+    if type(v) == 'table' then\
+        sum_inst = sum_inst + v["instant"]\
+        sum_avg = sum_avg + v["average"]\
+    end\
+end
+---
+...
+sum_inst
+---
+- 100
+...
+-- not exact due to accumulated integer division errors
+sum_avg > 99 and sum_avg < 101 or sum_avg
+---
+- true
+...
+tbl = nil
+---
+...
+f = fiber.new(function()\
+    for i = 1,1000 do end\
+    fiber.yield()\
+    tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\
+end)
+---
+...
+while f:status() ~= 'dead' do fiber.sleep(0.01) end
+---
+...
+tbl["average"] > 0
+---
+- true
+...
+tbl["instant"] > 0
+---
+- true
+...
+tbl["time"] > 0
+---
+- true
+...
+fiber.top_disable()
+---
+...
+fiber.top()
+---
+- error: fiber.top() is disabled. Enable it with fiber.top_enable() first
+...
 -- cleanup
 test_run:cmd("clear filter")
 ---
diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
index c5647b8f2..ce1f55e8d 100644
--- a/test/app/fiber.test.lua
+++ b/test/app/fiber.test.lua
@@ -629,6 +629,48 @@ while f:status() ~= 'dead' do fiber.sleep(0.01) end
 --
 fiber.join(fiber.self())
 
+sum = 0
+
+-- gh-2694 fiber.top()
+fiber.top_enable()
+
+a = fiber.top()
+type(a)
+-- scheduler is present in fiber.top()
+-- and is indexed by name
+a["1/sched"] ~= nil
+type(a["cpu misses"]) == 'number'
+sum_inst = 0
+sum_avg = 0
+
+-- update table to make sure
+-- a full event loop iteration
+-- has ended
+a = fiber.top()
+for k, v in pairs(a) do\
+    if type(v) == 'table' then\
+        sum_inst = sum_inst + v["instant"]\
+        sum_avg = sum_avg + v["average"]\
+    end\
+end
+
+sum_inst
+-- not exact due to accumulated integer division errors
+sum_avg > 99 and sum_avg < 101 or sum_avg
+tbl = nil
+f = fiber.new(function()\
+    for i = 1,1000 do end\
+    fiber.yield()\
+    tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\
+end)
+while f:status() ~= 'dead' do fiber.sleep(0.01) end
+tbl["average"] > 0
+tbl["instant"] > 0
+tbl["time"] > 0
+
+fiber.top_disable()
+fiber.top()
+
 -- cleanup
 test_run:cmd("clear filter")
 
-- 
2.21.0 (Apple Git-122)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption
  2019-11-01 14:05 [Tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption Serge Petrenko
@ 2019-11-02 17:12 ` Vladislav Shpilevoy
  2019-11-05 14:42   ` [Tarantool-patches] [tarantool-patches] " Serge Petrenko
  2019-11-09  6:53 ` Kirill Yukhin
  1 sibling, 1 reply; 5+ messages in thread
From: Vladislav Shpilevoy @ 2019-11-02 17:12 UTC (permalink / raw)
  To: Serge Petrenko, georgy; +Cc: tarantool-patches, tarantool-patches

Hi! Thanks for the patch!

See 3 comments below.

> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 93f22ae68..52888cc64 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -82,6 +86,38 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
> +#if ENABLE_FIBER_TOP
> +static __thread bool fiber_top_enabled = false;
> +
> +/**
> + * An action performed each time a context switch happens.
> + * Used to count each fiber's processing time.
> + */
> +static inline void
> +clock_set_on_csw(struct fiber *caller)
> +{
> +	caller->csw++;
> +	if (!fiber_top_enabled)
> +		return;
> +
> +	uint64_t clock;
> +	uint32_t cpu_id;
> +	clock = __rdtscp(&cpu_id);
> +
> +	if (cpu_id == cord()->cpu_id_last) {
> +		caller->clock_delta += clock - cord()->clock_last;
> +		cord()->clock_delta += clock - cord()->clock_last;
> +	} else {
> +		cord()->cpu_id_last = cpu_id;
> +		cord()->cpu_miss_count++;
> +	}
> +	cord()->clock_last = clock;
> +}
> +
> +#else
> +#define clock_set_on_csw(caller) ;

1. With undefined ENABLE_FIBER_TOP you don't update csw counter.
I would move this #if ENABLE to the clock_set_on_csw() body,
right after csw is incremented.

> +#endif /* ENABLE_FIBER_TOP */
> +
> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index 124908a05..a030e444d 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -319,6 +323,67 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
>  	return lbox_fiber_statof(f, cb_ctx, false);
>  }
>  
> +#if ENABLE_FIBER_TOP
> +static int
> +lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
> +{
> +	struct lua_State *L = (struct lua_State *) cb_ctx;
> +	char name_buf[64];
> +
> +	snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
> +	lua_pushstring(L, name_buf);

2. A piece of advice - use tt_sprintf:

    lua_pushstring(L, tt_sprintf("%u/%s", f->fid, f->name));

> +
> +	lua_newtable(L);
> +
> +	lua_pushliteral(L, "average");
> +	lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100);
> +	lua_settable(L, -3);
> +	lua_pushliteral(L, "instant");
> +	lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100);
> +	lua_settable(L, -3);
> +	lua_pushliteral(L, "time");
> +	lua_pushnumber(L, f->cputime / (double) FIBER_TIME_RES);
> +	lua_settable(L, -3);
> +	lua_settable(L, -3);
> +
> +	return 0;
> +}
> diff --git a/test/app/fiber.result b/test/app/fiber.result
> index 3c6115a33..3b9e5da9a 100644
> --- a/test/app/fiber.result
> +++ b/test/app/fiber.result
> @@ -1462,6 +1462,91 @@ fiber.join(fiber.self())
>  ---
>  - error: the fiber is not joinable
>  ...
> +sum = 0
> +---
> +...
> +-- gh-2694 fiber.top()
> +fiber.top_enable()
> +---
> +...
> +a = fiber.top()
> +---
> +...
> +type(a)
> +---
> +- table
> +...
> +-- scheduler is present in fiber.top()
> +-- and is indexed by name
> +a["1/sched"] ~= nil
> +---
> +- true
> +...
> +type(a["cpu misses"]) == 'number'
> +---
> +- true
> +...
> +sum_inst = 0
> +---
> +...
> +sum_avg = 0
> +---
> +...
> +-- update table to make sure
> +-- a full event loop iteration
> +-- has ended
> +a = fiber.top()
> +---
> +...
> +for k, v in pairs(a) do\
> +    if type(v) == 'table' then\

3. This looks hard to use. The fact, that one table contains
records totally different in their structure. I would propose
to return cpu misses and fibers separately:

    fiber.top() =

        cpu_misses = <number>,
        time = [
            '<id>/<name>' = {...},
            '<id>/<name>' = {...},
            '<id>/<name>' = {...},
            ...
        ]

Then you can take fiber.top().time and be sure, that all
records here have the same structure. As far as I remember
we already had similar problems with other statistics, so
it is better to design it know in the most extendible way.
And it will be easier to add new global statistics to the
top in future.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption
  2019-11-02 17:12 ` Vladislav Shpilevoy
@ 2019-11-05 14:42   ` Serge Petrenko
  2019-11-06 14:02     ` Vladislav Shpilevoy
  0 siblings, 1 reply; 5+ messages in thread
From: Serge Petrenko @ 2019-11-05 14:42 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches, tarantool-patches

[-- Attachment #1: Type: text/plain, Size: 8965 bytes --]

Hi! Thanks for your review!
I pushed the changes on the branch, the diff is below.

--
Serge Petrenko
sergepetrenko@tarantool.org




> 2 нояб. 2019 г., в 20:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org> написал(а):
> 
> Hi! Thanks for the patch!
> 
> See 3 comments below.
> 
>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>> index 93f22ae68..52888cc64 100644
>> --- a/src/lib/core/fiber.c
>> +++ b/src/lib/core/fiber.c
>> @@ -82,6 +86,38 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>> +#if ENABLE_FIBER_TOP
>> +static __thread bool fiber_top_enabled = false;
>> +
>> +/**
>> + * An action performed each time a context switch happens.
>> + * Used to count each fiber's processing time.
>> + */
>> +static inline void
>> +clock_set_on_csw(struct fiber *caller)
>> +{
>> +	caller->csw++;
>> +	if (!fiber_top_enabled)
>> +		return;
>> +
>> +	uint64_t clock;
>> +	uint32_t cpu_id;
>> +	clock = __rdtscp(&cpu_id);
>> +
>> +	if (cpu_id == cord()->cpu_id_last) {
>> +		caller->clock_delta += clock - cord()->clock_last;
>> +		cord()->clock_delta += clock - cord()->clock_last;
>> +	} else {
>> +		cord()->cpu_id_last = cpu_id;
>> +		cord()->cpu_miss_count++;
>> +	}
>> +	cord()->clock_last = clock;
>> +}
>> +
>> +#else
>> +#define clock_set_on_csw(caller) ;
> 
> 1. With undefined ENABLE_FIBER_TOP you don't update csw counter.
> I would move this #if ENABLE to the clock_set_on_csw() body,
> right after csw is incremented.

Fixed.

> 
>> +#endif /* ENABLE_FIBER_TOP */
>> +
>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>> index 124908a05..a030e444d 100644
>> --- a/src/lua/fiber.c
>> +++ b/src/lua/fiber.c
>> @@ -319,6 +323,67 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
>> 	return lbox_fiber_statof(f, cb_ctx, false);
>> }
>> 
>> +#if ENABLE_FIBER_TOP
>> +static int
>> +lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>> +{
>> +	struct lua_State *L = (struct lua_State *) cb_ctx;
>> +	char name_buf[64];
>> +
>> +	snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
>> +	lua_pushstring(L, name_buf);
> 
> 2. A piece of advice - use tt_sprintf:
> 
>    lua_pushstring(L, tt_sprintf("%u/%s", f->fid, f->name));

Thanks! Changed.

> 
>> +
>> +	lua_newtable(L);
>> +
>> +	lua_pushliteral(L, "average");
>> +	lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100);
>> +	lua_settable(L, -3);
>> +	lua_pushliteral(L, "instant");
>> +	lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100);
>> +	lua_settable(L, -3);
>> +	lua_pushliteral(L, "time");
>> +	lua_pushnumber(L, f->cputime / (double) FIBER_TIME_RES);
>> +	lua_settable(L, -3);
>> +	lua_settable(L, -3);
>> +
>> +	return 0;
>> +}
>> diff --git a/test/app/fiber.result b/test/app/fiber.result
>> index 3c6115a33..3b9e5da9a 100644
>> --- a/test/app/fiber.result
>> +++ b/test/app/fiber.result
>> @@ -1462,6 +1462,91 @@ fiber.join(fiber.self())
>> ---
>> - error: the fiber is not joinable
>> ...
>> +sum = 0
>> +---
>> +...
>> +-- gh-2694 fiber.top()
>> +fiber.top_enable()
>> +---
>> +...
>> +a = fiber.top()
>> +---
>> +...
>> +type(a)
>> +---
>> +- table
>> +...
>> +-- scheduler is present in fiber.top()
>> +-- and is indexed by name
>> +a["1/sched"] ~= nil
>> +---
>> +- true
>> +...
>> +type(a["cpu misses"]) == 'number'
>> +---
>> +- true
>> +...
>> +sum_inst = 0
>> +---
>> +...
>> +sum_avg = 0
>> +---
>> +...
>> +-- update table to make sure
>> +-- a full event loop iteration
>> +-- has ended
>> +a = fiber.top()
>> +---
>> +...
>> +for k, v in pairs(a) do\
>> +    if type(v) == 'table' then\
> 
> 3. This looks hard to use. The fact, that one table contains
> records totally different in their structure. I would propose
> to return cpu misses and fibers separately:
> 
>    fiber.top() =
> 
>        cpu_misses = <number>,
>        time = [
>            '<id>/<name>' = {...},
>            '<id>/<name>' = {...},
>            '<id>/<name>' = {...},
>            ...
>        ]
> 
> Then you can take fiber.top().time and be sure, that all
> records here have the same structure. As far as I remember
> we already had similar problems with other statistics, so
> it is better to design it know in the most extendible way.
> And it will be easier to add new global statistics to the
> top in future.
> 

Good point. I named the subtable `cpu` instead of `time`. It makes more
sense imo. Also renamed `cpu misses` to `cpu_misses` so that it can be
accessed as fiber.top <http://fiber.top/>().cpu_misses


diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 52888cc64..aebaba7f0 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -88,6 +88,7 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
 
 #if ENABLE_FIBER_TOP
 static __thread bool fiber_top_enabled = false;
+#endif /* ENABLE_FIBER_TOP */
 
 /**
  * An action performed each time a context switch happens.
@@ -97,6 +98,8 @@ static inline void
 clock_set_on_csw(struct fiber *caller)
 {
 	caller->csw++;
+
+#if ENABLE_FIBER_TOP
 	if (!fiber_top_enabled)
 		return;
 
@@ -112,12 +115,10 @@ clock_set_on_csw(struct fiber *caller)
 		cord()->cpu_miss_count++;
 	}
 	cord()->clock_last = clock;
-}
-
-#else
-#define clock_set_on_csw(caller) ;
 #endif /* ENABLE_FIBER_TOP */
 
+}
+
 /*
  * Defines a handler to be executed on exit from cord's thread func,
  * accessible via cord()->on_exit (normally NULL). It is used to
diff --git a/src/lua/fiber.c b/src/lua/fiber.c
index a030e444d..8b3b22e55 100644
--- a/src/lua/fiber.c
+++ b/src/lua/fiber.c
@@ -33,6 +33,7 @@
 #include <fiber.h>
 #include "lua/utils.h"
 #include "backtrace.h"
+#include "tt_static.h"
 
 #include <lua.h>
 #include <lauxlib.h>
@@ -328,10 +329,8 @@ static int
 lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
 {
 	struct lua_State *L = (struct lua_State *) cb_ctx;
-	char name_buf[64];
 
-	snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
-	lua_pushstring(L, name_buf);
+	lua_pushstring(L, tt_sprintf("%u/%s", f->fid, f->name));
 
 	lua_newtable(L);
 
@@ -357,12 +356,15 @@ lbox_fiber_top(struct lua_State *L)
 			      " fiber.top_enable() first");
 	}
 	lua_newtable(L);
-	lua_pushliteral(L, "cpu misses");
+	lua_pushliteral(L, "cpu_misses");
 	lua_pushnumber(L, cord()->cpu_miss_count_last);
 	lua_settable(L, -3);
 
+	lua_pushliteral(L, "cpu");
+	lua_newtable(L);
 	lbox_fiber_top_entry(&cord()->sched, L);
 	fiber_stat(lbox_fiber_top_entry, L);
+	lua_settable(L, -3);
 
 	return 1;
 }
diff --git a/test/app/fiber.result b/test/app/fiber.result
index 3b9e5da9a..4a094939f 100644
--- a/test/app/fiber.result
+++ b/test/app/fiber.result
@@ -1478,11 +1478,11 @@ type(a)
 ...
 -- scheduler is present in fiber.top()
 -- and is indexed by name
-a["1/sched"] ~= nil
+a.cpu["1/sched"] ~= nil
 ---
 - true
 ...
-type(a["cpu misses"]) == 'number'
+type(a.cpu_misses) == 'number'
 ---
 - true
 ...
@@ -1495,14 +1495,12 @@ sum_avg = 0
 -- update table to make sure
 -- a full event loop iteration
 -- has ended
-a = fiber.top()
+a = fiber.top().cpu
 ---
 ...
 for k, v in pairs(a) do\
-    if type(v) == 'table' then\
-        sum_inst = sum_inst + v["instant"]\
-        sum_avg = sum_avg + v["average"]\
-    end\
+    sum_inst = sum_inst + v["instant"]\
+    sum_avg = sum_avg + v["average"]\
 end
 ---
 ...
@@ -1521,7 +1519,7 @@ tbl = nil
 f = fiber.new(function()\
     for i = 1,1000 do end\
     fiber.yield()\
-    tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\
+    tbl = fiber.top().cpu[fiber.self().id()..'/'..fiber.self().name()]\
 end)
 ---
 ...
diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
index ce1f55e8d..38b85d554 100644
--- a/test/app/fiber.test.lua
+++ b/test/app/fiber.test.lua
@@ -638,20 +638,18 @@ a = fiber.top()
 type(a)
 -- scheduler is present in fiber.top()
 -- and is indexed by name
-a["1/sched"] ~= nil
-type(a["cpu misses"]) == 'number'
+a.cpu["1/sched"] ~= nil
+type(a.cpu_misses) == 'number'
 sum_inst = 0
 sum_avg = 0
 
 -- update table to make sure
 -- a full event loop iteration
 -- has ended
-a = fiber.top()
+a = fiber.top().cpu
 for k, v in pairs(a) do\
-    if type(v) == 'table' then\
-        sum_inst = sum_inst + v["instant"]\
-        sum_avg = sum_avg + v["average"]\
-    end\
+    sum_inst = sum_inst + v["instant"]\
+    sum_avg = sum_avg + v["average"]\
 end
 
 sum_inst
@@ -661,7 +659,7 @@ tbl = nil
 f = fiber.new(function()\
     for i = 1,1000 do end\
     fiber.yield()\
-    tbl = fiber.top()[fiber.self().id()..'/'..fiber.self().name()]\
+    tbl = fiber.top().cpu[fiber.self().id()..'/'..fiber.self().name()]\
 end)
 while f:status() ~= 'dead' do fiber.sleep(0.01) end
 tbl["average"] > 0


[-- Attachment #2: Type: text/html, Size: 19250 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption
  2019-11-05 14:42   ` [Tarantool-patches] [tarantool-patches] " Serge Petrenko
@ 2019-11-06 14:02     ` Vladislav Shpilevoy
  0 siblings, 0 replies; 5+ messages in thread
From: Vladislav Shpilevoy @ 2019-11-06 14:02 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches

Hi! Thanks for the fixes! Now LGTM.

On 05/11/2019 17:42, Serge Petrenko wrote:
> Hi! Thanks for your review!
> I pushed the changes on the branch, the diff is below.
> 
> --
> Serge Petrenko
> sergepetrenko@tarantool.org <mailto:sergepetrenko@tarantool.org>
> 
> 
> 
> 
>> 2 нояб. 2019 г., в 20:12, Vladislav Shpilevoy <v.shpilevoy@tarantool.org <mailto:v.shpilevoy@tarantool.org>> написал(а):
>>
>> Hi! Thanks for the patch!
>>
>> See 3 comments below.
>>
>>> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
>>> index 93f22ae68..52888cc64 100644
>>> --- a/src/lib/core/fiber.c
>>> +++ b/src/lib/core/fiber.c
>>> @@ -82,6 +86,38 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>>> +#if ENABLE_FIBER_TOP
>>> +static __thread bool fiber_top_enabled = false;
>>> +
>>> +/**
>>> + * An action performed each time a context switch happens.
>>> + * Used to count each fiber's processing time.
>>> + */
>>> +static inline void
>>> +clock_set_on_csw(struct fiber *caller)
>>> +{
>>> +caller->csw++;
>>> +if (!fiber_top_enabled)
>>> +return;
>>> +
>>> +uint64_t clock;
>>> +uint32_t cpu_id;
>>> +clock = __rdtscp(&cpu_id);
>>> +
>>> +if (cpu_id == cord()->cpu_id_last) {
>>> +caller->clock_delta += clock - cord()->clock_last;
>>> +cord()->clock_delta += clock - cord()->clock_last;
>>> +} else {
>>> +cord()->cpu_id_last = cpu_id;
>>> +cord()->cpu_miss_count++;
>>> +}
>>> +cord()->clock_last = clock;
>>> +}
>>> +
>>> +#else
>>> +#define clock_set_on_csw(caller) ;
>>
>> 1. With undefined ENABLE_FIBER_TOP you don't update csw counter.
>> I would move this #if ENABLE to the clock_set_on_csw() body,
>> right after csw is incremented.
> 
> Fixed.
> 
>>
>>> +#endif /* ENABLE_FIBER_TOP */
>>> +
>>> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
>>> index 124908a05..a030e444d 100644
>>> --- a/src/lua/fiber.c
>>> +++ b/src/lua/fiber.c
>>> @@ -319,6 +323,67 @@ lbox_fiber_statof_nobt(struct fiber *f, void *cb_ctx)
>>> return lbox_fiber_statof(f, cb_ctx, false);
>>> }
>>>
>>> +#if ENABLE_FIBER_TOP
>>> +static int
>>> +lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>>> +{
>>> +struct lua_State *L = (struct lua_State *) cb_ctx;
>>> +char name_buf[64];
>>> +
>>> +snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
>>> +lua_pushstring(L, name_buf);
>>
>> 2. A piece of advice - use tt_sprintf:
>>
>>    lua_pushstring(L, tt_sprintf("%u/%s", f->fid, f->name));
> 
> Thanks! Changed.
> 
>>
>>> +
>>> +lua_newtable(L);
>>> +
>>> +lua_pushliteral(L, "average");
>>> +lua_pushnumber(L, f->clock_acc / (double)cord()->clock_acc * 100);
>>> +lua_settable(L, -3);
>>> +lua_pushliteral(L, "instant");
>>> +lua_pushnumber(L, f->clock_delta_last / (double)cord()->clock_delta_last * 100);
>>> +lua_settable(L, -3);
>>> +lua_pushliteral(L, "time");
>>> +lua_pushnumber(L, f->cputime / (double) FIBER_TIME_RES);
>>> +lua_settable(L, -3);
>>> +lua_settable(L, -3);
>>> +
>>> +return 0;
>>> +}
>>> diff --git a/test/app/fiber.result b/test/app/fiber.result
>>> index 3c6115a33..3b9e5da9a 100644
>>> --- a/test/app/fiber.result
>>> +++ b/test/app/fiber.result
>>> @@ -1462,6 +1462,91 @@ fiber.join(fiber.self())
>>> ---
>>> - error: the fiber is not joinable
>>> ...
>>> +sum = 0
>>> +---
>>> +...
>>> +-- gh-2694 fiber.top <http://fiber.top>()
>>> +fiber.top_enable()
>>> +---
>>> +...
>>> +a = fiber.top <http://fiber.top>()
>>> +---
>>> +...
>>> +type(a)
>>> +---
>>> +- table
>>> +...
>>> +-- scheduler is present in fiber.top <http://fiber.top>()
>>> +-- and is indexed by name
>>> +a["1/sched"] ~= nil
>>> +---
>>> +- true
>>> +...
>>> +type(a["cpu misses"]) == 'number'
>>> +---
>>> +- true
>>> +...
>>> +sum_inst = 0
>>> +---
>>> +...
>>> +sum_avg = 0
>>> +---
>>> +...
>>> +-- update table to make sure
>>> +-- a full event loop iteration
>>> +-- has ended
>>> +a = fiber.top <http://fiber.top>()
>>> +---
>>> +...
>>> +for k, v in pairs(a) do\
>>> +    if type(v) == 'table' then\
>>
>> 3. This looks hard to use. The fact, that one table contains
>> records totally different in their structure. I would propose
>> to return cpu misses and fibers separately:
>>
>>    fiber.top <http://fiber.top>() =
>>
>>        cpu_misses = <number>,
>>        time = [
>>            '<id>/<name>' = {...},
>>            '<id>/<name>' = {...},
>>            '<id>/<name>' = {...},
>>            ...
>>        ]
>>
>> Then you can take fiber.top <http://fiber.top>().time and be sure, that all
>> records here have the same structure. As far as I remember
>> we already had similar problems with other statistics, so
>> it is better to design it know in the most extendible way.
>> And it will be easier to add new global statistics to the
>> top in future.
>>
> 
> Good point. I named the subtable `cpu` instead of `time`. It makes more
> sense imo. Also renamed `cpu misses` to `cpu_misses` so that it can be
> accessed as fiber.top <http://fiber.top>().cpu_misses
> 
> 
> diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
> index 52888cc64..aebaba7f0 100644
> --- a/src/lib/core/fiber.c
> +++ b/src/lib/core/fiber.c
> @@ -88,6 +88,7 @@ static int (*fiber_invoke)(fiber_func f, va_list ap);
>  
>  #if ENABLE_FIBER_TOP
>  static __thread bool fiber_top_enabled = false;
> +#endif /* ENABLE_FIBER_TOP */
>  
>  /**
>   * An action performed each time a context switch happens.
> @@ -97,6 +98,8 @@ static inline void
>  clock_set_on_csw(struct fiber *caller)
>  {
>  caller->csw++;
> +
> +#if ENABLE_FIBER_TOP
>  if (!fiber_top_enabled)
>  return;
>  
> @@ -112,12 +115,10 @@ clock_set_on_csw(struct fiber *caller)
>  cord()->cpu_miss_count++;
>  }
>  cord()->clock_last = clock;
> -}
> -
> -#else
> -#define clock_set_on_csw(caller) ;
>  #endif /* ENABLE_FIBER_TOP */
>  
> +}
> +
>  /*
>   * Defines a handler to be executed on exit from cord's thread func,
>   * accessible via cord()->on_exit (normally NULL). It is used to
> diff --git a/src/lua/fiber.c b/src/lua/fiber.c
> index a030e444d..8b3b22e55 100644
> --- a/src/lua/fiber.c
> +++ b/src/lua/fiber.c
> @@ -33,6 +33,7 @@
>  #include <fiber.h>
>  #include "lua/utils.h"
>  #include "backtrace.h"
> +#include "tt_static.h"
>  
>  #include <lua.h>
>  #include <lauxlib.h>
> @@ -328,10 +329,8 @@ static int
>  lbox_fiber_top_entry(struct fiber *f, void *cb_ctx)
>  {
>  struct lua_State *L = (struct lua_State *) cb_ctx;
> -char name_buf[64];
>  
> -snprintf(name_buf, sizeof(name_buf), "%u/%s", f->fid, f->name);
> -lua_pushstring(L, name_buf);
> +lua_pushstring(L, tt_sprintf("%u/%s", f->fid, f->name));
>  
>  lua_newtable(L);
>  
> @@ -357,12 +356,15 @@ lbox_fiber_top(struct lua_State *L)
>        " fiber.top_enable() first");
>  }
>  lua_newtable(L);
> -lua_pushliteral(L, "cpu misses");
> +lua_pushliteral(L, "cpu_misses");
>  lua_pushnumber(L, cord()->cpu_miss_count_last);
>  lua_settable(L, -3);
>  
> +lua_pushliteral(L, "cpu");
> +lua_newtable(L);
>  lbox_fiber_top_entry(&cord()->sched, L);
>  fiber_stat(lbox_fiber_top_entry, L);
> +lua_settable(L, -3);
>  
>  return 1;
>  }
> diff --git a/test/app/fiber.result b/test/app/fiber.result
> index 3b9e5da9a..4a094939f 100644
> --- a/test/app/fiber.result
> +++ b/test/app/fiber.result
> @@ -1478,11 +1478,11 @@ type(a)
>  ...
>  -- scheduler is present in fiber.top <http://fiber.top>()
>  -- and is indexed by name
> -a["1/sched"] ~= nil
> +a.cpu["1/sched"] ~= nil
>  ---
>  - true
>  ...
> -type(a["cpu misses"]) == 'number'
> +type(a.cpu_misses) == 'number'
>  ---
>  - true
>  ...
> @@ -1495,14 +1495,12 @@ sum_avg = 0
>  -- update table to make sure
>  -- a full event loop iteration
>  -- has ended
> -a = fiber.top <http://fiber.top>()
> +a = fiber.top <http://fiber.top>().cpu
>  ---
>  ...
>  for k, v in pairs(a) do\
> -    if type(v) == 'table' then\
> -        sum_inst = sum_inst + v["instant"]\
> -        sum_avg = sum_avg + v["average"]\
> -    end\
> +    sum_inst = sum_inst + v["instant"]\
> +    sum_avg = sum_avg + v["average"]\
>  end
>  ---
>  ...
> @@ -1521,7 +1519,7 @@ tbl = nil
>  f = fiber.new(function()\
>      for i = 1,1000 do end\
>      fiber.yield()\
> -    tbl = fiber.top <http://fiber.top>()[fiber.self().id()..'/'..fiber.self().name()]\
> +    tbl = fiber.top <http://fiber.top>().cpu[fiber.self().id()..'/'..fiber.self().name()]\
>  end)
>  ---
>  ...
> diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
> index ce1f55e8d..38b85d554 100644
> --- a/test/app/fiber.test.lua
> +++ b/test/app/fiber.test.lua
> @@ -638,20 +638,18 @@ a = fiber.top <http://fiber.top>()
>  type(a)
>  -- scheduler is present in fiber.top <http://fiber.top>()
>  -- and is indexed by name
> -a["1/sched"] ~= nil
> -type(a["cpu misses"]) == 'number'
> +a.cpu["1/sched"] ~= nil
> +type(a.cpu_misses) == 'number'
>  sum_inst = 0
>  sum_avg = 0
>  
>  -- update table to make sure
>  -- a full event loop iteration
>  -- has ended
> -a = fiber.top <http://fiber.top>()
> +a = fiber.top <http://fiber.top>().cpu
>  for k, v in pairs(a) do\
> -    if type(v) == 'table' then\
> -        sum_inst = sum_inst + v["instant"]\
> -        sum_avg = sum_avg + v["average"]\
> -    end\
> +    sum_inst = sum_inst + v["instant"]\
> +    sum_avg = sum_avg + v["average"]\
>  end
>  
>  sum_inst
> @@ -661,7 +659,7 @@ tbl = nil
>  f = fiber.new(function()\
>      for i = 1,1000 do end\
>      fiber.yield()\
> -    tbl = fiber.top <http://fiber.top>()[fiber.self().id()..'/'..fiber.self().name()]\
> +    tbl = fiber.top <http://fiber.top>().cpu[fiber.self().id()..'/'..fiber.self().name()]\
>  end)
>  while f:status() ~= 'dead' do fiber.sleep(0.01) end
>  tbl["average"] > 0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Tarantool-patches] [tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption
  2019-11-01 14:05 [Tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption Serge Petrenko
  2019-11-02 17:12 ` Vladislav Shpilevoy
@ 2019-11-09  6:53 ` Kirill Yukhin
  1 sibling, 0 replies; 5+ messages in thread
From: Kirill Yukhin @ 2019-11-09  6:53 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, tarantool-patches

Hello,

On 01 ноя 17:05, Serge Petrenko wrote:
> Implement a new function in Lua fiber library: top(). It returns a table
> of alive fibers (including the scheduler). Each table entry has two
> fields: average cpu consumption, which is calculated with exponential
> moving average over event loop iterations, and current cpu consumption,
> which shows fiber's cpu usage over the last event loop iteration.
> The patch relies on CPU timestamp counter to measure each fiber's time
> share.
> 
> Closes #2694
> 
> @TarantoolBot document
> Title: fiber: new function `fiber.top()`
> 
> `fiber.top()` returns a table of all alive fibers and lists their cpu
> consumption. Let's take a look at the example:
> ```
> tarantool> fiber.top()
> ---
> - 104/lua:
>     instant: 18.433514726042
>     time: 0.677505865
>     average: 21.98826143184
>   103/lua:
>     instant: 19.131392015951
>     time: 0.689521917
>     average: 20.807772656431
>   107/lua:
>     instant: 18.624600174469
>     time: 0.681585168
>     average: 17.78194117452
>   101/on_shutdown:
>     instant: 0
>     time: 0
>     average: 0
>   105/lua:
>     instant: 18.562289702156
>     time: 0.682085309
>     average: 15.513811055476
>   106/lua:
>     instant: 18.441822789017
>     time: 0.677320271
>     average: 15.427595583115
>   102/interactive:
>     instant: 0
>     time: 0.000367182
>     average: 0
>   cpu misses: 0
>   1/sched:
>     instant: 6.8063805923649
>     time: 0.253035056
>     average: 8.3479789103691
> ...
> 
> ```
> In the table above keys are strings containing fiber ids and names
> (the only exception is a single 'cpu misses' key which indicates the
> number of times tx thread was rescheduled on a different cpu core.
> More on that later).
> The three metrics available for each fiber are:
> 1) instant (per cent),
> which indicates the share of time fiber was executing during the
> previous event loop iteration
> 2) average (per cent), which is calculated as an exponential moving
> average of `instant` values over all previous event loop iterations.
> 3) time (seconds), which estimates how much cpu time each fiber spent
> processing during its lifetime.
> 
> More info on `cpu misses` field returned by `fiber.top()`:
> `cpu misses` indicates the amount of times tx thread detected it was
> rescheduled on a different cpu core during the last event loop
> iteration.
> fiber.top() uses cpu timestamp counter to measure each fiber's execution
> time. However, each cpu core may have its own counter value (you can
> only rely on counter deltas if both measurements were taken on the same
> core, otherwise the delta may even get negative).
> When tx thread is rescheduled to a different cpu core, tarantool just
> assumes cpu delta was zero for the latest measurement. This loweres
> precision of our computations, so the bigger `cpu misses` value the
> lower the precision of fiber.top() results.
> 
> Fiber.top() doesn't work on arm architecture at the moment.
> 
> Please note, that enabling fiber.top() slows down fiber switching by
> about 15 per cent, so it is disabled by default.
> To enable it you need to issue `fiber.top_enable()`.
> You can disable it back after you finished debugging  using
> `fiber.top_disable()`.
> "Time" entry is also added to each fibers output in fiber.info()
> (it duplicates "time" entry from fiber.top()).
> Note, that "time" is only counted while fiber.top is enabled.
> ---
> https://github.com/tarantool/tarantool/issues/2694
> https://github.com/tarantool/tarantool/tree/sp/gh-2694-fiber-top

I've checked your patch into master.

--
Regards, Kirill Yukhin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-09  6:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-01 14:05 [Tarantool-patches] [PATCH v3] lua: add fiber.top() listing fiber cpu consumption Serge Petrenko
2019-11-02 17:12 ` Vladislav Shpilevoy
2019-11-05 14:42   ` [Tarantool-patches] [tarantool-patches] " Serge Petrenko
2019-11-06 14:02     ` Vladislav Shpilevoy
2019-11-09  6:53 ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox