Hi! Thanks for your reply!
--
Serge Petrenko
sergepetrenko@tarantool.org




20 нояб. 2019 г., в 4:44, Alexander Turenko <alexander.turenko@tarantool.org> написал(а):

It still shows miscompares for me:

$ cat /proc/cpuinfo | grep processor | wc -l
8
$ (cd test && ./test-run.py $(yes fiber.test.lua | head -n 1000))

commit aefc64faaf7bfbb58ac53bf5abea55d1faabcbfe

The miscompare looks so:

[006] Test failed! Result content mismatch:
[006] --- app/fiber.result Tue Nov 19 20:39:21 2019
[006] +++ app/fiber.reject Wed Nov 20 04:33:39 2019
[006] @@ -1525,7 +1525,7 @@
[006]  -- disable the test above.
[006]  sum_avg <= 100 or sum_avg
[006]  ---
[006] -- true
[006] +- 100
[006]  ...
[006]  -- not exact due to accumulated integer division errors
[006]  --sum_avg > 99 and sum_avg <= 100 or sum_avg

Maybe it is some rounding issue.

When I changed the condition to sum_avg <= 101, I got:

[015] Test failed! Result content mismatch:
[015] --- app/fiber.result Wed Nov 20 04:35:14 2019
[015] +++ app/fiber.reject Wed Nov 20 04:35:36 2019
[015] @@ -1512,7 +1512,7 @@
[015]  ...
[015]  sum_inst
[015]  ---
[015] -- 100
[015] +- 45.802853351342
[015]  ...
[015]  -- when a fiber dies, its impact on the thread moving average
[015]  -- persists for a couple of ev loop iterations, but it is no

Sorry, but it does not look okay for me in the sense that it may lead to
problems in testing: we already have enough ones.

Let's provide a test that will work stable or describe why it is not
possible. In the latter case we can extract an unstable test case and
mark it as fragile using suite.ini test-run's option. It will not
strictly close the problem, but will lower its probability.

Maybe we can even retry such tests or allows them to fail:
https://github.com/tarantool/test-run/issues/189

Anyway, if those fixes descrease probability of fails, I don't mind.
Just noted that a further work may be needed.

Well, yes, they do. Just try to run this test before the patch.
Anyway, I guess you’re right that we shouldn’t leave flaky tests if
possible, so I commented the flaky pieces out and added a request to
reenable them once #4625 (List dead fibers in fiber.top() output) is
implemented.

The diff’s below. Now 1000 out of 1000 tests pass with 16 concurrent
jobs, at least on my machine.



WBR, Alexander Turenko.

On Tue, Nov 19, 2019 at 11:57:11PM +0100, Vladislav Shpilevoy wrote:
Hi! Thanks for the fixes!

LGTM.

On 18/11/2019 17:05, Serge Petrenko wrote:
The first patch factors all the clock-related members of struct fiber and struct
cord into appropriate structs and adds methods for their updates and resets.

The second patch fixes exponential moving average calculation so that we do not
experience huge numbers in average load percentage calculations.

The third patch alters fiber.top() test to wait for correct output before
testing it.

Follow-up https://github.com/tarantool/tarantool/issues/2694
Branch https://github.com/tarantool/tarantool/tree/sp/gh-2694-test-fixup

Changes in v3:
- introduce a new patch which refactors all the clock stat
  handling.
- introduce a patch fixing EMA calculation
- review fixes as per review from Vladislav

Changes in v2:
- clean up all fibers clock stats on fiber.top_enable()
- push 0 instead of NaN when cord clock_delta_last is 0
- review fixes as per review from Vladislav

Serge Petrenko (3):
 fiber.top() refactor clock and cpu time calculation
 fiber.top(): alter exponential moving average calculation
 app/fiber: wait till a full event loop iteration ends

src/lib/core/fiber.c    | 175 ++++++++++++++++++++++++----------------
src/lib/core/fiber.h    | 109 +++++++++++++++++--------
src/lua/fiber.c         |  20 +++--
test/app/fiber.result   |  38 ++++++---
test/app/fiber.test.lua |  36 +++++++--
5 files changed, 252 insertions(+), 126 deletions(-)



diff --git a/test/app/fiber.result b/test/app/fiber.result
index b767bbb59..6d9604ad8 100644
--- a/test/app/fiber.result
+++ b/test/app/fiber.result
@@ -1510,25 +1510,21 @@ for k, v in pairs(a) do\
 end
 ---
 ...
-sum_inst
----
-- 100
-...
 -- when a fiber dies, its impact on the thread moving average
 -- persists for a couple of ev loop iterations, but it is no
 -- longer listed in fiber.top(). So sum_avg may way smaller than
--- 100%. See gh-4625 for details and reenable the test below as
+-- 100%. See gh-4625 for details and reenable both tests below as
 -- soon as it is implemented.
 -- In rare cases when a fiber dies on the same event loop
 -- iteration as you issue fiber.top(), sum_inst will also be
--- smaller than 100%, but it is so rare I don't even want to
--- disable the test above.
-sum_avg <= 100 or sum_avg
+-- smaller than 100%.
+-- sum_inst
+sum_avg <= 100.1 or sum_avg
 ---
 - true
 ...
 -- not exact due to accumulated integer division errors
---sum_avg > 99 and sum_avg <= 100 or sum_avg
+--sum_avg > 99 and sum_avg <= 100.1 or sum_avg
 tbl = nil
 ---
 ...
diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua
index 7f7350acc..6df210d9c 100644
--- a/test/app/fiber.test.lua
+++ b/test/app/fiber.test.lua
@@ -658,19 +658,18 @@ for k, v in pairs(a) do\
     sum_avg = sum_avg + v["average"]\
 end

 

-sum_inst
 -- when a fiber dies, its impact on the thread moving average
 -- persists for a couple of ev loop iterations, but it is no
 -- longer listed in fiber.top(). So sum_avg may way smaller than
--- 100%. See gh-4625 for details and reenable the test below as
+-- 100%. See gh-4625 for details and reenable both tests below as
 -- soon as it is implemented.
 -- In rare cases when a fiber dies on the same event loop
 -- iteration as you issue fiber.top(), sum_inst will also be
--- smaller than 100%, but it is so rare I don't even want to
--- disable the test above.
-sum_avg <= 100 or sum_avg
+-- smaller than 100%.
+-- sum_inst
+sum_avg <= 100.1 or sum_avg
 -- not exact due to accumulated integer division errors
---sum_avg > 99 and sum_avg <= 100 or sum_avg
+--sum_avg > 99 and sum_avg <= 100.1 or sum_avg
 tbl = nil
 f = fiber.new(function()\
     local fiber_key = fiber.self().id()..'/'..fiber.self().name()\