Hi! Thanks for your reply! -- Serge Petrenko sergepetrenko@tarantool.org > 20 нояб. 2019 г., в 4:44, Alexander Turenko написал(а): > > It still shows miscompares for me: > > $ cat /proc/cpuinfo | grep processor | wc -l > 8 > $ (cd test && ./test-run.py $(yes fiber.test.lua | head -n 1000)) > > commit aefc64faaf7bfbb58ac53bf5abea55d1faabcbfe > > The miscompare looks so: > > [006] Test failed! Result content mismatch: > [006] --- app/fiber.result Tue Nov 19 20:39:21 2019 > [006] +++ app/fiber.reject Wed Nov 20 04:33:39 2019 > [006] @@ -1525,7 +1525,7 @@ > [006] -- disable the test above. > [006] sum_avg <= 100 or sum_avg > [006] --- > [006] -- true > [006] +- 100 > [006] ... > [006] -- not exact due to accumulated integer division errors > [006] --sum_avg > 99 and sum_avg <= 100 or sum_avg > > Maybe it is some rounding issue. > > When I changed the condition to sum_avg <= 101, I got: > > [015] Test failed! Result content mismatch: > [015] --- app/fiber.result Wed Nov 20 04:35:14 2019 > [015] +++ app/fiber.reject Wed Nov 20 04:35:36 2019 > [015] @@ -1512,7 +1512,7 @@ > [015] ... > [015] sum_inst > [015] --- > [015] -- 100 > [015] +- 45.802853351342 > [015] ... > [015] -- when a fiber dies, its impact on the thread moving average > [015] -- persists for a couple of ev loop iterations, but it is no > > Sorry, but it does not look okay for me in the sense that it may lead to > problems in testing: we already have enough ones. > > Let's provide a test that will work stable or describe why it is not > possible. In the latter case we can extract an unstable test case and > mark it as fragile using suite.ini test-run's option. It will not > strictly close the problem, but will lower its probability. > > Maybe we can even retry such tests or allows them to fail: > https://github.com/tarantool/test-run/issues/189 > > Anyway, if those fixes descrease probability of fails, I don't mind. > Just noted that a further work may be needed. Well, yes, they do. Just try to run this test before the patch. Anyway, I guess you’re right that we shouldn’t leave flaky tests if possible, so I commented the flaky pieces out and added a request to reenable them once #4625 (List dead fibers in fiber.top () output) is implemented. The diff’s below. Now 1000 out of 1000 tests pass with 16 concurrent jobs, at least on my machine. > > WBR, Alexander Turenko. > > On Tue, Nov 19, 2019 at 11:57:11PM +0100, Vladislav Shpilevoy wrote: >> Hi! Thanks for the fixes! >> >> LGTM. >> >> On 18/11/2019 17:05, Serge Petrenko wrote: >>> The first patch factors all the clock-related members of struct fiber and struct >>> cord into appropriate structs and adds methods for their updates and resets. >>> >>> The second patch fixes exponential moving average calculation so that we do not >>> experience huge numbers in average load percentage calculations. >>> >>> The third patch alters fiber.top() test to wait for correct output before >>> testing it. >>> >>> Follow-up https://github.com/tarantool/tarantool/issues/2694 >>> Branch https://github.com/tarantool/tarantool/tree/sp/gh-2694-test-fixup >>> >>> Changes in v3: >>> - introduce a new patch which refactors all the clock stat >>> handling. >>> - introduce a patch fixing EMA calculation >>> - review fixes as per review from Vladislav >>> >>> Changes in v2: >>> - clean up all fibers clock stats on fiber.top_enable() >>> - push 0 instead of NaN when cord clock_delta_last is 0 >>> - review fixes as per review from Vladislav >>> >>> Serge Petrenko (3): >>> fiber.top() refactor clock and cpu time calculation >>> fiber.top(): alter exponential moving average calculation >>> app/fiber: wait till a full event loop iteration ends >>> >>> src/lib/core/fiber.c | 175 ++++++++++++++++++++++++---------------- >>> src/lib/core/fiber.h | 109 +++++++++++++++++-------- >>> src/lua/fiber.c | 20 +++-- >>> test/app/fiber.result | 38 ++++++--- >>> test/app/fiber.test.lua | 36 +++++++-- >>> 5 files changed, 252 insertions(+), 126 deletions(-) >>> diff --git a/test/app/fiber.result b/test/app/fiber.result index b767bbb59..6d9604ad8 100644 --- a/test/app/fiber.result +++ b/test/app/fiber.result @@ -1510,25 +1510,21 @@ for k, v in pairs(a) do\ end --- ... -sum_inst ---- -- 100 -... -- when a fiber dies, its impact on the thread moving average -- persists for a couple of ev loop iterations, but it is no -- longer listed in fiber.top(). So sum_avg may way smaller than --- 100%. See gh-4625 for details and reenable the test below as +-- 100%. See gh-4625 for details and reenable both tests below as -- soon as it is implemented. -- In rare cases when a fiber dies on the same event loop -- iteration as you issue fiber.top(), sum_inst will also be --- smaller than 100%, but it is so rare I don't even want to --- disable the test above. -sum_avg <= 100 or sum_avg +-- smaller than 100%. +-- sum_inst +sum_avg <= 100.1 or sum_avg --- - true ... -- not exact due to accumulated integer division errors ---sum_avg > 99 and sum_avg <= 100 or sum_avg +--sum_avg > 99 and sum_avg <= 100.1 or sum_avg tbl = nil --- ... diff --git a/test/app/fiber.test.lua b/test/app/fiber.test.lua index 7f7350acc..6df210d9c 100644 --- a/test/app/fiber.test.lua +++ b/test/app/fiber.test.lua @@ -658,19 +658,18 @@ for k, v in pairs(a) do\ sum_avg = sum_avg + v["average"]\ end -sum_inst -- when a fiber dies, its impact on the thread moving average -- persists for a couple of ev loop iterations, but it is no -- longer listed in fiber.top(). So sum_avg may way smaller than --- 100%. See gh-4625 for details and reenable the test below as +-- 100%. See gh-4625 for details and reenable both tests below as -- soon as it is implemented. -- In rare cases when a fiber dies on the same event loop -- iteration as you issue fiber.top(), sum_inst will also be --- smaller than 100%, but it is so rare I don't even want to --- disable the test above. -sum_avg <= 100 or sum_avg +-- smaller than 100%. +-- sum_inst +sum_avg <= 100.1 or sum_avg -- not exact due to accumulated integer division errors ---sum_avg > 99 and sum_avg <= 100 or sum_avg +--sum_avg > 99 and sum_avg <= 100.1 or sum_avg tbl = nil f = fiber.new(function()\ local fiber_key = fiber.self().id()..'/'..fiber.self().name()\