From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng3.m.smailru.net (smtpng3.m.smailru.net [94.100.177.149]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 82EAC440F3C for ; Wed, 20 Nov 2019 15:19:06 +0300 (MSK) Date: Wed, 20 Nov 2019 15:19:02 +0300 From: Alexander Turenko Message-ID: <20191120121901.zzcegkygtkh62l5m@tkn_work_nb> References: <9bea3bba-ead8-490e-c9f5-aaf385c2637f@tarantool.org> <20191120014409.f7r2inhv5on4watf@tkn_work_nb> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v3 0/3] fiber.top(): minor fixup List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org, Vladislav Shpilevoy On Wed, Nov 20, 2019 at 11:29:26AM +0300, Serge Petrenko wrote: > Hi! Thanks for your reply! > -- > Serge Petrenko > sergepetrenko@tarantool.org > > > > > > 20 нояб. 2019 г., в 4:44, Alexander Turenko написал(а): > > > > It still shows miscompares for me: > > > > $ cat /proc/cpuinfo | grep processor | wc -l > > 8 > > $ (cd test && ./test-run.py $(yes fiber.test.lua | head -n 1000)) > > > > commit aefc64faaf7bfbb58ac53bf5abea55d1faabcbfe > > > > The miscompare looks so: > > > > [006] Test failed! Result content mismatch: > > [006] --- app/fiber.result Tue Nov 19 20:39:21 2019 > > [006] +++ app/fiber.reject Wed Nov 20 04:33:39 2019 > > [006] @@ -1525,7 +1525,7 @@ > > [006] -- disable the test above. > > [006] sum_avg <= 100 or sum_avg > > [006] --- > > [006] -- true > > [006] +- 100 > > [006] ... > > [006] -- not exact due to accumulated integer division errors > > [006] --sum_avg > 99 and sum_avg <= 100 or sum_avg > > > > Maybe it is some rounding issue. > > > > When I changed the condition to sum_avg <= 101, I got: > > > > [015] Test failed! Result content mismatch: > > [015] --- app/fiber.result Wed Nov 20 04:35:14 2019 > > [015] +++ app/fiber.reject Wed Nov 20 04:35:36 2019 > > [015] @@ -1512,7 +1512,7 @@ > > [015] ... > > [015] sum_inst > > [015] --- > > [015] -- 100 > > [015] +- 45.802853351342 > > [015] ... > > [015] -- when a fiber dies, its impact on the thread moving average > > [015] -- persists for a couple of ev loop iterations, but it is no > > > > Sorry, but it does not look okay for me in the sense that it may lead to > > problems in testing: we already have enough ones. > > > > Let's provide a test that will work stable or describe why it is not > > possible. In the latter case we can extract an unstable test case and > > mark it as fragile using suite.ini test-run's option. It will not > > strictly close the problem, but will lower its probability. > > > > Maybe we can even retry such tests or allows them to fail: > > https://github.com/tarantool/test-run/issues/189 > > > > Anyway, if those fixes descrease probability of fails, I don't mind. > > Just noted that a further work may be needed. > > Well, yes, they do. Just try to run this test before the patch. > Anyway, I guess you’re right that we shouldn’t leave flaky tests if > possible, so I commented the flaky pieces out and added a request to > reenable them once #4625 (List dead fibers in fiber.top () output) is > implemented. > > The diff’s below. Now 1000 out of 1000 tests pass with 16 concurrent > jobs, at least on my machine. Verified 79579e399ff08017581ba58895d06bd7da54f67d. There are no more fails on fiber top related cases. So, I'm okay now (it is not formal LGTM, because I did not review for the whole patchset, just my personal feeling). NB: There are fails that are unrelated to fiber top: [012] --- app/fiber.result Wed Nov 20 14:39:10 2019 [012] +++ app/fiber.reject Wed Nov 20 14:40:57 2019 [012] @@ -675,8 +675,8 @@ [012] -- attempt to access local storage of dead fiber raises error [012] pcall(function(f) return f.storage end, f) [012] --- [012] -- false [012] -- '[string "return pcall(function(f) return f.storage end..."]:1: the fiber is dead' [012] +- true [012] +- key: some value [012] ... [012] -- [012] -- Test that local storage is garbage collected when fiber is died I run 1000 jobs in 16 threads three times and got 5, 8 and 6 fails. It is not about fiber top, just noted.