From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6836A42EF5C for ; Mon, 29 Jun 2020 15:19:00 +0300 (MSK) Date: Mon, 29 Jun 2020 15:18:08 +0300 From: Alexander Turenko Message-ID: <20200629121808.x57t46jhbobagvzd@tkn_work_nb> References: <1bc869413c7d116ba5fcdd2a05c756729fedd7d4.1593084206.git.avtikhon@tarantool.org> <340e8840f089856d52fcda52a925aa2b0f3437b6.1593084206.git.avtikhon@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <340e8840f089856d52fcda52a925aa2b0f3437b6.1593084206.git.avtikhon@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH v2 2/2] test: fix flaky box/net.box_readahead_gh-3958 test List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" Cc: tarantool-patches@dev.tarantool.org Please, polish the commit message. The fix itself is okay. It would be counter-productive to play ping-pong around the problem description, so let's do your best and ask Kirill to push. No need to review the messages with me again. I'm okay after clarification of details pointed below. WBR, Alexaner Turenko. On Thu, Jun 25, 2020 at 02:25:11PM +0300, Alexander V. Tikhonov wrote: > Issue: > > [014] --- box/net.box_readahead_gh-3958.result Mon Jun 15 15:33:23 2020 > [014] +++ box/net.box_readahead_gh-3958.reject Tue Jun 16 02:24:04 2020 > [014] @@ -46,6 +46,7 @@ > [014] ... > [014] test_run:wait_log('default', 'readahead limit is reached', 1024, 0.1) > [014] --- > [014] +- readahead limit is reached > [014] ... > [014] s:drop() > [014] --- > [014] Okay. > [014] Last 15 lines of Tarantool Log file [Instance "box"][/tarantool/test/var/014_box/box.log]: > [014] 2020-06-16 02:24:03.792 [5585] main/121/console/unix/: I> set 'read_only' configuration option to false > [014] 2020-06-16 02:24:03.834 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > [014] 2020-06-16 02:24:03.951 [5585] main/121/console/unix/: space.h:336 E> ER_NO_SUCH_INDEX_ID: No index #1 is defined in space '_space' > [014] 2020-06-16 02:24:04.180 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 128 > [014] 2020-06-16 02:24:04.183 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 102400 > [014] 2020-06-16 02:24:04.189 [5585] main/453/console/unix/: I> set 'readahead' configuration option to 16320 I think this block is redundant: you already show above that wait_log() found the warning re readahead, while it should not. And provided interpretation below. > > Found that the root cause of the issue, was the previously run test > 'box/net.box_call_blocks_gh-946.test.lua' on the same worker, in this > case the log output mistakenly checked by wait_log/grep_log test_run > function, which finds the grepping string in the log of the previous > test. It is okay. > To avoid of it the tests can be swapped in worker running queue > and in this case both tests pass, check swapped log output: I guess you want to say that when tests are run on different workers or in another order, everything works. Initially I don't get what is 'swapping in worker running queue' and how I can do it. Too much specific terms to express different ordering, IMHO. > > 2020-06-17 10:57:39.881 [69372] main C> entering the event loop > 2020-06-17 10:57:39.896 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 128 > 2020-06-17 10:57:39.898 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 102400 > 2020-06-17 10:57:40.003 [69372] main/156/console/unix/: I> set 'readahead' configuration option to 16320 > 2020-06-17 10:57:40.053 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.063 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached > 2020-06-17 10:57:40.067 [69372] main C> got signal 15 - Terminated I don't think that this big log is useful. Proper description what is going on is enough. > > Also found that 'readahead' issue from the first test blocks its > printing to log file due to suppressed. To fix this issue the > default server must be restarted at the very start of the test. I know the context, but I would not understand it from the message if I would not know already. You describe another possible fix with printing a 1 KiB log entry before a test case and then check a last 1 KiB of the log for the 'readahead limit reached' message. You should describe it (maybe even provide a diff) and describe the reason why it does not work: ratelimiting of such messages may lead to disappearing some amount of messages when its pressure is descreased below the limit (nothing triggers printing of suspended messages if such messages are not triggered anymore). Let's experiment with this and describe your observations. > > Closes #5082 > --- > > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5082-net-box-readahead > Issue: https://github.com/tarantool/tarantool/issues/5082 > index 1e33a84cb..f34240b65 100644 > --- a/test/box/net.box_readahead_gh-3958.test.lua > +++ b/test/box/net.box_readahead_gh-3958.test.lua > @@ -1,4 +1,5 @@ > test_run = require('test_run').new() > +test_run:cmd('restart server default') > net = require('net.box') Okay. But I would leave a short comment here, like 'discard all logs from previous tests and clear log ratelimiting statistics'.