[Tarantool-patches] [PATCH v2 2/2] test: fix flaky box/net.box_readahead_gh-3958 test

Alexander Turenko alexander.turenko at tarantool.org
Mon Jun 29 15:18:08 MSK 2020


Please, polish the commit message. The fix itself is okay.

It would be counter-productive to play ping-pong around the problem
description, so let's do your best and ask Kirill to push. No need to
review the messages with me again.

I'm okay after clarification of details pointed below.

WBR, Alexaner Turenko.

On Thu, Jun 25, 2020 at 02:25:11PM +0300, Alexander V. Tikhonov wrote:
> Issue:
> 
> [014] --- box/net.box_readahead_gh-3958.result Mon Jun 15 15:33:23 2020
> [014] +++ box/net.box_readahead_gh-3958.reject Tue Jun 16 02:24:04 2020
> [014] @@ -46,6 +46,7 @@
> [014]  ...
> [014]  test_run:wait_log('default', 'readahead limit is reached', 1024, 0.1)
> [014]  ---
> [014] +- readahead limit is reached
> [014]  ...
> [014]  s:drop()
> [014]  ---
> [014]

Okay.

> [014] Last 15 lines of Tarantool Log file [Instance "box"][/tarantool/test/var/014_box/box.log]:
> [014] 2020-06-16 02:24:03.792 [5585] main/121/console/unix/: I> set 'read_only' configuration option to false
> [014] 2020-06-16 02:24:03.834 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.835 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.836 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.837 [5585] iproto iproto.cc:606 W> stopping input on connection fd 26, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> [014] 2020-06-16 02:24:03.951 [5585] main/121/console/unix/: space.h:336 E> ER_NO_SUCH_INDEX_ID: No index #1 is defined in space '_space'
> [014] 2020-06-16 02:24:04.180 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 128
> [014] 2020-06-16 02:24:04.183 [5585] main/121/console/unix/: I> set 'readahead' configuration option to 102400
> [014] 2020-06-16 02:24:04.189 [5585] main/453/console/unix/: I> set 'readahead' configuration option to 16320

I think this block is redundant: you already show above that wait_log()
found the warning re readahead, while it should not. And provided
interpretation below.

> 
> Found that the root cause of the issue, was the previously run test
> 'box/net.box_call_blocks_gh-946.test.lua' on the same worker, in this
> case the log output mistakenly checked by wait_log/grep_log test_run
> function, which finds the grepping string in the log of the previous
> test.

It is okay.

> To avoid of it the tests can be swapped in worker running queue
> and in this case both tests pass, check swapped log output:

I guess you want to say that when tests are run on different workers or
in another order, everything works. Initially I don't get what is
'swapping in worker running queue' and how I can do it. Too much
specific terms to express different ordering, IMHO.

> 
> 2020-06-17 10:57:39.881 [69372] main C> entering the event loop
> 2020-06-17 10:57:39.896 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 128
> 2020-06-17 10:57:39.898 [69372] main/119/console/unix/: I> set 'readahead' configuration option to 102400
> 2020-06-17 10:57:40.003 [69372] main/156/console/unix/: I> set 'readahead' configuration option to 16320
> 2020-06-17 10:57:40.053 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.056 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.058 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.061 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.062 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.063 [69372] iproto iproto.cc:606 W> stopping input on connection fd 33, aka unix/:(socket), peer of unix/:(socket), readahead limit is reached
> 2020-06-17 10:57:40.067 [69372] main C> got signal 15 - Terminated

I don't think that this big log is useful. Proper description what is
going on is enough.

> 
> Also found that 'readahead' issue from the first test blocks its
> printing to log file due to suppressed. To fix this issue the
> default server must be restarted at the very start of the test.

I know the context, but I would not understand it from the message if I
would not know already.

You describe another possible fix with printing a 1 KiB log entry before
a test case and then check a last 1 KiB of the log for the 'readahead
limit reached' message.

You should describe it (maybe even provide a diff) and describe the
reason why it does not work: ratelimiting of such messages may lead to
disappearing some amount of messages when its pressure is descreased
below the limit (nothing triggers printing of suspended messages if such
messages are not triggered anymore). Let's experiment with this and
describe your observations.

> 
> Closes #5082
> ---
> 
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5082-net-box-readahead
> Issue: https://github.com/tarantool/tarantool/issues/5082

> index 1e33a84cb..f34240b65 100644
> --- a/test/box/net.box_readahead_gh-3958.test.lua
> +++ b/test/box/net.box_readahead_gh-3958.test.lua
> @@ -1,4 +1,5 @@
>  test_run = require('test_run').new()
> +test_run:cmd('restart server default')
>  net = require('net.box')

Okay. But I would leave a short comment here, like 'discard all logs
from previous tests and clear log ratelimiting statistics'.


More information about the Tarantool-patches mailing list