Tarantool development patches archive
 help / color / mirror / Atom feed
From: Alexander Turenko <alexander.turenko@tarantool.org>
To: "Alexander V. Tikhonov" <avtikhon@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1] test: fix flaky replication/wal_rw_stress.test.lua
Date: Thu, 18 Jun 2020 23:50:46 +0300	[thread overview]
Message-ID: <20200618205046.hklilhvpapongixz@tkn_work_nb> (raw)
In-Reply-To: <2074a5617eb0da1c16830aab2f64f51f22ecb9bf.1592231572.git.avtikhon@tarantool.org>

TL;DR: Can you verify that the problem we want to detect with the test
still may be detected after the fix?

(More details are below.)

WBR, Alexander Turenko.

> diff --git a/test/replication/wal_rw_stress.test.lua b/test/replication/wal_rw_stress.test.lua
> index 08570b285..48d68c5ac 100644
> --- a/test/replication/wal_rw_stress.test.lua
> +++ b/test/replication/wal_rw_stress.test.lua
> @@ -38,7 +38,7 @@ test_run:cmd("setopt delimiter ''");
>  -- are running in different threads, there shouldn't be any rw errors.
>  test_run:cmd("switch replica")
>  box.cfg{replication = replication}
> -box.info.replication[1].downstream.status ~= 'stopped' or box.info
> +test_run:wait_cond(function() return box.info.replication[1].downstream.status ~= 'stopped' end) or box.info
>  test_run:cmd("switch default")

The comment above says 'there shouldn't be any rw errors'. Your fix
hides a transient 'writev(1), <...>', which I guess is a temporary
connectivity problem. But I guess it also may hide an rw error for which
the test case was added (related to disc). Or such error should keep the
relay in the stopped state forever?

I tried to revert b9db91e1cdcc97c269703420c7b292e0f125f0ec ('xlog: fix
fallocate vs read race') (only src/, not test/), removed the test from
the fragile list, clean the repository (to ensure that we'll run with
the new HAVE_POSIX_FALLOCATE value) and run the test 1000 times in 32
parallel jobs:

$ (cd test && ./test-run.py -j 32 $(yes replication/wal_rw_stress | head -n 1000))
<...>
Statistics:
* pass: 1000

My plan was: reproduce the original issue (#3883) and verify that your
fix does not hide it. However the plan fails on the first step.

Can you check, whether #3883 is reproducible for you after reverting the
fix?

Even if it will hide the original problem, the error message should
differ. I guess we can filter out connectivity problems from disc rw
problems in the wait_cond() function.

BTW, I also checked whether #4977 reproduced on master for me: and it
seems, no: 1000 tests passed in 32 parallel jobs.

Maybe it is reproducible only in some specific environment? On FreeBSD
and/or in VirtualBox? I tried it on Linux laptop (initially I missed
that it occurs on FreeBSD, sorry).

Side note: I suggest to use something like the following to carry long
lines:

 | test_run:wait_cond(function()                                     \
 |     return box.info.replication[1].downstream.status ~= 'stopped' \
 | end or box.info

  reply	other threads:[~2020-06-18 20:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-15 14:34 Alexander V. Tikhonov
2020-06-18 20:50 ` Alexander Turenko [this message]
2020-06-19 13:38   ` Alexander V. Tikhonov
2020-06-23 14:52     ` Alexander Turenko
2020-06-26  9:32 ` Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200618205046.hklilhvpapongixz@tkn_work_nb \
    --to=alexander.turenko@tarantool.org \
    --cc=avtikhon@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1] test: fix flaky replication/wal_rw_stress.test.lua' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox