From: Serge Petrenko <sergepetrenko@tarantool.org>
To: "Alexander V. Tikhonov" <avtikhon@tarantool.org>,
Kirill Yukhin <kyukhin@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
Date: Wed, 9 Sep 2020 18:41:11 +0300 [thread overview]
Message-ID: <23d93ca8-c632-aab5-7e57-44b6f8c3a93e@tarantool.org> (raw)
In-Reply-To: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org>
07.09.2020 04:00, Alexander V. Tikhonov пишет:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
> [026] --- replication/status.result Thu Jun 11 12:07:39 2020
> [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020
> [026] @@ -174,15 +174,17 @@
> [026] ...
> [026] replica.downstream.status == 'follow'
> [026] ---
> [026] -- true
> [026] +- false
> [026] ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result Mon Sep 7 00:22:52 2020
> [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024] ...
> [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] --
> [024] -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result Fri Jul 3 04:29:56 2020
> [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014] ...
> [014] master.upstream.status == "follow"
> [014] ---
> [014] -- true
> [014] +- false
> [014] ...
> [014] master.upstream.lag < 1
> [014] ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
> Issue: https://github.com/tarantool/tarantool/issues/5110
>
> test/replication/status.result | 16 +++++++++-------
> test/replication/status.test.lua | 13 +++++++++----
> test/replication/suite.ini | 1 -
> 3 files changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/test/replication/status.result b/test/replication/status.result
> index a86f48774..d5addbc80 100644
> --- a/test/replication/status.result
> +++ b/test/replication/status.result
> @@ -172,15 +172,17 @@ replica.upstream == nil
> ---
> - true
> ...
> -replica.downstream.status == 'follow'
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> ---
> - true
> ...
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> ----
> -- true
> -...
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +-- wait for the replication vclock
> +test_run:wait_cond(function() \
> + local r = box.info.replication[replica_id].downstream.vclock \
> + return (r ~= nil and box.info.vclock ~= nil and \
> + r[master_id] == box.info.vclock[master_id] and \
> + r[replica_id] == box.info.vclock[replica_id]) \
> + end) or box.info
> ---
> - true
> ...
> @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
> ---
> - true
> ...
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
> ---
> - true
> ...
> diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
> index 090968172..6006ce9cf 100644
> --- a/test/replication/status.test.lua
> +++ b/test/replication/status.test.lua
> @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
> -- replica.lsn == box.info.vclock[replica_id]
> replica.lsn == 0
> replica.upstream == nil
> -replica.downstream.status == 'follow'
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> +-- wait for the replication vclock
> +test_run:wait_cond(function() \
> + local r = box.info.replication[replica_id].downstream.vclock \
> + return (r ~= nil and box.info.vclock ~= nil and \
> + r[master_id] == box.info.vclock[master_id] and \
> + r[replica_id] == box.info.vclock[replica_id]) \
> + end) or box.info
>
> --
> -- Replica
> @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
> master = box.info.replication[master_id]
> master.id == master_id
> master.uuid == box.space._cluster:get(master_id)[2]
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
> master.upstream.lag < 1
> master.upstream.idle < 1
> master.upstream.peer:match("unix/")
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index ab9c3dabd..9bba9d125 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -23,4 +23,3 @@ fragile = errinj.test.lua ; gh-3870
> box_set_replication_stress.test.lua ; gh-4992 gh-4986
> gh-4605-empty-password.test.lua ; gh-5030
> anon.test.lua ; gh-5058
> - status.test.lua ; gh-5110
LGTM
--
Serge Petrenko
next prev parent reply other threads:[~2020-09-09 15:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-07 1:00 Alexander V. Tikhonov
2020-09-09 15:41 ` Serge Petrenko [this message]
2020-09-11 10:36 ` Kirill Yukhin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23d93ca8-c632-aab5-7e57-44b6f8c3a93e@tarantool.org \
--to=sergepetrenko@tarantool.org \
--cc=avtikhon@tarantool.org \
--cc=kyukhin@tarantool.org \
--cc=tarantool-patches@dev.tarantool.org \
--subject='Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox