[Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
Serge Petrenko
sergepetrenko at tarantool.org
Wed Sep 9 18:41:11 MSK 2020
07.09.2020 04:00, Alexander V. Tikhonov пишет:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
> [026] --- replication/status.result Thu Jun 11 12:07:39 2020
> [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020
> [026] @@ -174,15 +174,17 @@
> [026] ...
> [026] replica.downstream.status == 'follow'
> [026] ---
> [026] -- true
> [026] +- false
> [026] ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result Mon Sep 7 00:22:52 2020
> [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024] ...
> [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] --
> [024] -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result Fri Jul 3 04:29:56 2020
> [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014] ...
> [014] master.upstream.status == "follow"
> [014] ---
> [014] -- true
> [014] +- false
> [014] ...
> [014] master.upstream.lag < 1
> [014] ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
> Issue: https://github.com/tarantool/tarantool/issues/5110
>
> test/replication/status.result | 16 +++++++++-------
> test/replication/status.test.lua | 13 +++++++++----
> test/replication/suite.ini | 1 -
> 3 files changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/test/replication/status.result b/test/replication/status.result
> index a86f48774..d5addbc80 100644
> --- a/test/replication/status.result
> +++ b/test/replication/status.result
> @@ -172,15 +172,17 @@ replica.upstream == nil
> ---
> - true
> ...
> -replica.downstream.status == 'follow'
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> ---
> - true
> ...
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> ----
> -- true
> -...
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +-- wait for the replication vclock
> +test_run:wait_cond(function() \
> + local r = box.info.replication[replica_id].downstream.vclock \
> + return (r ~= nil and box.info.vclock ~= nil and \
> + r[master_id] == box.info.vclock[master_id] and \
> + r[replica_id] == box.info.vclock[replica_id]) \
> + end) or box.info
> ---
> - true
> ...
> @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
> ---
> - true
> ...
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
> ---
> - true
> ...
> diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
> index 090968172..6006ce9cf 100644
> --- a/test/replication/status.test.lua
> +++ b/test/replication/status.test.lua
> @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
> -- replica.lsn == box.info.vclock[replica_id]
> replica.lsn == 0
> replica.upstream == nil
> -replica.downstream.status == 'follow'
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> +-- wait for the replication vclock
> +test_run:wait_cond(function() \
> + local r = box.info.replication[replica_id].downstream.vclock \
> + return (r ~= nil and box.info.vclock ~= nil and \
> + r[master_id] == box.info.vclock[master_id] and \
> + r[replica_id] == box.info.vclock[replica_id]) \
> + end) or box.info
>
> --
> -- Replica
> @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
> master = box.info.replication[master_id]
> master.id == master_id
> master.uuid == box.space._cluster:get(master_id)[2]
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
> master.upstream.lag < 1
> master.upstream.idle < 1
> master.upstream.peer:match("unix/")
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index ab9c3dabd..9bba9d125 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -23,4 +23,3 @@ fragile = errinj.test.lua ; gh-3870
> box_set_replication_stress.test.lua ; gh-4992 gh-4986
> gh-4605-empty-password.test.lua ; gh-5030
> anon.test.lua ; gh-5058
> - status.test.lua ; gh-5110
LGTM
--
Serge Petrenko
More information about the Tarantool-patches
mailing list