[Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status

Serge Petrenko sergepetrenko at tarantool.org
Wed Sep 9 18:41:11 MSK 2020


07.09.2020 04:00, Alexander V. Tikhonov пишет:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
>   [026] --- replication/status.result	Thu Jun 11 12:07:39 2020
>   [026] +++ replication/status.reject	Sun Jun 14 03:20:21 2020
>   [026] @@ -174,15 +174,17 @@
>   [026]  ...
>   [026]  replica.downstream.status == 'follow'
>   [026]  ---
>   [026] -- true
>   [026] +- false
>   [026]  ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result	Mon Sep  7 00:22:52 2020
> [024] +++ replication/status.reject	Mon Sep  7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024]  ...
> [024]  replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024]  ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] +    index field ''vclock'' (a nil value)'
> [024]  ...
> [024]  replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024]  ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] +    index field ''vclock'' (a nil value)'
> [024]  ...
> [024]  --
> [024]  -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result	Fri Jul  3 04:29:56 2020
> [014] +++ replication/status.reject	Mon Sep  7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014]  ...
> [014]  master.upstream.status == "follow"
> [014]  ---
> [014] -- true
> [014] +- false
> [014]  ...
> [014]  master.upstream.lag < 1
> [014]  ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
> Issue: https://github.com/tarantool/tarantool/issues/5110
>
>   test/replication/status.result   | 16 +++++++++-------
>   test/replication/status.test.lua | 13 +++++++++----
>   test/replication/suite.ini       |  1 -
>   3 files changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/test/replication/status.result b/test/replication/status.result
> index a86f48774..d5addbc80 100644
> --- a/test/replication/status.result
> +++ b/test/replication/status.result
> @@ -172,15 +172,17 @@ replica.upstream == nil
>   ---
>   - true
>   ...
> -replica.downstream.status == 'follow'
> +test_run:wait_downstream(replica_id, {status == 'follow'})
>   ---
>   - true
>   ...
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> ----
> -- true
> -...
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +-- wait for the replication vclock
> +test_run:wait_cond(function()                    \
> +    local r = box.info.replication[replica_id].downstream.vclock \
> +    return (r ~= nil and box.info.vclock ~= nil and \
> +            r[master_id] == box.info.vclock[master_id] and \
> +            r[replica_id] == box.info.vclock[replica_id]) \
> +    end) or box.info
>   ---
>   - true
>   ...
> @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
>   ---
>   - true
>   ...
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
>   ---
>   - true
>   ...
> diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
> index 090968172..6006ce9cf 100644
> --- a/test/replication/status.test.lua
> +++ b/test/replication/status.test.lua
> @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
>   -- replica.lsn == box.info.vclock[replica_id]
>   replica.lsn == 0
>   replica.upstream == nil
> -replica.downstream.status == 'follow'
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> +-- wait for the replication vclock
> +test_run:wait_cond(function()                    \
> +    local r = box.info.replication[replica_id].downstream.vclock \
> +    return (r ~= nil and box.info.vclock ~= nil and \
> +            r[master_id] == box.info.vclock[master_id] and \
> +            r[replica_id] == box.info.vclock[replica_id]) \
> +    end) or box.info
>   
>   --
>   -- Replica
> @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
>   master = box.info.replication[master_id]
>   master.id == master_id
>   master.uuid == box.space._cluster:get(master_id)[2]
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
>   master.upstream.lag < 1
>   master.upstream.idle < 1
>   master.upstream.peer:match("unix/")
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index ab9c3dabd..9bba9d125 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -23,4 +23,3 @@ fragile = errinj.test.lua            ; gh-3870
>             box_set_replication_stress.test.lua     ; gh-4992 gh-4986
>             gh-4605-empty-password.test.lua         ; gh-5030
>             anon.test.lua              ; gh-5058
> -          status.test.lua            ; gh-5110


LGTM

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list