[Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status

Kirill Yukhin kyukhin at tarantool.org
Fri Sep 11 13:36:44 MSK 2020


Hello,

On 07 сен 04:00, Alexander V. Tikhonov wrote:
> On heavy loaded hosts found the following 3 issues:
> 
> line 174:
> 
>  [026] --- replication/status.result	Thu Jun 11 12:07:39 2020
>  [026] +++ replication/status.reject	Sun Jun 14 03:20:21 2020
>  [026] @@ -174,15 +174,17 @@
>  [026]  ...
>  [026]  replica.downstream.status == 'follow'
>  [026]  ---
>  [026] -- true
>  [026] +- false
>  [026]  ...
> 
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
> 
> line 178:
> 
> [024] --- replication/status.result	Mon Sep  7 00:22:52 2020
> [024] +++ replication/status.reject	Mon Sep  7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024]  ...
> [024]  replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024]  ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] +    index field ''vclock'' (a nil value)'
> [024]  ...
> [024]  replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024]  ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] +    index field ''vclock'' (a nil value)'
> [024]  ...
> [024]  --
> [024]  -- Replica
> 
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
> 
> line 224:
> 
> [014] --- replication/status.result	Fri Jul  3 04:29:56 2020
> [014] +++ replication/status.reject	Mon Sep  7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014]  ...
> [014]  master.upstream.status == "follow"
> [014]  ---
> [014] -- true
> [014] +- false
> [014]  ...
> [014]  master.upstream.lag < 1
> [014]  ---
> 
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
> 
> Removed test from 'fragile' test_run tool list to run it in parallel.
> 
> Closes #5110

I've checked your patch into 1.10, 2.4, 2.5 and master.

--
Regards, Kirill Yukhin


More information about the Tarantool-patches mailing list