[Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
Kirill Yukhin
kyukhin at tarantool.org
Fri Sep 11 13:36:44 MSK 2020
Hello,
On 07 сен 04:00, Alexander V. Tikhonov wrote:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
> [026] --- replication/status.result Thu Jun 11 12:07:39 2020
> [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020
> [026] @@ -174,15 +174,17 @@
> [026] ...
> [026] replica.downstream.status == 'follow'
> [026] ---
> [026] -- true
> [026] +- false
> [026] ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result Mon Sep 7 00:22:52 2020
> [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024] ...
> [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] --
> [024] -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result Fri Jul 3 04:29:56 2020
> [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014] ...
> [014] master.upstream.status == "follow"
> [014] ---
> [014] -- true
> [014] +- false
> [014] ...
> [014] master.upstream.lag < 1
> [014] ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
I've checked your patch into 1.10, 2.4, 2.5 and master.
--
Regards, Kirill Yukhin
More information about the Tarantool-patches
mailing list