[Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status

Alexander V. Tikhonov avtikhon at tarantool.org
Mon Sep 7 04:00:11 MSK 2020


On heavy loaded hosts found the following 3 issues:

line 174:

 [026] --- replication/status.result	Thu Jun 11 12:07:39 2020
 [026] +++ replication/status.reject	Sun Jun 14 03:20:21 2020
 [026] @@ -174,15 +174,17 @@
 [026]  ...
 [026]  replica.downstream.status == 'follow'
 [026]  ---
 [026] -- true
 [026] +- false
 [026]  ...

It happened because replication downstream status check occurred too
early. To give the replication status check routine ability to reach
the needed 'follow' state, it need to wait for it using
test_run:wait_downstream() routine.

line 178:

[024] --- replication/status.result	Mon Sep  7 00:22:52 2020
[024] +++ replication/status.reject	Mon Sep  7 00:36:01 2020
[024] @@ -178,11 +178,13 @@
[024]  ...
[024]  replica.downstream.vclock[master_id] == box.info.vclock[master_id]
[024]  ---
[024] -- true
[024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
[024] +    index field ''vclock'' (a nil value)'
[024]  ...
[024]  replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
[024]  ---
[024] -- true
[024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
[024] +    index field ''vclock'' (a nil value)'
[024]  ...
[024]  --
[024]  -- Replica

It happened because replication vclock field was not exist at the moment
of its check. To fix the issue, vclock field had to be waited to be
available using test_run:wait_cond() routine. Also the replication data
downstream had to be read at the same moment.

line 224:

[014] --- replication/status.result	Fri Jul  3 04:29:56 2020
[014] +++ replication/status.reject	Mon Sep  7 00:17:30 2020
[014] @@ -224,7 +224,7 @@
[014]  ...
[014]  master.upstream.status == "follow"
[014]  ---
[014] -- true
[014] +- false
[014]  ...
[014]  master.upstream.lag < 1
[014]  ---

It happened because replication upstream status check occurred too
early. To give the replication status check routine ability to reach
the needed 'follow' state, it need to wait for it using
test_run:wait_upstream() routine.

Removed test from 'fragile' test_run tool list to run it in parallel.

Closes #5110
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
Issue: https://github.com/tarantool/tarantool/issues/5110

 test/replication/status.result   | 16 +++++++++-------
 test/replication/status.test.lua | 13 +++++++++----
 test/replication/suite.ini       |  1 -
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/test/replication/status.result b/test/replication/status.result
index a86f48774..d5addbc80 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -172,15 +172,17 @@ replica.upstream == nil
 ---
 - true
 ...
-replica.downstream.status == 'follow'
+test_run:wait_downstream(replica_id, {status == 'follow'})
 ---
 - true
 ...
-replica.downstream.vclock[master_id] == box.info.vclock[master_id]
----
-- true
-...
-replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
+-- wait for the replication vclock
+test_run:wait_cond(function()                    \
+    local r = box.info.replication[replica_id].downstream.vclock \
+    return (r ~= nil and box.info.vclock ~= nil and \
+            r[master_id] == box.info.vclock[master_id] and \
+            r[replica_id] == box.info.vclock[replica_id]) \
+    end) or box.info
 ---
 - true
 ...
@@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
 ---
 - true
 ...
-master.upstream.status == "follow"
+test_run:wait_upstream(master_id, {status == 'follow'})
 ---
 - true
 ...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 090968172..6006ce9cf 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
 -- replica.lsn == box.info.vclock[replica_id]
 replica.lsn == 0
 replica.upstream == nil
-replica.downstream.status == 'follow'
-replica.downstream.vclock[master_id] == box.info.vclock[master_id]
-replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
+test_run:wait_downstream(replica_id, {status == 'follow'})
+-- wait for the replication vclock
+test_run:wait_cond(function()                    \
+    local r = box.info.replication[replica_id].downstream.vclock \
+    return (r ~= nil and box.info.vclock ~= nil and \
+            r[master_id] == box.info.vclock[master_id] and \
+            r[replica_id] == box.info.vclock[replica_id]) \
+    end) or box.info
 
 --
 -- Replica
@@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
 master = box.info.replication[master_id]
 master.id == master_id
 master.uuid == box.space._cluster:get(master_id)[2]
-master.upstream.status == "follow"
+test_run:wait_upstream(master_id, {status == 'follow'})
 master.upstream.lag < 1
 master.upstream.idle < 1
 master.upstream.peer:match("unix/")
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index ab9c3dabd..9bba9d125 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -23,4 +23,3 @@ fragile = errinj.test.lua            ; gh-3870
           box_set_replication_stress.test.lua     ; gh-4992 gh-4986
           gh-4605-empty-password.test.lua         ; gh-5030
           anon.test.lua              ; gh-5058
-          status.test.lua            ; gh-5110
-- 
2.17.1



More information about the Tarantool-patches mailing list