* [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
@ 2020-09-07 1:00 Alexander V. Tikhonov
2020-09-09 15:41 ` Serge Petrenko
2020-09-11 10:36 ` Kirill Yukhin
0 siblings, 2 replies; 3+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-07 1:00 UTC (permalink / raw)
To: Kirill Yukhin, Serge Petrenko; +Cc: tarantool-patches
On heavy loaded hosts found the following 3 issues:
line 174:
[026] --- replication/status.result Thu Jun 11 12:07:39 2020
[026] +++ replication/status.reject Sun Jun 14 03:20:21 2020
[026] @@ -174,15 +174,17 @@
[026] ...
[026] replica.downstream.status == 'follow'
[026] ---
[026] -- true
[026] +- false
[026] ...
It happened because replication downstream status check occurred too
early. To give the replication status check routine ability to reach
the needed 'follow' state, it need to wait for it using
test_run:wait_downstream() routine.
line 178:
[024] --- replication/status.result Mon Sep 7 00:22:52 2020
[024] +++ replication/status.reject Mon Sep 7 00:36:01 2020
[024] @@ -178,11 +178,13 @@
[024] ...
[024] replica.downstream.vclock[master_id] == box.info.vclock[master_id]
[024] ---
[024] -- true
[024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
[024] + index field ''vclock'' (a nil value)'
[024] ...
[024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
[024] ---
[024] -- true
[024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
[024] + index field ''vclock'' (a nil value)'
[024] ...
[024] --
[024] -- Replica
It happened because replication vclock field was not exist at the moment
of its check. To fix the issue, vclock field had to be waited to be
available using test_run:wait_cond() routine. Also the replication data
downstream had to be read at the same moment.
line 224:
[014] --- replication/status.result Fri Jul 3 04:29:56 2020
[014] +++ replication/status.reject Mon Sep 7 00:17:30 2020
[014] @@ -224,7 +224,7 @@
[014] ...
[014] master.upstream.status == "follow"
[014] ---
[014] -- true
[014] +- false
[014] ...
[014] master.upstream.lag < 1
[014] ---
It happened because replication upstream status check occurred too
early. To give the replication status check routine ability to reach
the needed 'follow' state, it need to wait for it using
test_run:wait_upstream() routine.
Removed test from 'fragile' test_run tool list to run it in parallel.
Closes #5110
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
Issue: https://github.com/tarantool/tarantool/issues/5110
test/replication/status.result | 16 +++++++++-------
test/replication/status.test.lua | 13 +++++++++----
test/replication/suite.ini | 1 -
3 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/test/replication/status.result b/test/replication/status.result
index a86f48774..d5addbc80 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -172,15 +172,17 @@ replica.upstream == nil
---
- true
...
-replica.downstream.status == 'follow'
+test_run:wait_downstream(replica_id, {status == 'follow'})
---
- true
...
-replica.downstream.vclock[master_id] == box.info.vclock[master_id]
----
-- true
-...
-replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
+-- wait for the replication vclock
+test_run:wait_cond(function() \
+ local r = box.info.replication[replica_id].downstream.vclock \
+ return (r ~= nil and box.info.vclock ~= nil and \
+ r[master_id] == box.info.vclock[master_id] and \
+ r[replica_id] == box.info.vclock[replica_id]) \
+ end) or box.info
---
- true
...
@@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
---
- true
...
-master.upstream.status == "follow"
+test_run:wait_upstream(master_id, {status == 'follow'})
---
- true
...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 090968172..6006ce9cf 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
-- replica.lsn == box.info.vclock[replica_id]
replica.lsn == 0
replica.upstream == nil
-replica.downstream.status == 'follow'
-replica.downstream.vclock[master_id] == box.info.vclock[master_id]
-replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
+test_run:wait_downstream(replica_id, {status == 'follow'})
+-- wait for the replication vclock
+test_run:wait_cond(function() \
+ local r = box.info.replication[replica_id].downstream.vclock \
+ return (r ~= nil and box.info.vclock ~= nil and \
+ r[master_id] == box.info.vclock[master_id] and \
+ r[replica_id] == box.info.vclock[replica_id]) \
+ end) or box.info
--
-- Replica
@@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
master = box.info.replication[master_id]
master.id == master_id
master.uuid == box.space._cluster:get(master_id)[2]
-master.upstream.status == "follow"
+test_run:wait_upstream(master_id, {status == 'follow'})
master.upstream.lag < 1
master.upstream.idle < 1
master.upstream.peer:match("unix/")
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index ab9c3dabd..9bba9d125 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -23,4 +23,3 @@ fragile = errinj.test.lua ; gh-3870
box_set_replication_stress.test.lua ; gh-4992 gh-4986
gh-4605-empty-password.test.lua ; gh-5030
anon.test.lua ; gh-5058
- status.test.lua ; gh-5110
--
2.17.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
2020-09-07 1:00 [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status Alexander V. Tikhonov
@ 2020-09-09 15:41 ` Serge Petrenko
2020-09-11 10:36 ` Kirill Yukhin
1 sibling, 0 replies; 3+ messages in thread
From: Serge Petrenko @ 2020-09-09 15:41 UTC (permalink / raw)
To: Alexander V. Tikhonov, Kirill Yukhin; +Cc: tarantool-patches
07.09.2020 04:00, Alexander V. Tikhonov пишет:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
> [026] --- replication/status.result Thu Jun 11 12:07:39 2020
> [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020
> [026] @@ -174,15 +174,17 @@
> [026] ...
> [026] replica.downstream.status == 'follow'
> [026] ---
> [026] -- true
> [026] +- false
> [026] ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result Mon Sep 7 00:22:52 2020
> [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024] ...
> [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] --
> [024] -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result Fri Jul 3 04:29:56 2020
> [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014] ...
> [014] master.upstream.status == "follow"
> [014] ---
> [014] -- true
> [014] +- false
> [014] ...
> [014] master.upstream.lag < 1
> [014] ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
> ---
>
> Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174
> Issue: https://github.com/tarantool/tarantool/issues/5110
>
> test/replication/status.result | 16 +++++++++-------
> test/replication/status.test.lua | 13 +++++++++----
> test/replication/suite.ini | 1 -
> 3 files changed, 18 insertions(+), 12 deletions(-)
>
> diff --git a/test/replication/status.result b/test/replication/status.result
> index a86f48774..d5addbc80 100644
> --- a/test/replication/status.result
> +++ b/test/replication/status.result
> @@ -172,15 +172,17 @@ replica.upstream == nil
> ---
> - true
> ...
> -replica.downstream.status == 'follow'
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> ---
> - true
> ...
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> ----
> -- true
> -...
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +-- wait for the replication vclock
> +test_run:wait_cond(function() \
> + local r = box.info.replication[replica_id].downstream.vclock \
> + return (r ~= nil and box.info.vclock ~= nil and \
> + r[master_id] == box.info.vclock[master_id] and \
> + r[replica_id] == box.info.vclock[replica_id]) \
> + end) or box.info
> ---
> - true
> ...
> @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2]
> ---
> - true
> ...
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
> ---
> - true
> ...
> diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
> index 090968172..6006ce9cf 100644
> --- a/test/replication/status.test.lua
> +++ b/test/replication/status.test.lua
> @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2]
> -- replica.lsn == box.info.vclock[replica_id]
> replica.lsn == 0
> replica.upstream == nil
> -replica.downstream.status == 'follow'
> -replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> +test_run:wait_downstream(replica_id, {status == 'follow'})
> +-- wait for the replication vclock
> +test_run:wait_cond(function() \
> + local r = box.info.replication[replica_id].downstream.vclock \
> + return (r ~= nil and box.info.vclock ~= nil and \
> + r[master_id] == box.info.vclock[master_id] and \
> + r[replica_id] == box.info.vclock[replica_id]) \
> + end) or box.info
>
> --
> -- Replica
> @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2
> master = box.info.replication[master_id]
> master.id == master_id
> master.uuid == box.space._cluster:get(master_id)[2]
> -master.upstream.status == "follow"
> +test_run:wait_upstream(master_id, {status == 'follow'})
> master.upstream.lag < 1
> master.upstream.idle < 1
> master.upstream.peer:match("unix/")
> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index ab9c3dabd..9bba9d125 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -23,4 +23,3 @@ fragile = errinj.test.lua ; gh-3870
> box_set_replication_stress.test.lua ; gh-4992 gh-4986
> gh-4605-empty-password.test.lua ; gh-5030
> anon.test.lua ; gh-5058
> - status.test.lua ; gh-5110
LGTM
--
Serge Petrenko
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status
2020-09-07 1:00 [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status Alexander V. Tikhonov
2020-09-09 15:41 ` Serge Petrenko
@ 2020-09-11 10:36 ` Kirill Yukhin
1 sibling, 0 replies; 3+ messages in thread
From: Kirill Yukhin @ 2020-09-11 10:36 UTC (permalink / raw)
To: Alexander V. Tikhonov; +Cc: tarantool-patches
Hello,
On 07 сен 04:00, Alexander V. Tikhonov wrote:
> On heavy loaded hosts found the following 3 issues:
>
> line 174:
>
> [026] --- replication/status.result Thu Jun 11 12:07:39 2020
> [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020
> [026] @@ -174,15 +174,17 @@
> [026] ...
> [026] replica.downstream.status == 'follow'
> [026] ---
> [026] -- true
> [026] +- false
> [026] ...
>
> It happened because replication downstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_downstream() routine.
>
> line 178:
>
> [024] --- replication/status.result Mon Sep 7 00:22:52 2020
> [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020
> [024] @@ -178,11 +178,13 @@
> [024] ...
> [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id]
> [024] ---
> [024] -- true
> [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to
> [024] + index field ''vclock'' (a nil value)'
> [024] ...
> [024] --
> [024] -- Replica
>
> It happened because replication vclock field was not exist at the moment
> of its check. To fix the issue, vclock field had to be waited to be
> available using test_run:wait_cond() routine. Also the replication data
> downstream had to be read at the same moment.
>
> line 224:
>
> [014] --- replication/status.result Fri Jul 3 04:29:56 2020
> [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020
> [014] @@ -224,7 +224,7 @@
> [014] ...
> [014] master.upstream.status == "follow"
> [014] ---
> [014] -- true
> [014] +- false
> [014] ...
> [014] master.upstream.lag < 1
> [014] ---
>
> It happened because replication upstream status check occurred too
> early. To give the replication status check routine ability to reach
> the needed 'follow' state, it need to wait for it using
> test_run:wait_upstream() routine.
>
> Removed test from 'fragile' test_run tool list to run it in parallel.
>
> Closes #5110
I've checked your patch into 1.10, 2.4, 2.5 and master.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-09-11 10:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 1:00 [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status Alexander V. Tikhonov
2020-09-09 15:41 ` Serge Petrenko
2020-09-11 10:36 ` Kirill Yukhin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox