* [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua
@ 2020-09-29 13:47 Alexander V. Tikhonov
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 13:47 UTC (permalink / raw)
To: Vladislav Shpilevoy, Kirill Yukhin; +Cc: tarantool-patches
On heavy loaded hosts found the following issue:
[151] --- replication/replica_rejoin.result Tue Sep 29 10:57:26 2020
[151] +++ replication/replica_rejoin.reject Tue Sep 29 10:57:48 2020
[151] @@ -230,7 +230,12 @@
[151] return box.info ~= nil and box.info.replication[1] ~= nil
[151] end)
[151] ---
[151] -- true
[151] +- error: "builtin/box/load_cfg.lua:601: Please call box.cfg{} first\nstack traceback:\n\tbuiltin/box/load_cfg.lua:601:
[151] + in function '__index'\n\t[string \"return test_run:wait_cond(function() ...\"]:1:
[151] + in function 'cond'\n\t/tmp/tnt/151_replication/test_run.lua:411: in function </tmp/tnt/151_replication/test_run.lua:404>\n\t[C]:
[151] + in function 'pcall'\n\tbuiltin/box/console.lua:402: in function 'eval'\n\tbuiltin/box/console.lua:708:
[151] + in function 'repl'\n\tbuiltin/box/console.lua:842: in function <builtin/box/console.lua:828>\n\t[C]:
[151] + in function 'pcall'\n\tbuiltin/socket.lua:1081: in function <builtin/socket.lua:1079>"
[151] ...
[151] test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
[151] ---
[151]
It happened because box.cfg was not ready to provide information. In
real there is no need to use local check for replication information
parts availablity, due to wait_upstream() function used below, do it
itself.
Part of #4985
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums
Issue: https://github.com/tarantool/tarantool/issues/4985
test/replication/replica_rejoin.result | 8 --------
test/replication/replica_rejoin.test.lua | 5 -----
2 files changed, 13 deletions(-)
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index f6e74eae1..4d9e83868 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -221,14 +221,6 @@ test_run:cmd("switch replica")
---
- true
...
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function() \
- return box.info ~= nil and box.info.replication[1] ~= nil \
-end)
----
-- true
-...
test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
---
- true
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 0feea152e..599a52988 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -81,11 +81,6 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
box.cfg{checkpoint_count = checkpoint_count}
test_run:cmd("start server replica with args='true', wait=False")
test_run:cmd("switch replica")
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function() \
- return box.info ~= nil and box.info.replication[1] ~= nil \
-end)
test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
box.space.test:select()
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1)
2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
@ 2020-09-29 13:49 ` Alexander V. Tikhonov
2020-10-01 22:10 ` Vladislav Shpilevoy
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2) Alexander V. Tikhonov
2020-10-01 22:10 ` [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Vladislav Shpilevoy
2 siblings, 1 reply; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 13:49 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tarantool-patches
Set error message to log output in test:
replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums
.../gh-3160-misc-heartbeats-on-master-changes.result | 5 ++++-
.../gh-3160-misc-heartbeats-on-master-changes.test.lua | 5 ++++-
test/replication/suite.ini | 2 +-
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
index 9bce55ae1..26c369753 100644
--- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
+++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
@@ -34,6 +34,7 @@ end;
---
...
function test_timeout()
+ local log = require('log')
local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
local follows = test_run:wait_cond(function()
@@ -43,7 +44,9 @@ function test_timeout()
for i = 0, 99 do
box.space.test_timeout:replace({1})
if wait_not_follow(replicaA, replicaB) then
- return error(box.info.replication)
+ log.info("test_timeout() failed, box.info.replication:")
+ log.info(box.info.replication)
+ return false
end
end
return true
diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
index b3d8d2d54..480d4ae6c 100644
--- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
+++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
@@ -16,6 +16,7 @@ function wait_not_follow(replicaA, replicaB)
end, box.cfg.replication_timeout)
end;
function test_timeout()
+ local log = require('log')
local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
local follows = test_run:wait_cond(function()
@@ -25,7 +26,9 @@ function test_timeout()
for i = 0, 99 do
box.space.test_timeout:replace({1})
if wait_not_follow(replicaA, replicaB) then
- return error(box.info.replication)
+ log.info("test_timeout() failed, box.info.replication:")
+ log.info(box.info.replication)
+ return false
end
end
return true
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 007f4f64c..d32d76753 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -24,7 +24,7 @@ fragile = {
},
"gh-3160-misc-heartbeats-on-master-changes.test.lua": {
"issues": [ "gh-4940" ],
- "checksums": [ "39b09085bc6398d15324191851d6f556" ]
+ "checksums": [ "39b09085bc6398d15324191851d6f556", "20b7bf9ce51a1a936da3f465db42bd62" ]
},
"skip_conflict_row.test.lua": {
"issues": [ "gh-4958" ]
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2)
2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
@ 2020-09-29 13:49 ` Alexander V. Tikhonov
2020-10-01 22:10 ` [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Vladislav Shpilevoy
2 siblings, 0 replies; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 13:49 UTC (permalink / raw)
To: Kirill Yukhin; +Cc: tarantool-patches
Set error message to log output in test:
replication/replica_rejoin.test.lua gh-4985
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums
test/replication/replica_rejoin.result | 11 +++++++----
test/replication/replica_rejoin.test.lua | 9 +++++----
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 4d9e83868..dbcde0db2 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -4,6 +4,9 @@ env = require('test_run')
test_run = env.new()
---
...
+log = require('log')
+---
+...
engine = test_run:get_cfg('engine')
---
...
@@ -45,7 +48,7 @@ test_run:cmd("switch replica")
---
- true
...
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
---
- true
...
@@ -115,7 +118,7 @@ test_run:cmd("start server replica with args='true'")
---
- true
...
-box.info.replication[2].downstream.vclock ~= nil or box.info
+box.info.replication[2].downstream.vclock ~= nil or log(box.info)
---
- true
...
@@ -123,7 +126,7 @@ test_run:cmd("switch replica")
---
- true
...
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
---
- true
...
@@ -162,7 +165,7 @@ box.space.test:select()
...
-- Check that restart works as usual.
test_run:cmd("restart server replica with args='true'")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
---
- true
...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 599a52988..3ea588aa6 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -1,5 +1,6 @@
env = require('test_run')
test_run = env.new()
+log = require('log')
engine = test_run:get_cfg('engine')
test_run:cleanup_cluster()
@@ -19,7 +20,7 @@ _ = box.space.test:insert{3}
test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
test_run:cmd("start server replica with args='true'")
test_run:cmd("switch replica")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
box.space.test:select()
test_run:cmd("switch default")
test_run:cmd("stop server replica")
@@ -46,9 +47,9 @@ box.cfg{checkpoint_count = checkpoint_count}
-- Restart the replica. Since xlogs have been removed,
-- it is supposed to rejoin without changing id.
test_run:cmd("start server replica with args='true'")
-box.info.replication[2].downstream.vclock ~= nil or box.info
+box.info.replication[2].downstream.vclock ~= nil or log(box.info)
test_run:cmd("switch replica")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
box.space.test:select()
test_run:cmd("switch default")
@@ -62,7 +63,7 @@ box.space.test:select()
-- Check that restart works as usual.
test_run:cmd("restart server replica with args='true'")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
box.space.test:select()
-- Check that rebootstrap is NOT initiated unless the replica
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1)
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
@ 2020-10-01 22:10 ` Vladislav Shpilevoy
0 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-01 22:10 UTC (permalink / raw)
To: Alexander V. Tikhonov, Kirill Yukhin; +Cc: tarantool-patches
Thanks for the patch!
What is (1) in the commit title?
On 29.09.2020 15:49, Alexander V. Tikhonov wrote:
> Set error message to log output in test:
>
> replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
> ---
Why? What is happening in this patch? I don't understand.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua
2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2) Alexander V. Tikhonov
@ 2020-10-01 22:10 ` Vladislav Shpilevoy
2 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-01 22:10 UTC (permalink / raw)
To: Alexander V. Tikhonov, Kirill Yukhin; +Cc: tarantool-patches
Hi! Thanks for the patch!
This commit LGTM.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua
@ 2020-09-29 12:57 Alexander V. Tikhonov
0 siblings, 0 replies; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 12:57 UTC (permalink / raw)
To: Vladislav Shpilevoy, Kirill Yukhin; +Cc: root, tarantool-patches
From: root <root@dev1.tarantool.i>
On heavy loaded hosts found the following issue:
[151] --- replication/replica_rejoin.result Tue Sep 29 10:57:26 2020
[151] +++ replication/replica_rejoin.reject Tue Sep 29 10:57:48 2020
[151] @@ -230,7 +230,12 @@
[151] return box.info ~= nil and box.info.replication[1] ~= nil
[151] end)
[151] ---
[151] -- true
[151] +- error: "builtin/box/load_cfg.lua:601: Please call box.cfg{} first\nstack traceback:\n\tbuiltin/box/load_cfg.lua:601:
[151] + in function '__index'\n\t[string \"return test_run:wait_cond(function() ...\"]:1:
[151] + in function 'cond'\n\t/tmp/tnt/151_replication/test_run.lua:411: in function </tmp/tnt/151_replication/test_run.lua:404>\n\t[C]:
[151] + in function 'pcall'\n\tbuiltin/box/console.lua:402: in function 'eval'\n\tbuiltin/box/console.lua:708:
[151] + in function 'repl'\n\tbuiltin/box/console.lua:842: in function <builtin/box/console.lua:828>\n\t[C]:
[151] + in function 'pcall'\n\tbuiltin/socket.lua:1081: in function <builtin/socket.lua:1079>"
[151] ...
[151] test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
[151] ---
[151]
It happened because box.cfg was not ready to provide information. In
real there is no need to use local check for replication information
parts availablity, due to wait_upstream() function used below, do it
itself.
Part of #4985
---
Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums
Issue: https://github.com/tarantool/tarantool/issues/4985
test/replication/replica_rejoin.result | 8 --------
test/replication/replica_rejoin.test.lua | 5 -----
2 files changed, 13 deletions(-)
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index f6e74eae1..4d9e83868 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -221,14 +221,6 @@ test_run:cmd("switch replica")
---
- true
...
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function() \
- return box.info ~= nil and box.info.replication[1] ~= nil \
-end)
----
-- true
-...
test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
---
- true
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 0feea152e..599a52988 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -81,11 +81,6 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
box.cfg{checkpoint_count = checkpoint_count}
test_run:cmd("start server replica with args='true', wait=False")
test_run:cmd("switch replica")
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function() \
- return box.info ~= nil and box.info.replication[1] ~= nil \
-end)
test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
box.space.test:select()
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-01 22:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
2020-10-01 22:10 ` Vladislav Shpilevoy
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2) Alexander V. Tikhonov
2020-10-01 22:10 ` [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Vladislav Shpilevoy
-- strict thread matches above, loose matches on Subject: below --
2020-09-29 12:57 Alexander V. Tikhonov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox