Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua
@ 2020-09-29 13:47 Alexander V. Tikhonov
  2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 13:47 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Kirill Yukhin; +Cc: tarantool-patches

On heavy loaded hosts found the following issue:

  [151] --- replication/replica_rejoin.result     Tue Sep 29 10:57:26 2020
  [151] +++ replication/replica_rejoin.reject     Tue Sep 29 10:57:48 2020
  [151] @@ -230,7 +230,12 @@
  [151]      return box.info ~= nil and box.info.replication[1] ~= nil
  [151]  end)
  [151]  ---
  [151] -- true
  [151] +- error: "builtin/box/load_cfg.lua:601: Please call box.cfg{} first\nstack traceback:\n\tbuiltin/box/load_cfg.lua:601:
  [151] +    in function '__index'\n\t[string \"return test_run:wait_cond(function()         ...\"]:1:
  [151] +    in function 'cond'\n\t/tmp/tnt/151_replication/test_run.lua:411: in function </tmp/tnt/151_replication/test_run.lua:404>\n\t[C]:
  [151] +    in function 'pcall'\n\tbuiltin/box/console.lua:402: in function 'eval'\n\tbuiltin/box/console.lua:708:
  [151] +    in function 'repl'\n\tbuiltin/box/console.lua:842: in function <builtin/box/console.lua:828>\n\t[C]:
  [151] +    in function 'pcall'\n\tbuiltin/socket.lua:1081: in function <builtin/socket.lua:1079>"
  [151]  ...
  [151]  test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
  [151]  ---
  [151]

It happened because box.cfg was not ready to provide information. In
real there is no need to use local check for replication information
parts availablity, due to wait_upstream() function used below, do it
itself.

Part of #4985
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums
Issue: https://github.com/tarantool/tarantool/issues/4985

 test/replication/replica_rejoin.result   | 8 --------
 test/replication/replica_rejoin.test.lua | 5 -----
 2 files changed, 13 deletions(-)

diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index f6e74eae1..4d9e83868 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -221,14 +221,6 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function()                                                   \
-    return box.info ~= nil and box.info.replication[1] ~= nil                   \
-end)
----
-- true
-...
 test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
 ---
 - true
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 0feea152e..599a52988 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -81,11 +81,6 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
 box.cfg{checkpoint_count = checkpoint_count}
 test_run:cmd("start server replica with args='true', wait=False")
 test_run:cmd("switch replica")
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function()                                                   \
-    return box.info ~= nil and box.info.replication[1] ~= nil                   \
-end)
 test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
 box.space.test:select()
 
-- 
2.25.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1)
  2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
@ 2020-09-29 13:49 ` Alexander V. Tikhonov
  2020-10-01 22:10   ` Vladislav Shpilevoy
  2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2) Alexander V. Tikhonov
  2020-10-01 22:10 ` [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Vladislav Shpilevoy
  2 siblings, 1 reply; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 13:49 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches

Set error message to log output in test:

  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums

 .../gh-3160-misc-heartbeats-on-master-changes.result         | 5 ++++-
 .../gh-3160-misc-heartbeats-on-master-changes.test.lua       | 5 ++++-
 test/replication/suite.ini                                   | 2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
index 9bce55ae1..26c369753 100644
--- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
+++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
@@ -34,6 +34,7 @@ end;
 ---
 ...
 function test_timeout()
+    local log = require('log')
     local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
     local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     local follows = test_run:wait_cond(function()
@@ -43,7 +44,9 @@ function test_timeout()
     for i = 0, 99 do
         box.space.test_timeout:replace({1})
         if wait_not_follow(replicaA, replicaB) then
-            return error(box.info.replication)
+            log.info("test_timeout() failed, box.info.replication:")
+            log.info(box.info.replication)
+            return false
         end
     end
     return true
diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
index b3d8d2d54..480d4ae6c 100644
--- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
+++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
@@ -16,6 +16,7 @@ function wait_not_follow(replicaA, replicaB)
     end, box.cfg.replication_timeout)
 end;
 function test_timeout()
+    local log = require('log')
     local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
     local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     local follows = test_run:wait_cond(function()
@@ -25,7 +26,9 @@ function test_timeout()
     for i = 0, 99 do
         box.space.test_timeout:replace({1})
         if wait_not_follow(replicaA, replicaB) then
-            return error(box.info.replication)
+            log.info("test_timeout() failed, box.info.replication:")
+            log.info(box.info.replication)
+            return false
         end
     end
     return true
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 007f4f64c..d32d76753 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -24,7 +24,7 @@ fragile = {
         },
         "gh-3160-misc-heartbeats-on-master-changes.test.lua": {
             "issues": [ "gh-4940" ],
-            "checksums": [ "39b09085bc6398d15324191851d6f556" ]
+            "checksums": [ "39b09085bc6398d15324191851d6f556", "20b7bf9ce51a1a936da3f465db42bd62" ]
         },
         "skip_conflict_row.test.lua": {
             "issues": [ "gh-4958" ]
-- 
2.25.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2)
  2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
  2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
@ 2020-09-29 13:49 ` Alexander V. Tikhonov
  2020-10-01 22:10 ` [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Vladislav Shpilevoy
  2 siblings, 0 replies; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 13:49 UTC (permalink / raw)
  To: Kirill Yukhin; +Cc: tarantool-patches

Set error message to log output in test:

  replication/replica_rejoin.test.lua gh-4985
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums

 test/replication/replica_rejoin.result   | 11 +++++++----
 test/replication/replica_rejoin.test.lua |  9 +++++----
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 4d9e83868..dbcde0db2 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -4,6 +4,9 @@ env = require('test_run')
 test_run = env.new()
 ---
 ...
+log = require('log')
+---
+...
 engine = test_run:get_cfg('engine')
 ---
 ...
@@ -45,7 +48,7 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
 ---
 - true
 ...
@@ -115,7 +118,7 @@ test_run:cmd("start server replica with args='true'")
 ---
 - true
 ...
-box.info.replication[2].downstream.vclock ~= nil or box.info
+box.info.replication[2].downstream.vclock ~= nil or log(box.info)
 ---
 - true
 ...
@@ -123,7 +126,7 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
 ---
 - true
 ...
@@ -162,7 +165,7 @@ box.space.test:select()
 ...
 -- Check that restart works as usual.
 test_run:cmd("restart server replica with args='true'")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
 ---
 - true
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 599a52988..3ea588aa6 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -1,5 +1,6 @@
 env = require('test_run')
 test_run = env.new()
+log = require('log')
 engine = test_run:get_cfg('engine')
 
 test_run:cleanup_cluster()
@@ -19,7 +20,7 @@ _ = box.space.test:insert{3}
 test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
 test_run:cmd("start server replica with args='true'")
 test_run:cmd("switch replica")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
 box.space.test:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
@@ -46,9 +47,9 @@ box.cfg{checkpoint_count = checkpoint_count}
 -- Restart the replica. Since xlogs have been removed,
 -- it is supposed to rejoin without changing id.
 test_run:cmd("start server replica with args='true'")
-box.info.replication[2].downstream.vclock ~= nil or box.info
+box.info.replication[2].downstream.vclock ~= nil or log(box.info)
 test_run:cmd("switch replica")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
 box.space.test:select()
 test_run:cmd("switch default")
 
@@ -62,7 +63,7 @@ box.space.test:select()
 
 -- Check that restart works as usual.
 test_run:cmd("restart server replica with args='true'")
-box.info.replication[1].upstream.status == 'follow' or box.info
+box.info.replication[1].upstream.status == 'follow' or log(box.info)
 box.space.test:select()
 
 -- Check that rebootstrap is NOT initiated unless the replica
-- 
2.25.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1)
  2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
@ 2020-10-01 22:10   ` Vladislav Shpilevoy
  0 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-01 22:10 UTC (permalink / raw)
  To: Alexander V. Tikhonov, Kirill Yukhin; +Cc: tarantool-patches

Thanks for the patch!

What is (1) in the commit title?

On 29.09.2020 15:49, Alexander V. Tikhonov wrote:
> Set error message to log output in test:
> 
>   replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
> ---

Why? What is happening in this patch? I don't understand.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua
  2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
  2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
  2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2) Alexander V. Tikhonov
@ 2020-10-01 22:10 ` Vladislav Shpilevoy
  2 siblings, 0 replies; 6+ messages in thread
From: Vladislav Shpilevoy @ 2020-10-01 22:10 UTC (permalink / raw)
  To: Alexander V. Tikhonov, Kirill Yukhin; +Cc: tarantool-patches

Hi! Thanks for the patch!

This commit LGTM.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua
@ 2020-09-29 12:57 Alexander V. Tikhonov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexander V. Tikhonov @ 2020-09-29 12:57 UTC (permalink / raw)
  To: Vladislav Shpilevoy, Kirill Yukhin; +Cc: root, tarantool-patches

From: root <root@dev1.tarantool.i>

On heavy loaded hosts found the following issue:

  [151] --- replication/replica_rejoin.result     Tue Sep 29 10:57:26 2020
  [151] +++ replication/replica_rejoin.reject     Tue Sep 29 10:57:48 2020
  [151] @@ -230,7 +230,12 @@
  [151]      return box.info ~= nil and box.info.replication[1] ~= nil
  [151]  end)
  [151]  ---
  [151] -- true
  [151] +- error: &quot;builtin/box/load_cfg.lua:601: Please call box.cfg{} first\nstack traceback:\n\tbuiltin/box/load_cfg.lua:601:
  [151] +    in function &apos;__index&apos;\n\t[string \&quot;return test_run:wait_cond(function()         ...\&quot;]:1:
  [151] +    in function &apos;cond&apos;\n\t/tmp/tnt/151_replication/test_run.lua:411: in function &lt;/tmp/tnt/151_replication/test_run.lua:404&gt;\n\t[C]:
  [151] +    in function &apos;pcall&apos;\n\tbuiltin/box/console.lua:402: in function &apos;eval&apos;\n\tbuiltin/box/console.lua:708:
  [151] +    in function &apos;repl&apos;\n\tbuiltin/box/console.lua:842: in function &lt;builtin/box/console.lua:828&gt;\n\t[C]:
  [151] +    in function &apos;pcall&apos;\n\tbuiltin/socket.lua:1081: in function &lt;builtin/socket.lua:1079&gt;&quot;
  [151]  ...
  [151]  test_run:wait_upstream(1, {message_re = &apos;Missing %.xlog file&apos;, status = &apos;loading&apos;})
  [151]  ---
  [151]

It happened because box.cfg was not ready to provide information. In
real there is no need to use local check for replication information
parts availablity, due to wait_upstream() function used below, do it
itself.

Part of #4985
---

Github: https://github.com/tarantool/tarantool/tree/avtikhon/flaky-checksums
Issue: https://github.com/tarantool/tarantool/issues/4985

 test/replication/replica_rejoin.result   | 8 --------
 test/replication/replica_rejoin.test.lua | 5 -----
 2 files changed, 13 deletions(-)

diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index f6e74eae1..4d9e83868 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -221,14 +221,6 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function()                                                   \
-    return box.info ~= nil and box.info.replication[1] ~= nil                   \
-end)
----
-- true
-...
 test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
 ---
 - true
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 0feea152e..599a52988 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -81,11 +81,6 @@ test_run:wait_cond(function() return #fio.glob(fio.pathjoin(box.cfg.wal_dir, '*.
 box.cfg{checkpoint_count = checkpoint_count}
 test_run:cmd("start server replica with args='true', wait=False")
 test_run:cmd("switch replica")
--- Need to wait for box.info.replication[1] defined, otherwise test-run fails to
--- wait for the upstream status sometimes.
-test_run:wait_cond(function()                                                   \
-    return box.info ~= nil and box.info.replication[1] ~= nil                   \
-end)
 test_run:wait_upstream(1, {message_re = 'Missing %.xlog file', status = 'loading'})
 box.space.test:select()
 
-- 
2.25.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-10-01 22:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 13:47 [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Alexander V. Tikhonov
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (1) Alexander V. Tikhonov
2020-10-01 22:10   ` Vladislav Shpilevoy
2020-09-29 13:49 ` [Tarantool-patches] [PATCH v1] test: move error messages for tests into logs (2) Alexander V. Tikhonov
2020-10-01 22:10 ` [Tarantool-patches] [PATCH v1] test: flaky replication/replica_rejoin.test.lua Vladislav Shpilevoy
  -- strict thread matches above, loose matches on Subject: below --
2020-09-29 12:57 Alexander V. Tikhonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox