Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: v.shpilevoy@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: [Tarantool-patches] [PATCH] replication: fix flaky gh-3160-misc... test
Date: Wed,  8 Dec 2021 13:44:54 +0300	[thread overview]
Message-ID: <20211208104454.20345-1-sergepetrenko@tarantool.org> (raw)

The test fails quite often with one of the following results.

Either

[001] @@ -60,7 +60,7 @@
[001]  ...
[001]  test_timeout()
[001]  ---
[001] -- true
[001] +- false
[001]  ...
[001]  test_run:cmd("switch default")
[001]  ---
[001]

Or

[034]  test_timeout()
[034]  ---
[034] -- true
[034] +- error: 'replicas are not in the follow status'
[034]  ...

Both errors are caused by wait_cond checking saved
`box.info.replication` values instead of actual `box.info.replication`
output. Fix this.

Also, the test's quite long for no reason.

The wait_cond waiting for replication to break lasts a whole
`replication_timeout`, which is excess.

The test's idea, as stated in the commit that has introduced it
(195d4462: Send relay heartbeat if wal changes won't be send) is to
constantly relay some data to the remote peers and make
it so their relays never send heartbeats.

If we wait for a whole replication timeout between inserts, there's a high
chance of peer relays waking up naturally (after replication_timeout passes).

In this case the test tests nothing. Fix the issue by waiting for ~ 1/5
of replication_timeout in between the despatches. This also reduces test
run time from ~4.5 to ~1.5 seconds on my machine.

Remove the test from fragile list, since it shouldn't be flaky anymore.

Closes #4940
---
https://github.com/tarantool/tarantool/issues/4940
https://github.com/tarantool/tarantool/tree/sp/gh-3160-fix

 ...0-misc-heartbeats-on-master-changes.result | 22 ++++++++++-------
 ...misc-heartbeats-on-master-changes.test.lua | 24 ++++++++++++-------
 test/replication/suite.ini                    |  4 ----
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
index 86e5ddfa0..101317ebd 100644
--- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
+++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result
@@ -26,23 +26,27 @@ test_run:cmd("setopt delimiter ';'")
 ---
 - true
 ...
-function wait_not_follow(replicaA, replicaB)
+local function replica(id)
+    return box.info.replication[id].upstream
+end
+
+function wait_not_follow(id_a, id_b)
     return test_run:wait_cond(function()
-        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
-    end, box.cfg.replication_timeout)
+        return replica(id_a).status ~= 'follow' or
+               replica(id_b).status ~= 'follow'
+    end, box.cfg.replication_timeout / 5)
 end;
 ---
 ...
 function test_timeout()
-    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
-    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
-    local follows = test_run:wait_cond(function()
-        return replicaA.status == 'follow' or replicaB.status == 'follow'
-    end)
+    local id_a = box.info.id % 3 + 1
+    local id_b = id_a % 3 + 1
+    local follows = test_run:wait_upstream(id_a, {status = 'follow'})
+    follows = follows and test_run:wait_upstream(id_b, {status = 'follow'})
     if not follows then error('replicas are not in the follow status') end
     for i = 0, 99 do
         box.space.test_timeout:replace({1})
-        if wait_not_follow(replicaA, replicaB) then
+        if wait_not_follow(id_a, id_b) then
             require('log').error(box.info.replication)
             return false
         end
diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
index bfc9f854f..21a6f1443 100644
--- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
+++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua
@@ -10,21 +10,27 @@ test_run:cmd("switch autobootstrap3")
 test_run = require('test_run').new()
 _ = box.schema.space.create('test_timeout'):create_index('pk')
 test_run:cmd("setopt delimiter ';'")
-function wait_not_follow(replicaA, replicaB)
+
+local function replica(id)
+    return box.info.replication[id].upstream
+end
+
+function wait_not_follow(id_a, id_b)
     return test_run:wait_cond(function()
-        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
-    end, box.cfg.replication_timeout)
+        return replica(id_a).status ~= 'follow' or
+               replica(id_b).status ~= 'follow'
+    end, box.cfg.replication_timeout / 5)
 end;
+
 function test_timeout()
-    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
-    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
-    local follows = test_run:wait_cond(function()
-        return replicaA.status == 'follow' or replicaB.status == 'follow'
-    end)
+    local id_a = box.info.id % 3 + 1
+    local id_b = id_a % 3 + 1
+    local follows = test_run:wait_upstream(id_a, {status = 'follow'})
+    follows = follows and test_run:wait_upstream(id_b, {status = 'follow'})
     if not follows then error('replicas are not in the follow status') end
     for i = 0, 99 do
         box.space.test_timeout:replace({1})
-        if wait_not_follow(replicaA, replicaB) then
+        if wait_not_follow(id_a, id_b) then
             require('log').error(box.info.replication)
             return false
         end
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index e20d63a6e..347710f58 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -21,10 +21,6 @@ fragile = {
             "issues": [ "gh-4351" ],
             "checksums": [ "acd88b48b0046ec52346274eeeef0b25", "a645ff7616b5caf0fcd2099022b776bf", "eb3e92564ba71e7b7c458050223f4d57" ]
         },
-        "gh-3160-misc-heartbeats-on-master-changes.test.lua": {
-            "issues": [ "gh-4940" ],
-            "checksums": [ "945521821b8199c59716e969d89d953d", "b4e60f8ec2d4340bc0324f73e2cc8a01", "c7054aec18a7a983c717f1b92dd1434c", "09500c4d118ace1e05b23665ba055bf5", "60d4cbd20d4c646deb9464f82fabffb4" ]
-        },
         "skip_conflict_row.test.lua": {
             "issues": [ "gh-4958" ],
             "checksums": [ "a21f07339237cd9d0b8c74e144284449", "0359b0b1cc80052faf96972959513694", "ef104dfd04afa7c75087de13246e3eb0" ]
-- 
2.30.1 (Apple Git-130)


             reply	other threads:[~2021-12-08 10:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 10:44 Serge Petrenko via Tarantool-patches [this message]
2021-12-08 11:00 ` Vladimir Davydov via Tarantool-patches
2021-12-08 11:13   ` Serge Petrenko via Tarantool-patches
2021-12-08 11:25     ` Vladimir Davydov via Tarantool-patches
2021-12-08 11:56       ` Serge Petrenko via Tarantool-patches
2021-12-08 23:45 ` Vladislav Shpilevoy via Tarantool-patches
2021-12-09 10:19   ` Serge Petrenko via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211208104454.20345-1-sergepetrenko@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH] replication: fix flaky gh-3160-misc... test' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox