From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [87.239.111.99] (localhost [127.0.0.1]) by dev.tarantool.org (Postfix) with ESMTP id 168C36ECC0; Wed, 8 Dec 2021 13:45:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 168C36ECC0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tarantool.org; s=dev; t=1638960300; bh=9pa+8LNLArjyxATkF9fBXHXZAWebo3UYUVA69jJLeRk=; h=To:Date:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=S2GF9q5YviOE/uNF3JCgIqvzpWxktHFznlPCFnS5046EMwW6NsSY94lLlEIVf5ENU NOP5QT352FPC9zNc2WUq5TPSAFwP0aklhCJQF++cHR1z3xls8LpvzNzx3jFDFBH8o2 HB6E8t+PH49gynadMNk0JGEepMqkleYYQbTQp97Q= Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 50CA76ECC0 for ; Wed, 8 Dec 2021 13:44:59 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 dev.tarantool.org 50CA76ECC0 Received: by smtp33.i.mail.ru with esmtpa (envelope-from ) id 1muuRW-0007L8-BU; Wed, 08 Dec 2021 13:44:58 +0300 To: v.shpilevoy@tarantool.org Date: Wed, 8 Dec 2021 13:44:54 +0300 Message-Id: <20211208104454.20345-1-sergepetrenko@tarantool.org> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-7564579A: EEAE043A70213CC8 X-77F55803: 4F1203BC0FB41BD99816DF0F5D64E804DD11F3B756615DE2CDAB71F3CDD6900B182A05F5380850404C228DA9ACA6FE273D02B97B00189F16EB38AB7F62697AFD0A744107380D544D417835DFFBAE3979 X-7FA49CB5: FF5795518A3D127A4AD6D5ED66289B5278DA827A17800CE7DB6A86BDF2D5A895EA1F7E6F0F101C67BD4B6F7A4D31EC0BCC500DACC3FED6E28638F802B75D45FF8AA50765F79006377BA6DB23C50317A38638F802B75D45FF36EB9D2243A4F8B5A6FCA7DBDB1FC311F39EFFDF887939037866D6147AF826D856356119E569C08C30D3A8A1C60735BA117882F4460429724CE54428C33FAD305F5C1EE8F4F765FC3A703B70628EAD7BA471835C12D1D9774AD6D5ED66289B52BA9C0B312567BB23117882F446042972877693876707352026055571C92BF10F618001F51B5FD3F9D2E47CDBA5A96583BA9C0B312567BB2376E601842F6C81A19E625A9149C048EEFAD5A440E159F97D03F1AB874ED890284AD6D5ED66289B52698AB9A7B718F8C46E0066C2D8992A16725E5C173C3A84C391CBEB1991AB40D5BA3038C0950A5D36B5C8C57E37DE458B0BC6067A898B09E46D1867E19FE14079C09775C1D3CA48CF3D321E7403792E342EB15956EA79C166A417C69337E82CC275ECD9A6C639B01B78DA827A17800CE732FCE54C4D9A645443847C11F186F3C59DAA53EE0834AAEE X-C1DE0DAB: C20DE7B7AB408E4181F030C43753B8186998911F362727C414F749A5E30D975C5478C2A5D31ECACFBFC51036041321783E34F0DEA4CC34939C2B6934AE262D3EE7EAB7254005DCEDC1ACE034E8A9AD111E0A4E2319210D9B64D260DF9561598F01A9E91200F654B02F433CA60753AEF28E8E86DC7131B365E7726E8460B7C23C X-C8649E89: 4E36BF7865823D7055A7F0CF078B5EC49A30900B95165D34F8A2DF183797A6A8E7A3DB9B1A44C44A71AE3984DB345BB6558AE91C0449715040E076DE1369EC2A1D7E09C32AA3244CE3FBDC058580FDE493E9880C06B4A464408A6A02710B7304FACE5A9C96DEB163 X-D57D3AED: 3ZO7eAau8CL7WIMRKs4sN3D3tLDjz0dLbV79QFUyzQ2Ujvy7cMT6pYYqY16iZVKkSc3dCLJ7zSJH7+u4VD18S7Vl4ZUrpaVfd2+vE6kuoey4m4VkSEu530nj6fImhcD4MUrOEAnl0W826KZ9Q+tr5ycPtXkTV4k65bRjmOUUP8cvGozZ33TWg5HZplvhhXbhDGzqmQDTd6OAevLeAnq3Ra9uf7zvY2zzsIhlcp/Y7m53TZgf2aB4JOg4gkr2biojnkZtajpA89VOlGBJxNeINw== X-Mailru-Sender: 11C2EC085EDE56FA38FD4C59F7EFE407167756FB0DB8BE41B2F7B8025DB7F6373DCE99E73093D3FC6BB2E709EA627F343C7DDD459B58856F0E45BC603594F5A135B915D4279FF0574198E0F3ECE9B5443453F38A29522196 X-Mras: Ok Subject: [Tarantool-patches] [PATCH] replication: fix flaky gh-3160-misc... test X-BeenThere: tarantool-patches@dev.tarantool.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Serge Petrenko via Tarantool-patches Reply-To: Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Errors-To: tarantool-patches-bounces@dev.tarantool.org Sender: "Tarantool-patches" The test fails quite often with one of the following results. Either [001] @@ -60,7 +60,7 @@ [001] ... [001] test_timeout() [001] --- [001] -- true [001] +- false [001] ... [001] test_run:cmd("switch default") [001] --- [001] Or [034] test_timeout() [034] --- [034] -- true [034] +- error: 'replicas are not in the follow status' [034] ... Both errors are caused by wait_cond checking saved `box.info.replication` values instead of actual `box.info.replication` output. Fix this. Also, the test's quite long for no reason. The wait_cond waiting for replication to break lasts a whole `replication_timeout`, which is excess. The test's idea, as stated in the commit that has introduced it (195d4462: Send relay heartbeat if wal changes won't be send) is to constantly relay some data to the remote peers and make it so their relays never send heartbeats. If we wait for a whole replication timeout between inserts, there's a high chance of peer relays waking up naturally (after replication_timeout passes). In this case the test tests nothing. Fix the issue by waiting for ~ 1/5 of replication_timeout in between the despatches. This also reduces test run time from ~4.5 to ~1.5 seconds on my machine. Remove the test from fragile list, since it shouldn't be flaky anymore. Closes #4940 --- https://github.com/tarantool/tarantool/issues/4940 https://github.com/tarantool/tarantool/tree/sp/gh-3160-fix ...0-misc-heartbeats-on-master-changes.result | 22 ++++++++++------- ...misc-heartbeats-on-master-changes.test.lua | 24 ++++++++++++------- test/replication/suite.ini | 4 ---- 3 files changed, 28 insertions(+), 22 deletions(-) diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result index 86e5ddfa0..101317ebd 100644 --- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.result +++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.result @@ -26,23 +26,27 @@ test_run:cmd("setopt delimiter ';'") --- - true ... -function wait_not_follow(replicaA, replicaB) +local function replica(id) + return box.info.replication[id].upstream +end + +function wait_not_follow(id_a, id_b) return test_run:wait_cond(function() - return replicaA.status ~= 'follow' or replicaB.status ~= 'follow' - end, box.cfg.replication_timeout) + return replica(id_a).status ~= 'follow' or + replica(id_b).status ~= 'follow' + end, box.cfg.replication_timeout / 5) end; --- ... function test_timeout() - local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream - local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream - local follows = test_run:wait_cond(function() - return replicaA.status == 'follow' or replicaB.status == 'follow' - end) + local id_a = box.info.id % 3 + 1 + local id_b = id_a % 3 + 1 + local follows = test_run:wait_upstream(id_a, {status = 'follow'}) + follows = follows and test_run:wait_upstream(id_b, {status = 'follow'}) if not follows then error('replicas are not in the follow status') end for i = 0, 99 do box.space.test_timeout:replace({1}) - if wait_not_follow(replicaA, replicaB) then + if wait_not_follow(id_a, id_b) then require('log').error(box.info.replication) return false end diff --git a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua index bfc9f854f..21a6f1443 100644 --- a/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua +++ b/test/replication/gh-3160-misc-heartbeats-on-master-changes.test.lua @@ -10,21 +10,27 @@ test_run:cmd("switch autobootstrap3") test_run = require('test_run').new() _ = box.schema.space.create('test_timeout'):create_index('pk') test_run:cmd("setopt delimiter ';'") -function wait_not_follow(replicaA, replicaB) + +local function replica(id) + return box.info.replication[id].upstream +end + +function wait_not_follow(id_a, id_b) return test_run:wait_cond(function() - return replicaA.status ~= 'follow' or replicaB.status ~= 'follow' - end, box.cfg.replication_timeout) + return replica(id_a).status ~= 'follow' or + replica(id_b).status ~= 'follow' + end, box.cfg.replication_timeout / 5) end; + function test_timeout() - local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream - local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream - local follows = test_run:wait_cond(function() - return replicaA.status == 'follow' or replicaB.status == 'follow' - end) + local id_a = box.info.id % 3 + 1 + local id_b = id_a % 3 + 1 + local follows = test_run:wait_upstream(id_a, {status = 'follow'}) + follows = follows and test_run:wait_upstream(id_b, {status = 'follow'}) if not follows then error('replicas are not in the follow status') end for i = 0, 99 do box.space.test_timeout:replace({1}) - if wait_not_follow(replicaA, replicaB) then + if wait_not_follow(id_a, id_b) then require('log').error(box.info.replication) return false end diff --git a/test/replication/suite.ini b/test/replication/suite.ini index e20d63a6e..347710f58 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -21,10 +21,6 @@ fragile = { "issues": [ "gh-4351" ], "checksums": [ "acd88b48b0046ec52346274eeeef0b25", "a645ff7616b5caf0fcd2099022b776bf", "eb3e92564ba71e7b7c458050223f4d57" ] }, - "gh-3160-misc-heartbeats-on-master-changes.test.lua": { - "issues": [ "gh-4940" ], - "checksums": [ "945521821b8199c59716e969d89d953d", "b4e60f8ec2d4340bc0324f73e2cc8a01", "c7054aec18a7a983c717f1b92dd1434c", "09500c4d118ace1e05b23665ba055bf5", "60d4cbd20d4c646deb9464f82fabffb4" ] - }, "skip_conflict_row.test.lua": { "issues": [ "gh-4958" ], "checksums": [ "a21f07339237cd9d0b8c74e144284449", "0359b0b1cc80052faf96972959513694", "ef104dfd04afa7c75087de13246e3eb0" ] -- 2.30.1 (Apple Git-130)