From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp61.i.mail.ru (smtp61.i.mail.ru [217.69.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 390E8469719 for ; Mon, 7 Sep 2020 04:00:14 +0300 (MSK) From: "Alexander V. Tikhonov" Date: Mon, 7 Sep 2020 04:00:11 +0300 Message-Id: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org> Subject: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kirill Yukhin , Serge Petrenko Cc: tarantool-patches@dev.tarantool.org On heavy loaded hosts found the following 3 issues: line 174: [026] --- replication/status.result Thu Jun 11 12:07:39 2020 [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020 [026] @@ -174,15 +174,17 @@ [026] ... [026] replica.downstream.status == 'follow' [026] --- [026] -- true [026] +- false [026] ... It happened because replication downstream status check occurred too early. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_downstream() routine. line 178: [024] --- replication/status.result Mon Sep 7 00:22:52 2020 [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020 [024] @@ -178,11 +178,13 @@ [024] ... [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id] [024] --- [024] -- true [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to [024] + index field ''vclock'' (a nil value)' [024] ... [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] [024] --- [024] -- true [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to [024] + index field ''vclock'' (a nil value)' [024] ... [024] -- [024] -- Replica It happened because replication vclock field was not exist at the moment of its check. To fix the issue, vclock field had to be waited to be available using test_run:wait_cond() routine. Also the replication data downstream had to be read at the same moment. line 224: [014] --- replication/status.result Fri Jul 3 04:29:56 2020 [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020 [014] @@ -224,7 +224,7 @@ [014] ... [014] master.upstream.status == "follow" [014] --- [014] -- true [014] +- false [014] ... [014] master.upstream.lag < 1 [014] --- It happened because replication upstream status check occurred too early. To give the replication status check routine ability to reach the needed 'follow' state, it need to wait for it using test_run:wait_upstream() routine. Removed test from 'fragile' test_run tool list to run it in parallel. Closes #5110 --- Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174 Issue: https://github.com/tarantool/tarantool/issues/5110 test/replication/status.result | 16 +++++++++------- test/replication/status.test.lua | 13 +++++++++---- test/replication/suite.ini | 1 - 3 files changed, 18 insertions(+), 12 deletions(-) diff --git a/test/replication/status.result b/test/replication/status.result index a86f48774..d5addbc80 100644 --- a/test/replication/status.result +++ b/test/replication/status.result @@ -172,15 +172,17 @@ replica.upstream == nil --- - true ... -replica.downstream.status == 'follow' +test_run:wait_downstream(replica_id, {status == 'follow'}) --- - true ... -replica.downstream.vclock[master_id] == box.info.vclock[master_id] ---- -- true -... -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] +-- wait for the replication vclock +test_run:wait_cond(function() \ + local r = box.info.replication[replica_id].downstream.vclock \ + return (r ~= nil and box.info.vclock ~= nil and \ + r[master_id] == box.info.vclock[master_id] and \ + r[replica_id] == box.info.vclock[replica_id]) \ + end) or box.info --- - true ... @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2] --- - true ... -master.upstream.status == "follow" +test_run:wait_upstream(master_id, {status == 'follow'}) --- - true ... diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua index 090968172..6006ce9cf 100644 --- a/test/replication/status.test.lua +++ b/test/replication/status.test.lua @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2] -- replica.lsn == box.info.vclock[replica_id] replica.lsn == 0 replica.upstream == nil -replica.downstream.status == 'follow' -replica.downstream.vclock[master_id] == box.info.vclock[master_id] -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] +test_run:wait_downstream(replica_id, {status == 'follow'}) +-- wait for the replication vclock +test_run:wait_cond(function() \ + local r = box.info.replication[replica_id].downstream.vclock \ + return (r ~= nil and box.info.vclock ~= nil and \ + r[master_id] == box.info.vclock[master_id] and \ + r[replica_id] == box.info.vclock[replica_id]) \ + end) or box.info -- -- Replica @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2 master = box.info.replication[master_id] master.id == master_id master.uuid == box.space._cluster:get(master_id)[2] -master.upstream.status == "follow" +test_run:wait_upstream(master_id, {status == 'follow'}) master.upstream.lag < 1 master.upstream.idle < 1 master.upstream.peer:match("unix/") diff --git a/test/replication/suite.ini b/test/replication/suite.ini index ab9c3dabd..9bba9d125 100644 --- a/test/replication/suite.ini +++ b/test/replication/suite.ini @@ -23,4 +23,3 @@ fragile = errinj.test.lua ; gh-3870 box_set_replication_stress.test.lua ; gh-4992 gh-4986 gh-4605-empty-password.test.lua ; gh-5030 anon.test.lua ; gh-5058 - status.test.lua ; gh-5110 -- 2.17.1