From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp59.i.mail.ru (smtp59.i.mail.ru [217.69.128.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 763B7469719 for ; Wed, 9 Sep 2020 18:41:13 +0300 (MSK) References: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org> From: Serge Petrenko Message-ID: <23d93ca8-c632-aab5-7e57-44b6f8c3a93e@tarantool.org> Date: Wed, 9 Sep 2020 18:41:11 +0300 MIME-Version: 1.0 In-Reply-To: <87a0a297a6c954f37bed6a79f81949a87196a218.1599440334.git.avtikhon@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: ru Subject: Re: [Tarantool-patches] [PATCH v1] test: flaky replication/status.test.lua status List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" , Kirill Yukhin Cc: tarantool-patches@dev.tarantool.org 07.09.2020 04:00, Alexander V. Tikhonov пишет: > On heavy loaded hosts found the following 3 issues: > > line 174: > > [026] --- replication/status.result Thu Jun 11 12:07:39 2020 > [026] +++ replication/status.reject Sun Jun 14 03:20:21 2020 > [026] @@ -174,15 +174,17 @@ > [026] ... > [026] replica.downstream.status == 'follow' > [026] --- > [026] -- true > [026] +- false > [026] ... > > It happened because replication downstream status check occurred too > early. To give the replication status check routine ability to reach > the needed 'follow' state, it need to wait for it using > test_run:wait_downstream() routine. > > line 178: > > [024] --- replication/status.result Mon Sep 7 00:22:52 2020 > [024] +++ replication/status.reject Mon Sep 7 00:36:01 2020 > [024] @@ -178,11 +178,13 @@ > [024] ... > [024] replica.downstream.vclock[master_id] == box.info.vclock[master_id] > [024] --- > [024] -- true > [024] +- error: '[string "return replica.downstream.vclock[master_id] =..."]:1: attempt to > [024] + index field ''vclock'' (a nil value)' > [024] ... > [024] replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] > [024] --- > [024] -- true > [024] +- error: '[string "return replica.downstream.vclock[replica_id] ..."]:1: attempt to > [024] + index field ''vclock'' (a nil value)' > [024] ... > [024] -- > [024] -- Replica > > It happened because replication vclock field was not exist at the moment > of its check. To fix the issue, vclock field had to be waited to be > available using test_run:wait_cond() routine. Also the replication data > downstream had to be read at the same moment. > > line 224: > > [014] --- replication/status.result Fri Jul 3 04:29:56 2020 > [014] +++ replication/status.reject Mon Sep 7 00:17:30 2020 > [014] @@ -224,7 +224,7 @@ > [014] ... > [014] master.upstream.status == "follow" > [014] --- > [014] -- true > [014] +- false > [014] ... > [014] master.upstream.lag < 1 > [014] --- > > It happened because replication upstream status check occurred too > early. To give the replication status check routine ability to reach > the needed 'follow' state, it need to wait for it using > test_run:wait_upstream() routine. > > Removed test from 'fragile' test_run tool list to run it in parallel. > > Closes #5110 > --- > > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5110-repl-status-174 > Issue: https://github.com/tarantool/tarantool/issues/5110 > > test/replication/status.result | 16 +++++++++------- > test/replication/status.test.lua | 13 +++++++++---- > test/replication/suite.ini | 1 - > 3 files changed, 18 insertions(+), 12 deletions(-) > > diff --git a/test/replication/status.result b/test/replication/status.result > index a86f48774..d5addbc80 100644 > --- a/test/replication/status.result > +++ b/test/replication/status.result > @@ -172,15 +172,17 @@ replica.upstream == nil > --- > - true > ... > -replica.downstream.status == 'follow' > +test_run:wait_downstream(replica_id, {status == 'follow'}) > --- > - true > ... > -replica.downstream.vclock[master_id] == box.info.vclock[master_id] > ---- > -- true > -... > -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] > +-- wait for the replication vclock > +test_run:wait_cond(function() \ > + local r = box.info.replication[replica_id].downstream.vclock \ > + return (r ~= nil and box.info.vclock ~= nil and \ > + r[master_id] == box.info.vclock[master_id] and \ > + r[replica_id] == box.info.vclock[replica_id]) \ > + end) or box.info > --- > - true > ... > @@ -222,7 +224,7 @@ master.uuid == box.space._cluster:get(master_id)[2] > --- > - true > ... > -master.upstream.status == "follow" > +test_run:wait_upstream(master_id, {status == 'follow'}) > --- > - true > ... > diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua > index 090968172..6006ce9cf 100644 > --- a/test/replication/status.test.lua > +++ b/test/replication/status.test.lua > @@ -64,9 +64,14 @@ replica.uuid == box.space._cluster:get(replica_id)[2] > -- replica.lsn == box.info.vclock[replica_id] > replica.lsn == 0 > replica.upstream == nil > -replica.downstream.status == 'follow' > -replica.downstream.vclock[master_id] == box.info.vclock[master_id] > -replica.downstream.vclock[replica_id] == box.info.vclock[replica_id] > +test_run:wait_downstream(replica_id, {status == 'follow'}) > +-- wait for the replication vclock > +test_run:wait_cond(function() \ > + local r = box.info.replication[replica_id].downstream.vclock \ > + return (r ~= nil and box.info.vclock ~= nil and \ > + r[master_id] == box.info.vclock[master_id] and \ > + r[replica_id] == box.info.vclock[replica_id]) \ > + end) or box.info > > -- > -- Replica > @@ -83,7 +88,7 @@ box.info.vclock[master_id] == 2 > master = box.info.replication[master_id] > master.id == master_id > master.uuid == box.space._cluster:get(master_id)[2] > -master.upstream.status == "follow" > +test_run:wait_upstream(master_id, {status == 'follow'}) > master.upstream.lag < 1 > master.upstream.idle < 1 > master.upstream.peer:match("unix/") > diff --git a/test/replication/suite.ini b/test/replication/suite.ini > index ab9c3dabd..9bba9d125 100644 > --- a/test/replication/suite.ini > +++ b/test/replication/suite.ini > @@ -23,4 +23,3 @@ fragile = errinj.test.lua ; gh-3870 > box_set_replication_stress.test.lua ; gh-4992 gh-4986 > gh-4605-empty-password.test.lua ; gh-5030 > anon.test.lua ; gh-5058 > - status.test.lua ; gh-5110 LGTM -- Serge Petrenko