From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.mail.ru (smtp1.mail.ru [94.100.179.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 58E1C46970F for ; Thu, 21 Nov 2019 02:01:03 +0300 (MSK) References: <071be8487c8db4ce13a157db292c20edcc37614e.1574290043.git.i.kosarev@tarantool.org> From: Vladislav Shpilevoy Message-ID: <26b7d94a-1564-e9d6-0e22-837c5cbe124b@tarantool.org> Date: Thu, 21 Nov 2019 00:07:33 +0100 MIME-Version: 1.0 In-Reply-To: <071be8487c8db4ce13a157db292c20edcc37614e.1574290043.git.i.kosarev@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 2/2] test: stabilize quorum test conditions List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ilya Kosarev , tarantool-patches@dev.tarantool.org Hm. So you are not going to fix the flaky error I mentioned in the previous thread about this commit? Seems like it is also about 'conditions which need time to be satisfied'. On 20/11/2019 23:47, Ilya Kosarev wrote: > There were some pass conditions in quorum test which could take some > time to be satisfied. Now they are wrapped using test_run:wait_cond to > make the test stable. > > Part of #4586 > --- > test/replication/quorum.result | 30 +++++++++++++++++------------- > test/replication/quorum.test.lua | 18 +++++++++--------- > 2 files changed, 26 insertions(+), 22 deletions(-) > > diff --git a/test/replication/quorum.result b/test/replication/quorum.result > index ff5fa0150..12604c8de 100644 > --- a/test/replication/quorum.result > +++ b/test/replication/quorum.result > @@ -115,15 +115,15 @@ box.info.status -- running > - running > ... > -- Check that the replica follows all masters. > -box.info.id == 1 or box.info.replication[1].upstream.status == 'follow' > +box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20) > --- > - true > ... > -box.info.id == 2 or box.info.replication[2].upstream.status == 'follow' > +box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20) > --- > - true > ... > -box.info.id == 3 or box.info.replication[3].upstream.status == 'follow' > +box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20) > --- > - true > ... > @@ -149,6 +149,10 @@ test_run:cmd('stop server quorum1') > --- > - true > ... > +test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20) > +--- > +- true > +... > for i = 1, 100 do box.space.test:insert{i} end > --- > ... > @@ -166,9 +170,9 @@ test_run:cmd('switch quorum1') > --- > - true > ... > -box.space.test:count() -- 100 > +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20) > --- > -- 100 > +- true > ... > -- Rebootstrap one node of the cluster and check that others follow. > -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay > @@ -197,9 +201,9 @@ test_run:cmd('switch quorum1') > - true > ... > test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"') > -box.space.test:count() -- 100 > +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20) > --- > -- 100 > +- true > ... > -- The rebootstrapped replica will be assigned id = 4, > -- because ids 1..3 are busy. > @@ -207,11 +211,9 @@ test_run:cmd('switch quorum2') > --- > - true > ... > -fiber = require('fiber') > ---- > -... > -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end > +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20) > --- > +- true > ... > box.info.replication[4].upstream.status > --- > @@ -221,11 +223,13 @@ test_run:cmd('switch quorum3') > --- > - true > ... > -fiber = require('fiber') > +test_run:wait_cond(function() return box.info.replication ~= nil end, 20) > --- > +- true > ... > -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end > +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20) > --- > +- true > ... > box.info.replication[4].upstream.status > --- > diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua > index 98febb367..be23200d3 100644 > --- a/test/replication/quorum.test.lua > +++ b/test/replication/quorum.test.lua > @@ -47,9 +47,9 @@ box.info.ro -- false > box.info.status -- running > > -- Check that the replica follows all masters. > -box.info.id == 1 or box.info.replication[1].upstream.status == 'follow' > -box.info.id == 2 or box.info.replication[2].upstream.status == 'follow' > -box.info.id == 3 or box.info.replication[3].upstream.status == 'follow' > +box.info.id == 1 or test_run:wait_cond(function() return box.info.replication[1].upstream.status == 'follow' end, 20) > +box.info.id == 2 or test_run:wait_cond(function() return box.info.replication[2].upstream.status == 'follow' end, 20) > +box.info.id == 3 or test_run:wait_cond(function() return box.info.replication[3].upstream.status == 'follow' end, 20) > > -- Check that box.cfg() doesn't return until the instance > -- catches up with all configured replicas. > @@ -59,13 +59,14 @@ test_run:cmd('switch quorum2') > box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001) > test_run:cmd('stop server quorum1') > > +test_run:wait_cond(function() return box.space.test.index.primary ~= nil end, 20) > for i = 1, 100 do box.space.test:insert{i} end > fiber = require('fiber') > fiber.sleep(0.1) > > test_run:cmd('start server quorum1 with args="0.1 0.5"') > test_run:cmd('switch quorum1') > -box.space.test:count() -- 100 > +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20) > > -- Rebootstrap one node of the cluster and check that others follow. > -- Note, due to ERRINJ_RELAY_TIMEOUT there is a substantial delay > @@ -81,17 +82,16 @@ box.snapshot() > test_run:cmd('switch quorum1') > test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"') > > -box.space.test:count() -- 100 > +test_run:wait_cond(function() return box.space.test:count() == 100 end, 20) > > -- The rebootstrapped replica will be assigned id = 4, > -- because ids 1..3 are busy. > test_run:cmd('switch quorum2') > -fiber = require('fiber') > -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end > +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20) > box.info.replication[4].upstream.status > test_run:cmd('switch quorum3') > -fiber = require('fiber') > -while box.info.replication[4].upstream.status ~= 'follow' do fiber.sleep(0.001) end > +test_run:wait_cond(function() return box.info.replication ~= nil end, 20) > +test_run:wait_cond(function() return box.info.replication[4].upstream.status == 'follow' end, 20) > box.info.replication[4].upstream.status > > -- Cleanup. >