From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 8 Oct 2018 22:07:59 +0300 From: Alexander Turenko Subject: Re: [PATCH 3/4] test: increase timeout to check replica status Message-ID: <20181008190759.2ugfgsfboqhm4mn7@tkn_work_nb> References: <20181003145057.68820-1-sergw@tarantool.org> <20181005090215.6160-1-sergw@tarantool.org> <20181005090215.6160-4-sergw@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20181005090215.6160-4-sergw@tarantool.org> To: Sergei Voronezhskii Cc: tarantool-patches@freelists.org, Vladimir Davydov , Georgy Kirichenko List-ID: On Fri, Oct 05, 2018 at 12:02:14PM +0300, Sergei Voronezhskii wrote: > The replica status is checked 100 times, each check within > `replica_timeout`. Refactor the code to get properly upstream. > Then in loop with little sleep check upstreams status until > it is not in follow mode. If count of checks is more than 200 > break the loop with error. The value 200 and little sleep 0.001 > choosed suitably to `replica_timeout` and `replica_connect_timeout`. replica_timeout -> replication_timeout replica_connect_timeout -> replication_connect_timeout > > Part of #2436, #3232 > --- > test/replication/misc.result | 22 ++++++++++++---------- > test/replication/misc.test.lua | 22 ++++++++++++---------- > 2 files changed, 24 insertions(+), 20 deletions(-) > > diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua > index 375c8b58a..cb658f6d0 100644 > --- a/test/replication/misc.test.lua > +++ b/test/replication/misc.test.lua > @@ -43,30 +43,32 @@ test_run:create_cluster(SERVERS, "replication", {args="0.1"}) > test_run:wait_fullmesh(SERVERS) > test_run:cmd("switch autobootstrap1") > test_run = require('test_run').new() > -box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01} > +box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2} > test_run:cmd("switch autobootstrap2") > test_run = require('test_run').new() > -box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01} > +box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2} > test_run:cmd("switch autobootstrap3") > test_run = require('test_run').new() > fiber=require('fiber') > -box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01} > +box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2} > _ = box.schema.space.create('test_timeout'):create_index('pk') > test_run:cmd("setopt delimiter ';'") > function test_timeout() > + local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream > + local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream Are the 'box' code guarantees that box.info.replication[N].upstream will update the same table? I don't think so. It is better to get these values inside the loop. Nit: too long lines. > for i = 0, 99 do > box.space.test_timeout:replace({1}) > - fiber.sleep(0.005) > - local rinfo = box.info.replication > - if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or > - rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or > - rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then > - return error('Replication broken') > - end > + local n = 200 > + repeat > + fiber.sleep(0.001) > + n = n - 1 > + if n == 0 then return error(box.info.replication) end > + until replicaA.status == 'follow' and replicaB.status == 'follow' > end > return true > end ; > test_run:cmd("setopt delimiter ''"); > +-- the replica status is checked 100 times, each check within replication_timeout Don't get the comment 'each check within replication_timeout', what does it mean? Anyway, I think this just broke the test case. It did check that replicas does not leave from the 'follow' state, now it checks nothing. I still push to the approach where QA team working closely with developers to understand cases or at least leave issues for developers to fix its test cases. I strongly against random timeout tweaks to get a test 'working'. Please, elaborate the test case with Georgy (the case was introduced in 195d4462). > test_timeout() > > -- gh-3247 - Sequence-generated value is not replicated in case > -- > 2.18.0 >