From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp40.i.mail.ru (smtp40.i.mail.ru [94.100.177.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8C823469719 for ; Thu, 10 Sep 2020 11:13:05 +0300 (MSK) Date: Thu, 10 Sep 2020 11:13:03 +0300 From: "Alexander V. Tikhonov" Message-ID: <20200910081303.GA11605@hpalx> References: <0e6d058ce43dcba3b8268a1a09e71e22e287a7db.1599480747.git.avtikhon@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-* List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Hi Vlad, thanks for the review. I've corrected the commit message and variable name. Also commented your review. On Wed, Sep 09, 2020 at 11:12:40PM +0200, Vladislav Shpilevoy wrote: > Thanks for the patch! > > See 3 comments below. > > On 07.09.2020 14:13, Alexander V. Tikhonov wrote: > > On heavy loaded hosts found the following issue: > > > > box.cfg{replication_synchro_quorum = 2} > > | --- > > + | - error: '[string "test_run:wait_cond(function() ..."]:1: attempt to > > + | index field ''vclock'' (a nil value)' > > 1. How is that output possible? box.cfg has nothing to do with wait_cond > nor with vclocks. The diff looks broken, misplaced. Right, the error message from test-run was not informative and correct, but to compare it with output of the new issues it should be saved in commit message as it occurred. Also I've added the really caused issue command wait_cond() from the test to the commit message. > > > | ... > > test_run:wait_cond(function() return f:status() == 'dead' end) > > | --- > > - | - true > > + | - false > > | ... > > ok, err > > | --- > > - | - true > > - | - [2] > > + | - null > > + | - null > > | ... > > box.space.sync:select{} > > | --- > > > > It happened because replication vclock field was not exist at the moment > > of its check. To fix the issue, vclock field had to be waited to be > > available using test_run:wait_cond() routine. > > 2. But in the fix you don't wait for it. You just added one more level > of checking inside the existing wait_cond. Right, but this check avoid of the fail, due to wait_cond() checks vclock existence and denies use it till it will be available. > > > Closes #5230 > > --- > > > > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195 > > Issue: https://github.com/tarantool/tarantool/issues/5230 > > > > test/replication/gh-5195-qsync-replica-write.result | 10 ++++++---- > > test/replication/gh-5195-qsync-replica-write.test.lua | 7 +++---- > > 2 files changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result > > index 3999e8f5e..b47359fde 100644 > > --- a/test/replication/gh-5195-qsync-replica-write.result > > +++ b/test/replication/gh-5195-qsync-replica-write.result > > @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'}) > > | - true > > | ... > > test_run:wait_cond(function() \ > > - local info = box.info.replication[replica_id] \ > > - local lsn = info.downstream.vclock[replica_id] \ > > - return lsn and lsn >= replica_lsn \ > > -end) \ > > + local lsn = box.info.replication[replica_id].downstream.vclock \ > > 3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info' > variable, it was perfectly fine. Right, the variable name was not correct - fixed. 'info' variable wasn't need to be set due to used only once. > > > + return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn) \ > > + end) or box.info > > + | --- > > + | - true > > + | ...