From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpng2.m.smailru.net (smtpng2.m.smailru.net [94.100.179.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 8E752469719 for ; Thu, 10 Sep 2020 00:12:42 +0300 (MSK) References: <0e6d058ce43dcba3b8268a1a09e71e22e287a7db.1599480747.git.avtikhon@tarantool.org> From: Vladislav Shpilevoy Message-ID: Date: Wed, 9 Sep 2020 23:12:40 +0200 MIME-Version: 1.0 In-Reply-To: <0e6d058ce43dcba3b8268a1a09e71e22e287a7db.1599480747.git.avtikhon@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-* List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Alexander V. Tikhonov" , Kirill Yukhin , Serge Petrenko Cc: tarantool-patches@dev.tarantool.org Thanks for the patch! See 3 comments below. On 07.09.2020 14:13, Alexander V. Tikhonov wrote: > On heavy loaded hosts found the following issue: > > box.cfg{replication_synchro_quorum = 2} > | --- > + | - error: '[string "test_run:wait_cond(function() ..."]:1: attempt to > + | index field ''vclock'' (a nil value)' 1. How is that output possible? box.cfg has nothing to do with wait_cond nor with vclocks. The diff looks broken, misplaced. > | ... > test_run:wait_cond(function() return f:status() == 'dead' end) > | --- > - | - true > + | - false > | ... > ok, err > | --- > - | - true > - | - [2] > + | - null > + | - null > | ... > box.space.sync:select{} > | --- > > It happened because replication vclock field was not exist at the moment > of its check. To fix the issue, vclock field had to be waited to be > available using test_run:wait_cond() routine. 2. But in the fix you don't wait for it. You just added one more level of checking inside the existing wait_cond. > Closes #5230 > --- > > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195 > Issue: https://github.com/tarantool/tarantool/issues/5230 > > test/replication/gh-5195-qsync-replica-write.result | 10 ++++++---- > test/replication/gh-5195-qsync-replica-write.test.lua | 7 +++---- > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result > index 3999e8f5e..b47359fde 100644 > --- a/test/replication/gh-5195-qsync-replica-write.result > +++ b/test/replication/gh-5195-qsync-replica-write.result > @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'}) > | - true > | ... > test_run:wait_cond(function() \ > - local info = box.info.replication[replica_id] \ > - local lsn = info.downstream.vclock[replica_id] \ > - return lsn and lsn >= replica_lsn \ > -end) \ > + local lsn = box.info.replication[replica_id].downstream.vclock \ 3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info' variable, it was perfectly fine. > + return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn) \ > + end) or box.info > + | --- > + | - true > + | ...