[Tarantool-patches] [PATCH v1] test: flaky replication/gh-5195-qsync-*

Alexander V. Tikhonov avtikhon at tarantool.org
Thu Sep 10 11:13:03 MSK 2020


Hi Vlad, thanks for the review. I've corrected the commit message and
variable name. Also commented your review.

On Wed, Sep 09, 2020 at 11:12:40PM +0200, Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> See 3 comments below.
> 
> On 07.09.2020 14:13, Alexander V. Tikhonov wrote:
> > On heavy loaded hosts found the following issue:
> > 
> >    box.cfg{replication_synchro_quorum = 2}
> >     | ---
> >   + | - error: '[string "test_run:wait_cond(function()                ..."]:1: attempt to
> >   + |     index field ''vclock'' (a nil value)'
> 
> 1. How is that output possible? box.cfg has nothing to do with wait_cond
> nor with vclocks. The diff looks broken, misplaced.

Right, the error message from test-run was not informative and correct,
but to compare it with output of the new issues it should be saved in
commit message as it occurred. Also I've added the really caused issue
command wait_cond() from the test to the commit message.
> 
> >     | ...
> >    test_run:wait_cond(function() return f:status() == 'dead' end)
> >     | ---
> >   - | - true
> >   + | - false
> >     | ...
> >    ok, err
> >     | ---
> >   - | - true
> >   - | - [2]
> >   + | - null
> >   + | - null
> >     | ...
> >    box.space.sync:select{}
> >     | ---
> > 
> > It happened because replication vclock field was not exist at the moment
> > of its check. To fix the issue, vclock field had to be waited to be
> > available using test_run:wait_cond() routine.
> 
> 2. But in the fix you don't wait for it. You just added one more level
> of checking inside the existing wait_cond.

Right, but this check avoid of the fail, due to wait_cond() checks vclock
existence and denies use it till it will be available.

> 
> > Closes #5230
> > ---
> > 
> > Github: https://github.com/tarantool/tarantool/tree/avtikhon/gh-5230-fix-qsync-write-5195
> > Issue: https://github.com/tarantool/tarantool/issues/5230
> > 
> >  test/replication/gh-5195-qsync-replica-write.result   | 10 ++++++----
> >  test/replication/gh-5195-qsync-replica-write.test.lua |  7 +++----
> >  2 files changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/test/replication/gh-5195-qsync-replica-write.result b/test/replication/gh-5195-qsync-replica-write.result
> > index 3999e8f5e..b47359fde 100644
> > --- a/test/replication/gh-5195-qsync-replica-write.result
> > +++ b/test/replication/gh-5195-qsync-replica-write.result
> > @@ -96,10 +96,12 @@ test_run:wait_downstream(replica_id, {status='follow'})
> >   | - true
> >   | ...
> >  test_run:wait_cond(function()                                                   \
> > -        local info = box.info.replication[replica_id]                           \
> > -        local lsn = info.downstream.vclock[replica_id]                          \
> > -        return lsn and lsn >= replica_lsn                                       \
> > -end)                                                                            \
> > +        local lsn = box.info.replication[replica_id].downstream.vclock          \
> 
> 3. It is not lsn, it is vclock now. Also I don't think you need to inline 'info'
> variable, it was perfectly fine.

Right, the variable name was not correct - fixed. 'info' variable wasn't
need to be set due to used only once.

> 
> > +        return (lsn and lsn[replica_id] and lsn[replica_id] >= replica_lsn)     \
> > +    end) or box.info
> > + | ---
> > + | - true
> > + | ...


More information about the Tarantool-patches mailing list