[Tarantool-patches] [PATCH v23 3/3] test: add gh-6036-qsync-order test

Cyrill Gorcunov gorcunov at gmail.com
Wed Oct 20 01:26:46 MSK 2021


On Tue, Oct 19, 2021 at 06:09:50PM +0300, Serge Petrenko wrote:
> > +--
> > +-- The election_replica1 node has no clue that there is a new leader
> > +-- and continue writing data with obsolete term. Since election_replica3
> > +-- is delayed now the INSERT won't proceed yet but get queued.
> > +test_run:switch("election_replica1")
> > + | ---
> > + | - true
> > + | ...
> > +box.space.test:insert{3}
> > + | ---
> > + | - [3]
> > + | ...
> > +
> > +--
> > +-- Finally enable election_replica3 back. Make sure the data from new election_replica2
> > +-- leader get writing while old leader's data ignored.
> > +test_run:switch("election_replica3")
> > + | ---
> > + | - true
> > + | ...
> 
> Hi and thanks for the fixes!
> 
> I have only one comment left.
> 
> Actually you do need to count writes here.
> The wait_cond for ERRINJ_WAL_WRITE_COUNT == write_cnt + 3
> is needed to make sure you receive (and thus try to process)
> insert {3} **before** the replica is re-enabled.
> 
> Otherwise we can't be sure that the test is correct. You may simply
> perform a select before insert{3} has reached the replica.

You know, I spent a few hours trying to pass the test waiting for
ERRINJ_WAL_WRITE_COUNT == write_cnt + 3 and finally realized that
it seems that is what happens: the replica1 is not longer a leader
and when this record reach our replica3 node we NOPify it then
we run

apply_row
  if (request.type == IPROTO_NOP)
    return process_nop()

thus this record even not reaching the journal at all and that is
why waiting for write_cnt + 3 lasts forever. If only I didn't miss
something obvious.


More information about the Tarantool-patches mailing list