[Tarantool-patches] [PATCH v23 3/3] test: add gh-6036-qsync-order test

Serge Petrenko sergepetrenko at tarantool.org
Wed Oct 20 09:35:43 MSK 2021



20.10.2021 01:26, Cyrill Gorcunov пишет:
> On Tue, Oct 19, 2021 at 06:09:50PM +0300, Serge Petrenko wrote:
>>> +--
>>> +-- The election_replica1 node has no clue that there is a new leader
>>> +-- and continue writing data with obsolete term. Since election_replica3
>>> +-- is delayed now the INSERT won't proceed yet but get queued.
>>> +test_run:switch("election_replica1")
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.space.test:insert{3}
>>> + | ---
>>> + | - [3]
>>> + | ...
>>> +
>>> +--
>>> +-- Finally enable election_replica3 back. Make sure the data from new election_replica2
>>> +-- leader get writing while old leader's data ignored.
>>> +test_run:switch("election_replica3")
>>> + | ---
>>> + | - true
>>> + | ...
>> Hi and thanks for the fixes!
>>
>> I have only one comment left.
>>
>> Actually you do need to count writes here.
>> The wait_cond for ERRINJ_WAL_WRITE_COUNT == write_cnt + 3
>> is needed to make sure you receive (and thus try to process)
>> insert {3} **before** the replica is re-enabled.
>>
>> Otherwise we can't be sure that the test is correct. You may simply
>> perform a select before insert{3} has reached the replica.
> You know, I spent a few hours trying to pass the test waiting for
> ERRINJ_WAL_WRITE_COUNT == write_cnt + 3 and finally realized that
> it seems that is what happens: the replica1 is not longer a leader
> and when this record reach our replica3 node we NOPify it then
> we run
>
> apply_row
>    if (request.type == IPROTO_NOP)
>      return process_nop()
>
> thus this record even not reaching the journal at all and that is
> why waiting for write_cnt + 3 lasts forever. If only I didn't miss
> something obvious.

Unfortunately, this is not the case. A NOP entry still reaches WAL.
That's why we need NOP entries: they reside in WAL but do nothing.
That's for vclock bump sake. Otherwise we could skip such entries
completely, without nopifying them.

So, even if the entry is nopified, it would enter WAL sooner or later.

I just realised what the problem is: the entry is waiting on a limbo latch
inside the NOPify procedure. That's why it never reaches the journal
(until we re-enable replica3, at least).

I don't know how to wait for this entry's arrival then.
The current test version looks OK to me.

Vlad, do you have any ideas here?

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list