[Tarantool-patches] [PATCH v23 3/3] test: add gh-6036-qsync-order test

Serge Petrenko sergepetrenko at tarantool.org
Fri Oct 22 09:36:23 MSK 2021

22.10.2021 01:06, Vladislav Shpilevoy пишет:
>>>> Actually you do need to count writes here.
>>>> The wait_cond for ERRINJ_WAL_WRITE_COUNT == write_cnt + 3
>>>> is needed to make sure you receive (and thus try to process)
>>>> insert {3} **before** the replica is re-enabled.
>>>> Otherwise we can't be sure that the test is correct. You may simply
>>>> perform a select before insert{3} has reached the replica.
>>> You know, I spent a few hours trying to pass the test waiting for
>>> ERRINJ_WAL_WRITE_COUNT == write_cnt + 3 and finally realized that
>>> it seems that is what happens: the replica1 is not longer a leader
>>> and when this record reach our replica3 node we NOPify it then
>>> we run
>>> apply_row
>>>     if (request.type == IPROTO_NOP)
>>>       return process_nop()
>>> thus this record even not reaching the journal at all and that is
>>> why waiting for write_cnt + 3 lasts forever. If only I didn't miss
>>> something obvious.
>> Unfortunately, this is not the case. A NOP entry still reaches WAL.
>> That's why we need NOP entries: they reside in WAL but do nothing.
>> That's for vclock bump sake. Otherwise we could skip such entries
>> completely, without nopifying them.
>> So, even if the entry is nopified, it would enter WAL sooner or later.
>> I just realised what the problem is: the entry is waiting on a limbo latch
>> inside the NOPify procedure. That's why it never reaches the journal
>> (until we re-enable replica3, at least).
>> I don't know how to wait for this entry's arrival then.
>> The current test version looks OK to me.
>> Vlad, do you have any ideas here?
> I think it might worth adding an errinj for the number of blocked
> fibers waiting on the limbo latch. Could even expose that to box.info.qsync,
> seems like useful info. Would help to measure contention.

This might be useful, indeed.
Cyrill, let's implement `box.info.synchro.queue.waiters` then and use it 
in the test.
Or any other suitable name if you guys come up with one.

