[Tarantool-patches] [PATCH v19 3/3] test: add gh-6036-qsync-order test

Serge Petrenko sergepetrenko at tarantool.org
Tue Oct 5 16:55:24 MSK 2021



05.10.2021 00:16, Cyrill Gorcunov пишет:
> On Fri, Oct 01, 2021 at 03:30:41PM +0300, Serge Petrenko wrote:
>> Thanks for the test!
>> Please, find a couple of comments below.
>> I think the test won't be flaky anymore once you fix my comments.
> Thanks for comments, Serge!
>
>>> +
>>> +box.once("bootstrap", function()
>>> +    box.schema.user.grant('guest', 'super')
>>> +end)
>> Looks like "election_replica.lua" suits our needs perfectly now.
>> No need to introduce a new instance file.
> yup, I updated the test, thanks!
>
>>> +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +
>>
>> You may replace both calls with test_run:wait_lsn('master', 'replica')
>> Even without switching.
> Actually I need switch, otherwise i got stuck, so I use
>
> test_run:switch("master")
> box.ctl.promote()
> s = box.schema.create_space('test', {is_sync = true})
> _ = s:create_index('pk')
> s:insert{1}
>
> test_run:switch("replica1")
> test_run:wait_lsn('replica1', 'master')
>
> test_run:switch("replica2")
> test_run:wait_lsn('replica2', 'master')
>
>
> which works just fine

Yep, that's what I meant.

>
>>> +
>>> +--
>>> +-- Make replica1 been a leader and start writting data,
>>> +-- the PROMOTE request get queued on replica2 and not
>>> +-- yet processed, same time INSERT won't complete either
>>> +-- waiting for PROMOTE completion first.
>>> +test_run:switch("replica1")
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.ctl.promote()
>>> + | ---
>>> + | ...
>>> +_ = require('fiber').create(function() box.space.test:insert{2} end)
>>> + | ---
>>> + | ...
>>> +
>>> +--
>> Prior to doing something on master, you should make sure
>> replica2 has received the promote.
>> "wait_lsn" won't work here, because WAL is disabled. You may try
>> test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
> Wait, this moment is dubious. Look, once we issue promote it get
> stuck inside journal write procedure so the next "insert" won't
> proceed until the promote finished. I understand that you point
> to the potential race here because even promote() command may get
> finished but slowed down on nework level and simply reach replica2
> out of other calls. I think without aditional debug output (such
> as promote term exposed via box.info I did in previous series) we
> can't be sure about timings and it seems that i've to return back
> the box.info patch. I mean currently the command
>
> test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
>
> stucks forever because promote yet not finished and next 'insert' simply
> has not been applied.

Ok, I see. I didn't think of that at first.
Look, your `box.info` patch won't help here as well.
Since promote is blocked on its way to WAL, it isn't applied yet,
so we won't see the term increase.

There is a way to dect a blocked promote:
There's ERRINJ_WAL_WRITE_COUNT.
It's incremented each time you call wal_write. Even before the write is 
blocked.

So you need to save ERRINJ_WAL_WRITE_COUNT, then do promote on replica1, 
then
return to replica2 and wait until ERRINJ_WAL_WRITE_COUNT gets bigger 
than your saved
value.


-- 
Serge Petrenko



More information about the Tarantool-patches mailing list