[Tarantool-patches] [PATCH v19 3/3] test: add gh-6036-qsync-order test

Cyrill Gorcunov gorcunov at gmail.com
Tue Oct 5 00:16:41 MSK 2021


On Fri, Oct 01, 2021 at 03:30:41PM +0300, Serge Petrenko wrote:
> 
> Thanks for the test!
> Please, find a couple of comments below.
> I think the test won't be flaky anymore once you fix my comments.

Thanks for comments, Serge!

> > +
> > +box.once("bootstrap", function()
> > +    box.schema.user.grant('guest', 'super')
> > +end)
> 
> Looks like "election_replica.lua" suits our needs perfectly now.
> No need to introduce a new instance file.

yup, I updated the test, thanks!

> > +test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
> > + | ---
> > + | - true
> > + | ...
> > +
> 
> 
> You may replace both calls with test_run:wait_lsn('master', 'replica')
> Even without switching.

Actually I need switch, otherwise i got stuck, so I use

test_run:switch("master")
box.ctl.promote()
s = box.schema.create_space('test', {is_sync = true})
_ = s:create_index('pk')
s:insert{1}

test_run:switch("replica1")
test_run:wait_lsn('replica1', 'master')

test_run:switch("replica2")
test_run:wait_lsn('replica2', 'master')


which works just fine

> > +
> > +--
> > +-- Make replica1 been a leader and start writting data,
> > +-- the PROMOTE request get queued on replica2 and not
> > +-- yet processed, same time INSERT won't complete either
> > +-- waiting for PROMOTE completion first.
> > +test_run:switch("replica1")
> > + | ---
> > + | - true
> > + | ...
> > +box.ctl.promote()
> > + | ---
> > + | ...
> > +_ = require('fiber').create(function() box.space.test:insert{2} end)
> > + | ---
> > + | ...
> > +
> > +--
> 
> Prior to doing something on master, you should make sure
> replica2 has received the promote.
> "wait_lsn" won't work here, because WAL is disabled. You may try
> test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)

Wait, this moment is dubious. Look, once we issue promote it get
stuck inside journal write procedure so the next "insert" won't
proceed until the promote finished. I understand that you point
to the potential race here because even promote() command may get
finished but slowed down on nework level and simply reach replica2
out of other calls. I think without aditional debug output (such
as promote term exposed via box.info I did in previous series) we
can't be sure about timings and it seems that i've to return back
the box.info patch. I mean currently the command

test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)

stucks forever because promote yet not finished and next 'insert' simply
has not been applied.


More information about the Tarantool-patches mailing list