Tarantool development patches archive
 help / color / mirror / Atom feed
From: Sergey Bronnikov <sergeyb@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function
Date: Tue, 25 Aug 2020 15:49:39 +0300	[thread overview]
Message-ID: <20200825124939.GB47610@pony.bronevichok.ru> (raw)
In-Reply-To: <8c3b7d4c-22e8-108a-b15e-ef3c89510a63@tarantool.org>

Vlad, thanks for review!
Patches updated in a branch and please see my answers inline.

On 00:00 Tue 21 Jul , Vladislav Shpilevoy wrote:
> Thanks for the patch!
> 
> See 8 comments below.
> 
> On 09.07.2020 19:16, sergeyb@tarantool.org wrote:
> > From: Sergey Bronnikov <sergeyb@tarantool.org>
> > 
> > Part of #5055
> > Part of #4849
> > ---
> >  test/replication/qsync_basic.result   | 85 +++++++++++++++++++++++++++
> >  test/replication/qsync_basic.test.lua | 31 ++++++++++
> >  2 files changed, 116 insertions(+)
> > 
> > diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
> > index ab4be0c7e..464df75a7 100644
> > --- a/test/replication/qsync_basic.result
> > +++ b/test/replication/qsync_basic.result
> > @@ -32,6 +32,14 @@ s2.is_sync
> >   | - false
> >   | ...
> >  
> > +--
> > +-- gh-4849: clear synchro queue with unconfigured box
> > +--
> > +box.ctl.clear_synchro_queue()
> 
> 1. Enough to test that it does not crash here. No need to
> check result.

Well, removed assignment to variable.

> > + | ---
> > + | - -1
> > + | ...
> > +
> >  -- Net.box takes sync into account.
> >  box.schema.user.grant('guest', 'super')
> >   | ---
> > @@ -553,6 +561,82 @@ box.space.sync:select{7}
> >   | - - [7]
> >   | ...
> >  
> > +--
> > +-- gh-4849: clear synchro queue on a replica
> > +--
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2}
> 
> 2. If you want this timeout to fail, it is too big. The test will be too
> long. If it is supposed not to fail, then it is way too small. If you want
> it to fail, it should be something like 0.001. If you want it to hold, it
> should be 1000 to be sure. Keep in mind that you can change the timeout on
> fly. That allows to build quite complex reactive tests completely event-based.

set replication_synchro_timeout value to 1000

> > + | ---
> > + | ...
> > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {9})
> 
> 3. You need to extract the exact result value. Use pcall for that, and print
> its result after the fiber is dead. See other examples with fiber.create() in
> qsync test suite. The same for the next test case.

Done.

> > + | ---
> > + | ...
> > +f1:status()
> > + | ---
> > + | - suspended
> > + | ...
> > +test_run:switch('replica')
> > + | ---
> > + | - true
> 
> 4. Better wait until the value is delivered. Otherwise you can switch to
> replica before master finishes WAL write, and the queue will be empty here.
> Try
> 
>     test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)

Agree, added wait_cond().

> > + | ...
> > +box.ctl.clear_synchro_queue()
> > + | ---
> > + | - 0
> > + | ...
> > +box.space.sync:select{9}
> > + | ---
> > + | - []
> 
> 5. If select returns 9 before queue clean, and returns empty afterwards,
> it means the queue was cleared. So here the queue size is not really needed,
> as you can see.

removed this and next select{} statements

> > + | ...
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.space.sync:select{9}
> > + | ---
> > + | - []
> > + | ...
> > +f1:status()
> > + | ---
> > + | - dead
> > + | ...
> > +
> > +--
> > +-- gh-4849: clear synchro queue on a master
> 
> 6. Since the previous test the replica's WAL is different from master's.
> I am not sure the replication is still alive.

Test with clear_synchro_queue() on master keeps master alive at the end
of test so I moved it before the same test for the replica. Also added
note for others that cluster may be in a broken state here.

> > +--
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2}
> > + | ---
> > + | ...
> > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {10})
> > + | ---
> > + | ...
> > +f1:status()
> > + | ---
> > + | - suspended
> > + | ...
> > +box.ctl.clear_synchro_queue()
> > + | ---
> > + | - -2
> > + | ...
> > +box.space.sync:select{10}
> > + | ---
> > + | - - [10]
> 
> 7. Why is it 10? The quorum is not reached, it should have been rolled back.

Added "box.cfg{replication_synchro_timeout = 0.1}" before clear_synchro_queue()
call and now tx is rolled back. Also I've added checks of current value with
wait_cond() that is more reliable than select{} on a master and replica.
Updated output looks like this:

box.ctl.clear_synchro_queue()
 | ---
 | ...
test_run:switch('replica')
 | ---
 | - true
 | ...
test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
 | ---
 | - true
 | ...
test_run:switch('default')
 | ---
 | - true
 | ...
test_run:wait_cond(function() return f:status() == 'dead' end)
 | ---
 | - true
 | ...
ok, err
 | ---
 | - false
 | - Quorum collection for a synchronous transaction is timed out
 | ...
test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
 | ---
 | - true
 | ...

> > + | ...
> > +test_run:switch('replica')
> > + | ---
> > + | - true
> > + | ...
> > +box.space.sync:select{10}
> > + | ---
> > + | - - [10]
> > + | ...
> > +
> >  -- Cleanup.
> >  test_run:cmd('switch default')
> >   | ---
> > @@ -576,6 +660,7 @@ test_run:cmd('delete server replica')
> >   | ...
> >  box.space.test:drop()
> >   | ---
> > + | - error: A rollback for a synchronous transaction is received
> 
> 8. Why is it changed?

Perhaps it is because in previous version of test I haven't wait fibers 'dead'.
There is no error now:

box.space.test:drop()
 | ---
 | ...
box.space.sync:drop()
 | ---
 | ...

> 
> >   | ...
> >  box.space.sync:drop()
> >   | ---

  reply	other threads:[~2020-08-25 12:49 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1594314820.git.sergeyb@tarantool.org>
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 1/3] replication: print number of txs in limbo before its clear sergeyb
2020-07-09 20:07   ` Vladislav Shpilevoy
2020-07-10 12:55     ` Sergey Bronnikov
2020-07-20 21:59   ` Vladislav Shpilevoy
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function sergeyb
2020-07-20 22:00   ` Vladislav Shpilevoy
2020-08-25 12:49     ` Sergey Bronnikov [this message]
2020-08-26  7:31       ` Serge Petrenko
2020-08-26 14:48         ` Sergey Bronnikov
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion sergeyb
2020-07-09 20:07   ` Vladislav Shpilevoy
2020-07-10  8:05     ` Sergey Bronnikov
2020-07-20 22:01   ` Vladislav Shpilevoy
2020-08-26 14:45     ` Sergey Bronnikov
2020-08-27 10:49       ` Serge Petrenko
2020-08-31 10:05         ` Sergey Bronnikov
2020-09-01 13:03           ` Serge Petrenko
2020-11-12  9:32             ` Sergey Bronnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825124939.GB47610@pony.bronevichok.ru \
    --to=sergeyb@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox