From: Sergey Bronnikov <sergeyb@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function Date: Tue, 25 Aug 2020 15:49:39 +0300 [thread overview] Message-ID: <20200825124939.GB47610@pony.bronevichok.ru> (raw) In-Reply-To: <8c3b7d4c-22e8-108a-b15e-ef3c89510a63@tarantool.org> Vlad, thanks for review! Patches updated in a branch and please see my answers inline. On 00:00 Tue 21 Jul , Vladislav Shpilevoy wrote: > Thanks for the patch! > > See 8 comments below. > > On 09.07.2020 19:16, sergeyb@tarantool.org wrote: > > From: Sergey Bronnikov <sergeyb@tarantool.org> > > > > Part of #5055 > > Part of #4849 > > --- > > test/replication/qsync_basic.result | 85 +++++++++++++++++++++++++++ > > test/replication/qsync_basic.test.lua | 31 ++++++++++ > > 2 files changed, 116 insertions(+) > > > > diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result > > index ab4be0c7e..464df75a7 100644 > > --- a/test/replication/qsync_basic.result > > +++ b/test/replication/qsync_basic.result > > @@ -32,6 +32,14 @@ s2.is_sync > > | - false > > | ... > > > > +-- > > +-- gh-4849: clear synchro queue with unconfigured box > > +-- > > +box.ctl.clear_synchro_queue() > > 1. Enough to test that it does not crash here. No need to > check result. Well, removed assignment to variable. > > + | --- > > + | - -1 > > + | ... > > + > > -- Net.box takes sync into account. > > box.schema.user.grant('guest', 'super') > > | --- > > @@ -553,6 +561,82 @@ box.space.sync:select{7} > > | - - [7] > > | ... > > > > +-- > > +-- gh-4849: clear synchro queue on a replica > > +-- > > +test_run:switch('default') > > + | --- > > + | - true > > + | ... > > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2} > > 2. If you want this timeout to fail, it is too big. The test will be too > long. If it is supposed not to fail, then it is way too small. If you want > it to fail, it should be something like 0.001. If you want it to hold, it > should be 1000 to be sure. Keep in mind that you can change the timeout on > fly. That allows to build quite complex reactive tests completely event-based. set replication_synchro_timeout value to 1000 > > + | --- > > + | ... > > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {9}) > > 3. You need to extract the exact result value. Use pcall for that, and print > its result after the fiber is dead. See other examples with fiber.create() in > qsync test suite. The same for the next test case. Done. > > + | --- > > + | ... > > +f1:status() > > + | --- > > + | - suspended > > + | ... > > +test_run:switch('replica') > > + | --- > > + | - true > > 4. Better wait until the value is delivered. Otherwise you can switch to > replica before master finishes WAL write, and the queue will be empty here. > Try > > test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end) Agree, added wait_cond(). > > + | ... > > +box.ctl.clear_synchro_queue() > > + | --- > > + | - 0 > > + | ... > > +box.space.sync:select{9} > > + | --- > > + | - [] > > 5. If select returns 9 before queue clean, and returns empty afterwards, > it means the queue was cleared. So here the queue size is not really needed, > as you can see. removed this and next select{} statements > > + | ... > > +test_run:switch('default') > > + | --- > > + | - true > > + | ... > > +box.space.sync:select{9} > > + | --- > > + | - [] > > + | ... > > +f1:status() > > + | --- > > + | - dead > > + | ... > > + > > +-- > > +-- gh-4849: clear synchro queue on a master > > 6. Since the previous test the replica's WAL is different from master's. > I am not sure the replication is still alive. Test with clear_synchro_queue() on master keeps master alive at the end of test so I moved it before the same test for the replica. Also added note for others that cluster may be in a broken state here. > > +-- > > +test_run:switch('default') > > + | --- > > + | - true > > + | ... > > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2} > > + | --- > > + | ... > > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {10}) > > + | --- > > + | ... > > +f1:status() > > + | --- > > + | - suspended > > + | ... > > +box.ctl.clear_synchro_queue() > > + | --- > > + | - -2 > > + | ... > > +box.space.sync:select{10} > > + | --- > > + | - - [10] > > 7. Why is it 10? The quorum is not reached, it should have been rolled back. Added "box.cfg{replication_synchro_timeout = 0.1}" before clear_synchro_queue() call and now tx is rolled back. Also I've added checks of current value with wait_cond() that is more reliable than select{} on a master and replica. Updated output looks like this: box.ctl.clear_synchro_queue() | --- | ... test_run:switch('replica') | --- | - true | ... test_run:wait_cond(function() return box.space.sync:get{10} == nil end) | --- | - true | ... test_run:switch('default') | --- | - true | ... test_run:wait_cond(function() return f:status() == 'dead' end) | --- | - true | ... ok, err | --- | - false | - Quorum collection for a synchronous transaction is timed out | ... test_run:wait_cond(function() return box.space.sync:get{10} == nil end) | --- | - true | ... > > + | ... > > +test_run:switch('replica') > > + | --- > > + | - true > > + | ... > > +box.space.sync:select{10} > > + | --- > > + | - - [10] > > + | ... > > + > > -- Cleanup. > > test_run:cmd('switch default') > > | --- > > @@ -576,6 +660,7 @@ test_run:cmd('delete server replica') > > | ... > > box.space.test:drop() > > | --- > > + | - error: A rollback for a synchronous transaction is received > > 8. Why is it changed? Perhaps it is because in previous version of test I haven't wait fibers 'dead'. There is no error now: box.space.test:drop() | --- | ... box.space.sync:drop() | --- | ... > > > | ... > > box.space.sync:drop() > > | ---
next prev parent reply other threads:[~2020-08-25 12:49 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top [not found] <cover.1594314820.git.sergeyb@tarantool.org> 2020-07-09 17:16 ` [Tarantool-patches] [PATCH 1/3] replication: print number of txs in limbo before its clear sergeyb 2020-07-09 20:07 ` Vladislav Shpilevoy 2020-07-10 12:55 ` Sergey Bronnikov 2020-07-20 21:59 ` Vladislav Shpilevoy 2020-07-09 17:16 ` [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function sergeyb 2020-07-20 22:00 ` Vladislav Shpilevoy 2020-08-25 12:49 ` Sergey Bronnikov [this message] 2020-08-26 7:31 ` Serge Petrenko 2020-08-26 14:48 ` Sergey Bronnikov 2020-07-09 17:16 ` [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion sergeyb 2020-07-09 20:07 ` Vladislav Shpilevoy 2020-07-10 8:05 ` Sergey Bronnikov 2020-07-20 22:01 ` Vladislav Shpilevoy 2020-08-26 14:45 ` Sergey Bronnikov 2020-08-27 10:49 ` Serge Petrenko 2020-08-31 10:05 ` Sergey Bronnikov 2020-09-01 13:03 ` Serge Petrenko 2020-11-12 9:32 ` Sergey Bronnikov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200825124939.GB47610@pony.bronevichok.ru \ --to=sergeyb@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox