Tarantool development patches archive
 help / color / mirror / Atom feed
From: "Serge Petrenko" <sergepetrenko@tarantool.org>
To: "Sergey Bronnikov" <sergeyb@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
	"Vladislav Shpilevoy" <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function
Date: Wed, 26 Aug 2020 10:31:37 +0300	[thread overview]
Message-ID: <1598427097.413655454@f128.i.mail.ru> (raw)
In-Reply-To: <20200825124939.GB47610@pony.bronevichok.ru>

[-- Attachment #1: Type: text/plain, Size: 6126 bytes --]


Hi, Sergey!
Thanks for the patch and the fixes. New version LGTM. I couldn’t find the other patches in my mailbox though, could you please resend them? If you need my review, of course. 
>Вторник, 25 августа 2020, 15:49 +03:00 от Sergey Bronnikov <sergeyb@tarantool.org>:
> 
>Vlad, thanks for review!
>Patches updated in a branch and please see my answers inline.
>
>On 00:00 Tue 21 Jul , Vladislav Shpilevoy wrote:
>> Thanks for the patch!
>>
>> See 8 comments below.
>>
>> On 09.07.2020 19:16,  sergeyb@tarantool.org wrote:
>> > From: Sergey Bronnikov < sergeyb@tarantool.org >
>> >
>> > Part of #5055
>> > Part of #4849
>> > ---
>> > test/replication/qsync_basic.result | 85 +++++++++++++++++++++++++++
>> > test/replication/qsync_basic.test.lua | 31 ++++++++++
>> > 2 files changed, 116 insertions(+)
>> >
>> > diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
>> > index ab4be0c7e..464df75a7 100644
>> > --- a/test/replication/qsync_basic.result
>> > +++ b/test/replication/qsync_basic.result
>> > @@ -32,6 +32,14 @@ s2.is_sync
>> > | - false
>> > | ...
>> >
>> > +--
>> > +-- gh-4849: clear synchro queue with unconfigured box
>> > +--
>> > +box.ctl.clear_synchro_queue()
>>
>> 1. Enough to test that it does not crash here. No need to
>> check result.
>
>Well, removed assignment to variable.
>
>> > + | ---
>> > + | - -1
>> > + | ...
>> > +
>> > -- Net.box takes sync into account.
>> > box.schema.user.grant('guest', 'super')
>> > | ---
>> > @@ -553,6 +561,82 @@ box.space.sync:select{7}
>> > | - - [7]
>> > | ...
>> >
>> > +--
>> > +-- gh-4849: clear synchro queue on a replica
>> > +--
>> > +test_run:switch('default')
>> > + | ---
>> > + | - true
>> > + | ...
>> > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2}
>>
>> 2. If you want this timeout to fail, it is too big. The test will be too
>> long. If it is supposed not to fail, then it is way too small. If you want
>> it to fail, it should be something like 0.001. If you want it to hold, it
>> should be 1000 to be sure. Keep in mind that you can change the timeout on
>> fly. That allows to build quite complex reactive tests completely event-based.
>
>set replication_synchro_timeout value to 1000
>
>> > + | ---
>> > + | ...
>> > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {9})
>>
>> 3. You need to extract the exact result value. Use pcall for that, and print
>> its result after the fiber is dead. See other examples with fiber.create() in
>> qsync test suite. The same for the next test case.
>
>Done.
>
>> > + | ---
>> > + | ...
>> > +f1:status()
>> > + | ---
>> > + | - suspended
>> > + | ...
>> > +test_run:switch('replica')
>> > + | ---
>> > + | - true
>>
>> 4. Better wait until the value is delivered. Otherwise you can switch to
>> replica before master finishes WAL write, and the queue will be empty here.
>> Try
>>
>> test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
>
>Agree, added wait_cond().
>
>> > + | ...
>> > +box.ctl.clear_synchro_queue()
>> > + | ---
>> > + | - 0
>> > + | ...
>> > +box.space.sync:select{9}
>> > + | ---
>> > + | - []
>>
>> 5. If select returns 9 before queue clean, and returns empty afterwards,
>> it means the queue was cleared. So here the queue size is not really needed,
>> as you can see.
>
>removed this and next select{} statements
>
>> > + | ...
>> > +test_run:switch('default')
>> > + | ---
>> > + | - true
>> > + | ...
>> > +box.space.sync:select{9}
>> > + | ---
>> > + | - []
>> > + | ...
>> > +f1:status()
>> > + | ---
>> > + | - dead
>> > + | ...
>> > +
>> > +--
>> > +-- gh-4849: clear synchro queue on a master
>>
>> 6. Since the previous test the replica's WAL is different from master's.
>> I am not sure the replication is still alive.
>
>Test with clear_synchro_queue() on master keeps master alive at the end
>of test so I moved it before the same test for the replica. Also added
>note for others that cluster may be in a broken state here.
>
>> > +--
>> > +test_run:switch('default')
>> > + | ---
>> > + | - true
>> > + | ...
>> > +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 2}
>> > + | ---
>> > + | ...
>> > +f1 = fiber.create(box.space.sync.replace, box.space.sync, {10})
>> > + | ---
>> > + | ...
>> > +f1:status()
>> > + | ---
>> > + | - suspended
>> > + | ...
>> > +box.ctl.clear_synchro_queue()
>> > + | ---
>> > + | - -2
>> > + | ...
>> > +box.space.sync:select{10}
>> > + | ---
>> > + | - - [10]
>>
>> 7. Why is it 10? The quorum is not reached, it should have been rolled back.
>
>Added "box.cfg{replication_synchro_timeout = 0.1}" before clear_synchro_queue()
>call and now tx is rolled back. Also I've added checks of current value with
>wait_cond() that is more reliable than select{} on a master and replica.
>Updated output looks like this:
>
>box.ctl.clear_synchro_queue()
> | ---
> | ...
>test_run:switch('replica')
> | ---
> | - true
> | ...
>test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
> | ---
> | - true
> | ...
>test_run:switch('default')
> | ---
> | - true
> | ...
>test_run:wait_cond(function() return f:status() == 'dead' end)
> | ---
> | - true
> | ...
>ok, err
> | ---
> | - false
> | - Quorum collection for a synchronous transaction is timed out
> | ...
>test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
> | ---
> | - true
> | ...
>
>> > + | ...
>> > +test_run:switch('replica')
>> > + | ---
>> > + | - true
>> > + | ...
>> > +box.space.sync:select{10}
>> > + | ---
>> > + | - - [10]
>> > + | ...
>> > +
>> > -- Cleanup.
>> > test_run:cmd('switch default')
>> > | ---
>> > @@ -576,6 +660,7 @@ test_run:cmd('delete server replica')
>> > | ...
>> > box.space.test:drop()
>> > | ---
>> > + | - error: A rollback for a synchronous transaction is received
>>
>> 8. Why is it changed?
>
>Perhaps it is because in previous version of test I haven't wait fibers 'dead'.
>There is no error now:
>
>box.space.test:drop()
> | ---
> | ...
>box.space.sync:drop()
> | ---
> | ...
>
>>
>> > | ...
>> > box.space.sync:drop()
>> > | ---
 
--
Serge Petrenko

[-- Attachment #2: Type: text/html, Size: 7929 bytes --]

  reply	other threads:[~2020-08-26  7:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1594314820.git.sergeyb@tarantool.org>
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 1/3] replication: print number of txs in limbo before its clear sergeyb
2020-07-09 20:07   ` Vladislav Shpilevoy
2020-07-10 12:55     ` Sergey Bronnikov
2020-07-20 21:59   ` Vladislav Shpilevoy
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function sergeyb
2020-07-20 22:00   ` Vladislav Shpilevoy
2020-08-25 12:49     ` Sergey Bronnikov
2020-08-26  7:31       ` Serge Petrenko [this message]
2020-08-26 14:48         ` Sergey Bronnikov
2020-07-09 17:16 ` [Tarantool-patches] [PATCH 3/3] replication: add test with random leaders promotion and demotion sergeyb
2020-07-09 20:07   ` Vladislav Shpilevoy
2020-07-10  8:05     ` Sergey Bronnikov
2020-07-20 22:01   ` Vladislav Shpilevoy
2020-08-26 14:45     ` Sergey Bronnikov
2020-08-27 10:49       ` Serge Petrenko
2020-08-31 10:05         ` Sergey Bronnikov
2020-09-01 13:03           ` Serge Petrenko
2020-11-12  9:32             ` Sergey Bronnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1598427097.413655454@f128.i.mail.ru \
    --to=sergepetrenko@tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/3] replication: test clear_synchro_queue function' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox