Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Sergey Bronnikov <sergeyb@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org,
	Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
Date: Tue, 1 Sep 2020 15:25:32 +0300	[thread overview]
Message-ID: <6478b29b-ccee-35c5-665a-628c9e50b0b1@tarantool.org> (raw)
In-Reply-To: <20200831110642.GB94343@pony.bronevichok.ru>


31.08.2020 14:06, Sergey Bronnikov пишет:
> Sergey, thanks for review!
> See my comments inline. Test has been updated in a branch.
>
> On 14:27 Thu 27 Aug , Sergey Petrenko wrote:
>> Hi! Thanks for the patch!
>>
>> Please see my comments below.
>>
>> 26.08.2020 18:10, sergeyb@tarantool.org пишет:
>>> From: Sergey Bronnikov <sergeyb@tarantool.org>
>>>
>>> New regression tests covers cases when one can change synchronous mode
>>> of space to asynchronous and vice versa.
>>>
>>> Closes #5055
>>> Part of #5144
>>> ---
>>>
>>> Branch: ligurio/gh-4842-qsync-change-mode
>>> CI: https://gitlab.com/tarantool/tarantool/-/pipelines/182271234
>>>
>>>    test/replication/qsync_sync_mode.result   | 164 ++++++++++++++++++++++
>>>    test/replication/qsync_sync_mode.test.lua |  90 ++++++++++++
>>>    2 files changed, 254 insertions(+)
>>>    create mode 100644 test/replication/qsync_sync_mode.result
>>>    create mode 100644 test/replication/qsync_sync_mode.test.lua
>>>
>>> diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
>>> new file mode 100644
>>> index 000000000..f2f95ec0f
>>> --- /dev/null
>>> +++ b/test/replication/qsync_sync_mode.result
>>> @@ -0,0 +1,164 @@
>>> +-- test-run result file version 2
>>> +env = require('test_run')
>>> + | ---
>>> + | ...
>>> +test_run = env.new()
>>> + | ---
>>> + | ...
>>> +engine = test_run:get_cfg('engine')
>>> + | ---
>>> + | ...
>>> +fiber = require('fiber')
>>> + | ---
>>> + | ...
>>> +math = require('math')
>>> + | ---
>>> + | ...
>>> +math.randomseed(os.time())
>>> + | ---
>>> + | ...
>>> +
>>> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
>>> + | ---
>>> + | ...
>>> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
>>> + | ---
>>> + | ...
>>> +
>>> +disable_sync_mode = function()                                                 \
>>> +    local s = box.space._space:get(box.space.sync.id)                          \
>>> +    local new_s = s:update({{'=', 6, {is_sync=false}}})                        \
>>> +    box.space._space:replace(new_s)                                            \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +enable_sync_mode = function()                                                  \
>>> +    local s = box.space._space:get(box.space.sync.id)                          \
>>> +    local new_s = s:update({{'=', 6, {is_sync=true}}})                         \
>>> +    box.space._space:replace(new_s)                                            \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +set_random_sync_mode = function()                                              \
>>> +    if (math.random(1, 10) > 5) then                                           \
>>> +        enable_sync_mode()                                                     \
>>> +    else                                                                       \
>>> +        disable_sync_mode()                                                    \
>>> +    end                                                                        \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +set_random_quorum = function(n)                                                \
>>> +    box.cfg{replication_synchro_quorum=math.random(1, n)}                      \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +box.schema.user.grant('guest', 'replication')
>>> + | ---
>>> + | ...
>>> +
>>> +-- Setup an async cluster with two instances.
>>> +test_run:cmd('create server replica with rpl_master=default,\
>>> +                                         script="replication/replica.lua"')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('start server replica with wait=True, wait_load=True')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +
>>> +-- Write data to a leader, enable and disable sync mode in background in a
>>> +-- loop. Expected no data loss.
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> + | ---
>>> + | ...
>>> +_ = box.space.sync:create_index('pk')
>>> + | ---
>>> + | ...
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>>> + | ---
>>> + | ...
>>> +-- Testcase body.
>>> +for i = 1,10 do                                                                \
>>> +    set_random_sync_mode()                                                     \
>>> +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
>>> +        test_run:switch('replica')                                             \
>>> +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> +    end                                                                        \
>>> +    test_run:switch('default')                                                 \
>>> +end
>>> + | ---
>>> + | ...
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.space.sync:drop()
>>> + | ---
>>> + | ...
>>> +
>>> +-- Write data to a leader, enable and disable sync mode and change quorum value
>>> +-- in background in a loop.
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> + | ---
>>> + | ...
>>> +_ = box.space.sync:create_index('pk')
>>> + | ---
>>> + | ...
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>>> + | ---
>>> + | ...
>>> +-- Testcase body.
>>> +for i = 1,10 do                                                                \
>>> +    set_random_sync_mode()                                                     \
>>> +    set_random_quorum(5)                                                       \
>>> +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
>>> +        test_run:switch('replica')                                             \
>>> +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> +    end                                                                        \
>>> +    test_run:switch('default')                                                 \
>>> +end
>>> + | ---
>>> + | ...
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.space.sync:drop()
>>> + | ---
>>> + | ...
>>> +
>>> +-- Teardown.
>>> +test_run:cmd('switch default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('stop server replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('delete server replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cleanup_cluster()
>>> + | ---
>>> + | ...
>>> +box.schema.user.revoke('guest', 'replication')
>>> + | ---
>>> + | ...
>>> +box.cfg{                                                                       \
>>> +    replication_synchro_quorum = orig_synchro_quorum,                          \
>>> +    replication_synchro_timeout = orig_synchro_timeout,                        \
>>> +}
>>> + | ---
>>> + | ...
>>> diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
>>> new file mode 100644
>>> index 000000000..706261c10
>>> --- /dev/null
>>> +++ b/test/replication/qsync_sync_mode.test.lua
>>> @@ -0,0 +1,90 @@
>>> +env = require('test_run')
>>> +test_run = env.new()
>>> +engine = test_run:get_cfg('engine')
>>> +fiber = require('fiber')
>>> +math = require('math')
>>> +math.randomseed(os.time())
>>> +
>>> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
>>> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
>>> +
>>> +disable_sync_mode = function()                                                 \
>>> +    local s = box.space._space:get(box.space.sync.id)                          \
>>> +    local new_s = s:update({{'=', 6, {is_sync=false}}})                        \
>>> +    box.space._space:replace(new_s)                                            \
>>> +end
>>> +
>>> +enable_sync_mode = function()                                                  \
>>> +    local s = box.space._space:get(box.space.sync.id)                          \
>>> +    local new_s = s:update({{'=', 6, {is_sync=true}}})                         \
>>> +    box.space._space:replace(new_s)                                            \
>>> +end
>>> +
>> Vlad has pushed a patch with `space:alter` a couple of days ago.
>> So now you may say `space:alter{is_sync=true}`, `space:alter{is_sync=false}`
>> It does the same work you do, but looks much simpler.
> used alter in updated patch:
>
> -- Testcase body.
> for i = 1,100 do                                                               \
>      box.space.sync:alter{is_sync=random_boolean()}                             \
>      box.space.sync:insert{i}                                                   \
>      test_run:switch('replica')                                                 \
>      test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
>      test_run:switch('default')                                                 \
> end
>
>>> +set_random_sync_mode = function()                                              \
>>> +    if (math.random(1, 10) > 5) then                                           \
>>> +        enable_sync_mode()                                                     \
>>> +    else                                                                       \
>>> +        disable_sync_mode()                                                    \
>>> +    end                                                                        \
>>> +end
>>> +
>>> +set_random_quorum = function(n)                                                \
>>> +    box.cfg{replication_synchro_quorum=math.random(1, n)}                      \
>>> +end
>>> +
>>> +box.schema.user.grant('guest', 'replication')
>>> +
>>> +-- Setup an async cluster with two instances.
>>> +test_run:cmd('create server replica with rpl_master=default,\
>>> +                                         script="replication/replica.lua"')
>>> +test_run:cmd('start server replica with wait=True, wait_load=True')
>>> +
>>> +-- Write data to a leader, enable and disable sync mode in background in a
>>> +-- loop. Expected no data loss.
>> But sync mode is not set in background, it's set in the same loop where
>> insertions happen.
> in updated version I set sync mode just before an insert, description in comment
> has been updated
>
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> +_ = box.space.sync:create_index('pk')
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>> Is such a tiny timeout intended? I see you wrap the errors in a pcall, but
>> still,
>> it may happen that you get a timeout 10 out of 10 times and then you'll
>> test nothing. We usually use 30 second replication timeouts.
> Fixed by setting it to 1000.
>
>>> +-- Testcase body.
>>> +for i = 1,10 do                                                                \
>>> +    set_random_sync_mode()                                                     \
>>> +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
>>> +        test_run:switch('replica')                                             \
>>> +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> +    end                                                                        \
>>> +    test_run:switch('default')                                                 \
>>> +end
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> +box.space.sync:drop()
>>> +
>>> +-- Write data to a leader, enable and disable sync mode and change quorum value
>>> +-- in background in a loop.
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> +_ = box.space.sync:create_index('pk')
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>> Ok, I see why you need a small timeout in this testcase. Maybe set a small
>> timeout
>> only when the insertion is expected to fail(quorum = 3 or 4)?. And use a 30
>> sec timeout
>> otherwise.
> I have rewrote tests to consider to cases:
> - positive, when we have achievable quorum, set mode to sync or async on a master
> and then make an insert. Expected successful writes always.
> - negative, when we change mode to sync or async with tx's in a limbo.
>
> So comment above is not relevant now.

Hi! Thanks for the fixes.

The negative case seems pointless, since the change won't happen until 
the limbo
gets emptied out. This is because alter{is_sync=...} is a DML operation 
which
replaces an entry in _space system space, so it gets stuck on a 
non-empty limbo,
just like any DML op.
I suggest you remove it altogether.

Other than that LGTM.


>
>>> +-- Testcase body.
>>> +for i = 1,10 do                                                                \
>>> +    set_random_sync_mode()                                                     \
>>> +    set_random_quorum(5)                                                       \
>> Math.random() range is inclusive, so it'd be more fair, to choose quorum
>> from 1 to 4.
>> Then there'd be 2 cases of successful write, quorum 1 and 2, and 2 cases of
>> failed write,
>> quorum 3 and 4.
> removed set_random_quorum() in updated tests
>
>>> +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
>>> +        test_run:switch('replica')                                             \
>>> +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> +    end                                                                        \
>>> +    test_run:switch('default')                                                 \
>>> +end
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> +box.space.sync:drop()
>>> +
>>> +-- Teardown.
>>> +test_run:cmd('switch default')
>>> +test_run:cmd('stop server replica')
>>> +test_run:cmd('delete server replica')
>>> +test_run:cleanup_cluster()
>>> +box.schema.user.revoke('guest', 'replication')
>>> +box.cfg{                                                                       \
>>> +    replication_synchro_quorum = orig_synchro_quorum,                          \
>>> +    replication_synchro_timeout = orig_synchro_timeout,                        \
>>> +}



-- 
Serge Petrenko

  reply	other threads:[~2020-09-01 12:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-26 15:10 sergeyb
2020-08-27 11:27 ` Sergey Petrenko
2020-08-31 11:06   ` Sergey Bronnikov
2020-09-01 12:25     ` Serge Petrenko [this message]
2020-09-01 12:50       ` Sergey Bronnikov
2020-11-12  9:27         ` Sergey Bronnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6478b29b-ccee-35c5-665a-628c9e50b0b1@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=sergeyb@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox