[Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop

Sergey Bronnikov sergeyb at tarantool.org
Mon Aug 31 14:06:42 MSK 2020


Sergey, thanks for review!
See my comments inline. Test has been updated in a branch.

On 14:27 Thu 27 Aug , Sergey Petrenko wrote:
> Hi! Thanks for the patch!
> 
> Please see my comments below.
> 
> 26.08.2020 18:10, sergeyb at tarantool.org пишет:
> > From: Sergey Bronnikov <sergeyb at tarantool.org>
> > 
> > New regression tests covers cases when one can change synchronous mode
> > of space to asynchronous and vice versa.
> > 
> > Closes #5055
> > Part of #5144
> > ---
> > 
> > Branch: ligurio/gh-4842-qsync-change-mode
> > CI: https://gitlab.com/tarantool/tarantool/-/pipelines/182271234
> > 
> >   test/replication/qsync_sync_mode.result   | 164 ++++++++++++++++++++++
> >   test/replication/qsync_sync_mode.test.lua |  90 ++++++++++++
> >   2 files changed, 254 insertions(+)
> >   create mode 100644 test/replication/qsync_sync_mode.result
> >   create mode 100644 test/replication/qsync_sync_mode.test.lua
> > 
> > diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
> > new file mode 100644
> > index 000000000..f2f95ec0f
> > --- /dev/null
> > +++ b/test/replication/qsync_sync_mode.result
> > @@ -0,0 +1,164 @@
> > +-- test-run result file version 2
> > +env = require('test_run')
> > + | ---
> > + | ...
> > +test_run = env.new()
> > + | ---
> > + | ...
> > +engine = test_run:get_cfg('engine')
> > + | ---
> > + | ...
> > +fiber = require('fiber')
> > + | ---
> > + | ...
> > +math = require('math')
> > + | ---
> > + | ...
> > +math.randomseed(os.time())
> > + | ---
> > + | ...
> > +
> > +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> > + | ---
> > + | ...
> > +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> > + | ---
> > + | ...
> > +
> > +disable_sync_mode = function()                                                 \
> > +    local s = box.space._space:get(box.space.sync.id)                          \
> > +    local new_s = s:update({{'=', 6, {is_sync=false}}})                        \
> > +    box.space._space:replace(new_s)                                            \
> > +end
> > + | ---
> > + | ...
> > +
> > +enable_sync_mode = function()                                                  \
> > +    local s = box.space._space:get(box.space.sync.id)                          \
> > +    local new_s = s:update({{'=', 6, {is_sync=true}}})                         \
> > +    box.space._space:replace(new_s)                                            \
> > +end
> > + | ---
> > + | ...
> > +
> > +set_random_sync_mode = function()                                              \
> > +    if (math.random(1, 10) > 5) then                                           \
> > +        enable_sync_mode()                                                     \
> > +    else                                                                       \
> > +        disable_sync_mode()                                                    \
> > +    end                                                                        \
> > +end
> > + | ---
> > + | ...
> > +
> > +set_random_quorum = function(n)                                                \
> > +    box.cfg{replication_synchro_quorum=math.random(1, n)}                      \
> > +end
> > + | ---
> > + | ...
> > +
> > +box.schema.user.grant('guest', 'replication')
> > + | ---
> > + | ...
> > +
> > +-- Setup an async cluster with two instances.
> > +test_run:cmd('create server replica with rpl_master=default,\
> > +                                         script="replication/replica.lua"')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd('start server replica with wait=True, wait_load=True')
> > + | ---
> > + | - true
> > + | ...
> > +
> > +-- Write data to a leader, enable and disable sync mode in background in a
> > +-- loop. Expected no data loss.
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > + | ---
> > + | ...
> > +_ = box.space.sync:create_index('pk')
> > + | ---
> > + | ...
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> > + | ---
> > + | ...
> > +-- Testcase body.
> > +for i = 1,10 do                                                                \
> > +    set_random_sync_mode()                                                     \
> > +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
> > +        test_run:switch('replica')                                             \
> > +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > +    end                                                                        \
> > +    test_run:switch('default')                                                 \
> > +end
> > + | ---
> > + | ...
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.space.sync:drop()
> > + | ---
> > + | ...
> > +
> > +-- Write data to a leader, enable and disable sync mode and change quorum value
> > +-- in background in a loop.
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > + | ---
> > + | ...
> > +_ = box.space.sync:create_index('pk')
> > + | ---
> > + | ...
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> > + | ---
> > + | ...
> > +-- Testcase body.
> > +for i = 1,10 do                                                                \
> > +    set_random_sync_mode()                                                     \
> > +    set_random_quorum(5)                                                       \
> > +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
> > +        test_run:switch('replica')                                             \
> > +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > +    end                                                                        \
> > +    test_run:switch('default')                                                 \
> > +end
> > + | ---
> > + | ...
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.space.sync:drop()
> > + | ---
> > + | ...
> > +
> > +-- Teardown.
> > +test_run:cmd('switch default')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd('stop server replica')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd('delete server replica')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cleanup_cluster()
> > + | ---
> > + | ...
> > +box.schema.user.revoke('guest', 'replication')
> > + | ---
> > + | ...
> > +box.cfg{                                                                       \
> > +    replication_synchro_quorum = orig_synchro_quorum,                          \
> > +    replication_synchro_timeout = orig_synchro_timeout,                        \
> > +}
> > + | ---
> > + | ...
> > diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
> > new file mode 100644
> > index 000000000..706261c10
> > --- /dev/null
> > +++ b/test/replication/qsync_sync_mode.test.lua
> > @@ -0,0 +1,90 @@
> > +env = require('test_run')
> > +test_run = env.new()
> > +engine = test_run:get_cfg('engine')
> > +fiber = require('fiber')
> > +math = require('math')
> > +math.randomseed(os.time())
> > +
> > +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> > +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> > +
> > +disable_sync_mode = function()                                                 \
> > +    local s = box.space._space:get(box.space.sync.id)                          \
> > +    local new_s = s:update({{'=', 6, {is_sync=false}}})                        \
> > +    box.space._space:replace(new_s)                                            \
> > +end
> > +
> > +enable_sync_mode = function()                                                  \
> > +    local s = box.space._space:get(box.space.sync.id)                          \
> > +    local new_s = s:update({{'=', 6, {is_sync=true}}})                         \
> > +    box.space._space:replace(new_s)                                            \
> > +end
> > +
> 
> Vlad has pushed a patch with `space:alter` a couple of days ago.
> So now you may say `space:alter{is_sync=true}`, `space:alter{is_sync=false}`
> It does the same work you do, but looks much simpler.

used alter in updated patch:

-- Testcase body.
for i = 1,100 do                                                               \
    box.space.sync:alter{is_sync=random_boolean()}                             \
    box.space.sync:insert{i}                                                   \
    test_run:switch('replica')                                                 \
    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
    test_run:switch('default')                                                 \
end

> > +set_random_sync_mode = function()                                              \
> > +    if (math.random(1, 10) > 5) then                                           \
> > +        enable_sync_mode()                                                     \
> > +    else                                                                       \
> > +        disable_sync_mode()                                                    \
> > +    end                                                                        \
> > +end
> > +
> > +set_random_quorum = function(n)                                                \
> > +    box.cfg{replication_synchro_quorum=math.random(1, n)}                      \
> > +end
> > +
> > +box.schema.user.grant('guest', 'replication')
> > +
> > +-- Setup an async cluster with two instances.
> > +test_run:cmd('create server replica with rpl_master=default,\
> > +                                         script="replication/replica.lua"')
> > +test_run:cmd('start server replica with wait=True, wait_load=True')
> > +
> > +-- Write data to a leader, enable and disable sync mode in background in a
> > +-- loop. Expected no data loss.
> 
> But sync mode is not set in background, it's set in the same loop where
> insertions happen.

in updated version I set sync mode just before an insert, description in comment
has been updated

> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > +_ = box.space.sync:create_index('pk')
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> 
> Is such a tiny timeout intended? I see you wrap the errors in a pcall, but
> still,
> it may happen that you get a timeout 10 out of 10 times and then you'll
> test nothing. We usually use 30 second replication timeouts.

Fixed by setting it to 1000.

> > +-- Testcase body.
> > +for i = 1,10 do                                                                \
> > +    set_random_sync_mode()                                                     \
> > +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
> > +        test_run:switch('replica')                                             \
> > +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > +    end                                                                        \
> > +    test_run:switch('default')                                                 \
> > +end
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > +box.space.sync:drop()
> > +
> > +-- Write data to a leader, enable and disable sync mode and change quorum value
> > +-- in background in a loop.
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > +_ = box.space.sync:create_index('pk')
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> 
> Ok, I see why you need a small timeout in this testcase. Maybe set a small
> timeout
> only when the insertion is expected to fail(quorum = 3 or 4)?. And use a 30
> sec timeout
> otherwise.

I have rewrote tests to consider to cases:
- positive, when we have achievable quorum, set mode to sync or async on a master
and then make an insert. Expected successful writes always.
- negative, when we change mode to sync or async with tx's in a limbo.

So comment above is not relevant now.

> > +-- Testcase body.
> > +for i = 1,10 do                                                                \
> > +    set_random_sync_mode()                                                     \
> > +    set_random_quorum(5)                                                       \
> 
> Math.random() range is inclusive, so it'd be more fair, to choose quorum
> from 1 to 4.
> Then there'd be 2 cases of successful write, quorum 1 and 2, and 2 cases of
> failed write,
> quorum 3 and 4.

removed set_random_quorum() in updated tests

> > +    if pcall(box.space.sync.insert, box.space.sync, {i}) then                  \
> > +        test_run:switch('replica')                                             \
> > +        test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > +    end                                                                        \
> > +    test_run:switch('default')                                                 \
> > +end
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > +box.space.sync:drop()
> > +
> > +-- Teardown.
> > +test_run:cmd('switch default')
> > +test_run:cmd('stop server replica')
> > +test_run:cmd('delete server replica')
> > +test_run:cleanup_cluster()
> > +box.schema.user.revoke('guest', 'replication')
> > +box.cfg{                                                                       \
> > +    replication_synchro_quorum = orig_synchro_quorum,                          \
> > +    replication_synchro_timeout = orig_synchro_timeout,                        \
> > +}

-- 
sergeyb@


More information about the Tarantool-patches mailing list