* [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
@ 2020-08-26 15:10 sergeyb
2020-08-27 11:27 ` Sergey Petrenko
0 siblings, 1 reply; 6+ messages in thread
From: sergeyb @ 2020-08-26 15:10 UTC (permalink / raw)
To: tarantool-patches, v.shpilevoy, sergepetrenko, gorcunov
From: Sergey Bronnikov <sergeyb@tarantool.org>
New regression tests covers cases when one can change synchronous mode
of space to asynchronous and vice versa.
Closes #5055
Part of #5144
---
Branch: ligurio/gh-4842-qsync-change-mode
CI: https://gitlab.com/tarantool/tarantool/-/pipelines/182271234
test/replication/qsync_sync_mode.result | 164 ++++++++++++++++++++++
test/replication/qsync_sync_mode.test.lua | 90 ++++++++++++
2 files changed, 254 insertions(+)
create mode 100644 test/replication/qsync_sync_mode.result
create mode 100644 test/replication/qsync_sync_mode.test.lua
diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
new file mode 100644
index 000000000..f2f95ec0f
--- /dev/null
+++ b/test/replication/qsync_sync_mode.result
@@ -0,0 +1,164 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+math = require('math')
+ | ---
+ | ...
+math.randomseed(os.time())
+ | ---
+ | ...
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+disable_sync_mode = function() \
+ local s = box.space._space:get(box.space.sync.id) \
+ local new_s = s:update({{'=', 6, {is_sync=false}}}) \
+ box.space._space:replace(new_s) \
+end
+ | ---
+ | ...
+
+enable_sync_mode = function() \
+ local s = box.space._space:get(box.space.sync.id) \
+ local new_s = s:update({{'=', 6, {is_sync=true}}}) \
+ box.space._space:replace(new_s) \
+end
+ | ---
+ | ...
+
+set_random_sync_mode = function() \
+ if (math.random(1, 10) > 5) then \
+ enable_sync_mode() \
+ else \
+ disable_sync_mode() \
+ end \
+end
+ | ---
+ | ...
+
+set_random_quorum = function(n) \
+ box.cfg{replication_synchro_quorum=math.random(1, n)} \
+end
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+-- Setup an async cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+-- Write data to a leader, enable and disable sync mode in background in a
+-- loop. Expected no data loss.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
+ | ---
+ | ...
+-- Testcase body.
+for i = 1,10 do \
+ set_random_sync_mode() \
+ if pcall(box.space.sync.insert, box.space.sync, {i}) then \
+ test_run:switch('replica') \
+ test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
+ end \
+ test_run:switch('default') \
+end
+ | ---
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Write data to a leader, enable and disable sync mode and change quorum value
+-- in background in a loop.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
+ | ---
+ | ...
+-- Testcase body.
+for i = 1,10 do \
+ set_random_sync_mode() \
+ set_random_quorum(5) \
+ if pcall(box.space.sync.insert, box.space.sync, {i}) then \
+ test_run:switch('replica') \
+ test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
+ end \
+ test_run:switch('default') \
+end
+ | ---
+ | ...
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Teardown.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.cfg{ \
+ replication_synchro_quorum = orig_synchro_quorum, \
+ replication_synchro_timeout = orig_synchro_timeout, \
+}
+ | ---
+ | ...
diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
new file mode 100644
index 000000000..706261c10
--- /dev/null
+++ b/test/replication/qsync_sync_mode.test.lua
@@ -0,0 +1,90 @@
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+fiber = require('fiber')
+math = require('math')
+math.randomseed(os.time())
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+
+disable_sync_mode = function() \
+ local s = box.space._space:get(box.space.sync.id) \
+ local new_s = s:update({{'=', 6, {is_sync=false}}}) \
+ box.space._space:replace(new_s) \
+end
+
+enable_sync_mode = function() \
+ local s = box.space._space:get(box.space.sync.id) \
+ local new_s = s:update({{'=', 6, {is_sync=true}}}) \
+ box.space._space:replace(new_s) \
+end
+
+set_random_sync_mode = function() \
+ if (math.random(1, 10) > 5) then \
+ enable_sync_mode() \
+ else \
+ disable_sync_mode() \
+ end \
+end
+
+set_random_quorum = function(n) \
+ box.cfg{replication_synchro_quorum=math.random(1, n)} \
+end
+
+box.schema.user.grant('guest', 'replication')
+
+-- Setup an async cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+ script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+-- Write data to a leader, enable and disable sync mode in background in a
+-- loop. Expected no data loss.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
+-- Testcase body.
+for i = 1,10 do \
+ set_random_sync_mode() \
+ if pcall(box.space.sync.insert, box.space.sync, {i}) then \
+ test_run:switch('replica') \
+ test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
+ end \
+ test_run:switch('default') \
+end
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Write data to a leader, enable and disable sync mode and change quorum value
+-- in background in a loop.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
+-- Testcase body.
+for i = 1,10 do \
+ set_random_sync_mode() \
+ set_random_quorum(5) \
+ if pcall(box.space.sync.insert, box.space.sync, {i}) then \
+ test_run:switch('replica') \
+ test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
+ end \
+ test_run:switch('default') \
+end
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Teardown.
+test_run:cmd('switch default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+test_run:cleanup_cluster()
+box.schema.user.revoke('guest', 'replication')
+box.cfg{ \
+ replication_synchro_quorum = orig_synchro_quorum, \
+ replication_synchro_timeout = orig_synchro_timeout, \
+}
--
2.26.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
2020-08-26 15:10 [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop sergeyb
@ 2020-08-27 11:27 ` Sergey Petrenko
2020-08-31 11:06 ` Sergey Bronnikov
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Petrenko @ 2020-08-27 11:27 UTC (permalink / raw)
To: sergeyb; +Cc: tarantool-patches, Vladislav Shpilevoy
Hi! Thanks for the patch!
Please see my comments below.
26.08.2020 18:10, sergeyb@tarantool.org пишет:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
>
> New regression tests covers cases when one can change synchronous mode
> of space to asynchronous and vice versa.
>
> Closes #5055
> Part of #5144
> ---
>
> Branch: ligurio/gh-4842-qsync-change-mode
> CI: https://gitlab.com/tarantool/tarantool/-/pipelines/182271234
>
> test/replication/qsync_sync_mode.result | 164 ++++++++++++++++++++++
> test/replication/qsync_sync_mode.test.lua | 90 ++++++++++++
> 2 files changed, 254 insertions(+)
> create mode 100644 test/replication/qsync_sync_mode.result
> create mode 100644 test/replication/qsync_sync_mode.test.lua
>
> diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
> new file mode 100644
> index 000000000..f2f95ec0f
> --- /dev/null
> +++ b/test/replication/qsync_sync_mode.result
> @@ -0,0 +1,164 @@
> +-- test-run result file version 2
> +env = require('test_run')
> + | ---
> + | ...
> +test_run = env.new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +math = require('math')
> + | ---
> + | ...
> +math.randomseed(os.time())
> + | ---
> + | ...
> +
> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> + | ---
> + | ...
> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> + | ---
> + | ...
> +
> +disable_sync_mode = function() \
> + local s = box.space._space:get(box.space.sync.id) \
> + local new_s = s:update({{'=', 6, {is_sync=false}}}) \
> + box.space._space:replace(new_s) \
> +end
> + | ---
> + | ...
> +
> +enable_sync_mode = function() \
> + local s = box.space._space:get(box.space.sync.id) \
> + local new_s = s:update({{'=', 6, {is_sync=true}}}) \
> + box.space._space:replace(new_s) \
> +end
> + | ---
> + | ...
> +
> +set_random_sync_mode = function() \
> + if (math.random(1, 10) > 5) then \
> + enable_sync_mode() \
> + else \
> + disable_sync_mode() \
> + end \
> +end
> + | ---
> + | ...
> +
> +set_random_quorum = function(n) \
> + box.cfg{replication_synchro_quorum=math.random(1, n)} \
> +end
> + | ---
> + | ...
> +
> +box.schema.user.grant('guest', 'replication')
> + | ---
> + | ...
> +
> +-- Setup an async cluster with two instances.
> +test_run:cmd('create server replica with rpl_master=default,\
> + script="replication/replica.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +-- Write data to a leader, enable and disable sync mode in background in a
> +-- loop. Expected no data loss.
> +-- Testcase setup.
> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> + | ---
> + | ...
> +-- Testcase body.
> +for i = 1,10 do \
> + set_random_sync_mode() \
> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> + test_run:switch('replica') \
> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> + end \
> + test_run:switch('default') \
> +end
> + | ---
> + | ...
> +-- Testcase cleanup.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.space.sync:drop()
> + | ---
> + | ...
> +
> +-- Write data to a leader, enable and disable sync mode and change quorum value
> +-- in background in a loop.
> +-- Testcase setup.
> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> + | ---
> + | ...
> +-- Testcase body.
> +for i = 1,10 do \
> + set_random_sync_mode() \
> + set_random_quorum(5) \
> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> + test_run:switch('replica') \
> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> + end \
> + test_run:switch('default') \
> +end
> + | ---
> + | ...
> +-- Testcase cleanup.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.space.sync:drop()
> + | ---
> + | ...
> +
> +-- Teardown.
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cleanup_cluster()
> + | ---
> + | ...
> +box.schema.user.revoke('guest', 'replication')
> + | ---
> + | ...
> +box.cfg{ \
> + replication_synchro_quorum = orig_synchro_quorum, \
> + replication_synchro_timeout = orig_synchro_timeout, \
> +}
> + | ---
> + | ...
> diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
> new file mode 100644
> index 000000000..706261c10
> --- /dev/null
> +++ b/test/replication/qsync_sync_mode.test.lua
> @@ -0,0 +1,90 @@
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +fiber = require('fiber')
> +math = require('math')
> +math.randomseed(os.time())
> +
> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> +
> +disable_sync_mode = function() \
> + local s = box.space._space:get(box.space.sync.id) \
> + local new_s = s:update({{'=', 6, {is_sync=false}}}) \
> + box.space._space:replace(new_s) \
> +end
> +
> +enable_sync_mode = function() \
> + local s = box.space._space:get(box.space.sync.id) \
> + local new_s = s:update({{'=', 6, {is_sync=true}}}) \
> + box.space._space:replace(new_s) \
> +end
> +
Vlad has pushed a patch with `space:alter` a couple of days ago.
So now you may say `space:alter{is_sync=true}`, `space:alter{is_sync=false}`
It does the same work you do, but looks much simpler.
> +set_random_sync_mode = function() \
> + if (math.random(1, 10) > 5) then \
> + enable_sync_mode() \
> + else \
> + disable_sync_mode() \
> + end \
> +end
> +
> +set_random_quorum = function(n) \
> + box.cfg{replication_synchro_quorum=math.random(1, n)} \
> +end
> +
> +box.schema.user.grant('guest', 'replication')
> +
> +-- Setup an async cluster with two instances.
> +test_run:cmd('create server replica with rpl_master=default,\
> + script="replication/replica.lua"')
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> +
> +-- Write data to a leader, enable and disable sync mode in background in a
> +-- loop. Expected no data loss.
But sync mode is not set in background, it's set in the same loop where
insertions happen.
> +-- Testcase setup.
> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> +_ = box.space.sync:create_index('pk')
> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
Is such a tiny timeout intended? I see you wrap the errors in a pcall,
but still,
it may happen that you get a timeout 10 out of 10 times and then you'll
test nothing. We usually use 30 second replication timeouts.
> +-- Testcase body.
> +for i = 1,10 do \
> + set_random_sync_mode() \
> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> + test_run:switch('replica') \
> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> + end \
> + test_run:switch('default') \
> +end
> +-- Testcase cleanup.
> +test_run:switch('default')
> +box.space.sync:drop()
> +
> +-- Write data to a leader, enable and disable sync mode and change quorum value
> +-- in background in a loop.
> +-- Testcase setup.
> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> +_ = box.space.sync:create_index('pk')
> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
Ok, I see why you need a small timeout in this testcase. Maybe set a
small timeout
only when the insertion is expected to fail(quorum = 3 or 4)?. And use a
30 sec timeout
otherwise.
> +-- Testcase body.
> +for i = 1,10 do \
> + set_random_sync_mode() \
> + set_random_quorum(5) \
Math.random() range is inclusive, so it'd be more fair, to choose quorum
from 1 to 4.
Then there'd be 2 cases of successful write, quorum 1 and 2, and 2 cases
of failed write,
quorum 3 and 4.
> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> + test_run:switch('replica') \
> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> + end \
> + test_run:switch('default') \
> +end
> +-- Testcase cleanup.
> +test_run:switch('default')
> +box.space.sync:drop()
> +
> +-- Teardown.
> +test_run:cmd('switch default')
> +test_run:cmd('stop server replica')
> +test_run:cmd('delete server replica')
> +test_run:cleanup_cluster()
> +box.schema.user.revoke('guest', 'replication')
> +box.cfg{ \
> + replication_synchro_quorum = orig_synchro_quorum, \
> + replication_synchro_timeout = orig_synchro_timeout, \
> +}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
2020-08-27 11:27 ` Sergey Petrenko
@ 2020-08-31 11:06 ` Sergey Bronnikov
2020-09-01 12:25 ` Serge Petrenko
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Bronnikov @ 2020-08-31 11:06 UTC (permalink / raw)
To: Sergey Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy
Sergey, thanks for review!
See my comments inline. Test has been updated in a branch.
On 14:27 Thu 27 Aug , Sergey Petrenko wrote:
> Hi! Thanks for the patch!
>
> Please see my comments below.
>
> 26.08.2020 18:10, sergeyb@tarantool.org пишет:
> > From: Sergey Bronnikov <sergeyb@tarantool.org>
> >
> > New regression tests covers cases when one can change synchronous mode
> > of space to asynchronous and vice versa.
> >
> > Closes #5055
> > Part of #5144
> > ---
> >
> > Branch: ligurio/gh-4842-qsync-change-mode
> > CI: https://gitlab.com/tarantool/tarantool/-/pipelines/182271234
> >
> > test/replication/qsync_sync_mode.result | 164 ++++++++++++++++++++++
> > test/replication/qsync_sync_mode.test.lua | 90 ++++++++++++
> > 2 files changed, 254 insertions(+)
> > create mode 100644 test/replication/qsync_sync_mode.result
> > create mode 100644 test/replication/qsync_sync_mode.test.lua
> >
> > diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
> > new file mode 100644
> > index 000000000..f2f95ec0f
> > --- /dev/null
> > +++ b/test/replication/qsync_sync_mode.result
> > @@ -0,0 +1,164 @@
> > +-- test-run result file version 2
> > +env = require('test_run')
> > + | ---
> > + | ...
> > +test_run = env.new()
> > + | ---
> > + | ...
> > +engine = test_run:get_cfg('engine')
> > + | ---
> > + | ...
> > +fiber = require('fiber')
> > + | ---
> > + | ...
> > +math = require('math')
> > + | ---
> > + | ...
> > +math.randomseed(os.time())
> > + | ---
> > + | ...
> > +
> > +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> > + | ---
> > + | ...
> > +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> > + | ---
> > + | ...
> > +
> > +disable_sync_mode = function() \
> > + local s = box.space._space:get(box.space.sync.id) \
> > + local new_s = s:update({{'=', 6, {is_sync=false}}}) \
> > + box.space._space:replace(new_s) \
> > +end
> > + | ---
> > + | ...
> > +
> > +enable_sync_mode = function() \
> > + local s = box.space._space:get(box.space.sync.id) \
> > + local new_s = s:update({{'=', 6, {is_sync=true}}}) \
> > + box.space._space:replace(new_s) \
> > +end
> > + | ---
> > + | ...
> > +
> > +set_random_sync_mode = function() \
> > + if (math.random(1, 10) > 5) then \
> > + enable_sync_mode() \
> > + else \
> > + disable_sync_mode() \
> > + end \
> > +end
> > + | ---
> > + | ...
> > +
> > +set_random_quorum = function(n) \
> > + box.cfg{replication_synchro_quorum=math.random(1, n)} \
> > +end
> > + | ---
> > + | ...
> > +
> > +box.schema.user.grant('guest', 'replication')
> > + | ---
> > + | ...
> > +
> > +-- Setup an async cluster with two instances.
> > +test_run:cmd('create server replica with rpl_master=default,\
> > + script="replication/replica.lua"')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd('start server replica with wait=True, wait_load=True')
> > + | ---
> > + | - true
> > + | ...
> > +
> > +-- Write data to a leader, enable and disable sync mode in background in a
> > +-- loop. Expected no data loss.
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > + | ---
> > + | ...
> > +_ = box.space.sync:create_index('pk')
> > + | ---
> > + | ...
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> > + | ---
> > + | ...
> > +-- Testcase body.
> > +for i = 1,10 do \
> > + set_random_sync_mode() \
> > + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> > + test_run:switch('replica') \
> > + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > + end \
> > + test_run:switch('default') \
> > +end
> > + | ---
> > + | ...
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.space.sync:drop()
> > + | ---
> > + | ...
> > +
> > +-- Write data to a leader, enable and disable sync mode and change quorum value
> > +-- in background in a loop.
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > + | ---
> > + | ...
> > +_ = box.space.sync:create_index('pk')
> > + | ---
> > + | ...
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
> > + | ---
> > + | ...
> > +-- Testcase body.
> > +for i = 1,10 do \
> > + set_random_sync_mode() \
> > + set_random_quorum(5) \
> > + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> > + test_run:switch('replica') \
> > + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > + end \
> > + test_run:switch('default') \
> > +end
> > + | ---
> > + | ...
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > + | ---
> > + | - true
> > + | ...
> > +box.space.sync:drop()
> > + | ---
> > + | ...
> > +
> > +-- Teardown.
> > +test_run:cmd('switch default')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd('stop server replica')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cmd('delete server replica')
> > + | ---
> > + | - true
> > + | ...
> > +test_run:cleanup_cluster()
> > + | ---
> > + | ...
> > +box.schema.user.revoke('guest', 'replication')
> > + | ---
> > + | ...
> > +box.cfg{ \
> > + replication_synchro_quorum = orig_synchro_quorum, \
> > + replication_synchro_timeout = orig_synchro_timeout, \
> > +}
> > + | ---
> > + | ...
> > diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
> > new file mode 100644
> > index 000000000..706261c10
> > --- /dev/null
> > +++ b/test/replication/qsync_sync_mode.test.lua
> > @@ -0,0 +1,90 @@
> > +env = require('test_run')
> > +test_run = env.new()
> > +engine = test_run:get_cfg('engine')
> > +fiber = require('fiber')
> > +math = require('math')
> > +math.randomseed(os.time())
> > +
> > +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> > +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> > +
> > +disable_sync_mode = function() \
> > + local s = box.space._space:get(box.space.sync.id) \
> > + local new_s = s:update({{'=', 6, {is_sync=false}}}) \
> > + box.space._space:replace(new_s) \
> > +end
> > +
> > +enable_sync_mode = function() \
> > + local s = box.space._space:get(box.space.sync.id) \
> > + local new_s = s:update({{'=', 6, {is_sync=true}}}) \
> > + box.space._space:replace(new_s) \
> > +end
> > +
>
> Vlad has pushed a patch with `space:alter` a couple of days ago.
> So now you may say `space:alter{is_sync=true}`, `space:alter{is_sync=false}`
> It does the same work you do, but looks much simpler.
used alter in updated patch:
-- Testcase body.
for i = 1,100 do \
box.space.sync:alter{is_sync=random_boolean()} \
box.space.sync:insert{i} \
test_run:switch('replica') \
test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
test_run:switch('default') \
end
> > +set_random_sync_mode = function() \
> > + if (math.random(1, 10) > 5) then \
> > + enable_sync_mode() \
> > + else \
> > + disable_sync_mode() \
> > + end \
> > +end
> > +
> > +set_random_quorum = function(n) \
> > + box.cfg{replication_synchro_quorum=math.random(1, n)} \
> > +end
> > +
> > +box.schema.user.grant('guest', 'replication')
> > +
> > +-- Setup an async cluster with two instances.
> > +test_run:cmd('create server replica with rpl_master=default,\
> > + script="replication/replica.lua"')
> > +test_run:cmd('start server replica with wait=True, wait_load=True')
> > +
> > +-- Write data to a leader, enable and disable sync mode in background in a
> > +-- loop. Expected no data loss.
>
> But sync mode is not set in background, it's set in the same loop where
> insertions happen.
in updated version I set sync mode just before an insert, description in comment
has been updated
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > +_ = box.space.sync:create_index('pk')
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>
> Is such a tiny timeout intended? I see you wrap the errors in a pcall, but
> still,
> it may happen that you get a timeout 10 out of 10 times and then you'll
> test nothing. We usually use 30 second replication timeouts.
Fixed by setting it to 1000.
> > +-- Testcase body.
> > +for i = 1,10 do \
> > + set_random_sync_mode() \
> > + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> > + test_run:switch('replica') \
> > + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > + end \
> > + test_run:switch('default') \
> > +end
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > +box.space.sync:drop()
> > +
> > +-- Write data to a leader, enable and disable sync mode and change quorum value
> > +-- in background in a loop.
> > +-- Testcase setup.
> > +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> > +_ = box.space.sync:create_index('pk')
> > +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>
> Ok, I see why you need a small timeout in this testcase. Maybe set a small
> timeout
> only when the insertion is expected to fail(quorum = 3 or 4)?. And use a 30
> sec timeout
> otherwise.
I have rewrote tests to consider to cases:
- positive, when we have achievable quorum, set mode to sync or async on a master
and then make an insert. Expected successful writes always.
- negative, when we change mode to sync or async with tx's in a limbo.
So comment above is not relevant now.
> > +-- Testcase body.
> > +for i = 1,10 do \
> > + set_random_sync_mode() \
> > + set_random_quorum(5) \
>
> Math.random() range is inclusive, so it'd be more fair, to choose quorum
> from 1 to 4.
> Then there'd be 2 cases of successful write, quorum 1 and 2, and 2 cases of
> failed write,
> quorum 3 and 4.
removed set_random_quorum() in updated tests
> > + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
> > + test_run:switch('replica') \
> > + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> > + end \
> > + test_run:switch('default') \
> > +end
> > +-- Testcase cleanup.
> > +test_run:switch('default')
> > +box.space.sync:drop()
> > +
> > +-- Teardown.
> > +test_run:cmd('switch default')
> > +test_run:cmd('stop server replica')
> > +test_run:cmd('delete server replica')
> > +test_run:cleanup_cluster()
> > +box.schema.user.revoke('guest', 'replication')
> > +box.cfg{ \
> > + replication_synchro_quorum = orig_synchro_quorum, \
> > + replication_synchro_timeout = orig_synchro_timeout, \
> > +}
--
sergeyb@
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
2020-08-31 11:06 ` Sergey Bronnikov
@ 2020-09-01 12:25 ` Serge Petrenko
2020-09-01 12:50 ` Sergey Bronnikov
0 siblings, 1 reply; 6+ messages in thread
From: Serge Petrenko @ 2020-09-01 12:25 UTC (permalink / raw)
To: Sergey Bronnikov; +Cc: tarantool-patches, Vladislav Shpilevoy
31.08.2020 14:06, Sergey Bronnikov пишет:
> Sergey, thanks for review!
> See my comments inline. Test has been updated in a branch.
>
> On 14:27 Thu 27 Aug , Sergey Petrenko wrote:
>> Hi! Thanks for the patch!
>>
>> Please see my comments below.
>>
>> 26.08.2020 18:10, sergeyb@tarantool.org пишет:
>>> From: Sergey Bronnikov <sergeyb@tarantool.org>
>>>
>>> New regression tests covers cases when one can change synchronous mode
>>> of space to asynchronous and vice versa.
>>>
>>> Closes #5055
>>> Part of #5144
>>> ---
>>>
>>> Branch: ligurio/gh-4842-qsync-change-mode
>>> CI: https://gitlab.com/tarantool/tarantool/-/pipelines/182271234
>>>
>>> test/replication/qsync_sync_mode.result | 164 ++++++++++++++++++++++
>>> test/replication/qsync_sync_mode.test.lua | 90 ++++++++++++
>>> 2 files changed, 254 insertions(+)
>>> create mode 100644 test/replication/qsync_sync_mode.result
>>> create mode 100644 test/replication/qsync_sync_mode.test.lua
>>>
>>> diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
>>> new file mode 100644
>>> index 000000000..f2f95ec0f
>>> --- /dev/null
>>> +++ b/test/replication/qsync_sync_mode.result
>>> @@ -0,0 +1,164 @@
>>> +-- test-run result file version 2
>>> +env = require('test_run')
>>> + | ---
>>> + | ...
>>> +test_run = env.new()
>>> + | ---
>>> + | ...
>>> +engine = test_run:get_cfg('engine')
>>> + | ---
>>> + | ...
>>> +fiber = require('fiber')
>>> + | ---
>>> + | ...
>>> +math = require('math')
>>> + | ---
>>> + | ...
>>> +math.randomseed(os.time())
>>> + | ---
>>> + | ...
>>> +
>>> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
>>> + | ---
>>> + | ...
>>> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
>>> + | ---
>>> + | ...
>>> +
>>> +disable_sync_mode = function() \
>>> + local s = box.space._space:get(box.space.sync.id) \
>>> + local new_s = s:update({{'=', 6, {is_sync=false}}}) \
>>> + box.space._space:replace(new_s) \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +enable_sync_mode = function() \
>>> + local s = box.space._space:get(box.space.sync.id) \
>>> + local new_s = s:update({{'=', 6, {is_sync=true}}}) \
>>> + box.space._space:replace(new_s) \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +set_random_sync_mode = function() \
>>> + if (math.random(1, 10) > 5) then \
>>> + enable_sync_mode() \
>>> + else \
>>> + disable_sync_mode() \
>>> + end \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +set_random_quorum = function(n) \
>>> + box.cfg{replication_synchro_quorum=math.random(1, n)} \
>>> +end
>>> + | ---
>>> + | ...
>>> +
>>> +box.schema.user.grant('guest', 'replication')
>>> + | ---
>>> + | ...
>>> +
>>> +-- Setup an async cluster with two instances.
>>> +test_run:cmd('create server replica with rpl_master=default,\
>>> + script="replication/replica.lua"')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('start server replica with wait=True, wait_load=True')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +
>>> +-- Write data to a leader, enable and disable sync mode in background in a
>>> +-- loop. Expected no data loss.
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> + | ---
>>> + | ...
>>> +_ = box.space.sync:create_index('pk')
>>> + | ---
>>> + | ...
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>>> + | ---
>>> + | ...
>>> +-- Testcase body.
>>> +for i = 1,10 do \
>>> + set_random_sync_mode() \
>>> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
>>> + test_run:switch('replica') \
>>> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> + end \
>>> + test_run:switch('default') \
>>> +end
>>> + | ---
>>> + | ...
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.space.sync:drop()
>>> + | ---
>>> + | ...
>>> +
>>> +-- Write data to a leader, enable and disable sync mode and change quorum value
>>> +-- in background in a loop.
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> + | ---
>>> + | ...
>>> +_ = box.space.sync:create_index('pk')
>>> + | ---
>>> + | ...
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>>> + | ---
>>> + | ...
>>> +-- Testcase body.
>>> +for i = 1,10 do \
>>> + set_random_sync_mode() \
>>> + set_random_quorum(5) \
>>> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
>>> + test_run:switch('replica') \
>>> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> + end \
>>> + test_run:switch('default') \
>>> +end
>>> + | ---
>>> + | ...
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.space.sync:drop()
>>> + | ---
>>> + | ...
>>> +
>>> +-- Teardown.
>>> +test_run:cmd('switch default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('stop server replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cmd('delete server replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:cleanup_cluster()
>>> + | ---
>>> + | ...
>>> +box.schema.user.revoke('guest', 'replication')
>>> + | ---
>>> + | ...
>>> +box.cfg{ \
>>> + replication_synchro_quorum = orig_synchro_quorum, \
>>> + replication_synchro_timeout = orig_synchro_timeout, \
>>> +}
>>> + | ---
>>> + | ...
>>> diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
>>> new file mode 100644
>>> index 000000000..706261c10
>>> --- /dev/null
>>> +++ b/test/replication/qsync_sync_mode.test.lua
>>> @@ -0,0 +1,90 @@
>>> +env = require('test_run')
>>> +test_run = env.new()
>>> +engine = test_run:get_cfg('engine')
>>> +fiber = require('fiber')
>>> +math = require('math')
>>> +math.randomseed(os.time())
>>> +
>>> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
>>> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
>>> +
>>> +disable_sync_mode = function() \
>>> + local s = box.space._space:get(box.space.sync.id) \
>>> + local new_s = s:update({{'=', 6, {is_sync=false}}}) \
>>> + box.space._space:replace(new_s) \
>>> +end
>>> +
>>> +enable_sync_mode = function() \
>>> + local s = box.space._space:get(box.space.sync.id) \
>>> + local new_s = s:update({{'=', 6, {is_sync=true}}}) \
>>> + box.space._space:replace(new_s) \
>>> +end
>>> +
>> Vlad has pushed a patch with `space:alter` a couple of days ago.
>> So now you may say `space:alter{is_sync=true}`, `space:alter{is_sync=false}`
>> It does the same work you do, but looks much simpler.
> used alter in updated patch:
>
> -- Testcase body.
> for i = 1,100 do \
> box.space.sync:alter{is_sync=random_boolean()} \
> box.space.sync:insert{i} \
> test_run:switch('replica') \
> test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
> test_run:switch('default') \
> end
>
>>> +set_random_sync_mode = function() \
>>> + if (math.random(1, 10) > 5) then \
>>> + enable_sync_mode() \
>>> + else \
>>> + disable_sync_mode() \
>>> + end \
>>> +end
>>> +
>>> +set_random_quorum = function(n) \
>>> + box.cfg{replication_synchro_quorum=math.random(1, n)} \
>>> +end
>>> +
>>> +box.schema.user.grant('guest', 'replication')
>>> +
>>> +-- Setup an async cluster with two instances.
>>> +test_run:cmd('create server replica with rpl_master=default,\
>>> + script="replication/replica.lua"')
>>> +test_run:cmd('start server replica with wait=True, wait_load=True')
>>> +
>>> +-- Write data to a leader, enable and disable sync mode in background in a
>>> +-- loop. Expected no data loss.
>> But sync mode is not set in background, it's set in the same loop where
>> insertions happen.
> in updated version I set sync mode just before an insert, description in comment
> has been updated
>
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> +_ = box.space.sync:create_index('pk')
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>> Is such a tiny timeout intended? I see you wrap the errors in a pcall, but
>> still,
>> it may happen that you get a timeout 10 out of 10 times and then you'll
>> test nothing. We usually use 30 second replication timeouts.
> Fixed by setting it to 1000.
>
>>> +-- Testcase body.
>>> +for i = 1,10 do \
>>> + set_random_sync_mode() \
>>> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
>>> + test_run:switch('replica') \
>>> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> + end \
>>> + test_run:switch('default') \
>>> +end
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> +box.space.sync:drop()
>>> +
>>> +-- Write data to a leader, enable and disable sync mode and change quorum value
>>> +-- in background in a loop.
>>> +-- Testcase setup.
>>> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
>>> +_ = box.space.sync:create_index('pk')
>>> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=0.001}
>> Ok, I see why you need a small timeout in this testcase. Maybe set a small
>> timeout
>> only when the insertion is expected to fail(quorum = 3 or 4)?. And use a 30
>> sec timeout
>> otherwise.
> I have rewrote tests to consider to cases:
> - positive, when we have achievable quorum, set mode to sync or async on a master
> and then make an insert. Expected successful writes always.
> - negative, when we change mode to sync or async with tx's in a limbo.
>
> So comment above is not relevant now.
Hi! Thanks for the fixes.
The negative case seems pointless, since the change won't happen until
the limbo
gets emptied out. This is because alter{is_sync=...} is a DML operation
which
replaces an entry in _space system space, so it gets stuck on a
non-empty limbo,
just like any DML op.
I suggest you remove it altogether.
Other than that LGTM.
>
>>> +-- Testcase body.
>>> +for i = 1,10 do \
>>> + set_random_sync_mode() \
>>> + set_random_quorum(5) \
>> Math.random() range is inclusive, so it'd be more fair, to choose quorum
>> from 1 to 4.
>> Then there'd be 2 cases of successful write, quorum 1 and 2, and 2 cases of
>> failed write,
>> quorum 3 and 4.
> removed set_random_quorum() in updated tests
>
>>> + if pcall(box.space.sync.insert, box.space.sync, {i}) then \
>>> + test_run:switch('replica') \
>>> + test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end) \
>>> + end \
>>> + test_run:switch('default') \
>>> +end
>>> +-- Testcase cleanup.
>>> +test_run:switch('default')
>>> +box.space.sync:drop()
>>> +
>>> +-- Teardown.
>>> +test_run:cmd('switch default')
>>> +test_run:cmd('stop server replica')
>>> +test_run:cmd('delete server replica')
>>> +test_run:cleanup_cluster()
>>> +box.schema.user.revoke('guest', 'replication')
>>> +box.cfg{ \
>>> + replication_synchro_quorum = orig_synchro_quorum, \
>>> + replication_synchro_timeout = orig_synchro_timeout, \
>>> +}
--
Serge Petrenko
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
2020-09-01 12:25 ` Serge Petrenko
@ 2020-09-01 12:50 ` Sergey Bronnikov
2020-11-12 9:27 ` Sergey Bronnikov
0 siblings, 1 reply; 6+ messages in thread
From: Sergey Bronnikov @ 2020-09-01 12:50 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy
Hi, thanks for review!
On 01.09.2020 15:25, Serge Petrenko wrote:
>> I have rewrote tests to consider to cases:
>> - positive, when we have achievable quorum, set mode to sync or async
>> on a master
>> and then make an insert. Expected successful writes always.
>> - negative, when we change mode to sync or async with tx's in a limbo.
>>
>> So comment above is not relevant now.
>
> Hi! Thanks for the fixes.
>
> The negative case seems pointless, since the change won't happen until
> the limbo
> gets emptied out. This is because alter{is_sync=...} is a DML
> operation which
> replaces an entry in _space system space, so it gets stuck on a
> non-empty limbo,
> just like any DML op.
Yes, I had doubts regarding this testcase despite I know about such
behavior.
> I suggest you remove it altogether.
Well, negative testcase removed.
>
> Other than that LGTM.
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop
2020-09-01 12:50 ` Sergey Bronnikov
@ 2020-11-12 9:27 ` Sergey Bronnikov
0 siblings, 0 replies; 6+ messages in thread
From: Sergey Bronnikov @ 2020-11-12 9:27 UTC (permalink / raw)
To: Serge Petrenko; +Cc: tarantool-patches, Vladislav Shpilevoy
Hi,
On 15:50 Tue 01 Sep , Sergey Bronnikov wrote:
> Hi, thanks for review!
>
> On 01.09.2020 15:25, Serge Petrenko wrote:
> > > I have rewrote tests to consider to cases:
> > > - positive, when we have achievable quorum, set mode to sync or
> > > async on a master
> > > and then make an insert. Expected successful writes always.
> > > - negative, when we change mode to sync or async with tx's in a limbo.
updated patch in series v2 [1]
1. https://lists.tarantool.org/pipermail/tarantool-patches/2020-November/020636.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-11-12 9:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-26 15:10 [Tarantool-patches] [PATCH v1] replication: change space sync mode in a loop sergeyb
2020-08-27 11:27 ` Sergey Petrenko
2020-08-31 11:06 ` Sergey Bronnikov
2020-09-01 12:25 ` Serge Petrenko
2020-09-01 12:50 ` Sergey Bronnikov
2020-11-12 9:27 ` Sergey Bronnikov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox