[Tarantool-patches] [PATCH v30 3/3] test: add gh-6036-qsync-order test
Serge Petrenko
sergepetrenko at tarantool.org
Mon Feb 28 11:13:06 MSK 2022
24.02.2022 23:18, Cyrill Gorcunov пишет:
> To test that promotion requests are handled only when appropriate
> write to WAL completes, because we update memory data before the
> write finishes.
>
> Part-of #6036
>
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
Thanks for the patch and the fixes overall!
The test finally works fine on my machine.
I've experienced some flakiness, but I was able to fix
that with the following diff. Please, consider:
======================
diff --git a/test/replication-luatest/gh_6036_qsync_order_test.lua
b/test/replication-luatest/gh_6036_qsync_order_test.lua
index 95ed3a517..d71739dcc 100644
--- a/test/replication-luatest/gh_6036_qsync_order_test.lua
+++ b/test/replication-luatest/gh_6036_qsync_order_test.lua
@@ -142,10 +142,19 @@ g.test_qsync_order = function(cg)
cg.r2:wait_vclock(vclock)
cg.r3:wait_vclock(vclock)
+ -- Drop connection between r1 and the rest of the cluster.
+ -- Otherwise r1 might become Raft follower before attempting insert{4}.
+ cg.r1:exec(function() box.cfg{replication=""} end)
cg.r3:exec(function()
box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 2)
require('fiber').create(function() box.ctl.promote() end)
end)
+ t.helpers.retrying({}, function()
+ t.assert(cg.r3:exec(function()
+ return box.info.synchro.queue.latched
+ end))
+ end)
+ t.assert(cg.r1:exec(function() return box.info.ro == false end))
cg.r1:eval("box.space.test:insert{4}")
cg.r3:exec(function()
assert(box.info.synchro.queue.latched == true)
=======================
Also please address a couple of style-related comments below:
> ---
> .../gh_6036_qsync_order_test.lua | 157 ++++++++++++++++++
> test/replication-luatest/suite.ini | 1 +
> 2 files changed, 158 insertions(+)
> create mode 100644 test/replication-luatest/gh_6036_qsync_order_test.lua
>
> diff --git a/test/replication-luatest/gh_6036_qsync_order_test.lua b/test/replication-luatest/gh_6036_qsync_order_test.lua
> new file mode 100644
> index 000000000..95ed3a517
> --- /dev/null
> +++ b/test/replication-luatest/gh_6036_qsync_order_test.lua
> @@ -0,0 +1,157 @@
> +local t = require('luatest')
> +local cluster = require('test.luatest_helpers.cluster')
> +local server = require('test.luatest_helpers.server')
> +local fiber = require('fiber')
> +
> +local g = t.group('gh-6036')
> +
> +g.before_each(function(cg)
> + cg.cluster = cluster:new({})
> +
> + local box_cfg = {
> + replication = {
> + server.build_instance_uri('r1'),
> + server.build_instance_uri('r2'),
> + server.build_instance_uri('r3'),
> + },
> + replication_timeout = 0.1,
> + replication_connect_quorum = 1,
> + election_mode = 'manual',
> + election_timeout = 0.1,
> + replication_synchro_quorum = 1,
> + replication_synchro_timeout = 0.1,
> + log_level = 6,
> + }
> +
> + cg.r1 = cg.cluster:build_server({ alias = 'r1', box_cfg = box_cfg })
> + cg.r2 = cg.cluster:build_server({ alias = 'r2', box_cfg = box_cfg })
> + cg.r3 = cg.cluster:build_server({ alias = 'r3', box_cfg = box_cfg })
> +
> + cg.cluster:add_server(cg.r1)
> + cg.cluster:add_server(cg.r2)
> + cg.cluster:add_server(cg.r3)
> + cg.cluster:start()
> +end)
> +
> +g.after_each(function(cg)
> + cg.cluster:drop()
> + cg.cluster.servers = nil
> +end)
> +
> +g.test_qsync_order = function(cg)
> + cg.cluster:wait_fullmesh()
> +
> + --
> + -- Create a synchro space on the r1 node and make
> + -- sure the write processed just fine.
> + cg.r1:exec(function()
> + box.ctl.promote()
> + box.ctl.wait_rw()
> + local s = box.schema.create_space('test', {is_sync = true})
> + s:create_index('pk')
> + s:insert{1}
> + end)
> +
> + local vclock = cg.r1:get_vclock()
> + vclock[0] = nil
> + cg.r2:wait_vclock(vclock)
> + cg.r3:wait_vclock(vclock)
> +
> + t.assert_equals(cg.r1:eval("return box.space.test:select()"), {{1}})
> + t.assert_equals(cg.r2:eval("return box.space.test:select()"), {{1}})
> + t.assert_equals(cg.r3:eval("return box.space.test:select()"), {{1}})
> +
> + local function update_replication(...)
> + return (box.cfg{ replication = { ... } })
> + end
> +
> + --
> + -- Drop connection between r1 and r2.
> + cg.r1:exec(update_replication, {
> + server.build_instance_uri("r1"),
> + server.build_instance_uri("r3"),
> + })
> +
> + --
> + -- Drop connection between r2 and r1.
> + cg.r2:exec(update_replication, {
> + server.build_instance_uri("r2"),
> + server.build_instance_uri("r3"),
> + })
> +
> + --
> + -- Here we have the following scheme
> + --
> + -- r3 (WAL delay)
> + -- / \
> + -- r1 r2
> + --
> +
> + --
> + -- Initiate disk delay in a bit tricky way: the next write will
> + -- fall into forever sleep.
> + cg.r3:eval("box.error.injection.set('ERRINJ_WAL_DELAY', true)")
1. Sometimes you use 'eval' and sometimes you use 'exec', and I don't see
a pattern behind that. Please check every case with 'eval' and
replace it
with 'exec' when possible.
> +
> + --
> + -- Make r2 been a leader and start writting data, the PROMOTE
> + -- request get queued on r3 and not yet processed, same time
> + -- the INSERT won't complete either waiting for the PROMOTE
> + -- completion first. Note that we enter r3 as well just to be
> + -- sure the PROMOTE has reached it via queue state test.
> + cg.r2:exec(function()
> + box.ctl.promote()
> + box.ctl.wait_rw()
> + end)
> + t.helpers.retrying({}, function()
> + assert(cg.r3:exec(function()
> + return box.info.synchro.queue.latched == true
> + end))
2. Here you use a plain 'assert' instead of 't.assert'. Please avoid
plain assertions in luatest tests.
> + end)
> + cg.r2:eval("box.space.test:insert{2}")
3. Like I already mentioned above, could you wrap that into an 'exec'
instead?
> +
> + --
> + -- The r1 node has no clue that there is a new leader and continue
> + -- writing data with obsolete term. Since r3 is delayed now
> + -- the INSERT won't proceed yet but get queued.
> + cg.r1:eval("box.space.test:insert{3}")
> +
> + --
> + -- Finally enable r3 back. Make sure the data from new r2 leader get
> + -- writing while old leader's data ignored.
> + cg.r3:eval("box.error.injection.set('ERRINJ_WAL_DELAY', false)")
> + t.helpers.retrying({}, function()
> + assert(cg.r3:exec(function()
> + return box.space.test:get{2} ~= nil
> + end))
> + end)
> +
> + t.assert_equals(cg.r3:eval("return box.space.test:select()"), {{1},{2}})
> +
4. You group two tests in one function. Let's better extract the test
below into
a separate function. For example, g.test_promote_order, or something.
First of all, you may get rid of the 3rd instance in this test (you
only need 2 of them),
secondly, now you enter the test with a dirty config from the
previous test:
r1 <-> r2 <-> r3 (no connection between r1 and r3).
> + --
> + -- Make sure that while we're processing PROMOTE no other records
> + -- get sneaked in via applier code from other replicas. For this
> + -- sake initiate voting and stop inside wal thread just before
> + -- PROMOTE get written. Another replica sends us new record and
> + -- it should be dropped.
> + cg.r1:exec(function()
> + box.ctl.promote()
> + box.ctl.wait_rw()
> + end)
> + vclock = cg.r1:get_vclock()
> + vclock[0] = nil
> + cg.r2:wait_vclock(vclock)
> + cg.r3:wait_vclock(vclock)
> +
> + cg.r3:exec(function()
> + box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 2)
> + require('fiber').create(function() box.ctl.promote() end)
> + end)
> + cg.r1:eval("box.space.test:insert{4}")
> + cg.r3:exec(function()
> + assert(box.info.synchro.queue.latched == true)
> + box.error.injection.set('ERRINJ_WAL_DELAY', false)
> + box.ctl.wait_rw()
> + end)
> +
> + t.assert_equals(cg.r3:eval("return box.space.test:select()"), {{1},{2}})
> +end
> diff --git a/test/replication-luatest/suite.ini b/test/replication-luatest/suite.ini
> index 374f1b87a..07ec93a52 100644
> --- a/test/replication-luatest/suite.ini
> +++ b/test/replication-luatest/suite.ini
> @@ -2,3 +2,4 @@
> core = luatest
> description = replication luatests
> is_parallel = True
> +release_disabled = gh_6036_qsync_order_test.lua
--
Serge Petrenko
More information about the Tarantool-patches
mailing list