Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>,
	tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH v30 3/3] test: add gh-6036-qsync-order test
Date: Mon, 28 Feb 2022 11:13:06 +0300	[thread overview]
Message-ID: <8c04975a-ec56-e1bd-e4d4-3066e6e00439@tarantool.org> (raw)
In-Reply-To: <20220224201841.412565-4-gorcunov@gmail.com>



24.02.2022 23:18, Cyrill Gorcunov пишет:
> To test that promotion requests are handled only when appropriate
> write to WAL completes, because we update memory data before the
> write finishes.
>
> Part-of #6036
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

Thanks for the patch and the fixes overall!

The test finally works fine on my machine.
I've experienced some flakiness, but I was able to fix
that with the following diff. Please, consider:

======================

diff --git a/test/replication-luatest/gh_6036_qsync_order_test.lua 
b/test/replication-luatest/gh_6036_qsync_order_test.lua
index 95ed3a517..d71739dcc 100644
--- a/test/replication-luatest/gh_6036_qsync_order_test.lua
+++ b/test/replication-luatest/gh_6036_qsync_order_test.lua
@@ -142,10 +142,19 @@ g.test_qsync_order = function(cg)
      cg.r2:wait_vclock(vclock)
      cg.r3:wait_vclock(vclock)

+    -- Drop connection between r1 and the rest of the cluster.
+    -- Otherwise r1 might become Raft follower before attempting insert{4}.
+    cg.r1:exec(function() box.cfg{replication=""} end)
      cg.r3:exec(function()
          box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 2)
          require('fiber').create(function() box.ctl.promote() end)
      end)
+    t.helpers.retrying({}, function()
+        t.assert(cg.r3:exec(function()
+            return box.info.synchro.queue.latched
+        end))
+    end)
+    t.assert(cg.r1:exec(function() return box.info.ro == false end))
      cg.r1:eval("box.space.test:insert{4}")
      cg.r3:exec(function()
          assert(box.info.synchro.queue.latched == true)

=======================

Also please address a couple of style-related comments below:


> ---
>   .../gh_6036_qsync_order_test.lua              | 157 ++++++++++++++++++
>   test/replication-luatest/suite.ini            |   1 +
>   2 files changed, 158 insertions(+)
>   create mode 100644 test/replication-luatest/gh_6036_qsync_order_test.lua
>
> diff --git a/test/replication-luatest/gh_6036_qsync_order_test.lua b/test/replication-luatest/gh_6036_qsync_order_test.lua
> new file mode 100644
> index 000000000..95ed3a517
> --- /dev/null
> +++ b/test/replication-luatest/gh_6036_qsync_order_test.lua
> @@ -0,0 +1,157 @@
> +local t = require('luatest')
> +local cluster = require('test.luatest_helpers.cluster')
> +local server = require('test.luatest_helpers.server')
> +local fiber = require('fiber')
> +
> +local g = t.group('gh-6036')
> +
> +g.before_each(function(cg)
> +    cg.cluster = cluster:new({})
> +
> +    local box_cfg = {
> +        replication = {
> +            server.build_instance_uri('r1'),
> +            server.build_instance_uri('r2'),
> +            server.build_instance_uri('r3'),
> +        },
> +        replication_timeout         = 0.1,
> +        replication_connect_quorum  = 1,
> +        election_mode               = 'manual',
> +        election_timeout            = 0.1,
> +        replication_synchro_quorum  = 1,
> +        replication_synchro_timeout = 0.1,
> +        log_level                   = 6,
> +    }
> +
> +    cg.r1 = cg.cluster:build_server({ alias = 'r1', box_cfg = box_cfg })
> +    cg.r2 = cg.cluster:build_server({ alias = 'r2', box_cfg = box_cfg })
> +    cg.r3 = cg.cluster:build_server({ alias = 'r3', box_cfg = box_cfg })
> +
> +    cg.cluster:add_server(cg.r1)
> +    cg.cluster:add_server(cg.r2)
> +    cg.cluster:add_server(cg.r3)
> +    cg.cluster:start()
> +end)
> +
> +g.after_each(function(cg)
> +    cg.cluster:drop()
> +    cg.cluster.servers = nil
> +end)
> +
> +g.test_qsync_order = function(cg)
> +    cg.cluster:wait_fullmesh()
> +
> +    --
> +    -- Create a synchro space on the r1 node and make
> +    -- sure the write processed just fine.
> +    cg.r1:exec(function()
> +        box.ctl.promote()
> +        box.ctl.wait_rw()
> +        local s = box.schema.create_space('test', {is_sync = true})
> +        s:create_index('pk')
> +        s:insert{1}
> +    end)
> +
> +    local vclock = cg.r1:get_vclock()
> +    vclock[0] = nil
> +    cg.r2:wait_vclock(vclock)
> +    cg.r3:wait_vclock(vclock)
> +
> +    t.assert_equals(cg.r1:eval("return box.space.test:select()"), {{1}})
> +    t.assert_equals(cg.r2:eval("return box.space.test:select()"), {{1}})
> +    t.assert_equals(cg.r3:eval("return box.space.test:select()"), {{1}})
> +
> +    local function update_replication(...)
> +        return (box.cfg{ replication = { ... } })
> +    end
> +
> +    --
> +    -- Drop connection between r1 and r2.
> +    cg.r1:exec(update_replication, {
> +            server.build_instance_uri("r1"),
> +            server.build_instance_uri("r3"),
> +        })
> +
> +    --
> +    -- Drop connection between r2 and r1.
> +    cg.r2:exec(update_replication, {
> +        server.build_instance_uri("r2"),
> +        server.build_instance_uri("r3"),
> +    })
> +
> +    --
> +    -- Here we have the following scheme
> +    --
> +    --      r3 (WAL delay)
> +    --      /            \
> +    --    r1              r2
> +    --
> +
> +    --
> +    -- Initiate disk delay in a bit tricky way: the next write will
> +    -- fall into forever sleep.
> +    cg.r3:eval("box.error.injection.set('ERRINJ_WAL_DELAY', true)")

1. Sometimes you use 'eval' and sometimes you use 'exec', and I don't see
    a pattern behind that. Please check every case with 'eval' and 
replace it
    with 'exec' when possible.

> +
> +    --
> +    -- Make r2 been a leader and start writting data, the PROMOTE
> +    -- request get queued on r3 and not yet processed, same time
> +    -- the INSERT won't complete either waiting for the PROMOTE
> +    -- completion first. Note that we enter r3 as well just to be
> +    -- sure the PROMOTE has reached it via queue state test.
> +    cg.r2:exec(function()
> +        box.ctl.promote()
> +        box.ctl.wait_rw()
> +    end)
> +    t.helpers.retrying({}, function()
> +        assert(cg.r3:exec(function()
> +            return box.info.synchro.queue.latched == true
> +        end))

2. Here you use a plain 'assert' instead of 't.assert'. Please avoid
    plain assertions in luatest tests.

> +    end)
> +    cg.r2:eval("box.space.test:insert{2}")

3. Like I already mentioned above, could you wrap that into an 'exec' 
instead?

> +
> +    --
> +    -- The r1 node has no clue that there is a new leader and continue
> +    -- writing data with obsolete term. Since r3 is delayed now
> +    -- the INSERT won't proceed yet but get queued.
> +    cg.r1:eval("box.space.test:insert{3}")
> +
> +    --
> +    -- Finally enable r3 back. Make sure the data from new r2 leader get
> +    -- writing while old leader's data ignored.
> +    cg.r3:eval("box.error.injection.set('ERRINJ_WAL_DELAY', false)")
> +    t.helpers.retrying({}, function()
> +        assert(cg.r3:exec(function()
> +            return box.space.test:get{2} ~= nil
> +        end))
> +    end)
> +
> +    t.assert_equals(cg.r3:eval("return box.space.test:select()"), {{1},{2}})
> +

4. You group two tests in one function. Let's better extract the test 
below into
    a separate function. For example, g.test_promote_order, or something.

    First of all, you may get rid of the 3rd instance in this test (you 
only need 2 of them),
    secondly, now you enter the test with a dirty config from the 
previous test:
    r1 <-> r2 <-> r3 (no connection between r1 and r3).

> +    --
> +    -- Make sure that while we're processing PROMOTE no other records
> +    -- get sneaked in via applier code from other replicas. For this
> +    -- sake initiate voting and stop inside wal thread just before
> +    -- PROMOTE get written. Another replica sends us new record and
> +    -- it should be dropped.
> +    cg.r1:exec(function()
> +        box.ctl.promote()
> +        box.ctl.wait_rw()
> +    end)
> +    vclock = cg.r1:get_vclock()
> +    vclock[0] = nil
> +    cg.r2:wait_vclock(vclock)
> +    cg.r3:wait_vclock(vclock)
> +
> +    cg.r3:exec(function()
> +        box.error.injection.set('ERRINJ_WAL_DELAY_COUNTDOWN', 2)
> +        require('fiber').create(function() box.ctl.promote() end)
> +    end)
> +    cg.r1:eval("box.space.test:insert{4}")
> +    cg.r3:exec(function()
> +        assert(box.info.synchro.queue.latched == true)
> +        box.error.injection.set('ERRINJ_WAL_DELAY', false)
> +        box.ctl.wait_rw()
> +    end)
> +
> +    t.assert_equals(cg.r3:eval("return box.space.test:select()"), {{1},{2}})
> +end
> diff --git a/test/replication-luatest/suite.ini b/test/replication-luatest/suite.ini
> index 374f1b87a..07ec93a52 100644
> --- a/test/replication-luatest/suite.ini
> +++ b/test/replication-luatest/suite.ini
> @@ -2,3 +2,4 @@
>   core = luatest
>   description = replication luatests
>   is_parallel = True
> +release_disabled = gh_6036_qsync_order_test.lua

-- 
Serge Petrenko


      reply	other threads:[~2022-02-28  8:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:18 [Tarantool-patches] [RFC v30 0/3] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
2022-02-24 20:18 ` [Tarantool-patches] [PATCH v30 1/3] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2022-02-28  8:13   ` Serge Petrenko via Tarantool-patches
2022-02-24 20:18 ` [Tarantool-patches] [PATCH v30 2/3] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2022-02-28  8:24   ` Serge Petrenko via Tarantool-patches
2022-02-24 20:18 ` [Tarantool-patches] [PATCH v30 3/3] test: add gh-6036-qsync-order test Cyrill Gorcunov via Tarantool-patches
2022-02-28  8:13   ` Serge Petrenko via Tarantool-patches [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c04975a-ec56-e1bd-e4d4-3066e6e00439@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v30 3/3] test: add gh-6036-qsync-order test' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox