Tarantool development patches archive
 help / color / mirror / Atom feed
From: Cyrill Gorcunov via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tml <tarantool-patches@dev.tarantool.org>
Cc: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Subject: [Tarantool-patches] [PATCH v19 3/3] test: add gh-6036-qsync-order test
Date: Thu, 30 Sep 2021 12:44:45 +0300
Message-ID: <20210930094445.316694-4-gorcunov@gmail.com> (raw)
In-Reply-To: <20210930094445.316694-1-gorcunov@gmail.com>

To test that promotion requests are handled only when appropriate
write to WAL completes, because we update memory data before the
write finishes.

Note that without the patch this test fires assertion

> tarantool: src/box/txn_limbo.c:481: txn_limbo_read_rollback: Assertion `e->txn->signature >= 0' failed.

Part-of #6036

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 test/replication/gh-6036-qsync-master.lua     |   1 +
 test/replication/gh-6036-qsync-node.lua       |  35 +++
 test/replication/gh-6036-qsync-order.result   | 207 ++++++++++++++++++
 test/replication/gh-6036-qsync-order.test.lua |  95 ++++++++
 test/replication/gh-6036-qsync-replica1.lua   |   1 +
 test/replication/gh-6036-qsync-replica2.lua   |   1 +
 test/replication/suite.cfg                    |   1 +
 test/replication/suite.ini                    |   2 +-
 8 files changed, 342 insertions(+), 1 deletion(-)
 create mode 120000 test/replication/gh-6036-qsync-master.lua
 create mode 100644 test/replication/gh-6036-qsync-node.lua
 create mode 100644 test/replication/gh-6036-qsync-order.result
 create mode 100644 test/replication/gh-6036-qsync-order.test.lua
 create mode 120000 test/replication/gh-6036-qsync-replica1.lua
 create mode 120000 test/replication/gh-6036-qsync-replica2.lua

diff --git a/test/replication/gh-6036-qsync-master.lua b/test/replication/gh-6036-qsync-master.lua
new file mode 120000
index 000000000..87bdb46ef
--- /dev/null
+++ b/test/replication/gh-6036-qsync-master.lua
@@ -0,0 +1 @@
+gh-6036-qsync-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-qsync-node.lua b/test/replication/gh-6036-qsync-node.lua
new file mode 100644
index 000000000..ba6213255
--- /dev/null
+++ b/test/replication/gh-6036-qsync-node.lua
@@ -0,0 +1,35 @@
+local INSTANCE_ID = string.match(arg[0], "gh%-6036%-qsync%-(.+)%.lua")
+
+local function unix_socket(name)
+    return "unix/:./" .. name .. '.sock';
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+local box_cfg_common = {
+        listen                      = unix_socket(INSTANCE_ID),
+        replication                 = {
+            unix_socket("master"),
+            unix_socket("replica1"),
+            unix_socket("replica2"),
+        },
+        replication_connect_quorum  = 1,
+        replication_synchro_quorum  = 1,
+        replication_synchro_timeout = 10000,
+}
+
+if INSTANCE_ID == "master" then
+    box_cfg_common['election_mode'] = "manual"
+    box.cfg(box_cfg_common)
+elseif INSTANCE_ID == "replica1" then
+    box_cfg_common['election_mode'] = "manual"
+    box.cfg(box_cfg_common)
+else
+    assert(INSTANCE_ID == "replica2")
+    box_cfg_common['election_mode'] = "manual"
+    box.cfg(box_cfg_common)
+end
+
+box.once("bootstrap", function()
+    box.schema.user.grant('guest', 'super')
+end)
diff --git a/test/replication/gh-6036-qsync-order.result b/test/replication/gh-6036-qsync-order.result
new file mode 100644
index 000000000..378ce917a
--- /dev/null
+++ b/test/replication/gh-6036-qsync-order.result
@@ -0,0 +1,207 @@
+-- test-run result file version 2
+--
+-- gh-6036: verify that terms are locked when we're inside journal
+-- write routine, because parallel appliers may ignore the fact that
+-- the term is updated already but not yet written leading to data
+-- inconsistency.
+--
+test_run = require('test_run').new()
+ | ---
+ | ...
+
+test_run:cmd('create server master with script="replication/gh-6036-qsync-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica1 with script="replication/gh-6036-qsync-replica1.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica2 with script="replication/gh-6036-qsync-replica2.lua"')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('start server master with wait=False')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica1 with wait=False')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica2 with wait=False')
+ | ---
+ | - true
+ | ...
+
+test_run:wait_fullmesh({"master", "replica1", "replica2"})
+ | ---
+ | ...
+
+--
+-- Create a synchro space on the master node and make
+-- sure the write processed just fine.
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+s = box.schema.create_space('test', {is_sync = true})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+s:insert{1}
+ | ---
+ | - [1]
+ | ...
+test_run:switch("replica1")
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
+ | ---
+ | - true
+ | ...
+test_run:switch("replica2")
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
+ | ---
+ | - true
+ | ...
+
+--
+-- Drop connection between master and replica1.
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./master.sock",              \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+ | ---
+ | ...
+--
+-- Drop connection between replica1 and master.
+test_run:switch("replica1")
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
+ | ---
+ | - true
+ | ...
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./replica1.sock",            \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+ | ---
+ | ...
+
+--
+-- Here we have the following scheme
+--
+--              replica2 (will be delayed)
+--              /     \
+--          master    replica1
+
+--
+-- Initiate disk delay and remember the max term seen so far.
+test_run:switch("replica2")
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_DELAY', true)
+ | ---
+ | - ok
+ | ...
+
+--
+-- Make replica1 been a leader and start writting data,
+-- the PROMOTE request get queued on replica2 and not
+-- yet processed, same time INSERT won't complete either
+-- waiting for PROMOTE completion first.
+test_run:switch("replica1")
+ | ---
+ | - true
+ | ...
+box.ctl.promote()
+ | ---
+ | ...
+_ = require('fiber').create(function() box.space.test:insert{2} end)
+ | ---
+ | ...
+
+--
+-- The master node has no clue that there is a new leader
+-- and continue writting data with obsolete term. Since replica2
+-- is delayed now the INSERT won't proceed yet but get queued.
+test_run:switch("master")
+ | ---
+ | - true
+ | ...
+_ = require('fiber').create(function() box.space.test:insert{3} end)
+ | ---
+ | ...
+
+--
+-- Finally enable replica2 back. Make sure the data from new replica1
+-- leader get writting while old leader's data ignored.
+test_run:switch("replica2")
+ | ---
+ | - true
+ | ...
+box.error.injection.set('ERRINJ_WAL_DELAY', false)
+ | ---
+ | - ok
+ | ...
+test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
+ | ---
+ | - true
+ | ...
+box.space.test:select{}
+ | ---
+ | - - [1]
+ |   - [2]
+ | ...
+
+test_run:switch("default")
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica2')
+ | ---
+ | - true
+ | ...
+
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica1')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica2')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-6036-qsync-order.test.lua b/test/replication/gh-6036-qsync-order.test.lua
new file mode 100644
index 000000000..ac1dc5d91
--- /dev/null
+++ b/test/replication/gh-6036-qsync-order.test.lua
@@ -0,0 +1,95 @@
+--
+-- gh-6036: verify that terms are locked when we're inside journal
+-- write routine, because parallel appliers may ignore the fact that
+-- the term is updated already but not yet written leading to data
+-- inconsistency.
+--
+test_run = require('test_run').new()
+
+test_run:cmd('create server master with script="replication/gh-6036-qsync-master.lua"')
+test_run:cmd('create server replica1 with script="replication/gh-6036-qsync-replica1.lua"')
+test_run:cmd('create server replica2 with script="replication/gh-6036-qsync-replica2.lua"')
+
+test_run:cmd('start server master with wait=False')
+test_run:cmd('start server replica1 with wait=False')
+test_run:cmd('start server replica2 with wait=False')
+
+test_run:wait_fullmesh({"master", "replica1", "replica2"})
+
+--
+-- Create a synchro space on the master node and make
+-- sure the write processed just fine.
+test_run:switch("master")
+box.ctl.promote()
+s = box.schema.create_space('test', {is_sync = true})
+_ = s:create_index('pk')
+s:insert{1}
+test_run:switch("replica1")
+test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
+test_run:switch("replica2")
+test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
+
+--
+-- Drop connection between master and replica1.
+test_run:switch("master")
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./master.sock",              \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+--
+-- Drop connection between replica1 and master.
+test_run:switch("replica1")
+test_run:wait_cond(function() return box.space.test:get{1} ~= nil end)
+box.cfg({                                   \
+    replication = {                         \
+        "unix/:./replica1.sock",            \
+        "unix/:./replica2.sock",            \
+    },                                      \
+})
+
+--
+-- Here we have the following scheme
+--
+--              replica2 (will be delayed)
+--              /     \
+--          master    replica1
+
+--
+-- Initiate disk delay and remember the max term seen so far.
+test_run:switch("replica2")
+box.error.injection.set('ERRINJ_WAL_DELAY', true)
+
+--
+-- Make replica1 been a leader and start writting data,
+-- the PROMOTE request get queued on replica2 and not
+-- yet processed, same time INSERT won't complete either
+-- waiting for PROMOTE completion first.
+test_run:switch("replica1")
+box.ctl.promote()
+_ = require('fiber').create(function() box.space.test:insert{2} end)
+
+--
+-- The master node has no clue that there is a new leader
+-- and continue writting data with obsolete term. Since replica2
+-- is delayed now the INSERT won't proceed yet but get queued.
+test_run:switch("master")
+_ = require('fiber').create(function() box.space.test:insert{3} end)
+
+--
+-- Finally enable replica2 back. Make sure the data from new replica1
+-- leader get writting while old leader's data ignored.
+test_run:switch("replica2")
+box.error.injection.set('ERRINJ_WAL_DELAY', false)
+test_run:wait_cond(function() return box.space.test:get{2} ~= nil end)
+box.space.test:select{}
+
+test_run:switch("default")
+test_run:cmd('stop server master')
+test_run:cmd('stop server replica1')
+test_run:cmd('stop server replica2')
+
+test_run:cmd('delete server master')
+test_run:cmd('delete server replica1')
+test_run:cmd('delete server replica2')
diff --git a/test/replication/gh-6036-qsync-replica1.lua b/test/replication/gh-6036-qsync-replica1.lua
new file mode 120000
index 000000000..87bdb46ef
--- /dev/null
+++ b/test/replication/gh-6036-qsync-replica1.lua
@@ -0,0 +1 @@
+gh-6036-qsync-node.lua
\ No newline at end of file
diff --git a/test/replication/gh-6036-qsync-replica2.lua b/test/replication/gh-6036-qsync-replica2.lua
new file mode 120000
index 000000000..87bdb46ef
--- /dev/null
+++ b/test/replication/gh-6036-qsync-replica2.lua
@@ -0,0 +1 @@
+gh-6036-qsync-node.lua
\ No newline at end of file
diff --git a/test/replication/suite.cfg b/test/replication/suite.cfg
index 3eee0803c..ed09b2087 100644
--- a/test/replication/suite.cfg
+++ b/test/replication/suite.cfg
@@ -59,6 +59,7 @@
     "gh-6094-rs-uuid-mismatch.test.lua": {},
     "gh-6127-election-join-new.test.lua": {},
     "gh-6035-applier-filter.test.lua": {},
+    "gh-6036-qsync-order.test.lua": {},
     "election-candidate-promote.test.lua": {},
     "*": {
         "memtx": {"engine": "memtx"},
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 77eb95f49..080e4fbf4 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -3,7 +3,7 @@ core = tarantool
 script =  master.lua
 description = tarantool/box, replication
 disabled = consistent.test.lua
-release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua
+release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua qsync_advanced.test.lua qsync_errinj.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua long_row_timeout.test.lua gh-4739-vclock-assert.test.lua gh-4730-applier-rollback.test.lua gh-5140-qsync-casc-rollback.test.lua gh-5144-qsync-dup-confirm.test.lua gh-5167-qsync-rollback-snap.test.lua gh-5430-qsync-promote-crash.test.lua gh-5430-cluster-mvcc.test.lua  gh-5506-election-on-off.test.lua gh-5536-wal-limit.test.lua hang_on_synchro_fail.test.lua anon_register_gap.test.lua gh-5213-qsync-applier-order.test.lua gh-5213-qsync-applier-order-3.test.lua gh-6027-applier-error-show.test.lua gh-6032-promote-wal-write.test.lua gh-6057-qsync-confirm-async-no-wal.test.lua gh-5447-downstream-lag.test.lua gh-4040-invalid-msgpack.test.lua gh-6036-qsync-order.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
 use_unix_sockets = True
-- 
2.31.1


  parent reply	other threads:[~2021-09-30  9:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30  9:44 [Tarantool-patches] [PATCH v19 0/3] qsync: implement packet filtering (part 1) Cyrill Gorcunov via Tarantool-patches
2021-09-30  9:44 ` [Tarantool-patches] [PATCH v19 1/3] latch: add latch_is_locked helper Cyrill Gorcunov via Tarantool-patches
2021-09-30  9:44 ` [Tarantool-patches] [PATCH v19 2/3] qsync: order access to the limbo terms Cyrill Gorcunov via Tarantool-patches
2021-10-01 12:14   ` Serge Petrenko via Tarantool-patches
2021-10-01 12:31     ` Cyrill Gorcunov via Tarantool-patches
2021-10-01 12:37       ` Serge Petrenko via Tarantool-patches
2021-10-04 21:53         ` Cyrill Gorcunov via Tarantool-patches
2021-10-05 13:25           ` Serge Petrenko via Tarantool-patches
2021-10-05 21:52             ` Cyrill Gorcunov via Tarantool-patches
2021-09-30  9:44 ` Cyrill Gorcunov via Tarantool-patches [this message]
2021-10-01 12:30   ` [Tarantool-patches] [PATCH v19 3/3] test: add gh-6036-qsync-order test Serge Petrenko via Tarantool-patches
2021-10-04 21:16     ` Cyrill Gorcunov via Tarantool-patches
2021-10-05 13:55       ` Serge Petrenko via Tarantool-patches
2021-10-05 22:26         ` Cyrill Gorcunov via Tarantool-patches
2021-10-05 22:32           ` Cyrill Gorcunov via Tarantool-patches
2021-10-06  7:06             ` Cyrill Gorcunov via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210930094445.316694-4-gorcunov@gmail.com \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=v.shpilevoy@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git