Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH 0/3 v2] Additional qsync tests
@ 2020-11-12  9:10 sergeyb
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function sergeyb
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: sergeyb @ 2020-11-12  9:10 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko

From: Sergey Bronnikov <sergeyb@tarantool.org>

Changes v2:

- removed negative case in test with random leader: no more stuck tx's in a
  limbo before switching a leader. I spend couple of hours to get it work, but
failed. Some details: to avoid block a loop I should execute tx in a fiber when
quorum is set to 'broken', but both test_run:eval() and insert via netbox
connection in a fiber haven't insert value even when fiber is dead.
- check inserted values in test with random leader as Sergey asked on previous
  review
- removed negative case in test with random sync/async mode, Sergey said it is
  pointless [1]
- added test with change enable/disable sync mode, it has LGTM from Sergey [1]
- reduced a cluster size in test with random leader from 32 to 5 instances as
  Vlad asked on previous review
- aligned tests with 80 symbols per line
- adjusted values of synchro timeouts in box.cfg to make tests more reliable

GH branch: ligurio/gh-4842-qsync-testing
Github issue: https://github.com/tarantool/tarantool/issues/5055
Gitlab CI: https://gitlab.com/tarantool/tarantool/-/pipelines/215091265

1. https://lists.tarantool.org/pipermail/tarantool-patches/2020-September/019227.html

Sergey Bronnikov (3):
  replication: test clear_synchro_queue() function
  replication: add test with random leaders promotion and demotion
  replication: add test with change space sync mode in a loop

 test/replication/qsync.lua                    |  30 ++++
 test/replication/qsync1.lua                   |   1 +
 test/replication/qsync2.lua                   |   1 +
 test/replication/qsync3.lua                   |   1 +
 test/replication/qsync4.lua                   |   1 +
 test/replication/qsync5.lua                   |   1 +
 test/replication/qsync_basic.result           | 136 +++++++++++++++++
 test/replication/qsync_basic.test.lua         |  50 +++++++
 test/replication/qsync_random_leader.result   | 139 ++++++++++++++++++
 test/replication/qsync_random_leader.test.lua |  68 +++++++++
 test/replication/qsync_sync_mode.result       | 114 ++++++++++++++
 test/replication/qsync_sync_mode.test.lua     |  56 +++++++
 12 files changed, 598 insertions(+)
 create mode 100644 test/replication/qsync.lua
 create mode 120000 test/replication/qsync1.lua
 create mode 120000 test/replication/qsync2.lua
 create mode 120000 test/replication/qsync3.lua
 create mode 120000 test/replication/qsync4.lua
 create mode 120000 test/replication/qsync5.lua
 create mode 100644 test/replication/qsync_random_leader.result
 create mode 100644 test/replication/qsync_random_leader.test.lua
 create mode 100644 test/replication/qsync_sync_mode.result
 create mode 100644 test/replication/qsync_sync_mode.test.lua

-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function
  2020-11-12  9:10 [Tarantool-patches] [PATCH 0/3 v2] Additional qsync tests sergeyb
@ 2020-11-12  9:10 ` sergeyb
  2020-11-13 13:12   ` Serge Petrenko
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion sergeyb
  2020-11-12  9:11 ` [Tarantool-patches] [PATCH 3/3 v2] replication: add test with change space sync mode in a loop sergeyb
  2 siblings, 1 reply; 14+ messages in thread
From: sergeyb @ 2020-11-12  9:10 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko

From: Sergey Bronnikov <sergeyb@tarantool.org>

Part of #5055
Part of #4849
---
 test/replication/qsync_basic.result   | 136 ++++++++++++++++++++++++++
 test/replication/qsync_basic.test.lua |  50 ++++++++++
 2 files changed, 186 insertions(+)

diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
index bd3c3cce1..e848e305a 100644
--- a/test/replication/qsync_basic.result
+++ b/test/replication/qsync_basic.result
@@ -32,6 +32,13 @@ s2.is_sync
  | - false
  | ...
 
+--
+-- gh-4849: clear synchro queue with unconfigured box
+--
+box.ctl.clear_synchro_queue()
+ | ---
+ | ...
+
 -- Net.box takes sync into account.
 box.schema.user.grant('guest', 'super')
  | ---
@@ -637,6 +644,135 @@ box.space.sync:count()
  | - 0
  | ...
 
+--
+-- gh-4849: clear synchro queue on a master
+--
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
+ | ---
+ | ...
+ok, err = nil
+ | ---
+ | ...
+f = fiber.create(function()							\
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {10})		\
+end)
+ | ---
+ | ...
+f:status()
+ | ---
+ | - suspended
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.sync:get{10} ~= nil end)
+ | ---
+ | - true
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_timeout = 0.1}
+ | ---
+ | ...
+box.ctl.clear_synchro_queue()
+ | ---
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
+ | ---
+ | - true
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ | ---
+ | - true
+ | ...
+ok, err
+ | ---
+ | - false
+ | - Quorum collection for a synchronous transaction is timed out
+ | ...
+test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
+ | ---
+ | - true
+ | ...
+
+--
+-- gh-4849: clear synchro queue on a replica, make sure no crashes
+--
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
+ | ---
+ | ...
+ok, err = nil
+ | ---
+ | ...
+f = fiber.create(function()                                                     \
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {9})                 \
+end)
+ | ---
+ | ...
+f.status()
+ | ---
+ | - running
+ | ...
+test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
+ | ---
+ | - true
+ | ...
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout=0.01}
+ | ---
+ | ...
+box.ctl.clear_synchro_queue()
+ | ---
+ | ...
+test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
+ | ---
+ | - true
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.cfg{replication_synchro_timeout=0.01}
+ | ---
+ | ...
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ | ---
+ | - true
+ | ...
+ok, err
+ | ---
+ | - false
+ | - Quorum collection for a synchronous transaction is timed out
+ | ...
+test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
+ | ---
+ | - true
+ | ...
+
+-- Note: cluster may be in a broken state here due to nature of previous test.
+
 -- Cleanup.
 test_run:cmd('switch default')
  | ---
diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
index 94235547d..523dfa779 100644
--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -13,6 +13,11 @@ s1:select{}
 s2 = box.schema.create_space('test2')
 s2.is_sync
 
+--
+-- gh-4849: clear synchro queue with unconfigured box
+--
+box.ctl.clear_synchro_queue()
+
 -- Net.box takes sync into account.
 box.schema.user.grant('guest', 'super')
 netbox = require('net.box')
@@ -248,6 +253,51 @@ for i = 1, 100 do box.space.sync:delete{i} end
 test_run:cmd('switch replica')
 box.space.sync:count()
 
+--
+-- gh-4849: clear synchro queue on a master
+--
+test_run:switch('default')
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
+ok, err = nil
+f = fiber.create(function()							\
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {10})		\
+end)
+f:status()
+test_run:switch('replica')
+test_run:wait_cond(function() return box.space.sync:get{10} ~= nil end)
+test_run:switch('default')
+box.cfg{replication_synchro_timeout = 0.1}
+box.ctl.clear_synchro_queue()
+test_run:switch('replica')
+test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
+test_run:switch('default')
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ok, err
+test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
+
+--
+-- gh-4849: clear synchro queue on a replica, make sure no crashes
+--
+test_run:switch('default')
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
+ok, err = nil
+f = fiber.create(function()                                                     \
+    ok, err = pcall(box.space.sync.insert, box.space.sync, {9})                 \
+end)
+f.status()
+test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
+test_run:switch('replica')
+box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout=0.01}
+box.ctl.clear_synchro_queue()
+test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
+test_run:switch('default')
+box.cfg{replication_synchro_timeout=0.01}
+test_run:wait_cond(function() return f:status() == 'dead' end)
+ok, err
+test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
+
+-- Note: cluster may be in a broken state here due to nature of previous test.
+
 -- Cleanup.
 test_run:cmd('switch default')
 
-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-12  9:10 [Tarantool-patches] [PATCH 0/3 v2] Additional qsync tests sergeyb
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function sergeyb
@ 2020-11-12  9:10 ` sergeyb
  2020-11-13 15:10   ` Serge Petrenko
  2020-11-12  9:11 ` [Tarantool-patches] [PATCH 3/3 v2] replication: add test with change space sync mode in a loop sergeyb
  2 siblings, 1 reply; 14+ messages in thread
From: sergeyb @ 2020-11-12  9:10 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko

From: Sergey Bronnikov <sergeyb@tarantool.org>

Part of #5055
Part of #5144
---
 test/replication/qsync.lua                    |  30 ++++
 test/replication/qsync1.lua                   |   1 +
 test/replication/qsync2.lua                   |   1 +
 test/replication/qsync3.lua                   |   1 +
 test/replication/qsync4.lua                   |   1 +
 test/replication/qsync5.lua                   |   1 +
 test/replication/qsync_random_leader.result   | 139 ++++++++++++++++++
 test/replication/qsync_random_leader.test.lua |  68 +++++++++
 8 files changed, 242 insertions(+)
 create mode 100644 test/replication/qsync.lua
 create mode 120000 test/replication/qsync1.lua
 create mode 120000 test/replication/qsync2.lua
 create mode 120000 test/replication/qsync3.lua
 create mode 120000 test/replication/qsync4.lua
 create mode 120000 test/replication/qsync5.lua
 create mode 100644 test/replication/qsync_random_leader.result
 create mode 100644 test/replication/qsync_random_leader.test.lua

diff --git a/test/replication/qsync.lua b/test/replication/qsync.lua
new file mode 100644
index 000000000..b15cc18c9
--- /dev/null
+++ b/test/replication/qsync.lua
@@ -0,0 +1,30 @@
+#!/usr/bin/env tarantool
+
+-- get instance name from filename (qsync1.lua => qsync1)
+local INSTANCE_ID = string.match(arg[0], "%d")
+
+local SOCKET_DIR = require('fio').cwd()
+
+local function instance_uri(instance_id)
+    return SOCKET_DIR..'/qsync'..instance_id..'.sock';
+end
+
+-- start console first
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+    listen = instance_uri(INSTANCE_ID);
+    replication = {
+        instance_uri(1);
+        instance_uri(2);
+        instance_uri(3);
+        instance_uri(4);
+        instance_uri(5);
+    };
+})
+
+box.once("bootstrap", function()
+    box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 5}
+    box.cfg{read_only = false}
+    box.schema.user.grant("guest", 'replication')
+end)
diff --git a/test/replication/qsync1.lua b/test/replication/qsync1.lua
new file mode 120000
index 000000000..df9f3a883
--- /dev/null
+++ b/test/replication/qsync1.lua
@@ -0,0 +1 @@
+qsync.lua
\ No newline at end of file
diff --git a/test/replication/qsync2.lua b/test/replication/qsync2.lua
new file mode 120000
index 000000000..df9f3a883
--- /dev/null
+++ b/test/replication/qsync2.lua
@@ -0,0 +1 @@
+qsync.lua
\ No newline at end of file
diff --git a/test/replication/qsync3.lua b/test/replication/qsync3.lua
new file mode 120000
index 000000000..df9f3a883
--- /dev/null
+++ b/test/replication/qsync3.lua
@@ -0,0 +1 @@
+qsync.lua
\ No newline at end of file
diff --git a/test/replication/qsync4.lua b/test/replication/qsync4.lua
new file mode 120000
index 000000000..df9f3a883
--- /dev/null
+++ b/test/replication/qsync4.lua
@@ -0,0 +1 @@
+qsync.lua
\ No newline at end of file
diff --git a/test/replication/qsync5.lua b/test/replication/qsync5.lua
new file mode 120000
index 000000000..df9f3a883
--- /dev/null
+++ b/test/replication/qsync5.lua
@@ -0,0 +1 @@
+qsync.lua
\ No newline at end of file
diff --git a/test/replication/qsync_random_leader.result b/test/replication/qsync_random_leader.result
new file mode 100644
index 000000000..2a01ee822
--- /dev/null
+++ b/test/replication/qsync_random_leader.result
@@ -0,0 +1,139 @@
+-- test-run result file version 2
+os = require('os')
+ | ---
+ | ...
+env = require('test_run')
+ | ---
+ | ...
+math = require('math')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+log = require('log')
+ | ---
+ | ...
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+NUM_INSTANCES = 5
+ | ---
+ | ...
+SERVERS = {}
+ | ---
+ | ...
+for i=1,NUM_INSTANCES do                                                       \
+    SERVERS[i] = 'qsync' .. i                                                  \
+end;
+ | ---
+ | ...
+SERVERS -- print instance names
+ | ---
+ | - - qsync1
+ |   - qsync2
+ |   - qsync3
+ |   - qsync4
+ |   - qsync5
+ | ...
+
+math.randomseed(os.time())
+ | ---
+ | ...
+random = function(excluded_num, total)                                         \
+    local r = math.random(1, total)                                            \
+    if (r == excluded_num) then                                                \
+        return random(excluded_num, total)                                     \
+    end                                                                        \
+    return r                                                                   \
+end
+ | ---
+ | ...
+
+-- Write value on current leader.
+-- Pick a random replica in a cluster.
+-- Promote replica to leader.
+-- Make sure value is there.
+
+-- Testcase setup.
+test_run:create_cluster(SERVERS)
+ | ---
+ | ...
+test_run:wait_fullmesh(SERVERS)
+ | ---
+ | ...
+test_run:switch('qsync1')
+ | ---
+ | - true
+ | ...
+_ = box.schema.space.create('sync', {is_sync=true, engine = test_run:get_cfg('engine')})
+ | ---
+ | ...
+_ = box.space.sync:create_index('primary')
+ | ---
+ | ...
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+current_leader_id = 1
+ | ---
+ | ...
+test_run:eval(SERVERS[current_leader_id], "box.ctl.clear_synchro_queue()")
+ | ---
+ | - []
+ | ...
+
+-- Testcase body.
+for i=1,300 do                                                                 \
+    test_run:eval(SERVERS[current_leader_id],                                  \
+        string.format("box.space.sync:insert{%d}", i))                         \
+    new_leader_id = random(current_leader_id, #SERVERS)                        \
+    log.info(string.format("current leader id %d, new leader id %d",           \
+                           current_leader_id, new_leader_id))                  \
+    test_run:eval(SERVERS[new_leader_id], "box.ctl.clear_synchro_queue()")     \
+    replica = random(new_leader_id, #SERVERS)                                  \
+    test_run:wait_cond(function() return test_run:eval(SERVERS[replica],       \
+                       string.format("box.space.sync:get{%d}", i)) ~= nil end) \
+    test_run:wait_cond(function() return test_run:eval(SERVERS[current_leader_id], \
+                       string.format("box.space.sync:get{%d}", i)) ~= nil end) \
+    current_leader_id = new_leader_id                                          \
+end
+ | ---
+ | ...
+
+test_run:switch('qsync1')
+ | ---
+ | - true
+ | ...
+box.space.sync:count() -- 300
+ | ---
+ | - 300
+ | ...
+
+-- Teardown.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:eval(SERVERS[current_leader_id], 'box.space.sync:drop()')
+ | ---
+ | - []
+ | ...
+test_run:drop_cluster(SERVERS)
+ | ---
+ | ...
+box.cfg{                                                                       \
+    replication_synchro_quorum = orig_synchro_quorum,                          \
+    replication_synchro_timeout = orig_synchro_timeout,                        \
+}
+ | ---
+ | ...
diff --git a/test/replication/qsync_random_leader.test.lua b/test/replication/qsync_random_leader.test.lua
new file mode 100644
index 000000000..5b5992f8f
--- /dev/null
+++ b/test/replication/qsync_random_leader.test.lua
@@ -0,0 +1,68 @@
+os = require('os')
+env = require('test_run')
+math = require('math')
+fiber = require('fiber')
+test_run = env.new()
+log = require('log')
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+
+NUM_INSTANCES = 5
+SERVERS = {}
+for i=1,NUM_INSTANCES do                                                       \
+    SERVERS[i] = 'qsync' .. i                                                  \
+end;
+SERVERS -- print instance names
+
+math.randomseed(os.time())
+random = function(excluded_num, total)                                         \
+    local r = math.random(1, total)                                            \
+    if (r == excluded_num) then                                                \
+        return random(excluded_num, total)                                     \
+    end                                                                        \
+    return r                                                                   \
+end
+
+-- Write value on current leader.
+-- Pick a random replica in a cluster.
+-- Promote replica to leader.
+-- Make sure value is there.
+
+-- Testcase setup.
+test_run:create_cluster(SERVERS)
+test_run:wait_fullmesh(SERVERS)
+test_run:switch('qsync1')
+_ = box.schema.space.create('sync', {is_sync=true, engine = test_run:get_cfg('engine')})
+_ = box.space.sync:create_index('primary')
+test_run:switch('default')
+current_leader_id = 1
+test_run:eval(SERVERS[current_leader_id], "box.ctl.clear_synchro_queue()")
+
+-- Testcase body.
+for i=1,300 do                                                                 \
+    test_run:eval(SERVERS[current_leader_id],                                  \
+        string.format("box.space.sync:insert{%d}", i))                         \
+    new_leader_id = random(current_leader_id, #SERVERS)                        \
+    log.info(string.format("current leader id %d, new leader id %d",           \
+                           current_leader_id, new_leader_id))                  \
+    test_run:eval(SERVERS[new_leader_id], "box.ctl.clear_synchro_queue()")     \
+    replica = random(new_leader_id, #SERVERS)                                  \
+    test_run:wait_cond(function() return test_run:eval(SERVERS[replica],       \
+                       string.format("box.space.sync:get{%d}", i)) ~= nil end) \
+    test_run:wait_cond(function() return test_run:eval(SERVERS[current_leader_id], \
+                       string.format("box.space.sync:get{%d}", i)) ~= nil end) \
+    current_leader_id = new_leader_id                                          \
+end
+
+test_run:switch('qsync1')
+box.space.sync:count() -- 300
+
+-- Teardown.
+test_run:switch('default')
+test_run:eval(SERVERS[current_leader_id], 'box.space.sync:drop()')
+test_run:drop_cluster(SERVERS)
+box.cfg{                                                                       \
+    replication_synchro_quorum = orig_synchro_quorum,                          \
+    replication_synchro_timeout = orig_synchro_timeout,                        \
+}
-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Tarantool-patches] [PATCH 3/3 v2] replication: add test with change space sync mode in a loop
  2020-11-12  9:10 [Tarantool-patches] [PATCH 0/3 v2] Additional qsync tests sergeyb
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function sergeyb
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion sergeyb
@ 2020-11-12  9:11 ` sergeyb
  2020-11-13 14:05   ` Serge Petrenko
  2 siblings, 1 reply; 14+ messages in thread
From: sergeyb @ 2020-11-12  9:11 UTC (permalink / raw)
  To: tarantool-patches, v.shpilevoy, sergepetrenko

From: Sergey Bronnikov <sergeyb@tarantool.org>

Closes #5055
Part of #5144
---
 test/replication/qsync_sync_mode.result   | 114 ++++++++++++++++++++++
 test/replication/qsync_sync_mode.test.lua |  56 +++++++++++
 2 files changed, 170 insertions(+)
 create mode 100644 test/replication/qsync_sync_mode.result
 create mode 100644 test/replication/qsync_sync_mode.test.lua

diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
new file mode 100644
index 000000000..66f6a70b2
--- /dev/null
+++ b/test/replication/qsync_sync_mode.result
@@ -0,0 +1,114 @@
+-- test-run result file version 2
+env = require('test_run')
+ | ---
+ | ...
+test_run = env.new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+fiber = require('fiber')
+ | ---
+ | ...
+math = require('math')
+ | ---
+ | ...
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+math.randomseed(os.time())
+ | ---
+ | ...
+random_boolean = function()                                                    \
+    if (math.random(1, 10) > 5) then                                           \
+        return true                                                            \
+    else                                                                       \
+        return false                                                           \
+    end                                                                        \
+end
+ | ---
+ | ...
+
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+-- Setup an cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+-- Write data to a leader and enable/disable sync mode in a loop.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=1000}
+ | ---
+ | ...
+
+-- Testcase body.
+for i = 1,100 do                                                               \
+    box.space.sync:alter{is_sync=random_boolean()}                             \
+    box.space.sync:insert{i}                                                   \
+    test_run:switch('replica')                                                 \
+    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
+    test_run:switch('default')                                                 \
+    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
+end
+ | ---
+ | ...
+box.space.sync:count() -- 100
+ | ---
+ | - 100
+ | ...
+
+-- Testcase cleanup.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+
+-- Teardown.
+test_run:cmd('switch default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+test_run:cleanup_cluster()
+ | ---
+ | ...
+box.schema.user.revoke('guest', 'replication')
+ | ---
+ | ...
+box.cfg{                                                                       \
+    replication_synchro_quorum = orig_synchro_quorum,                          \
+    replication_synchro_timeout = orig_synchro_timeout,                        \
+}
+ | ---
+ | ...
diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
new file mode 100644
index 000000000..ea5725169
--- /dev/null
+++ b/test/replication/qsync_sync_mode.test.lua
@@ -0,0 +1,56 @@
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+fiber = require('fiber')
+math = require('math')
+
+orig_synchro_quorum = box.cfg.replication_synchro_quorum
+orig_synchro_timeout = box.cfg.replication_synchro_timeout
+
+math.randomseed(os.time())
+random_boolean = function()                                                    \
+    if (math.random(1, 10) > 5) then                                           \
+        return true                                                            \
+    else                                                                       \
+        return false                                                           \
+    end                                                                        \
+end
+
+box.schema.user.grant('guest', 'replication')
+
+-- Setup an cluster with two instances.
+test_run:cmd('create server replica with rpl_master=default,\
+                                         script="replication/replica.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+-- Write data to a leader and enable/disable sync mode in a loop.
+-- Testcase setup.
+_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
+_ = box.space.sync:create_index('pk')
+box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=1000}
+
+-- Testcase body.
+for i = 1,100 do                                                               \
+    box.space.sync:alter{is_sync=random_boolean()}                             \
+    box.space.sync:insert{i}                                                   \
+    test_run:switch('replica')                                                 \
+    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
+    test_run:switch('default')                                                 \
+    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
+end
+box.space.sync:count() -- 100
+
+-- Testcase cleanup.
+test_run:switch('default')
+box.space.sync:drop()
+
+-- Teardown.
+test_run:cmd('switch default')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server replica')
+test_run:cleanup_cluster()
+box.schema.user.revoke('guest', 'replication')
+box.cfg{                                                                       \
+    replication_synchro_quorum = orig_synchro_quorum,                          \
+    replication_synchro_timeout = orig_synchro_timeout,                        \
+}
-- 
2.25.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function sergeyb
@ 2020-11-13 13:12   ` Serge Petrenko
  2020-11-16 11:11     ` Sergey Bronnikov
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-11-13 13:12 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, v.shpilevoy


12.11.2020 12:10, sergeyb@tarantool.org пишет:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
>
> Part of #5055
> Part of #4849

Hi! Thanks for the patch!

Please see 2 comments below.

> ---
>   test/replication/qsync_basic.result   | 136 ++++++++++++++++++++++++++
>   test/replication/qsync_basic.test.lua |  50 ++++++++++
>   2 files changed, 186 insertions(+)
>
> diff --git a/test/replication/qsync_basic.result b/test/replication/qsync_basic.result
> index bd3c3cce1..e848e305a 100644
> --- a/test/replication/qsync_basic.result
> +++ b/test/replication/qsync_basic.result
> @@ -32,6 +32,13 @@ s2.is_sync
>    | - false
>    | ...
>   
> +--
> +-- gh-4849: clear synchro queue with unconfigured box
> +--

What do you mean by `unconfigured box`?

box.cfg is called when the default instance (master.lua) is started, before
the first test line.

If you really  want to test `box.ctl.clear_synchro_queue` before box.cfg,
you need to do it in `app-tap/cfg.test.lua`. It has all the tests for 
`box.ctl` options.

> +box.ctl.clear_synchro_queue()
> + | ---
> + | ...
> +
>   -- Net.box takes sync into account.
>   box.schema.user.grant('guest', 'super')
>    | ---
> @@ -637,6 +644,135 @@ box.space.sync:count()
>    | - 0
>    | ...
>   
> +--
> +-- gh-4849: clear synchro queue on a master
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
> + | ---
> + | ...
> +ok, err = nil
> + | ---
> + | ...
> +f = fiber.create(function()							\
> +    ok, err = pcall(box.space.sync.insert, box.space.sync, {10})		\
> +end)
> + | ---
> + | ...
> +f:status()
> + | ---
> + | - suspended
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +test_run:wait_cond(function() return box.space.sync:get{10} ~= nil end)
> + | ---
> + | - true
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_timeout = 0.1}
> + | ---
> + | ...
> +box.ctl.clear_synchro_queue()
> + | ---
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
> + | ---
> + | - true
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:wait_cond(function() return f:status() == 'dead' end)
> + | ---
> + | - true
> + | ...
> +ok, err
> + | ---
> + | - false
> + | - Quorum collection for a synchronous transaction is timed out
> + | ...
> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- gh-4849: clear synchro queue on a replica, make sure no crashes
> +--
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
> + | ---
> + | ...
> +ok, err = nil
> + | ---
> + | ...
> +f = fiber.create(function()                                                     \
> +    ok, err = pcall(box.space.sync.insert, box.space.sync, {9})                 \
> +end)
> + | ---
> + | ...
> +f.status()
> + | ---
> + | - running
> + | ...


typo: should be `f:status()`


> +test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
> + | ---
> + | - true
> + | ...
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout=0.01}
> + | ---
> + | ...
> +box.ctl.clear_synchro_queue()
> + | ---
> + | ...
> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
> + | ---
> + | - true
> + | ...
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.cfg{replication_synchro_timeout=0.01}
> + | ---
> + | ...
> +test_run:wait_cond(function() return f:status() == 'dead' end)
> + | ---
> + | - true
> + | ...
> +ok, err
> + | ---
> + | - false
> + | - Quorum collection for a synchronous transaction is timed out
> + | ...
> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
> + | ---
> + | - true
> + | ...
> +
> +-- Note: cluster may be in a broken state here due to nature of previous test.
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>    | ---
> diff --git a/test/replication/qsync_basic.test.lua b/test/replication/qsync_basic.test.lua
> index 94235547d..523dfa779 100644
> --- a/test/replication/qsync_basic.test.lua
> +++ b/test/replication/qsync_basic.test.lua
> @@ -13,6 +13,11 @@ s1:select{}
>   s2 = box.schema.create_space('test2')
>   s2.is_sync
>   
> +--
> +-- gh-4849: clear synchro queue with unconfigured box
> +--
> +box.ctl.clear_synchro_queue()
> +
>   -- Net.box takes sync into account.
>   box.schema.user.grant('guest', 'super')
>   netbox = require('net.box')
> @@ -248,6 +253,51 @@ for i = 1, 100 do box.space.sync:delete{i} end
>   test_run:cmd('switch replica')
>   box.space.sync:count()
>   
> +--
> +-- gh-4849: clear synchro queue on a master
> +--
> +test_run:switch('default')
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
> +ok, err = nil
> +f = fiber.create(function()							\
> +    ok, err = pcall(box.space.sync.insert, box.space.sync, {10})		\
> +end)
> +f:status()
> +test_run:switch('replica')
> +test_run:wait_cond(function() return box.space.sync:get{10} ~= nil end)
> +test_run:switch('default')
> +box.cfg{replication_synchro_timeout = 0.1}
> +box.ctl.clear_synchro_queue()
> +test_run:switch('replica')
> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
> +test_run:switch('default')
> +test_run:wait_cond(function() return f:status() == 'dead' end)
> +ok, err
> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
> +
> +--
> +-- gh-4849: clear synchro queue on a replica, make sure no crashes
> +--
> +test_run:switch('default')
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout = 1000}
> +ok, err = nil
> +f = fiber.create(function()                                                     \
> +    ok, err = pcall(box.space.sync.insert, box.space.sync, {9})                 \
> +end)
> +f.status()
> +test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
> +test_run:switch('replica')
> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout=0.01}
> +box.ctl.clear_synchro_queue()
> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
> +test_run:switch('default')
> +box.cfg{replication_synchro_timeout=0.01}
> +test_run:wait_cond(function() return f:status() == 'dead' end)
> +ok, err
> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
> +
> +-- Note: cluster may be in a broken state here due to nature of previous test.
> +
>   -- Cleanup.
>   test_run:cmd('switch default')
>   

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/3 v2] replication: add test with change space sync mode in a loop
  2020-11-12  9:11 ` [Tarantool-patches] [PATCH 3/3 v2] replication: add test with change space sync mode in a loop sergeyb
@ 2020-11-13 14:05   ` Serge Petrenko
  0 siblings, 0 replies; 14+ messages in thread
From: Serge Petrenko @ 2020-11-13 14:05 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, v.shpilevoy


12.11.2020 12:11, sergeyb@tarantool.org пишет:
> From: Sergey Bronnikov <sergeyb@tarantool.org>
>
> Closes #5055
> Part of #5144
> ---


Thanks for the patch!

LGTM.

>   test/replication/qsync_sync_mode.result   | 114 ++++++++++++++++++++++
>   test/replication/qsync_sync_mode.test.lua |  56 +++++++++++
>   2 files changed, 170 insertions(+)
>   create mode 100644 test/replication/qsync_sync_mode.result
>   create mode 100644 test/replication/qsync_sync_mode.test.lua
>
> diff --git a/test/replication/qsync_sync_mode.result b/test/replication/qsync_sync_mode.result
> new file mode 100644
> index 000000000..66f6a70b2
> --- /dev/null
> +++ b/test/replication/qsync_sync_mode.result
> @@ -0,0 +1,114 @@
> +-- test-run result file version 2
> +env = require('test_run')
> + | ---
> + | ...
> +test_run = env.new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +fiber = require('fiber')
> + | ---
> + | ...
> +math = require('math')
> + | ---
> + | ...
> +
> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> + | ---
> + | ...
> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> + | ---
> + | ...
> +
> +math.randomseed(os.time())
> + | ---
> + | ...
> +random_boolean = function()                                                    \
> +    if (math.random(1, 10) > 5) then                                           \
> +        return true                                                            \
> +    else                                                                       \
> +        return false                                                           \
> +    end                                                                        \
> +end
> + | ---
> + | ...
> +
> +box.schema.user.grant('guest', 'replication')
> + | ---
> + | ...
> +
> +-- Setup an cluster with two instances.
> +test_run:cmd('create server replica with rpl_master=default,\
> +                                         script="replication/replica.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +-- Write data to a leader and enable/disable sync mode in a loop.
> +-- Testcase setup.
> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=1000}
> + | ---
> + | ...
> +
> +-- Testcase body.
> +for i = 1,100 do                                                               \
> +    box.space.sync:alter{is_sync=random_boolean()}                             \
> +    box.space.sync:insert{i}                                                   \
> +    test_run:switch('replica')                                                 \
> +    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
> +    test_run:switch('default')                                                 \
> +    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
> +end
> + | ---
> + | ...
> +box.space.sync:count() -- 100
> + | ---
> + | - 100
> + | ...
> +
> +-- Testcase cleanup.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +box.space.sync:drop()
> + | ---
> + | ...
> +
> +-- Teardown.
> +test_run:cmd('switch default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('delete server replica')
> + | ---
> + | - true
> + | ...
> +test_run:cleanup_cluster()
> + | ---
> + | ...
> +box.schema.user.revoke('guest', 'replication')
> + | ---
> + | ...
> +box.cfg{                                                                       \
> +    replication_synchro_quorum = orig_synchro_quorum,                          \
> +    replication_synchro_timeout = orig_synchro_timeout,                        \
> +}
> + | ---
> + | ...
> diff --git a/test/replication/qsync_sync_mode.test.lua b/test/replication/qsync_sync_mode.test.lua
> new file mode 100644
> index 000000000..ea5725169
> --- /dev/null
> +++ b/test/replication/qsync_sync_mode.test.lua
> @@ -0,0 +1,56 @@
> +env = require('test_run')
> +test_run = env.new()
> +engine = test_run:get_cfg('engine')
> +fiber = require('fiber')
> +math = require('math')
> +
> +orig_synchro_quorum = box.cfg.replication_synchro_quorum
> +orig_synchro_timeout = box.cfg.replication_synchro_timeout
> +
> +math.randomseed(os.time())
> +random_boolean = function()                                                    \
> +    if (math.random(1, 10) > 5) then                                           \
> +        return true                                                            \
> +    else                                                                       \
> +        return false                                                           \
> +    end                                                                        \
> +end
> +
> +box.schema.user.grant('guest', 'replication')
> +
> +-- Setup an cluster with two instances.
> +test_run:cmd('create server replica with rpl_master=default,\
> +                                         script="replication/replica.lua"')
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> +
> +-- Write data to a leader and enable/disable sync mode in a loop.
> +-- Testcase setup.
> +_ = box.schema.space.create('sync', {is_sync=true, engine=engine})
> +_ = box.space.sync:create_index('pk')
> +box.cfg{replication_synchro_quorum=2, replication_synchro_timeout=1000}
> +
> +-- Testcase body.
> +for i = 1,100 do                                                               \
> +    box.space.sync:alter{is_sync=random_boolean()}                             \
> +    box.space.sync:insert{i}                                                   \
> +    test_run:switch('replica')                                                 \
> +    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
> +    test_run:switch('default')                                                 \
> +    test_run:wait_cond(function() return box.space.sync:get{i} ~= nil end)     \
> +end
> +box.space.sync:count() -- 100
> +
> +-- Testcase cleanup.
> +test_run:switch('default')
> +box.space.sync:drop()
> +
> +-- Teardown.
> +test_run:cmd('switch default')
> +test_run:cmd('stop server replica')
> +test_run:cmd('delete server replica')
> +test_run:cleanup_cluster()
> +box.schema.user.revoke('guest', 'replication')
> +box.cfg{                                                                       \
> +    replication_synchro_quorum = orig_synchro_quorum,                          \
> +    replication_synchro_timeout = orig_synchro_timeout,                        \
> +}

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-12  9:10 ` [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion sergeyb
@ 2020-11-13 15:10   ` Serge Petrenko
  2020-11-16  9:10     ` Sergey Bronnikov
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-11-13 15:10 UTC (permalink / raw)
  To: sergeyb, tarantool-patches, v.shpilevoy


12.11.2020 12:10, sergeyb@tarantool.org пишет:
> From: Sergey Bronnikov<sergeyb@tarantool.org>
>
> Part of #5055
> Part of #5144


Hi! Thanks for the patch!


> ---
>   test/replication/qsync.lua                    |  30 ++++
>   test/replication/qsync1.lua                   |   1 +
>   test/replication/qsync2.lua                   |   1 +
>   test/replication/qsync3.lua                   |   1 +
>   test/replication/qsync4.lua                   |   1 +
>   test/replication/qsync5.lua                   |   1 +
>   test/replication/qsync_random_leader.result   | 139 ++++++++++++++++++
>   test/replication/qsync_random_leader.test.lua |  68 +++++++++
>   8 files changed, 242 insertions(+)
>   create mode 100644 test/replication/qsync.lua
>   create mode 120000 test/replication/qsync1.lua
>   create mode 120000 test/replication/qsync2.lua
>   create mode 120000 test/replication/qsync3.lua
>   create mode 120000 test/replication/qsync4.lua
>   create mode 120000 test/replication/qsync5.lua
>   create mode 100644 test/replication/qsync_random_leader.result
>   create mode 100644 test/replication/qsync_random_leader.test.lua
>
> diff --git a/test/replication/qsync.lua b/test/replication/qsync.lua
> new file mode 100644
> index 000000000..b15cc18c9
> --- /dev/null
> +++ b/test/replication/qsync.lua
> @@ -0,0 +1,30 @@
> +#!/usr/bin/env tarantool
> +
> +-- get instance name from filename (qsync1.lua => qsync1)
> +local INSTANCE_ID = string.match(arg[0], "%d")
> +
> +local SOCKET_DIR = require('fio').cwd()
> +
> +local function instance_uri(instance_id)
> +    return SOCKET_DIR..'/qsync'..instance_id..'.sock';
> +end
> +
> +-- start console first
> +require('console').listen(os.getenv('ADMIN'))
> +
> +box.cfg({
> +    listen = instance_uri(INSTANCE_ID);
> +    replication = {
> +        instance_uri(1);
> +        instance_uri(2);
> +        instance_uri(3);
> +        instance_uri(4);
> +        instance_uri(5);
> +    };
> +})
> +
> +box.once("bootstrap", function()
> +    box.cfg{replication_synchro_timeout = 1000, replication_synchro_quorum = 5}
> +    box.cfg{read_only = false}

You may move these box.cfg calls to the initial box.cfg above.


> +    box.schema.user.grant("guest", 'replication')
> +end)
> diff --git a/test/replication/qsync1.lua b/test/replication/qsync1.lua
> new file mode 120000
> index 000000000..df9f3a883
> --- /dev/null
> +++ b/test/replication/qsync1.lua
> @@ -0,0 +1 @@
> +qsync.lua
> \ No newline at end of file
> diff --git a/test/replication/qsync2.lua b/test/replication/qsync2.lua
> new file mode 120000
> index 000000000..df9f3a883
> --- /dev/null
> +++ b/test/replication/qsync2.lua
> @@ -0,0 +1 @@
> +qsync.lua
> \ No newline at end of file


> +
> +-- Testcase body.
> +for i=1,300 do                                                                 \
> +    test_run:eval(SERVERS[current_leader_id],                                  \
> +        string.format("box.space.sync:insert{%d}", i))                         \
> +    new_leader_id = random(current_leader_id, #SERVERS)                        \
> +    log.info(string.format("current leader id %d, new leader id %d",           \
> +                           current_leader_id, new_leader_id))                  \
> +    test_run:eval(SERVERS[new_leader_id], "box.ctl.clear_synchro_queue()")     \
> +    replica = random(new_leader_id, #SERVERS)                                  \
> +    test_run:wait_cond(function() return test_run:eval(SERVERS[replica],       \
> +                       string.format("box.space.sync:get{%d}", i)) ~= nil end) \
> +    test_run:wait_cond(function() return test_run:eval(SERVERS[current_leader_id], \
> +                       string.format("box.space.sync:get{%d}", i)) ~= nil end) \
> +    current_leader_id = new_leader_id                                          \
> +end


Discussed verbally. Please update the testcase, so that insertions
are done with too high quorum.


> +
> +test_run:switch('qsync1')
> +box.space.sync:count() -- 300
> +
> +-- Teardown.
> +test_run:switch('default')
> +test_run:eval(SERVERS[current_leader_id], 'box.space.sync:drop()')
> +test_run:drop_cluster(SERVERS)
> +box.cfg{                                                                       \
> +    replication_synchro_quorum = orig_synchro_quorum,                          \
> +    replication_synchro_timeout = orig_synchro_timeout,                        \
> +}

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-13 15:10   ` Serge Petrenko
@ 2020-11-16  9:10     ` Sergey Bronnikov
  2020-11-16 14:00       ` Sergey Bronnikov
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Bronnikov @ 2020-11-16  9:10 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches, v.shpilevoy

Hi, Serge!

thanks for review. I've updated patch, see diff's below.


On 13.11.2020 18:10, Serge Petrenko wrote:
>
> 12.11.2020 12:10, sergeyb@tarantool.org пишет:
>> From: Sergey Bronnikov<sergeyb@tarantool.org>
>>
>> Part of #5055
>> Part of #5144
>
>
> Hi! Thanks for the patch!
>
>
>> ---
>>   test/replication/qsync.lua                    |  30 ++++
>>   test/replication/qsync1.lua                   |   1 +
>>   test/replication/qsync2.lua                   |   1 +
>>   test/replication/qsync3.lua                   |   1 +
>>   test/replication/qsync4.lua                   |   1 +
>>   test/replication/qsync5.lua                   |   1 +
>>   test/replication/qsync_random_leader.result   | 139 ++++++++++++++++++
>>   test/replication/qsync_random_leader.test.lua |  68 +++++++++
>>   8 files changed, 242 insertions(+)
>>   create mode 100644 test/replication/qsync.lua
>>   create mode 120000 test/replication/qsync1.lua
>>   create mode 120000 test/replication/qsync2.lua
>>   create mode 120000 test/replication/qsync3.lua
>>   create mode 120000 test/replication/qsync4.lua
>>   create mode 120000 test/replication/qsync5.lua
>>   create mode 100644 test/replication/qsync_random_leader.result
>>   create mode 100644 test/replication/qsync_random_leader.test.lua
>>
>> diff --git a/test/replication/qsync.lua b/test/replication/qsync.lua
>> new file mode 100644
>> index 000000000..b15cc18c9
>> --- /dev/null
>> +++ b/test/replication/qsync.lua
>> @@ -0,0 +1,30 @@
>> +#!/usr/bin/env tarantool
>> +
>> +-- get instance name from filename (qsync1.lua => qsync1)
>> +local INSTANCE_ID = string.match(arg[0], "%d")
>> +
>> +local SOCKET_DIR = require('fio').cwd()
>> +
>> +local function instance_uri(instance_id)
>> +    return SOCKET_DIR..'/qsync'..instance_id..'.sock';
>> +end
>> +
>> +-- start console first
>> +require('console').listen(os.getenv('ADMIN'))
>> +
>> +box.cfg({
>> +    listen = instance_uri(INSTANCE_ID);
>> +    replication = {
>> +        instance_uri(1);
>> +        instance_uri(2);
>> +        instance_uri(3);
>> +        instance_uri(4);
>> +        instance_uri(5);
>> +    };
>> +})
>> +
>> +box.once("bootstrap", function()
>> +    box.cfg{replication_synchro_timeout = 1000, 
>> replication_synchro_quorum = 5}
>> +    box.cfg{read_only = false}
>
> You may move these box.cfg calls to the initial box.cfg above.


Agree. Fixed it with patch below:

--- a/test/replication/qsync.lua
+++ b/test/replication/qsync.lua
@@ -21,10 +21,11 @@ box.cfg({
          instance_uri(4);
          instance_uri(5);
      };
+    replication_synchro_timeout = 1000;
+    replication_synchro_quorum = 5;
+    read_only = false;
  })

  box.once("bootstrap", function()
-    box.cfg{replication_synchro_timeout = 1000, 
replication_synchro_quorum = 5}
-    box.cfg{read_only = false}
      box.schema.user.grant("guest", 'replication')
  end)

>
>> +    box.schema.user.grant("guest", 'replication')
>> +end)
>> diff --git a/test/replication/qsync1.lua b/test/replication/qsync1.lua
>> new file mode 120000
>> index 000000000..df9f3a883
>> --- /dev/null
>> +++ b/test/replication/qsync1.lua
>> @@ -0,0 +1 @@
>> +qsync.lua
>> \ No newline at end of file
>> diff --git a/test/replication/qsync2.lua b/test/replication/qsync2.lua
>> new file mode 120000
>> index 000000000..df9f3a883
>> --- /dev/null
>> +++ b/test/replication/qsync2.lua
>> @@ -0,0 +1 @@
>> +qsync.lua
>> \ No newline at end of file
>
>
>> +
>> +-- Testcase body.
>> +for i=1,300 do \
>> + test_run:eval(SERVERS[current_leader_id], \
>> +        string.format("box.space.sync:insert{%d}", 
>> i))                         \
>> +    new_leader_id = random(current_leader_id, 
>> #SERVERS)                        \
>> +    log.info(string.format("current leader id %d, new leader id 
>> %d",           \
>> +                           current_leader_id, 
>> new_leader_id))                  \
>> +    test_run:eval(SERVERS[new_leader_id], 
>> "box.ctl.clear_synchro_queue()")     \
>> +    replica = random(new_leader_id, 
>> #SERVERS)                                  \
>> +    test_run:wait_cond(function() return 
>> test_run:eval(SERVERS[replica],       \
>> +                       string.format("box.space.sync:get{%d}", i)) 
>> ~= nil end) \
>> +    test_run:wait_cond(function() return 
>> test_run:eval(SERVERS[current_leader_id], \
>> +                       string.format("box.space.sync:get{%d}", i)) 
>> ~= nil end) \
>> +    current_leader_id = 
>> new_leader_id                                          \
>> +end
>
>
> Discussed verbally. Please update the testcase, so that insertions
> are done with too high quorum.

Updated test and added case with 'broken' quorum. Result file has been 
updated too.

Number of test iterations has been reduced from 300 to 200.

@@ -3,7 +3,7 @@
  math = require('math')
  fiber = require('fiber')
  test_run = env.new()
-log = require('log')
+netbox = require('net.box')

  orig_synchro_quorum = box.cfg.replication_synchro_quorum
  orig_synchro_timeout = box.cfg.replication_synchro_timeout
@@ -24,10 +24,11 @@
      return r \
  end

+-- Set 'broken' quorum on current leader.
  -- Write value on current leader.
  -- Pick a random replica in a cluster.
--- Promote replica to leader.
--- Make sure value is there.
+-- Set 'good' quorum on it and promote to a leader.
+-- Make sure value is there and on an old leader.

  -- Testcase setup.
  test_run:create_cluster(SERVERS)
@@ -35,28 +36,35 @@
  test_run:switch('qsync1')
  _ = box.schema.space.create('sync', {is_sync=true, engine = 
test_run:get_cfg('engine')})
  _ = box.space.sync:create_index('primary')
+box.schema.user.grant('guest', 'write', 'space', 'sync')

  test_run:switch('default')
  current_leader_id = 1
  test_run:eval(SERVERS[current_leader_id], 
"box.ctl.clear_synchro_queue()")

+SOCKET_DIR = require('fio').cwd()
+
  -- Testcase body.
-for i=1,300 do \
+for i=1,200 do \
test_run:eval(SERVERS[current_leader_id], \
-        string.format("box.space.sync:insert{%d}", 
i))                         \
+        "box.cfg{replication_synchro_quorum=6, 
replication_synchro_timeout=1000}") \
+    c = 
netbox.connect(SOCKET_DIR..'/'..SERVERS[current_leader_id]..'.sock') \
+    fiber.create(function() c.space.sync:insert{i} 
end)                        \
      new_leader_id = random(current_leader_id, 
#SERVERS)                        \
-    log.info(string.format("current leader id %d, new leader id 
%d",           \
-                           current_leader_id, 
new_leader_id))                  \
+ test_run:eval(SERVERS[new_leader_id], \
+        "box.cfg{replication_synchro_quorum=3, 
replication_synchro_timeout=0.01}") \
      test_run:eval(SERVERS[new_leader_id], 
"box.ctl.clear_synchro_queue()")     \
+ c:close() \
      replica = random(new_leader_id, 
#SERVERS)                                  \
      test_run:wait_cond(function() return 
test_run:eval(SERVERS[replica],       \
-                       string.format("box.space.sync:get{%d}", i)) ~= 
nil end) \
+                       string.format("box.space.sync:get{%d}", i))[1] 
~= nil end)  \
      test_run:wait_cond(function() return 
test_run:eval(SERVERS[current_leader_id], \
-                       string.format("box.space.sync:get{%d}", i)) ~= 
nil end) \
+                       string.format("box.space.sync:get{%d}", i))[1] 
~= nil end)  \
+    new_leader_id = random(current_leader_id, 
#SERVERS)                        \
      current_leader_id = 
new_leader_id                                          \
  end

  test_run:switch('qsync1')
-box.space.sync:count() -- 300
+box.space.sync:count() -- 200

  -- Teardown.
  test_run:switch('default')

>
>
>> +
>> +test_run:switch('qsync1')
>> +box.space.sync:count() -- 300
>> +
>> +-- Teardown.
>> +test_run:switch('default')
>> +test_run:eval(SERVERS[current_leader_id], 'box.space.sync:drop()')
>> +test_run:drop_cluster(SERVERS)
>> +box.cfg{ \
>> +    replication_synchro_quorum = 
>> orig_synchro_quorum,                          \
>> +    replication_synchro_timeout = 
>> orig_synchro_timeout,                        \
>> +}
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function
  2020-11-13 13:12   ` Serge Petrenko
@ 2020-11-16 11:11     ` Sergey Bronnikov
  2020-11-16 14:44       ` Serge Petrenko
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Bronnikov @ 2020-11-16 11:11 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches, v.shpilevoy

Hi, Serge!

thanks for review!

On 13.11.2020 16:12, Serge Petrenko wrote:
>
> 12.11.2020 12:10, sergeyb@tarantool.org пишет:
>> From: Sergey Bronnikov <sergeyb@tarantool.org>
>>
>> Part of #5055
>> Part of #4849
>
> Hi! Thanks for the patch!
>
> Please see 2 comments below.
>
>> ---
>>   test/replication/qsync_basic.result   | 136 ++++++++++++++++++++++++++
>>   test/replication/qsync_basic.test.lua |  50 ++++++++++
>>   2 files changed, 186 insertions(+)
>>
>> diff --git a/test/replication/qsync_basic.result 
>> b/test/replication/qsync_basic.result
>> index bd3c3cce1..e848e305a 100644
>> --- a/test/replication/qsync_basic.result
>> +++ b/test/replication/qsync_basic.result
>> @@ -32,6 +32,13 @@ s2.is_sync
>>    | - false
>>    | ...
>>   +--
>> +-- gh-4849: clear synchro queue with unconfigured box
>> +--
>
> What do you mean by `unconfigured box`?
>
> box.cfg is called when the default instance (master.lua) is started, 
> before
> the first test line.

Missed it.

>
> If you really  want to test `box.ctl.clear_synchro_queue` before box.cfg,
> you need to do it in `app-tap/cfg.test.lua`. It has all the tests for 
> `box.ctl` options.

Removed testcase from qsync_basic.test.lua and added it to 
app-tap/cfg.test.lua:

--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -13,11 +13,6 @@ s1:select{}
  s2 = box.schema.create_space('test2')
  s2.is_sync

---
--- gh-4849: clear synchro queue with unconfigured box
---
-box.ctl.clear_synchro_queue()
-
  -- Net.box takes sync into account.
  box.schema.user.grant('guest', 'super')
  netbox = require('net.box')

--- a/test/app-tap/cfg.test.lua
+++ b/test/app-tap/cfg.test.lua
@@ -3,7 +3,13 @@ local fiber = require('fiber')
  local tap = require('tap')
  local test = tap.test("cfg")

-test:plan(11)
+test:plan(12)
+
+--
+-- gh-4849: clear synchro queue is null with unconfigured box
+--
+local ok, err = pcall(box.ctl.clear_synchro_queue(), nil)
+test:isnil(ok, 'execute clear_synchro_queue with unconfigured box')

  --
  -- gh-4282: box.cfg should not allow nor just ignore nil UUID.

>> +box.ctl.clear_synchro_queue()
>> + | ---
>> + | ...
>> +
>>   -- Net.box takes sync into account.
>>   box.schema.user.grant('guest', 'super')
>>    | ---
>> @@ -637,6 +644,135 @@ box.space.sync:count()
>>    | - 0
>>    | ...
>>   +--
>> +-- gh-4849: clear synchro queue on a master
>> +--
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>> = 1000}
>> + | ---
>> + | ...
>> +ok, err = nil
>> + | ---
>> + | ...
>> +f = fiber.create(function()                            \
>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>> {10})        \
>> +end)
>> + | ---
>> + | ...
>> +f:status()
>> + | ---
>> + | - suspended
>> + | ...
>> +test_run:switch('replica')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_cond(function() return box.space.sync:get{10} ~= nil end)
>> + | ---
>> + | - true
>> + | ...
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{replication_synchro_timeout = 0.1}
>> + | ---
>> + | ...
>> +box.ctl.clear_synchro_queue()
>> + | ---
>> + | ...
>> +test_run:switch('replica')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
>> + | ---
>> + | - true
>> + | ...
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>> + | ---
>> + | - true
>> + | ...
>> +ok, err
>> + | ---
>> + | - false
>> + | - Quorum collection for a synchronous transaction is timed out
>> + | ...
>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
>> + | ---
>> + | - true
>> + | ...
>> +
>> +--
>> +-- gh-4849: clear synchro queue on a replica, make sure no crashes
>> +--
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>> = 1000}
>> + | ---
>> + | ...
>> +ok, err = nil
>> + | ---
>> + | ...
>> +f = fiber.create(function() \
>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>> {9})                 \
>> +end)
>> + | ---
>> + | ...
>> +f.status()
>> + | ---
>> + | - running
>> + | ...
>
>
> typo: should be `f:status()`

Updated:

--- a/test/replication/qsync_basic.test.lua
+++ b/test/replication/qsync_basic.test.lua
@@ -284,7 +284,7 @@ ok, err = nil
  f = fiber.create(function() \
      ok, err = pcall(box.space.sync.insert, box.space.sync, 
{9})                 \
  end)
-f.status()
+f:status()
  test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
  test_run:switch('replica')
  box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout=0.01}

>
>
>> +test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
>> + | ---
>> + | - true
>> + | ...
>> +test_run:switch('replica')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{replication_synchro_quorum = 3, 
>> replication_synchro_timeout=0.01}
>> + | ---
>> + | ...
>> +box.ctl.clear_synchro_queue()
>> + | ---
>> + | ...
>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>> + | ---
>> + | - true
>> + | ...
>> +test_run:switch('default')
>> + | ---
>> + | - true
>> + | ...
>> +box.cfg{replication_synchro_timeout=0.01}
>> + | ---
>> + | ...
>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>> + | ---
>> + | - true
>> + | ...
>> +ok, err
>> + | ---
>> + | - false
>> + | - Quorum collection for a synchronous transaction is timed out
>> + | ...
>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>> + | ---
>> + | - true
>> + | ...
>> +
>> +-- Note: cluster may be in a broken state here due to nature of 
>> previous test.
>> +
>>   -- Cleanup.
>>   test_run:cmd('switch default')
>>    | ---
>> diff --git a/test/replication/qsync_basic.test.lua 
>> b/test/replication/qsync_basic.test.lua
>> index 94235547d..523dfa779 100644
>> --- a/test/replication/qsync_basic.test.lua
>> +++ b/test/replication/qsync_basic.test.lua
>> @@ -13,6 +13,11 @@ s1:select{}
>>   s2 = box.schema.create_space('test2')
>>   s2.is_sync
>>   +--
>> +-- gh-4849: clear synchro queue with unconfigured box
>> +--
>> +box.ctl.clear_synchro_queue()
>> +
>>   -- Net.box takes sync into account.
>>   box.schema.user.grant('guest', 'super')
>>   netbox = require('net.box')
>> @@ -248,6 +253,51 @@ for i = 1, 100 do box.space.sync:delete{i} end
>>   test_run:cmd('switch replica')
>>   box.space.sync:count()
>>   +--
>> +-- gh-4849: clear synchro queue on a master
>> +--
>> +test_run:switch('default')
>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>> = 1000}
>> +ok, err = nil
>> +f = fiber.create(function()                            \
>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>> {10})        \
>> +end)
>> +f:status()
>> +test_run:switch('replica')
>> +test_run:wait_cond(function() return box.space.sync:get{10} ~= nil end)
>> +test_run:switch('default')
>> +box.cfg{replication_synchro_timeout = 0.1}
>> +box.ctl.clear_synchro_queue()
>> +test_run:switch('replica')
>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
>> +test_run:switch('default')
>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>> +ok, err
>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil end)
>> +
>> +--
>> +-- gh-4849: clear synchro queue on a replica, make sure no crashes
>> +--
>> +test_run:switch('default')
>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>> = 1000}
>> +ok, err = nil
>> +f = fiber.create(function() \
>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>> {9})                 \
>> +end)
>> +f.status()
>> +test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
>> +test_run:switch('replica')
>> +box.cfg{replication_synchro_quorum = 3, 
>> replication_synchro_timeout=0.01}
>> +box.ctl.clear_synchro_queue()
>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>> +test_run:switch('default')
>> +box.cfg{replication_synchro_timeout=0.01}
>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>> +ok, err
>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>> +
>> +-- Note: cluster may be in a broken state here due to nature of 
>> previous test.
>> +
>>   -- Cleanup.
>>   test_run:cmd('switch default')
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-16  9:10     ` Sergey Bronnikov
@ 2020-11-16 14:00       ` Sergey Bronnikov
  2020-11-16 14:48         ` Serge Petrenko
  0 siblings, 1 reply; 14+ messages in thread
From: Sergey Bronnikov @ 2020-11-16 14:00 UTC (permalink / raw)
  To: tarantool-patches, Serge Petrenko, Vladislav Shpilevoy


On 16.11.2020 12:10, Sergey Bronnikov wrote:
>
>  test_run:switch('qsync1')
> -box.space.sync:count() -- 300
> +box.space.sync:count() -- 200
>
sometimes this statement failed with:

[007] replication/qsync_random_leader.test.lua memtx           [ fail ]
[007]
[007] Test failed! Result content mismatch:
[007] --- replication/qsync_random_leader.result        Mon Nov 16 
08:41:46 2020
[007] +++ 
/home/s.bronnikov/work/tarantool/build/test/var/rejects/replication/qsync_random_leader.reject 
Mon Nov 16 09:57:34 2020
[007] @@ -128,7 +128,7 @@
[007]   | ...
[007]  box.space.sync:count() -- 200
[007]   | ---
[007] - | - 200
[007] + | - 199
[007]   | ...
[007]
[007]  -- Teardown.

so I replace it with wait_cond():

--- a/test/replication/qsync_random_leader.test.lua
+++ b/test/replication/qsync_random_leader.test.lua
@@ -63,8 +63,8 @@ for i=1,200 do \
      current_leader_id = 
new_leader_id                                          \
  end

-test_run:switch('qsync1')
-box.space.sync:count() -- 200
+test_run:wait_cond(function() return test_run:eval('qsync1', \
+                   ("box.space.sync:count()")) == 200 end)  \

  -- Teardown.
  test_run:switch('default')

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function
  2020-11-16 11:11     ` Sergey Bronnikov
@ 2020-11-16 14:44       ` Serge Petrenko
  0 siblings, 0 replies; 14+ messages in thread
From: Serge Petrenko @ 2020-11-16 14:44 UTC (permalink / raw)
  To: Sergey Bronnikov, tarantool-patches, v.shpilevoy


16.11.2020 14:11, Sergey Bronnikov пишет:
> Hi, Serge!
>
> thanks for review!

Thanks for the fixes!

LGTM.

>
> On 13.11.2020 16:12, Serge Petrenko wrote:
>>
>> 12.11.2020 12:10, sergeyb@tarantool.org пишет:
>>> From: Sergey Bronnikov <sergeyb@tarantool.org>
>>>
>>> Part of #5055
>>> Part of #4849
>>
>> Hi! Thanks for the patch!
>>
>> Please see 2 comments below.
>>
>>> ---
>>>   test/replication/qsync_basic.result   | 136 
>>> ++++++++++++++++++++++++++
>>>   test/replication/qsync_basic.test.lua |  50 ++++++++++
>>>   2 files changed, 186 insertions(+)
>>>
>>> diff --git a/test/replication/qsync_basic.result 
>>> b/test/replication/qsync_basic.result
>>> index bd3c3cce1..e848e305a 100644
>>> --- a/test/replication/qsync_basic.result
>>> +++ b/test/replication/qsync_basic.result
>>> @@ -32,6 +32,13 @@ s2.is_sync
>>>    | - false
>>>    | ...
>>>   +--
>>> +-- gh-4849: clear synchro queue with unconfigured box
>>> +--
>>
>> What do you mean by `unconfigured box`?
>>
>> box.cfg is called when the default instance (master.lua) is started, 
>> before
>> the first test line.
>
> Missed it.
>
>>
>> If you really  want to test `box.ctl.clear_synchro_queue` before 
>> box.cfg,
>> you need to do it in `app-tap/cfg.test.lua`. It has all the tests for 
>> `box.ctl` options.
>
> Removed testcase from qsync_basic.test.lua and added it to 
> app-tap/cfg.test.lua:
>
> --- a/test/replication/qsync_basic.test.lua
> +++ b/test/replication/qsync_basic.test.lua
> @@ -13,11 +13,6 @@ s1:select{}
>  s2 = box.schema.create_space('test2')
>  s2.is_sync
>
> ---
> --- gh-4849: clear synchro queue with unconfigured box
> ---
> -box.ctl.clear_synchro_queue()
> -
>  -- Net.box takes sync into account.
>  box.schema.user.grant('guest', 'super')
>  netbox = require('net.box')
>
> --- a/test/app-tap/cfg.test.lua
> +++ b/test/app-tap/cfg.test.lua
> @@ -3,7 +3,13 @@ local fiber = require('fiber')
>  local tap = require('tap')
>  local test = tap.test("cfg")
>
> -test:plan(11)
> +test:plan(12)
> +
> +--
> +-- gh-4849: clear synchro queue is null with unconfigured box
> +--
> +local ok, err = pcall(box.ctl.clear_synchro_queue(), nil)
> +test:isnil(ok, 'execute clear_synchro_queue with unconfigured box')
>
>  --
>  -- gh-4282: box.cfg should not allow nor just ignore nil UUID.
>
>>> +box.ctl.clear_synchro_queue()
>>> + | ---
>>> + | ...
>>> +
>>>   -- Net.box takes sync into account.
>>>   box.schema.user.grant('guest', 'super')
>>>    | ---
>>> @@ -637,6 +644,135 @@ box.space.sync:count()
>>>    | - 0
>>>    | ...
>>>   +--
>>> +-- gh-4849: clear synchro queue on a master
>>> +--
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>>> = 1000}
>>> + | ---
>>> + | ...
>>> +ok, err = nil
>>> + | ---
>>> + | ...
>>> +f = fiber.create(function()                            \
>>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>>> {10})        \
>>> +end)
>>> + | ---
>>> + | ...
>>> +f:status()
>>> + | ---
>>> + | - suspended
>>> + | ...
>>> +test_run:switch('replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:wait_cond(function() return box.space.sync:get{10} ~= nil 
>>> end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{replication_synchro_timeout = 0.1}
>>> + | ---
>>> + | ...
>>> +box.ctl.clear_synchro_queue()
>>> + | ---
>>> + | ...
>>> +test_run:switch('replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil 
>>> end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +ok, err
>>> + | ---
>>> + | - false
>>> + | - Quorum collection for a synchronous transaction is timed out
>>> + | ...
>>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil 
>>> end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +
>>> +--
>>> +-- gh-4849: clear synchro queue on a replica, make sure no crashes
>>> +--
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>>> = 1000}
>>> + | ---
>>> + | ...
>>> +ok, err = nil
>>> + | ---
>>> + | ...
>>> +f = fiber.create(function() \
>>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>>> {9})                 \
>>> +end)
>>> + | ---
>>> + | ...
>>> +f.status()
>>> + | ---
>>> + | - running
>>> + | ...
>>
>>
>> typo: should be `f:status()`
>
> Updated:
>
> --- a/test/replication/qsync_basic.test.lua
> +++ b/test/replication/qsync_basic.test.lua
> @@ -284,7 +284,7 @@ ok, err = nil
>  f = fiber.create(function() \
>      ok, err = pcall(box.space.sync.insert, box.space.sync, 
> {9})                 \
>  end)
> -f.status()
> +f:status()
>  test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
>  test_run:switch('replica')
>  box.cfg{replication_synchro_quorum = 3, 
> replication_synchro_timeout=0.01}
>
>>
>>
>>> +test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:switch('replica')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{replication_synchro_quorum = 3, 
>>> replication_synchro_timeout=0.01}
>>> + | ---
>>> + | ...
>>> +box.ctl.clear_synchro_queue()
>>> + | ---
>>> + | ...
>>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +test_run:switch('default')
>>> + | ---
>>> + | - true
>>> + | ...
>>> +box.cfg{replication_synchro_timeout=0.01}
>>> + | ---
>>> + | ...
>>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +ok, err
>>> + | ---
>>> + | - false
>>> + | - Quorum collection for a synchronous transaction is timed out
>>> + | ...
>>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>>> + | ---
>>> + | - true
>>> + | ...
>>> +
>>> +-- Note: cluster may be in a broken state here due to nature of 
>>> previous test.
>>> +
>>>   -- Cleanup.
>>>   test_run:cmd('switch default')
>>>    | ---
>>> diff --git a/test/replication/qsync_basic.test.lua 
>>> b/test/replication/qsync_basic.test.lua
>>> index 94235547d..523dfa779 100644
>>> --- a/test/replication/qsync_basic.test.lua
>>> +++ b/test/replication/qsync_basic.test.lua
>>> @@ -13,6 +13,11 @@ s1:select{}
>>>   s2 = box.schema.create_space('test2')
>>>   s2.is_sync
>>>   +--
>>> +-- gh-4849: clear synchro queue with unconfigured box
>>> +--
>>> +box.ctl.clear_synchro_queue()
>>> +
>>>   -- Net.box takes sync into account.
>>>   box.schema.user.grant('guest', 'super')
>>>   netbox = require('net.box')
>>> @@ -248,6 +253,51 @@ for i = 1, 100 do box.space.sync:delete{i} end
>>>   test_run:cmd('switch replica')
>>>   box.space.sync:count()
>>>   +--
>>> +-- gh-4849: clear synchro queue on a master
>>> +--
>>> +test_run:switch('default')
>>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>>> = 1000}
>>> +ok, err = nil
>>> +f = fiber.create(function()                            \
>>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>>> {10})        \
>>> +end)
>>> +f:status()
>>> +test_run:switch('replica')
>>> +test_run:wait_cond(function() return box.space.sync:get{10} ~= nil 
>>> end)
>>> +test_run:switch('default')
>>> +box.cfg{replication_synchro_timeout = 0.1}
>>> +box.ctl.clear_synchro_queue()
>>> +test_run:switch('replica')
>>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil 
>>> end)
>>> +test_run:switch('default')
>>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>>> +ok, err
>>> +test_run:wait_cond(function() return box.space.sync:get{10} == nil 
>>> end)
>>> +
>>> +--
>>> +-- gh-4849: clear synchro queue on a replica, make sure no crashes
>>> +--
>>> +test_run:switch('default')
>>> +box.cfg{replication_synchro_quorum = 3, replication_synchro_timeout 
>>> = 1000}
>>> +ok, err = nil
>>> +f = fiber.create(function() \
>>> +    ok, err = pcall(box.space.sync.insert, box.space.sync, 
>>> {9})                 \
>>> +end)
>>> +f.status()
>>> +test_run:wait_cond(function() return box.space.sync:get{9} ~= nil end)
>>> +test_run:switch('replica')
>>> +box.cfg{replication_synchro_quorum = 3, 
>>> replication_synchro_timeout=0.01}
>>> +box.ctl.clear_synchro_queue()
>>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>>> +test_run:switch('default')
>>> +box.cfg{replication_synchro_timeout=0.01}
>>> +test_run:wait_cond(function() return f:status() == 'dead' end)
>>> +ok, err
>>> +test_run:wait_cond(function() return box.space.sync:get{9} == nil end)
>>> +
>>> +-- Note: cluster may be in a broken state here due to nature of 
>>> previous test.
>>> +
>>>   -- Cleanup.
>>>   test_run:cmd('switch default')
>>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-16 14:00       ` Sergey Bronnikov
@ 2020-11-16 14:48         ` Serge Petrenko
  2020-11-16 14:53           ` Serge Petrenko
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-11-16 14:48 UTC (permalink / raw)
  To: Sergey Bronnikov, tarantool-patches, Vladislav Shpilevoy

Thanks for the fixes!

The patch LGTM.

16.11.2020 17:00, Sergey Bronnikov пишет:
>
> On 16.11.2020 12:10, Sergey Bronnikov wrote:
>>
>>  test_run:switch('qsync1')
>> -box.space.sync:count() -- 300
>> +box.space.sync:count() -- 200
>>
> sometimes this statement failed with:
>
> [007] replication/qsync_random_leader.test.lua memtx           [ fail ]
> [007]
> [007] Test failed! Result content mismatch:
> [007] --- replication/qsync_random_leader.result        Mon Nov 16 
> 08:41:46 2020
> [007] +++ 
> /home/s.bronnikov/work/tarantool/build/test/var/rejects/replication/qsync_random_leader.reject 
> Mon Nov 16 09:57:34 2020
> [007] @@ -128,7 +128,7 @@
> [007]   | ...
> [007]  box.space.sync:count() -- 200
> [007]   | ---
> [007] - | - 200
> [007] + | - 199
> [007]   | ...
> [007]
> [007]  -- Teardown.
>
> so I replace it with wait_cond():
>
> --- a/test/replication/qsync_random_leader.test.lua
> +++ b/test/replication/qsync_random_leader.test.lua
> @@ -63,8 +63,8 @@ for i=1,200 do \
>      current_leader_id = 
> new_leader_id                                          \
>  end
>
> -test_run:switch('qsync1')
> -box.space.sync:count() -- 200
> +test_run:wait_cond(function() return test_run:eval('qsync1', \
> +                   ("box.space.sync:count()")) == 200 end)  \
>
>  -- Teardown.
>  test_run:switch('default')
>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-16 14:48         ` Serge Petrenko
@ 2020-11-16 14:53           ` Serge Petrenko
  2020-11-16 16:04             ` Sergey Bronnikov
  0 siblings, 1 reply; 14+ messages in thread
From: Serge Petrenko @ 2020-11-16 14:53 UTC (permalink / raw)
  To: Sergey Bronnikov, tarantool-patches, Vladislav Shpilevoy


16.11.2020 17:48, Serge Petrenko пишет:
> Thanks for the fixes!
>
> The patch LGTM.
>
One side note.

The test runs for 60 seconds on my laptop now.

We should either add it to `long_run` in `replication/suite.ini` or
reduce iterations to something more sane. Say,  20 or 30.

I  personally prefer the latter, if you don't have any objections.

> 16.11.2020 17:00, Sergey Bronnikov пишет:
>>
>> On 16.11.2020 12:10, Sergey Bronnikov wrote:
>>>
>>>  test_run:switch('qsync1')
>>> -box.space.sync:count() -- 300
>>> +box.space.sync:count() -- 200
>>>
>> sometimes this statement failed with:
>>
>> [007] replication/qsync_random_leader.test.lua memtx [ fail ]
>> [007]
>> [007] Test failed! Result content mismatch:
>> [007] --- replication/qsync_random_leader.result        Mon Nov 16 
>> 08:41:46 2020
>> [007] +++ 
>> /home/s.bronnikov/work/tarantool/build/test/var/rejects/replication/qsync_random_leader.reject 
>> Mon Nov 16 09:57:34 2020
>> [007] @@ -128,7 +128,7 @@
>> [007]   | ...
>> [007]  box.space.sync:count() -- 200
>> [007]   | ---
>> [007] - | - 200
>> [007] + | - 199
>> [007]   | ...
>> [007]
>> [007]  -- Teardown.
>>
>> so I replace it with wait_cond():
>>
>> --- a/test/replication/qsync_random_leader.test.lua
>> +++ b/test/replication/qsync_random_leader.test.lua
>> @@ -63,8 +63,8 @@ for i=1,200 do \
>>      current_leader_id = 
>> new_leader_id                                          \
>>  end
>>
>> -test_run:switch('qsync1')
>> -box.space.sync:count() -- 200
>> +test_run:wait_cond(function() return test_run:eval('qsync1', \
>> +                   ("box.space.sync:count()")) == 200 end)  \
>>
>>  -- Teardown.
>>  test_run:switch('default')
>>
-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion
  2020-11-16 14:53           ` Serge Petrenko
@ 2020-11-16 16:04             ` Sergey Bronnikov
  0 siblings, 0 replies; 14+ messages in thread
From: Sergey Bronnikov @ 2020-11-16 16:04 UTC (permalink / raw)
  To: Serge Petrenko, tarantool-patches, Vladislav Shpilevoy


On 16.11.2020 17:53, Serge Petrenko wrote:
>
> 16.11.2020 17:48, Serge Petrenko пишет:
>> Thanks for the fixes!
>>
>> The patch LGTM.
>>
> One side note.
>
> The test runs for 60 seconds on my laptop now.
>
> We should either add it to `long_run` in `replication/suite.ini` or
> reduce iterations to something more sane. Say,  20 or 30.
>
> I  personally prefer the latter, if you don't have any objections.
>
Ok, reduced number of iterations to 30.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-16 16:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12  9:10 [Tarantool-patches] [PATCH 0/3 v2] Additional qsync tests sergeyb
2020-11-12  9:10 ` [Tarantool-patches] [PATCH 1/3 v2] replication: test clear_synchro_queue() function sergeyb
2020-11-13 13:12   ` Serge Petrenko
2020-11-16 11:11     ` Sergey Bronnikov
2020-11-16 14:44       ` Serge Petrenko
2020-11-12  9:10 ` [Tarantool-patches] [PATCH 2/3 v2] replication: add test with random leaders promotion and demotion sergeyb
2020-11-13 15:10   ` Serge Petrenko
2020-11-16  9:10     ` Sergey Bronnikov
2020-11-16 14:00       ` Sergey Bronnikov
2020-11-16 14:48         ` Serge Petrenko
2020-11-16 14:53           ` Serge Petrenko
2020-11-16 16:04             ` Sergey Bronnikov
2020-11-12  9:11 ` [Tarantool-patches] [PATCH 3/3 v2] replication: add test with change space sync mode in a loop sergeyb
2020-11-13 14:05   ` Serge Petrenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox