Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH] test: enable parallel mode for replication tests
@ 2018-09-27 15:38 Sergei Voronezhskii
  2018-10-01  1:36 ` Alexander Turenko
  0 siblings, 1 reply; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-09-27 15:38 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladimir Davydov, Alexander Turenko

- need more sleeps and timeout because increasing load on i/o
- at the end of tests which create any replication config need to call
`test_run:clenup_cluster()` which clears `box.space._cluster`
- switch on `use_unix_sockets` because of 'Address already in use'
problems
- instead of just checking `box.space.test:count()` or
`#fio.glob('./master/*.xlog')` need to wait for values because of
increading load in replication process

Part of #2436, #3232
---
BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication
 test/replication/before_replace.result      |  9 ++-
 test/replication/before_replace.test.lua    |  7 ++-
 test/replication/catch.result               | 31 ++++++----
 test/replication/catch.test.lua             | 14 +++--
 test/replication/gc.result                  | 18 +++---
 test/replication/gc.test.lua                | 12 ++--
 test/replication/local_spaces.result        |  3 +
 test/replication/local_spaces.test.lua      |  1 +
 test/replication/misc.result                | 65 +++++++++++++++------
 test/replication/misc.test.lua              | 43 ++++++++------
 test/replication/on_replace.result          | 13 +++++
 test/replication/on_replace.test.lua        |  4 ++
 test/replication/once.result                |  2 +-
 test/replication/once.test.lua              |  2 +-
 test/replication/quorum.result              |  9 ++-
 test/replication/quorum.test.lua            |  7 ++-
 test/replication/replica_rejoin.result      |  7 +++
 test/replication/replica_rejoin.test.lua    |  2 +
 test/replication/skip_conflict_row.result   |  7 +++
 test/replication/skip_conflict_row.test.lua |  2 +
 test/replication/status.result              |  7 +++
 test/replication/status.test.lua            |  2 +
 test/replication/suite.ini                  |  3 +-
 test/replication/sync.result                |  7 +++
 test/replication/sync.test.lua              |  2 +
 test/replication/wal_off.result             |  7 +++
 test/replication/wal_off.test.lua           |  2 +
 27 files changed, 209 insertions(+), 79 deletions(-)

diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index 858a52de6..a17afb4a3 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -76,7 +76,7 @@ test_run:cmd("switch autobootstrap1")
 ---
 - true
 ...
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
 ---
 - ok
 ...
@@ -87,7 +87,7 @@ test_run:cmd("switch autobootstrap2")
 ---
 - true
 ...
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
 ---
 - ok
 ...
@@ -98,7 +98,7 @@ test_run:cmd("switch autobootstrap3")
 ---
 - true
 ...
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
 ---
 - ok
 ...
@@ -223,3 +223,6 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index f1e590703..d86f74b05 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -46,13 +46,13 @@ test_run:cmd("setopt delimiter ''");
 -- Stall replication and generate incompatible data
 -- on the replicas.
 test_run:cmd("switch autobootstrap1")
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
 for i = 1, 10 do box.space.test:replace{i, i % 3 == 1 and i * 10 or i} end
 test_run:cmd("switch autobootstrap2")
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
 for i = 1, 10 do box.space.test:replace{i, i % 3 == 2 and i * 10 or i} end
 test_run:cmd("switch autobootstrap3")
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
+box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
 for i = 1, 10 do box.space.test:replace{i, i % 3 == 0 and i * 10 or i} end
 
 -- Synchronize.
@@ -80,3 +80,4 @@ box.space.test:select()
 -- Cleanup.
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/catch.result b/test/replication/catch.result
index aebba819f..c0b9bfb19 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -1,6 +1,9 @@
 env = require('test_run')
 ---
 ...
+fiber = require('fiber')
+---
+...
 test_run = env.new()
 ---
 ...
@@ -58,11 +61,11 @@ test_run:cmd("stop server replica")
 - true
 ...
 -- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
+for i=1,1000 do s:insert{i, 'this is test message12345'} end
 ---
 ...
 -- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+errinj.set("ERRINJ_RELAY_TIMEOUT", 2.0)
 ---
 - ok
 ...
@@ -74,6 +77,10 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
+-- After stop server got error variable 'fiber' is not declared
+fiber = require('fiber')
+---
+...
 -- Check that replica doesn't enter read-write mode before
 -- catching up with the master: to check that we inject sleep into
 -- the master relay_send function and attempt a data modifying
@@ -88,13 +95,12 @@ box.space.test ~= nil
 ---
 - true
 ...
-d = box.space.test:delete{1}
+repeat fiber.sleep(0.001) until box.space.test:get(1) ~= nil
 ---
-- error: Can't modify data because this instance is in read-only mode.
 ...
-box.space.test:get(1) ~= nil
+d = box.space.test:delete{1}
 ---
-- true
+- error: Can't modify data because this instance is in read-only mode.
 ...
 -- case #2: delete tuple by net.box
 test_run:cmd("switch default")
@@ -108,19 +114,21 @@ test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
 ---
 ...
-d = c.space.test:delete{1}
+repeat fiber.sleep(0.001) until c.space.test:get(1) ~= nil
 ---
-- error: Can't modify data because this instance is in read-only mode.
 ...
-c.space.test:get(1) ~= nil
+d = c.space.test:delete{1}
 ---
-- true
+- error: Can't modify data because this instance is in read-only mode.
 ...
 -- check sync
 errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
 ---
 - ok
 ...
+fiber.sleep(2.0) -- wait until release errinj sleep
+---
+...
 -- cleanup
 test_run:cmd("stop server replica")
 ---
@@ -130,6 +138,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 8cc3242f7..68cca831e 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -1,4 +1,5 @@
 env = require('test_run')
+fiber = require('fiber')
 test_run = env.new()
 engine = test_run:get_cfg('engine')
 
@@ -23,13 +24,15 @@ test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 
 -- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
+for i=1,1000 do s:insert{i, 'this is test message12345'} end
 
 -- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+errinj.set("ERRINJ_RELAY_TIMEOUT", 2.0)
 
 test_run:cmd("start server replica with args='0.01'")
 test_run:cmd("switch replica")
+-- After stop server got error variable 'fiber' is not declared
+fiber = require('fiber')
 
 -- Check that replica doesn't enter read-write mode before
 -- catching up with the master: to check that we inject sleep into
@@ -42,23 +45,26 @@ test_run:cmd("switch replica")
 -- #1: delete tuple on replica
 --
 box.space.test ~= nil
+repeat fiber.sleep(0.001) until box.space.test:get(1) ~= nil
 d = box.space.test:delete{1}
-box.space.test:get(1) ~= nil
 
 -- case #2: delete tuple by net.box
 
 test_run:cmd("switch default")
 test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
+repeat fiber.sleep(0.001) until c.space.test:get(1) ~= nil
 d = c.space.test:delete{1}
-c.space.test:get(1) ~= nil
 
 -- check sync
 errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+fiber.sleep(2.0) -- wait until release errinj sleep
+
 
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 83d0de293..2639c4cf2 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -1,6 +1,3 @@
-fio = require 'fio'
----
-...
 test_run = require('test_run').new()
 ---
 ...
@@ -13,6 +10,9 @@ replica_set = require('fast_replica')
 fiber = require('fiber')
 ---
 ...
+fio = require('fio')
+---
+...
 test_run:cleanup_cluster()
 ---
 ...
@@ -95,7 +95,7 @@ test_run:cmd("switch replica")
 fiber = require('fiber')
 ---
 ...
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 ---
 ...
 box.space.test:count()
@@ -121,7 +121,9 @@ wait_gc(1)
 ...
 -- Make sure the replica will receive data it is subscribed
 -- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- On parallel test_run mode we need here so big value `1.0`
+-- because replica received data faster than xlog was created
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 1.0)
 ---
 - ok
 ...
@@ -153,9 +155,8 @@ box.snapshot()
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
 ---
-- true
 ...
 -- Remove the timeout injection so that the replica catches
 -- up quickly.
@@ -188,9 +189,8 @@ wait_gc(1)
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 0 do fiber.sleep(0.01) end
 ---
-- true
 ...
 --
 -- Check that the master doesn't delete xlog files sent to the
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index eed76850c..74151a243 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -1,8 +1,8 @@
-fio = require 'fio'
 test_run = require('test_run').new()
 engine = test_run:get_cfg('engine')
 replica_set = require('fast_replica')
 fiber = require('fiber')
+fio = require('fio')
 
 test_run:cleanup_cluster()
 
@@ -52,7 +52,7 @@ test_run:cmd("start server replica")
 -- data from the master. Check it.
 test_run:cmd("switch replica")
 fiber = require('fiber')
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 box.space.test:count()
 test_run:cmd("switch default")
 
@@ -63,7 +63,9 @@ wait_gc(1)
 #fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
 -- Make sure the replica will receive data it is subscribed
 -- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- On parallel test_run mode we need here so big value `1.0`
+-- because replica received data faster than xlog was created
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 1.0)
 
 -- Send more data to the replica.
 -- Need to do 2 snapshots here, otherwise the replica would
@@ -78,7 +80,7 @@ box.snapshot()
 -- xlogs needed by the replica.
 box.snapshot()
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
 
 -- Remove the timeout injection so that the replica catches
 -- up quickly.
@@ -94,7 +96,7 @@ test_run:cmd("switch default")
 -- from the old checkpoint.
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 0 do fiber.sleep(0.01) end
 --
 -- Check that the master doesn't delete xlog files sent to the
 -- replica until it receives a confirmation that the data has
diff --git a/test/replication/local_spaces.result b/test/replication/local_spaces.result
index 151735530..4de223261 100644
--- a/test/replication/local_spaces.result
+++ b/test/replication/local_spaces.result
@@ -216,6 +216,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
index 06e2b0bd2..633cc9f1a 100644
--- a/test/replication/local_spaces.test.lua
+++ b/test/replication/local_spaces.test.lua
@@ -76,6 +76,7 @@ box.space.test3:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
 
 s1:select()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index 0ac48ba34..31ac26e48 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -88,6 +88,13 @@ test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
 ---
 ...
+test_run:cmd('delete server test')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
@@ -106,7 +113,7 @@ test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 test_run:cmd("switch autobootstrap2")
@@ -116,7 +123,7 @@ test_run:cmd("switch autobootstrap2")
 test_run = require('test_run').new()
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 test_run:cmd("switch autobootstrap3")
@@ -129,7 +136,7 @@ test_run = require('test_run').new()
 fiber=require('fiber')
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 _ = box.schema.space.create('test_timeout'):create_index('pk')
@@ -140,15 +147,16 @@ test_run:cmd("setopt delimiter ';'")
 - true
 ...
 function test_timeout()
+    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     for i = 0, 99 do 
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
-        end
+        local n = 200
+        repeat
+            fiber.sleep(0.001)
+            n = n - 1
+            if n == 0 then return error(box.info.replication) end
+        until replicaA.status == 'follow' and replicaB.status == 'follow'
     end
     return true
 end ;
@@ -229,6 +237,9 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
 rlimit = require('rlimit')
@@ -249,15 +260,15 @@ lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
 ---
 - true
 ...
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server bork')
 ---
 - true
 ...
-test_run:cmd('switch sock')
+test_run:cmd('switch bork')
 ---
 - true
 ...
@@ -299,14 +310,21 @@ lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('stop server sock')
+test_run:cmd("stop server bork")
+---
+- true
+...
+test_run:cmd("cleanup server bork")
 ---
 - true
 ...
-test_run:cmd('cleanup server sock')
+test_run:cmd("delete server bork")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
@@ -342,6 +360,17 @@ test_run:cmd('cleanup server er_load2')
 ---
 - true
 ...
+test_run:cmd('delete server er_load1')
+---
+- true
+...
+test_run:cmd('delete server er_load2')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
@@ -349,9 +378,6 @@ test_run:cmd('cleanup server er_load2')
 fiber = require('fiber')
 ---
 ...
-test_run:cleanup_cluster()
----
-...
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 ---
 - true
@@ -391,6 +417,9 @@ test_run:cmd("delete server replica_auth")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.drop('cluster')
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 56e1bab69..21fa1889c 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -32,6 +32,8 @@ test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
 test_run:cmd('stop server test')
 test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
+test_run:cmd('delete server test')
+test_run:cleanup_cluster()
 
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
@@ -41,26 +43,27 @@ test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 test_run:cmd("switch autobootstrap2")
 test_run = require('test_run').new()
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 test_run:cmd("switch autobootstrap3")
 test_run = require('test_run').new()
 fiber=require('fiber')
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 _ = box.schema.space.create('test_timeout'):create_index('pk')
 test_run:cmd("setopt delimiter ';'")
 function test_timeout()
+    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     for i = 0, 99 do 
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
-        end
+        local n = 200
+        repeat
+            fiber.sleep(0.001)
+            n = n - 1
+            if n == 0 then return error(box.info.replication) end
+        until replicaA.status == 'follow' and replicaB.status == 'follow'
     end
     return true
 end ;
@@ -89,6 +92,7 @@ box.space.space1:drop()
 
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
 
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
@@ -99,9 +103,9 @@ old_fno = lim.rlim_cur
 lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd(string.format('start server sock'))
-test_run:cmd('switch sock')
+test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd('start server bork')
+test_run:cmd('switch bork')
 test_run = require('test_run').new()
 fiber = require('fiber')
 test_run:cmd("setopt delimiter ';'")
@@ -122,8 +126,10 @@ test_run:cmd('switch default')
 lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('stop server sock')
-test_run:cmd('cleanup server sock')
+test_run:cmd("stop server bork")
+test_run:cmd("cleanup server bork")
+test_run:cmd("delete server bork")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
 
@@ -138,15 +144,15 @@ test_run:cmd('stop server er_load1')
 -- er_load2 exits automatically.
 test_run:cmd('cleanup server er_load1')
 test_run:cmd('cleanup server er_load2')
+test_run:cmd('delete server er_load1')
+test_run:cmd('delete server er_load2')
+test_run:cleanup_cluster()
 
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
 --
 fiber = require('fiber')
-
-test_run:cleanup_cluster()
-
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
 -- Wait a bit to make sure replica waits till user is created.
@@ -161,5 +167,6 @@ _ = test_run:wait_vclock('replica_auth', vclock)
 test_run:cmd("stop server replica_auth")
 test_run:cmd("cleanup server replica_auth")
 test_run:cmd("delete server replica_auth")
+test_run:cleanup_cluster()
 
 box.schema.user.drop('cluster')
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 4ffa3b25a..2e95b90ea 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -63,6 +63,9 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
+fiber = require('fiber')
+---
+...
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 ---
 ...
@@ -88,6 +91,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
@@ -177,3 +187,6 @@ _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 371b71cbd..e34832103 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -26,6 +26,7 @@ session_type
 test_run:cmd("switch default")
 box.space.test:insert{2}
 test_run:cmd("switch replica")
+fiber = require('fiber')
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 --
 -- applier
@@ -37,6 +38,8 @@ test_run:cmd("switch default")
 --
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
@@ -73,3 +76,4 @@ box.space.s2:select()
 
 _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/once.result b/test/replication/once.result
index 99ac05b72..e6cda0f92 100644
--- a/test/replication/once.result
+++ b/test/replication/once.result
@@ -54,7 +54,7 @@ ch = fiber.channel(1)
 _ = fiber.create(function() box.once("ro", f, 1) ch:put(true) end)
 ---
 ...
-fiber.sleep(0.001)
+fiber.sleep(0.01)
 ---
 ...
 once -- nil
diff --git a/test/replication/once.test.lua b/test/replication/once.test.lua
index 264c63670..275776abc 100644
--- a/test/replication/once.test.lua
+++ b/test/replication/once.test.lua
@@ -19,7 +19,7 @@ once = nil
 box.cfg{read_only = true}
 ch = fiber.channel(1)
 _ = fiber.create(function() box.once("ro", f, 1) ch:put(true) end)
-fiber.sleep(0.001)
+fiber.sleep(0.01)
 once -- nil
 box.cfg{read_only = false}
 ch:get()
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 265b099b7..2642fe8f4 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -435,18 +435,21 @@ test_run:cmd('switch default')
 ---
 - true
 ...
-test_run:cmd('stop server replica_quorum')
+test_run:cmd("stop server replica_quorum")
 ---
 - true
 ...
-test_run:cmd('cleanup server replica_quorum')
+test_run:cmd("cleanup server replica_quorum")
 ---
 - true
 ...
-test_run:cmd('delete server replica_quorum')
+test_run:cmd("delete server replica_quorum")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 5a43275c2..24d1b27c4 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -166,7 +166,8 @@ test_run:cmd('switch replica_quorum')
 box.cfg{replication={INSTANCE_URI, nonexistent_uri(1)}}
 box.info.id
 test_run:cmd('switch default')
-test_run:cmd('stop server replica_quorum')
-test_run:cmd('cleanup server replica_quorum')
-test_run:cmd('delete server replica_quorum')
+test_run:cmd("stop server replica_quorum")
+test_run:cmd("cleanup server replica_quorum")
+test_run:cmd("delete server replica_quorum")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 4370fae4b..37849850f 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -242,6 +242,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index f998f60d0..950ec7532 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -87,5 +87,7 @@ box.space.test:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
index 29963f56a..6ca13b472 100644
--- a/test/replication/skip_conflict_row.result
+++ b/test/replication/skip_conflict_row.result
@@ -91,6 +91,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
index 5f7d6ead3..4406ced95 100644
--- a/test/replication/skip_conflict_row.test.lua
+++ b/test/replication/skip_conflict_row.test.lua
@@ -31,5 +31,7 @@ box.info.status
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/status.result b/test/replication/status.result
index 8394b98c1..9e69f2478 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -391,3 +391,10 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 8bb25e0c6..cfdf6acdb 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -142,3 +142,5 @@ test_run:cmd('switch default')
 box.schema.user.revoke('guest', 'replication')
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index f4abc7af1..5cbc371c2 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -6,5 +6,6 @@ disabled = consistent.test.lua
 release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
+use_unix_sockets = True
 long_run = prune.test.lua
-is_parallel = False
+is_parallel = True
diff --git a/test/replication/sync.result b/test/replication/sync.result
index 81de60758..b2381ac59 100644
--- a/test/replication/sync.result
+++ b/test/replication/sync.result
@@ -303,6 +303,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
index a5cfab8de..51131667d 100644
--- a/test/replication/sync.test.lua
+++ b/test/replication/sync.test.lua
@@ -145,6 +145,8 @@ test_run:grep_log('replica', 'ER_CFG.*')
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/wal_off.result b/test/replication/wal_off.result
index e3b5709e9..e0ae84bd7 100644
--- a/test/replication/wal_off.result
+++ b/test/replication/wal_off.result
@@ -107,6 +107,13 @@ test_run:cmd("cleanup server wal_off")
 ---
 - true
 ...
+test_run:cmd("delete server wal_off")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/wal_off.test.lua b/test/replication/wal_off.test.lua
index 81fcf0b33..110f2f1f7 100644
--- a/test/replication/wal_off.test.lua
+++ b/test/replication/wal_off.test.lua
@@ -37,5 +37,7 @@ box.cfg { replication = "" }
 
 test_run:cmd("stop server wal_off")
 test_run:cmd("cleanup server wal_off")
+test_run:cmd("delete server wal_off")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
-- 
2.18.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] test: enable parallel mode for replication tests
  2018-09-27 15:38 [PATCH] test: enable parallel mode for replication tests Sergei Voronezhskii
@ 2018-10-01  1:36 ` Alexander Turenko
  2018-10-01 10:41   ` [tarantool-patches] " Alexander Turenko
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Turenko @ 2018-10-01  1:36 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

Hi, Sergei!

Below I reviewed 'before_replace' and 'catch' tests. I'm going to split the
review into parts to don't block you. Understand how the replication works, how
25 tests work and propose best way to stabilize them is significant amount of
work for me, so I'll interleave this work with other tasks and will try to
finish this within the week.

WBR, Alexander Turenko.

On Thu, Sep 27, 2018 at 06:38:50PM +0300, Sergei Voronezhskii wrote:
> - need more sleeps and timeout because increasing load on i/o
> - at the end of tests which create any replication config need to call
> `test_run:clenup_cluster()` which clears `box.space._cluster`
> - switch on `use_unix_sockets` because of 'Address already in use'
> problems

Need to update test-run too. This option is ignored for non-default
servers on the test-run version in your branch. I hit 'Address already
in use' error locally when test your branch.

> - instead of just checking `box.space.test:count()` or
> `#fio.glob('./master/*.xlog')` need to wait for values because of
> increading load in replication process
> 
> Part of #2436, #3232
> ---
> BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication
>  test/replication/before_replace.result      |  9 ++-
>  test/replication/before_replace.test.lua    |  7 ++-
>  test/replication/catch.result               | 31 ++++++----
>  test/replication/catch.test.lua             | 14 +++--
>  test/replication/gc.result                  | 18 +++---
>  test/replication/gc.test.lua                | 12 ++--
>  test/replication/local_spaces.result        |  3 +
>  test/replication/local_spaces.test.lua      |  1 +
>  test/replication/misc.result                | 65 +++++++++++++++------
>  test/replication/misc.test.lua              | 43 ++++++++------
>  test/replication/on_replace.result          | 13 +++++
>  test/replication/on_replace.test.lua        |  4 ++
>  test/replication/once.result                |  2 +-
>  test/replication/once.test.lua              |  2 +-
>  test/replication/quorum.result              |  9 ++-
>  test/replication/quorum.test.lua            |  7 ++-
>  test/replication/replica_rejoin.result      |  7 +++
>  test/replication/replica_rejoin.test.lua    |  2 +
>  test/replication/skip_conflict_row.result   |  7 +++
>  test/replication/skip_conflict_row.test.lua |  2 +
>  test/replication/status.result              |  7 +++
>  test/replication/status.test.lua            |  2 +
>  test/replication/suite.ini                  |  3 +-
>  test/replication/sync.result                |  7 +++
>  test/replication/sync.test.lua              |  2 +
>  test/replication/wal_off.result             |  7 +++
>  test/replication/wal_off.test.lua           |  2 +
>  27 files changed, 209 insertions(+), 79 deletions(-)
> 
> diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
> index f1e590703..d86f74b05 100644
> --- a/test/replication/before_replace.test.lua
> +++ b/test/replication/before_replace.test.lua
> @@ -46,13 +46,13 @@ test_run:cmd("setopt delimiter ''");
>  -- Stall replication and generate incompatible data
>  -- on the replicas.
>  test_run:cmd("switch autobootstrap1")
> -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
> +box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
>  for i = 1, 10 do box.space.test:replace{i, i % 3 == 1 and i * 10 or i} end
>  test_run:cmd("switch autobootstrap2")
> -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
> +box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
>  for i = 1, 10 do box.space.test:replace{i, i % 3 == 2 and i * 10 or i} end
>  test_run:cmd("switch autobootstrap3")
> -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
> +box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
>  for i = 1, 10 do box.space.test:replace{i, i % 3 == 0 and i * 10 or i} end
>  

I have got the following miscompare locally with the new timeout value (0.1).
Don't know whether it matters, but it was got on the old test-run (TCP sockets
were used for non-default server consoles).

[001] --- replication/before_replace.result	Fri Sep 28 14:13:57 2018
[001] +++ replication/before_replace.reject	Mon Oct  1 02:21:22 2018
[001] @@ -146,8 +146,8 @@
[001]    - [6, 60]
[001]    - [7, 70]
[001]    - [8, 80]
[001] -  - [9, 90]
[001] -  - [10, 100]
[001] +  - [9, 9]
[001] +  - [10, 10]
[001]  ...
[001]  test_run:cmd("switch autobootstrap2")
[001]  ---

It is the datum on the first node after restart. But 0.01 works good for me.
Don't know why, to be honest.

Don't get why ERRINJ_RELAY_TIMEOUT calls fiber_sleep **after** the write
of xrow (changing that break the test too). As I see 'catch' test exploits this
fact (but I rewrote it, see below).

Anyway. I'm tentative about the right implementation, but propose use an
enable/disable switch instead of timeout, somehow like so:

diff --git a/src/box/relay.cc b/src/box/relay.cc
index c90383d4a..078273bd4 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -622,12 +622,19 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 static void
 relay_send(struct relay *relay, struct xrow_header *packet)
 {
+       struct errinj *inj = errinj(ERRINJ_RELAY_STOP_SEND,
+                                   ERRINJ_BOOL);
+       while (inj->bparam) {
+               fiber_sleep(0.01);
+               inj = errinj(ERRINJ_RELAY_STOP_SEND,
+                            ERRINJ_BOOL);
+       }
        packet->sync = relay->sync;
        relay->last_row_tm = ev_monotonic_now(loop());
        coio_write_xrow(&relay->io, packet);
        fiber_gc();
 
-       struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
+       inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
        if (inj != NULL && inj->dparam > 0)
                fiber_sleep(inj->dparam);
 }
diff --git a/src/errinj.h b/src/errinj.h
index 84a1fbb5e..eaac24f5d 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -94,6 +94,7 @@ struct errinj {
        _(ERRINJ_VY_GC, ERRINJ_BOOL, {.bparam = false}) \
        _(ERRINJ_VY_LOG_FLUSH, ERRINJ_BOOL, {.bparam = false}) \
        _(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL, {.bparam = false}) \
+       _(ERRINJ_RELAY_STOP_SEND, ERRINJ_BOOL, {.bparam = false}) \
        _(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
        _(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
        _(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index d86f74b05..bfe3ba9e8 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -46,15 +46,23 @@ test_run:cmd("setopt delimiter ''");
 -- Stall replication and generate incompatible data
 -- on the replicas.
 test_run:cmd("switch autobootstrap1")
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
+box.error.injection.set('ERRINJ_RELAY_STOP_SEND', true)
 for i = 1, 10 do box.space.test:replace{i, i % 3 == 1 and i * 10 or i} end
 test_run:cmd("switch autobootstrap2")
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
+box.error.injection.set('ERRINJ_RELAY_STOP_SEND', true)
 for i = 1, 10 do box.space.test:replace{i, i % 3 == 2 and i * 10 or i} end
 test_run:cmd("switch autobootstrap3")
-box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
+box.error.injection.set('ERRINJ_RELAY_STOP_SEND', true)
 for i = 1, 10 do box.space.test:replace{i, i % 3 == 0 and i * 10 or i} end
 
+-- Resume replication.
+test_run:cmd("switch autobootstrap1")
+box.error.injection.set('ERRINJ_RELAY_STOP_SEND', false)
+test_run:cmd("switch autobootstrap2")
+box.error.injection.set('ERRINJ_RELAY_STOP_SEND', false)
+test_run:cmd("switch autobootstrap3")
+box.error.injection.set('ERRINJ_RELAY_STOP_SEND', false)
+
 -- Synchronize.
 test_run:cmd("switch default")
 vclock = test_run:get_cluster_vclock(SERVERS)

That works for me.

>  -- Synchronize.
> @@ -80,3 +80,4 @@ box.space.test:select()
>  -- Cleanup.
>  test_run:cmd("switch default")
>  test_run:drop_cluster(SERVERS)
> +test_run:cleanup_cluster()

> diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
> index 8cc3242f7..68cca831e 100644
> --- a/test/replication/catch.test.lua
> +++ b/test/replication/catch.test.lua
> @@ -1,4 +1,5 @@
>  env = require('test_run')
> +fiber = require('fiber')
>  test_run = env.new()
>  engine = test_run:get_cfg('engine')
>  
> @@ -23,13 +24,15 @@ test_run:cmd("switch default")
>  test_run:cmd("stop server replica")
>  
>  -- insert values on the master while replica is stopped and can't fetch them
> -for i=1,100 do s:insert{i, 'this is test message12345'} end
> +for i=1,1000 do s:insert{i, 'this is test message12345'} end
>  
>  -- sleep after every tuple
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
> +errinj.set("ERRINJ_RELAY_TIMEOUT", 2.0)
>  
>  test_run:cmd("start server replica with args='0.01'")
>  test_run:cmd("switch replica")
> +-- After stop server got error variable 'fiber' is not declared
> +fiber = require('fiber')
>  
>  -- Check that replica doesn't enter read-write mode before
>  -- catching up with the master: to check that we inject sleep into
> @@ -42,23 +45,26 @@ test_run:cmd("switch replica")
>  -- #1: delete tuple on replica
>  --
>  box.space.test ~= nil
> +repeat fiber.sleep(0.001) until box.space.test:get(1) ~= nil
>  d = box.space.test:delete{1}
> -box.space.test:get(1) ~= nil
>  
>  -- case #2: delete tuple by net.box
>  
>  test_run:cmd("switch default")
>  test_run:cmd("set variable r_uri to 'replica.listen'")
>  c = net_box.connect(r_uri)
> +repeat fiber.sleep(0.001) until c.space.test:get(1) ~= nil
>  d = c.space.test:delete{1}
> -c.space.test:get(1) ~= nil
>  
>  -- check sync
>  errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
> +fiber.sleep(2.0) -- wait until release errinj sleep
> +
>  
>  -- cleanup
>  test_run:cmd("stop server replica")
>  test_run:cmd("cleanup server replica")
> +test_run:cleanup_cluster()
>  box.space.test:drop()
>  box.schema.user.revoke('guest', 'replication')
>  

Proposed to use the replication enable/disable too. Also I think it is
easier to use replace instead of waiting for the first tuple and then
use delete.

The diff:

diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 68cca831e..92bb645b7 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -3,7 +3,6 @@ fiber = require('fiber')
 test_run = env.new()
 engine = test_run:get_cfg('engine')
 
-
 net_box = require('net.box')
 errinj = box.error.injection
 
@@ -14,7 +13,7 @@ test_run:cmd("switch replica")
 
 test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
--- vinyl does not support hash index
+-- Vinyl does not support hash index.
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 
 test_run:cmd("switch replica")
@@ -23,48 +22,42 @@ while box.space.test == nil do fiber.sleep(0.01) end
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,1000 do s:insert{i, 'this is test message12345'} end
-
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 2.0)
+-- Insert values on the master while replica is stopped and can't
+-- fetch them.
+errinj.set("ERRINJ_RELAY_STOP_SEND", true)
+for i = 1, 100 do s:insert{i, 'this is test message12345'} end
 
 test_run:cmd("start server replica with args='0.01'")
 test_run:cmd("switch replica")
--- After stop server got error variable 'fiber' is not declared
-fiber = require('fiber')
 
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
--- catching up with the master (local delete, remote delete) case
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
 --
--- #1: delete tuple on replica
+-- In the next two cases we try to replace a tuple while replica
+-- is catching up with the master (local replace, remote replace)
+-- case.
+--
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
-repeat fiber.sleep(0.001) until box.space.test:get(1) ~= nil
-d = box.space.test:delete{1}
+box.space.test:replace{1}
 
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 
 test_run:cmd("switch default")
 test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
-repeat fiber.sleep(0.001) until c.space.test:get(1) ~= nil
-d = c.space.test:delete{1}
-
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
-fiber.sleep(2.0) -- wait until release errinj sleep
+c.space.test:replace{1}
 
+-- Resume replicaton.
+errinj.set("ERRINJ_RELAY_STOP_SEND", false)
 
--- cleanup
+-- Cleanup.
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
 test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
-

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [tarantool-patches] Re: [PATCH] test: enable parallel mode for replication tests
  2018-10-01  1:36 ` Alexander Turenko
@ 2018-10-01 10:41   ` Alexander Turenko
  2018-10-03 14:50     ` Sergei Voronezhskii
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Turenko @ 2018-10-01 10:41 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

I discussed how we should proceed with the tests with Vladimir.

What should be done from QA side:

* cleanup box.space._cluster, delete replica where appropriate;
* using unix sockets against 'address already in use';
* find fails, perform some initial investigation;
* file an issue against a test author (if he work with us), whose test
  fails.

We should not tweak timeouts to make a test pass (and possibly hide
problems).

So, please, transform all changes except trivial ones (first two bullets
above) into issues.

----

Follow ups for my diffs from the previous email:

* before_replace:

I don't get the idea initially. Mine thought was that we try to use
timeout to set strict order of operations, but this is now so. The idea
was to slow replication down, so inserts will race with the replication
applier.

So timeout should be used here and it should race with inserts. Likely
0.01 is the best option. Maybe several cases or one random timeout case
is better.

The test shows another tuples after restart for me with 0.1 and with
other timeouts for Vladimir. This looks like the real bug. Please, file
issue instead of hiding that.

* catch:

Vladimir says switch/delay (don't sure which term is fit better) is okay
here. Maybe also we need to check that all tuples arrived to replicas
after enabling the replication.

WBR, Alexander Turenko.

On Mon, Oct 01, 2018 at 04:36:57AM +0300, Alexander Turenko wrote:
> Hi, Sergei!
> 
> Below I reviewed 'before_replace' and 'catch' tests. I'm going to split the
> review into parts to don't block you. Understand how the replication works, how
> 25 tests work and propose best way to stabilize them is significant amount of
> work for me, so I'll interleave this work with other tasks and will try to
> finish this within the week.
> 
> WBR, Alexander Turenko.
> 
> On Thu, Sep 27, 2018 at 06:38:50PM +0300, Sergei Voronezhskii wrote:
> > - need more sleeps and timeout because increasing load on i/o
> > - at the end of tests which create any replication config need to call
> > `test_run:clenup_cluster()` which clears `box.space._cluster`
> > - switch on `use_unix_sockets` because of 'Address already in use'
> > problems
> 
> Need to update test-run too. This option is ignored for non-default
> servers on the test-run version in your branch. I hit 'Address already
> in use' error locally when test your branch.
> 
> > - instead of just checking `box.space.test:count()` or
> > `#fio.glob('./master/*.xlog')` need to wait for values because of
> > increading load in replication process
> > 
> > Part of #2436, #3232
> > ---
> > BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication
> >  test/replication/before_replace.result      |  9 ++-
> >  test/replication/before_replace.test.lua    |  7 ++-
> >  test/replication/catch.result               | 31 ++++++----
> >  test/replication/catch.test.lua             | 14 +++--
> >  test/replication/gc.result                  | 18 +++---
> >  test/replication/gc.test.lua                | 12 ++--
> >  test/replication/local_spaces.result        |  3 +
> >  test/replication/local_spaces.test.lua      |  1 +
> >  test/replication/misc.result                | 65 +++++++++++++++------
> >  test/replication/misc.test.lua              | 43 ++++++++------
> >  test/replication/on_replace.result          | 13 +++++
> >  test/replication/on_replace.test.lua        |  4 ++
> >  test/replication/once.result                |  2 +-
> >  test/replication/once.test.lua              |  2 +-
> >  test/replication/quorum.result              |  9 ++-
> >  test/replication/quorum.test.lua            |  7 ++-
> >  test/replication/replica_rejoin.result      |  7 +++
> >  test/replication/replica_rejoin.test.lua    |  2 +
> >  test/replication/skip_conflict_row.result   |  7 +++
> >  test/replication/skip_conflict_row.test.lua |  2 +
> >  test/replication/status.result              |  7 +++
> >  test/replication/status.test.lua            |  2 +
> >  test/replication/suite.ini                  |  3 +-
> >  test/replication/sync.result                |  7 +++
> >  test/replication/sync.test.lua              |  2 +
> >  test/replication/wal_off.result             |  7 +++
> >  test/replication/wal_off.test.lua           |  2 +
> >  27 files changed, 209 insertions(+), 79 deletions(-)
> > 
> > diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
> > index f1e590703..d86f74b05 100644
> > --- a/test/replication/before_replace.test.lua
> > +++ b/test/replication/before_replace.test.lua
> > @@ -46,13 +46,13 @@ test_run:cmd("setopt delimiter ''");
> >  -- Stall replication and generate incompatible data
> >  -- on the replicas.
> >  test_run:cmd("switch autobootstrap1")
> > -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
> > +box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
> >  for i = 1, 10 do box.space.test:replace{i, i % 3 == 1 and i * 10 or i} end
> >  test_run:cmd("switch autobootstrap2")
> > -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
> > +box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
> >  for i = 1, 10 do box.space.test:replace{i, i % 3 == 2 and i * 10 or i} end
> >  test_run:cmd("switch autobootstrap3")
> > -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.01)
> > +box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
> >  for i = 1, 10 do box.space.test:replace{i, i % 3 == 0 and i * 10 or i} end
> >  
> 
> I have got the following miscompare locally with the new timeout value (0.1).
> Don't know whether it matters, but it was got on the old test-run (TCP sockets
> were used for non-default server consoles).
> 
> [001] --- replication/before_replace.result	Fri Sep 28 14:13:57 2018
> [001] +++ replication/before_replace.reject	Mon Oct  1 02:21:22 2018
> [001] @@ -146,8 +146,8 @@
> [001]    - [6, 60]
> [001]    - [7, 70]
> [001]    - [8, 80]
> [001] -  - [9, 90]
> [001] -  - [10, 100]
> [001] +  - [9, 9]
> [001] +  - [10, 10]
> [001]  ...
> [001]  test_run:cmd("switch autobootstrap2")
> [001]  ---
> 
> It is the datum on the first node after restart. But 0.01 works good for me.
> Don't know why, to be honest.
> 
> Don't get why ERRINJ_RELAY_TIMEOUT calls fiber_sleep **after** the write
> of xrow (changing that break the test too). As I see 'catch' test exploits this
> fact (but I rewrote it, see below).
> 
> Anyway. I'm tentative about the right implementation, but propose use an
> enable/disable switch instead of timeout, somehow like so:
> 
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index c90383d4a..078273bd4 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -622,12 +622,19 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>  static void
>  relay_send(struct relay *relay, struct xrow_header *packet)
>  {
> +       struct errinj *inj = errinj(ERRINJ_RELAY_STOP_SEND,
> +                                   ERRINJ_BOOL);
> +       while (inj->bparam) {
> +               fiber_sleep(0.01);
> +               inj = errinj(ERRINJ_RELAY_STOP_SEND,
> +                            ERRINJ_BOOL);
> +       }
>         packet->sync = relay->sync;
>         relay->last_row_tm = ev_monotonic_now(loop());
>         coio_write_xrow(&relay->io, packet);
>         fiber_gc();
>  
> -       struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
> +       inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
>         if (inj != NULL && inj->dparam > 0)
>                 fiber_sleep(inj->dparam);
>  }
> diff --git a/src/errinj.h b/src/errinj.h
> index 84a1fbb5e..eaac24f5d 100644
> --- a/src/errinj.h
> +++ b/src/errinj.h
> @@ -94,6 +94,7 @@ struct errinj {
>         _(ERRINJ_VY_GC, ERRINJ_BOOL, {.bparam = false}) \
>         _(ERRINJ_VY_LOG_FLUSH, ERRINJ_BOOL, {.bparam = false}) \
>         _(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL, {.bparam = false}) \
> +       _(ERRINJ_RELAY_STOP_SEND, ERRINJ_BOOL, {.bparam = false}) \
>         _(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
>         _(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
>         _(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
> diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
> index d86f74b05..bfe3ba9e8 100644
> --- a/test/replication/before_replace.test.lua
> +++ b/test/replication/before_replace.test.lua
> @@ -46,15 +46,23 @@ test_run:cmd("setopt delimiter ''");
>  -- Stall replication and generate incompatible data
>  -- on the replicas.
>  test_run:cmd("switch autobootstrap1")
> -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
> +box.error.injection.set('ERRINJ_RELAY_STOP_SEND', true)
>  for i = 1, 10 do box.space.test:replace{i, i % 3 == 1 and i * 10 or i} end
>  test_run:cmd("switch autobootstrap2")
> -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
> +box.error.injection.set('ERRINJ_RELAY_STOP_SEND', true)
>  for i = 1, 10 do box.space.test:replace{i, i % 3 == 2 and i * 10 or i} end
>  test_run:cmd("switch autobootstrap3")
> -box.error.injection.set('ERRINJ_RELAY_TIMEOUT', 0.1)
> +box.error.injection.set('ERRINJ_RELAY_STOP_SEND', true)
>  for i = 1, 10 do box.space.test:replace{i, i % 3 == 0 and i * 10 or i} end
>  
> +-- Resume replication.
> +test_run:cmd("switch autobootstrap1")
> +box.error.injection.set('ERRINJ_RELAY_STOP_SEND', false)
> +test_run:cmd("switch autobootstrap2")
> +box.error.injection.set('ERRINJ_RELAY_STOP_SEND', false)
> +test_run:cmd("switch autobootstrap3")
> +box.error.injection.set('ERRINJ_RELAY_STOP_SEND', false)
> +
>  -- Synchronize.
>  test_run:cmd("switch default")
>  vclock = test_run:get_cluster_vclock(SERVERS)
> 
> That works for me.
> 
> >  -- Synchronize.
> > @@ -80,3 +80,4 @@ box.space.test:select()
> >  -- Cleanup.
> >  test_run:cmd("switch default")
> >  test_run:drop_cluster(SERVERS)
> > +test_run:cleanup_cluster()
> 
> > diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
> > index 8cc3242f7..68cca831e 100644
> > --- a/test/replication/catch.test.lua
> > +++ b/test/replication/catch.test.lua
> > @@ -1,4 +1,5 @@
> >  env = require('test_run')
> > +fiber = require('fiber')
> >  test_run = env.new()
> >  engine = test_run:get_cfg('engine')
> >  
> > @@ -23,13 +24,15 @@ test_run:cmd("switch default")
> >  test_run:cmd("stop server replica")
> >  
> >  -- insert values on the master while replica is stopped and can't fetch them
> > -for i=1,100 do s:insert{i, 'this is test message12345'} end
> > +for i=1,1000 do s:insert{i, 'this is test message12345'} end
> >  
> >  -- sleep after every tuple
> > -errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
> > +errinj.set("ERRINJ_RELAY_TIMEOUT", 2.0)
> >  
> >  test_run:cmd("start server replica with args='0.01'")
> >  test_run:cmd("switch replica")
> > +-- After stop server got error variable 'fiber' is not declared
> > +fiber = require('fiber')
> >  
> >  -- Check that replica doesn't enter read-write mode before
> >  -- catching up with the master: to check that we inject sleep into
> > @@ -42,23 +45,26 @@ test_run:cmd("switch replica")
> >  -- #1: delete tuple on replica
> >  --
> >  box.space.test ~= nil
> > +repeat fiber.sleep(0.001) until box.space.test:get(1) ~= nil
> >  d = box.space.test:delete{1}
> > -box.space.test:get(1) ~= nil
> >  
> >  -- case #2: delete tuple by net.box
> >  
> >  test_run:cmd("switch default")
> >  test_run:cmd("set variable r_uri to 'replica.listen'")
> >  c = net_box.connect(r_uri)
> > +repeat fiber.sleep(0.001) until c.space.test:get(1) ~= nil
> >  d = c.space.test:delete{1}
> > -c.space.test:get(1) ~= nil
> >  
> >  -- check sync
> >  errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
> > +fiber.sleep(2.0) -- wait until release errinj sleep
> > +
> >  
> >  -- cleanup
> >  test_run:cmd("stop server replica")
> >  test_run:cmd("cleanup server replica")
> > +test_run:cleanup_cluster()
> >  box.space.test:drop()
> >  box.schema.user.revoke('guest', 'replication')
> >  
> 
> Proposed to use the replication enable/disable too. Also I think it is
> easier to use replace instead of waiting for the first tuple and then
> use delete.
> 
> The diff:
> 
> diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
> index 68cca831e..92bb645b7 100644
> --- a/test/replication/catch.test.lua
> +++ b/test/replication/catch.test.lua
> @@ -3,7 +3,6 @@ fiber = require('fiber')
>  test_run = env.new()
>  engine = test_run:get_cfg('engine')
>  
> -
>  net_box = require('net.box')
>  errinj = box.error.injection
>  
> @@ -14,7 +13,7 @@ test_run:cmd("switch replica")
>  
>  test_run:cmd("switch default")
>  s = box.schema.space.create('test', {engine = engine});
> --- vinyl does not support hash index
> +-- Vinyl does not support hash index.
>  index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
>  
>  test_run:cmd("switch replica")
> @@ -23,48 +22,42 @@ while box.space.test == nil do fiber.sleep(0.01) end
>  test_run:cmd("switch default")
>  test_run:cmd("stop server replica")
>  
> --- insert values on the master while replica is stopped and can't fetch them
> -for i=1,1000 do s:insert{i, 'this is test message12345'} end
> -
> --- sleep after every tuple
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 2.0)
> +-- Insert values on the master while replica is stopped and can't
> +-- fetch them.
> +errinj.set("ERRINJ_RELAY_STOP_SEND", true)
> +for i = 1, 100 do s:insert{i, 'this is test message12345'} end
>  
>  test_run:cmd("start server replica with args='0.01'")
>  test_run:cmd("switch replica")
> --- After stop server got error variable 'fiber' is not declared
> -fiber = require('fiber')
>  
>  -- Check that replica doesn't enter read-write mode before
> --- catching up with the master: to check that we inject sleep into
> --- the master relay_send function and attempt a data modifying
> --- statement in replica while it's still fetching data from the
> --- master.
> --- In the next two cases we try to delete a tuple while replica is
> --- catching up with the master (local delete, remote delete) case
> +-- catching up with the master: to check that we stop sending
> +-- rows on the master in relay_send function and attempt a data
> +-- modifying statement in replica while it's still fetching data
> +-- from the master.
>  --
> --- #1: delete tuple on replica
> +-- In the next two cases we try to replace a tuple while replica
> +-- is catching up with the master (local replace, remote replace)
> +-- case.
> +--
> +-- Case #1: replace tuple on replica locally.
>  --
>  box.space.test ~= nil
> -repeat fiber.sleep(0.001) until box.space.test:get(1) ~= nil
> -d = box.space.test:delete{1}
> +box.space.test:replace{1}
>  
> --- case #2: delete tuple by net.box
> +-- Case #2: replace tuple on replica by net.box.
>  
>  test_run:cmd("switch default")
>  test_run:cmd("set variable r_uri to 'replica.listen'")
>  c = net_box.connect(r_uri)
> -repeat fiber.sleep(0.001) until c.space.test:get(1) ~= nil
> -d = c.space.test:delete{1}
> -
> --- check sync
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
> -fiber.sleep(2.0) -- wait until release errinj sleep
> +c.space.test:replace{1}
>  
> +-- Resume replicaton.
> +errinj.set("ERRINJ_RELAY_STOP_SEND", false)
>  
> --- cleanup
> +-- Cleanup.
>  test_run:cmd("stop server replica")
>  test_run:cmd("cleanup server replica")
>  test_run:cleanup_cluster()
>  box.space.test:drop()
>  box.schema.user.revoke('guest', 'replication')
> -
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] test: enable parallel mode for replication tests
  2018-10-01 10:41   ` [tarantool-patches] " Alexander Turenko
@ 2018-10-03 14:50     ` Sergei Voronezhskii
  2018-10-05  9:02       ` Sergei Voronezhskii
  0 siblings, 1 reply; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-10-03 14:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

- at the end of tests which create any replication config need to call
`test_run:clenup_cluster()` which clears `box.space._cluster`
- switch on `use_unix_sockets` because of 'Address already in use'
problems
- instead of just checking `box.space.test:count()` or
`#fio.glob('./master/*.xlog')` need to wait for values because of
increading load in replication process
- for `catch` test use new errinj for just pause `relay_send` to
check read-only state of replica

Part of #2436, #3232
---
BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication
 src/box/relay.cc                            |  7 ++-
 src/errinj.h                                |  1 +
 test/replication/before_replace.result      |  3 +
 test/replication/before_replace.test.lua    |  1 +
 test/replication/catch.result               | 50 ++++++++--------
 test/replication/catch.test.lua             | 39 ++++++------
 test/replication/gc.result                  | 24 ++++----
 test/replication/gc.test.lua                | 18 +++---
 test/replication/local_spaces.result        |  3 +
 test/replication/local_spaces.test.lua      |  1 +
 test/replication/misc.result                | 66 +++++++++++++++------
 test/replication/misc.test.lua              | 45 ++++++++------
 test/replication/on_replace.result          | 13 ++++
 test/replication/on_replace.test.lua        |  4 ++
 test/replication/once.result                | 12 ++++
 test/replication/once.test.lua              |  3 +
 test/replication/quorum.result              |  9 ++-
 test/replication/quorum.test.lua            |  7 ++-
 test/replication/replica_rejoin.result      |  7 +++
 test/replication/replica_rejoin.test.lua    |  2 +
 test/replication/skip_conflict_row.result   |  7 +++
 test/replication/skip_conflict_row.test.lua |  2 +
 test/replication/status.result              |  7 +++
 test/replication/status.test.lua            |  2 +
 test/replication/suite.ini                  |  3 +-
 test/replication/sync.result                |  7 +++
 test/replication/sync.test.lua              |  2 +
 test/replication/wal_off.result             |  7 +++
 test/replication/wal_off.test.lua           |  2 +
 29 files changed, 242 insertions(+), 112 deletions(-)

diff --git a/src/box/relay.cc b/src/box/relay.cc
index c90383d4a..c7fd53cb4 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -622,12 +622,17 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 static void
 relay_send(struct relay *relay, struct xrow_header *packet)
 {
+    struct errinj *inj = errinj(ERRINJ_RELAY_STOP_SEND, ERRINJ_BOOL);
+    while (inj->bparam) {
+        fiber_sleep(0.01);
+        inj = errinj(ERRINJ_RELAY_STOP_SEND, ERRINJ_BOOL);
+    }
 	packet->sync = relay->sync;
 	relay->last_row_tm = ev_monotonic_now(loop());
 	coio_write_xrow(&relay->io, packet);
 	fiber_gc();
 
-	struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
+	inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
 	if (inj != NULL && inj->dparam > 0)
 		fiber_sleep(inj->dparam);
 }
diff --git a/src/errinj.h b/src/errinj.h
index 84a1fbb5e..eaac24f5d 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -94,6 +94,7 @@ struct errinj {
 	_(ERRINJ_VY_GC, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FLUSH, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL, {.bparam = false}) \
+	_(ERRINJ_RELAY_STOP_SEND, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index 858a52de6..87973b6d1 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -223,3 +223,6 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index f1e590703..bcd264521 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -80,3 +80,4 @@ box.space.test:select()
 -- Cleanup.
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/catch.result b/test/replication/catch.result
index aebba819f..6e8e17ac8 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -1,6 +1,9 @@
 env = require('test_run')
 ---
 ...
+fiber = require('fiber')
+---
+...
 test_run = env.new()
 ---
 ...
@@ -35,7 +38,7 @@ test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
 ---
 ...
--- vinyl does not support hash index
+-- Vinyl does not support hash index
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 ---
 ...
@@ -57,14 +60,13 @@ test_run:cmd("stop server replica")
 ---
 - true
 ...
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
+-- Insert values on the master while replica is stopped and can't fetch them.
+errinj.set('ERRINJ_RELAY_STOP_SEND', true)
 ---
+- ok
 ...
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+for i=1,100 do s:insert{i, 'this is test message12345'} end
 ---
-- ok
 ...
 test_run:cmd("start server replica with args='0.01'")
 ---
@@ -75,28 +77,25 @@ test_run:cmd("switch replica")
 - true
 ...
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica is
 -- catching up with the master (local delete, remote delete) case
 --
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
 ---
 - true
 ...
-d = box.space.test:delete{1}
+box.space.test:replace{1}
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-box.space.test:get(1) ~= nil
----
-- true
-...
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 test_run:cmd("switch default")
 ---
 - true
@@ -108,20 +107,16 @@ test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
 ---
 ...
-d = c.space.test:delete{1}
+d = c.space.test:replace{1}
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-c.space.test:get(1) ~= nil
----
-- true
-...
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+-- Resume replicaton
+errinj.set('ERRINJ_RELAY_STOP_SEND', false)
 ---
 - ok
 ...
--- cleanup
+-- Cleanup
 test_run:cmd("stop server replica")
 ---
 - true
@@ -130,6 +125,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 8cc3242f7..90a5ce8f1 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -1,8 +1,8 @@
 env = require('test_run')
+fiber = require('fiber')
 test_run = env.new()
 engine = test_run:get_cfg('engine')
 
-
 net_box = require('net.box')
 errinj = box.error.injection
 
@@ -13,7 +13,7 @@ test_run:cmd("switch replica")
 
 test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
--- vinyl does not support hash index
+-- Vinyl does not support hash index
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 
 test_run:cmd("switch replica")
@@ -22,43 +22,42 @@ while box.space.test == nil do fiber.sleep(0.01) end
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 
--- insert values on the master while replica is stopped and can't fetch them
+-- Insert values on the master while replica is stopped and can't fetch them.
+errinj.set('ERRINJ_RELAY_STOP_SEND', true)
 for i=1,100 do s:insert{i, 'this is test message12345'} end
 
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
-
 test_run:cmd("start server replica with args='0.01'")
 test_run:cmd("switch replica")
 
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica is
 -- catching up with the master (local delete, remote delete) case
 --
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
-d = box.space.test:delete{1}
-box.space.test:get(1) ~= nil
+box.space.test:replace{1}
 
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 
 test_run:cmd("switch default")
 test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
-d = c.space.test:delete{1}
-c.space.test:get(1) ~= nil
+d = c.space.test:replace{1}
+
+-- Resume replicaton
+errinj.set('ERRINJ_RELAY_STOP_SEND', false)
 
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
 
--- cleanup
+-- Cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 83d0de293..46a02d0ab 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -1,6 +1,3 @@
-fio = require 'fio'
----
-...
 test_run = require('test_run').new()
 ---
 ...
@@ -13,6 +10,9 @@ replica_set = require('fast_replica')
 fiber = require('fiber')
 ---
 ...
+fio = require('fio')
+---
+...
 test_run:cleanup_cluster()
 ---
 ...
@@ -95,7 +95,7 @@ test_run:cmd("switch replica")
 fiber = require('fiber')
 ---
 ...
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 ---
 ...
 box.space.test:count()
@@ -119,9 +119,9 @@ wait_gc(1)
 ---
 - true
 ...
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_STOP_SEND", true)
 ---
 - ok
 ...
@@ -153,13 +153,12 @@ box.snapshot()
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
 ---
-- true
 ...
--- Remove the timeout injection so that the replica catches
+-- Resume replicaton so that the replica catches
 -- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_STOP_SEND", false)
 ---
 - ok
 ...
@@ -188,9 +187,8 @@ wait_gc(1)
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 0 do fiber.sleep(0.01) end
 ---
-- true
 ...
 --
 -- Check that the master doesn't delete xlog files sent to the
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index eed76850c..eb7fee93c 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -1,8 +1,8 @@
-fio = require 'fio'
 test_run = require('test_run').new()
 engine = test_run:get_cfg('engine')
 replica_set = require('fast_replica')
 fiber = require('fiber')
+fio = require('fio')
 
 test_run:cleanup_cluster()
 
@@ -52,7 +52,7 @@ test_run:cmd("start server replica")
 -- data from the master. Check it.
 test_run:cmd("switch replica")
 fiber = require('fiber')
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 box.space.test:count()
 test_run:cmd("switch default")
 
@@ -61,9 +61,9 @@ test_run:cmd("switch default")
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
 #fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_STOP_SEND", true)
 
 -- Send more data to the replica.
 -- Need to do 2 snapshots here, otherwise the replica would
@@ -78,11 +78,11 @@ box.snapshot()
 -- xlogs needed by the replica.
 box.snapshot()
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
 
--- Remove the timeout injection so that the replica catches
+-- Resume replicaton so that the replica catches
 -- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_STOP_SEND", false)
 
 -- Check that the replica received all data from the master.
 test_run:cmd("switch replica")
@@ -94,7 +94,7 @@ test_run:cmd("switch default")
 -- from the old checkpoint.
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 0 do fiber.sleep(0.01) end
 --
 -- Check that the master doesn't delete xlog files sent to the
 -- replica until it receives a confirmation that the data has
diff --git a/test/replication/local_spaces.result b/test/replication/local_spaces.result
index 151735530..4de223261 100644
--- a/test/replication/local_spaces.result
+++ b/test/replication/local_spaces.result
@@ -216,6 +216,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
index 06e2b0bd2..633cc9f1a 100644
--- a/test/replication/local_spaces.test.lua
+++ b/test/replication/local_spaces.test.lua
@@ -76,6 +76,7 @@ box.space.test3:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
 
 s1:select()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index f8aa8dab6..937ef1b24 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -88,6 +88,13 @@ test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
 ---
 ...
+test_run:cmd('delete server test')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
@@ -106,7 +113,7 @@ test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 test_run:cmd("switch autobootstrap2")
@@ -116,7 +123,7 @@ test_run:cmd("switch autobootstrap2")
 test_run = require('test_run').new()
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 test_run:cmd("switch autobootstrap3")
@@ -129,7 +136,7 @@ test_run = require('test_run').new()
 fiber=require('fiber')
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 _ = box.schema.space.create('test_timeout'):create_index('pk')
@@ -140,15 +147,16 @@ test_run:cmd("setopt delimiter ';'")
 - true
 ...
 function test_timeout()
+    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     for i = 0, 99 do 
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
-        end
+        local n = 200
+        repeat
+            fiber.sleep(0.001)
+            n = n - 1
+            if n == 0 then return error(box.info.replication) end
+        until replicaA.status == 'follow' and replicaB.status == 'follow'
     end
     return true
 end ;
@@ -158,6 +166,7 @@ test_run:cmd("setopt delimiter ''");
 ---
 - true
 ...
+-- the replica status is checked 100 times, each check within replication_timeout
 test_timeout()
 ---
 - true
@@ -229,6 +238,9 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
 rlimit = require('rlimit')
@@ -249,15 +261,15 @@ lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
 ---
 - true
 ...
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server bork')
 ---
 - true
 ...
-test_run:cmd('switch sock')
+test_run:cmd('switch bork')
 ---
 - true
 ...
@@ -299,14 +311,21 @@ lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('stop server sock')
+test_run:cmd("stop server bork")
+---
+- true
+...
+test_run:cmd("cleanup server bork")
 ---
 - true
 ...
-test_run:cmd('cleanup server sock')
+test_run:cmd("delete server bork")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
@@ -342,6 +361,17 @@ test_run:cmd('cleanup server er_load2')
 ---
 - true
 ...
+test_run:cmd('delete server er_load1')
+---
+- true
+...
+test_run:cmd('delete server er_load2')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
@@ -349,9 +379,6 @@ test_run:cmd('cleanup server er_load2')
 fiber = require('fiber')
 ---
 ...
-test_run:cleanup_cluster()
----
-...
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 ---
 - true
@@ -391,6 +418,9 @@ test_run:cmd("delete server replica_auth")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.drop('cluster')
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 46726b7f4..cb658f6d0 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -32,6 +32,8 @@ test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
 test_run:cmd('stop server test')
 test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
+test_run:cmd('delete server test')
+test_run:cleanup_cluster()
 
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
@@ -41,30 +43,32 @@ test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 test_run:cmd("switch autobootstrap2")
 test_run = require('test_run').new()
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 test_run:cmd("switch autobootstrap3")
 test_run = require('test_run').new()
 fiber=require('fiber')
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 _ = box.schema.space.create('test_timeout'):create_index('pk')
 test_run:cmd("setopt delimiter ';'")
 function test_timeout()
+    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     for i = 0, 99 do 
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
-        end
+        local n = 200
+        repeat
+            fiber.sleep(0.001)
+            n = n - 1
+            if n == 0 then return error(box.info.replication) end
+        until replicaA.status == 'follow' and replicaB.status == 'follow'
     end
     return true
 end ;
 test_run:cmd("setopt delimiter ''");
+-- the replica status is checked 100 times, each check within replication_timeout
 test_timeout()
 
 -- gh-3247 - Sequence-generated value is not replicated in case
@@ -89,6 +93,7 @@ box.space.space1:drop()
 
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
 
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
@@ -99,9 +104,9 @@ old_fno = lim.rlim_cur
 lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd(string.format('start server sock'))
-test_run:cmd('switch sock')
+test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd('start server bork')
+test_run:cmd('switch bork')
 test_run = require('test_run').new()
 fiber = require('fiber')
 test_run:cmd("setopt delimiter ';'")
@@ -122,8 +127,10 @@ test_run:cmd('switch default')
 lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('stop server sock')
-test_run:cmd('cleanup server sock')
+test_run:cmd("stop server bork")
+test_run:cmd("cleanup server bork")
+test_run:cmd("delete server bork")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
 
@@ -138,15 +145,15 @@ test_run:cmd('stop server er_load1')
 -- er_load2 exits automatically.
 test_run:cmd('cleanup server er_load1')
 test_run:cmd('cleanup server er_load2')
+test_run:cmd('delete server er_load1')
+test_run:cmd('delete server er_load2')
+test_run:cleanup_cluster()
 
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
 --
 fiber = require('fiber')
-
-test_run:cleanup_cluster()
-
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
 -- Wait a bit to make sure replica waits till user is created.
@@ -161,6 +168,8 @@ _ = test_run:wait_vclock('replica_auth', vclock)
 test_run:cmd("stop server replica_auth")
 test_run:cmd("cleanup server replica_auth")
 test_run:cmd("delete server replica_auth")
+test_run:cleanup_cluster()
+
 box.schema.user.drop('cluster')
 
 --
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 4ffa3b25a..2e95b90ea 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -63,6 +63,9 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
+fiber = require('fiber')
+---
+...
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 ---
 ...
@@ -88,6 +91,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
@@ -177,3 +187,6 @@ _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 371b71cbd..e34832103 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -26,6 +26,7 @@ session_type
 test_run:cmd("switch default")
 box.space.test:insert{2}
 test_run:cmd("switch replica")
+fiber = require('fiber')
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 --
 -- applier
@@ -37,6 +38,8 @@ test_run:cmd("switch default")
 --
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
@@ -73,3 +76,4 @@ box.space.s2:select()
 
 _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/once.result b/test/replication/once.result
index 99ac05b72..fd787915e 100644
--- a/test/replication/once.result
+++ b/test/replication/once.result
@@ -85,3 +85,15 @@ once -- 1
 box.cfg{read_only = false}
 ---
 ...
+box.space._schema:delete{"oncero"}
+---
+- ['oncero']
+...
+box.space._schema:delete{"oncekey"}
+---
+- ['oncekey']
+...
+box.space._schema:delete{"oncetest"}
+---
+- ['oncetest']
+...
diff --git a/test/replication/once.test.lua b/test/replication/once.test.lua
index 264c63670..813fbfdab 100644
--- a/test/replication/once.test.lua
+++ b/test/replication/once.test.lua
@@ -28,3 +28,6 @@ box.cfg{read_only = true}
 box.once("ro", f, 1) -- ok, already done
 once -- 1
 box.cfg{read_only = false}
+box.space._schema:delete{"oncero"}
+box.space._schema:delete{"oncekey"}
+box.space._schema:delete{"oncetest"}
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 265b099b7..2642fe8f4 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -435,18 +435,21 @@ test_run:cmd('switch default')
 ---
 - true
 ...
-test_run:cmd('stop server replica_quorum')
+test_run:cmd("stop server replica_quorum")
 ---
 - true
 ...
-test_run:cmd('cleanup server replica_quorum')
+test_run:cmd("cleanup server replica_quorum")
 ---
 - true
 ...
-test_run:cmd('delete server replica_quorum')
+test_run:cmd("delete server replica_quorum")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 5a43275c2..24d1b27c4 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -166,7 +166,8 @@ test_run:cmd('switch replica_quorum')
 box.cfg{replication={INSTANCE_URI, nonexistent_uri(1)}}
 box.info.id
 test_run:cmd('switch default')
-test_run:cmd('stop server replica_quorum')
-test_run:cmd('cleanup server replica_quorum')
-test_run:cmd('delete server replica_quorum')
+test_run:cmd("stop server replica_quorum")
+test_run:cmd("cleanup server replica_quorum")
+test_run:cmd("delete server replica_quorum")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 4370fae4b..37849850f 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -242,6 +242,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index f998f60d0..950ec7532 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -87,5 +87,7 @@ box.space.test:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
index 29963f56a..6ca13b472 100644
--- a/test/replication/skip_conflict_row.result
+++ b/test/replication/skip_conflict_row.result
@@ -91,6 +91,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
index 5f7d6ead3..4406ced95 100644
--- a/test/replication/skip_conflict_row.test.lua
+++ b/test/replication/skip_conflict_row.test.lua
@@ -31,5 +31,7 @@ box.info.status
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/status.result b/test/replication/status.result
index 8394b98c1..9e69f2478 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -391,3 +391,10 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 8bb25e0c6..cfdf6acdb 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -142,3 +142,5 @@ test_run:cmd('switch default')
 box.schema.user.revoke('guest', 'replication')
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index f4abc7af1..5cbc371c2 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -6,5 +6,6 @@ disabled = consistent.test.lua
 release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
+use_unix_sockets = True
 long_run = prune.test.lua
-is_parallel = False
+is_parallel = True
diff --git a/test/replication/sync.result b/test/replication/sync.result
index 81de60758..b2381ac59 100644
--- a/test/replication/sync.result
+++ b/test/replication/sync.result
@@ -303,6 +303,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
index a5cfab8de..51131667d 100644
--- a/test/replication/sync.test.lua
+++ b/test/replication/sync.test.lua
@@ -145,6 +145,8 @@ test_run:grep_log('replica', 'ER_CFG.*')
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/wal_off.result b/test/replication/wal_off.result
index e3b5709e9..e0ae84bd7 100644
--- a/test/replication/wal_off.result
+++ b/test/replication/wal_off.result
@@ -107,6 +107,13 @@ test_run:cmd("cleanup server wal_off")
 ---
 - true
 ...
+test_run:cmd("delete server wal_off")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/wal_off.test.lua b/test/replication/wal_off.test.lua
index 81fcf0b33..110f2f1f7 100644
--- a/test/replication/wal_off.test.lua
+++ b/test/replication/wal_off.test.lua
@@ -37,5 +37,7 @@ box.cfg { replication = "" }
 
 test_run:cmd("stop server wal_off")
 test_run:cmd("cleanup server wal_off")
+test_run:cmd("delete server wal_off")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
-- 
2.18.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] test: enable parallel mode for replication tests
  2018-10-03 14:50     ` Sergei Voronezhskii
@ 2018-10-05  9:02       ` Sergei Voronezhskii
  2018-10-05  9:02         ` [PATCH 1/4] test: cleanup replication tests, parallel mode on Sergei Voronezhskii
                           ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-10-05  9:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

Split the patch of enable parallel mode in replication tests

BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] test: cleanup replication tests, parallel mode on
  2018-10-05  9:02       ` Sergei Voronezhskii
@ 2018-10-05  9:02         ` Sergei Voronezhskii
  2018-10-08 19:02           ` Alexander Turenko
  2018-10-05  9:02         ` [PATCH 2/4] test: errinj for pause relay_send Sergei Voronezhskii
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-10-05  9:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

- at the end of tests which create any replication config need to call
`test_run:clenup_cluster()` which clears `box.space._cluster`
- switch on `use_unix_sockets` because of 'Address already in use'
problems
- test `once` need to clean `once*` schemas

Part of #2436, #3232
---
 test/replication/before_replace.result      |  3 ++
 test/replication/before_replace.test.lua    |  1 +
 test/replication/catch.result               |  3 ++
 test/replication/catch.test.lua             |  1 +
 test/replication/local_spaces.result        |  3 ++
 test/replication/local_spaces.test.lua      |  1 +
 test/replication/misc.result                | 44 +++++++++++++++++----
 test/replication/misc.test.lua              | 23 +++++++----
 test/replication/on_replace.result          | 10 +++++
 test/replication/on_replace.test.lua        |  3 ++
 test/replication/once.result                | 12 ++++++
 test/replication/once.test.lua              |  3 ++
 test/replication/quorum.result              |  9 +++--
 test/replication/quorum.test.lua            |  7 ++--
 test/replication/replica_rejoin.result      |  7 ++++
 test/replication/replica_rejoin.test.lua    |  2 +
 test/replication/skip_conflict_row.result   |  7 ++++
 test/replication/skip_conflict_row.test.lua |  2 +
 test/replication/status.result              |  7 ++++
 test/replication/status.test.lua            |  2 +
 test/replication/suite.ini                  |  3 +-
 test/replication/sync.result                |  7 ++++
 test/replication/sync.test.lua              |  2 +
 test/replication/wal_off.result             |  7 ++++
 test/replication/wal_off.test.lua           |  2 +
 25 files changed, 148 insertions(+), 23 deletions(-)

diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index 858a52de6..87973b6d1 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -223,3 +223,6 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index f1e590703..bcd264521 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -80,3 +80,4 @@ box.space.test:select()
 -- Cleanup.
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/catch.result b/test/replication/catch.result
index aebba819f..e23f33cef 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -130,6 +130,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 8cc3242f7..217328772 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -59,6 +59,7 @@ errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
diff --git a/test/replication/local_spaces.result b/test/replication/local_spaces.result
index 151735530..4de223261 100644
--- a/test/replication/local_spaces.result
+++ b/test/replication/local_spaces.result
@@ -216,6 +216,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
index 06e2b0bd2..633cc9f1a 100644
--- a/test/replication/local_spaces.test.lua
+++ b/test/replication/local_spaces.test.lua
@@ -76,6 +76,7 @@ box.space.test3:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
 
 s1:select()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index f8aa8dab6..c0c7d482d 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -88,6 +88,13 @@ test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
 ---
 ...
+test_run:cmd('delete server test')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
@@ -229,6 +236,9 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
 rlimit = require('rlimit')
@@ -249,15 +259,15 @@ lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
 ---
 - true
 ...
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server bork')
 ---
 - true
 ...
-test_run:cmd('switch sock')
+test_run:cmd('switch bork')
 ---
 - true
 ...
@@ -299,14 +309,21 @@ lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('stop server sock')
+test_run:cmd("stop server bork")
+---
+- true
+...
+test_run:cmd("cleanup server bork")
 ---
 - true
 ...
-test_run:cmd('cleanup server sock')
+test_run:cmd("delete server bork")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
@@ -342,6 +359,17 @@ test_run:cmd('cleanup server er_load2')
 ---
 - true
 ...
+test_run:cmd('delete server er_load1')
+---
+- true
+...
+test_run:cmd('delete server er_load2')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
@@ -349,9 +377,6 @@ test_run:cmd('cleanup server er_load2')
 fiber = require('fiber')
 ---
 ...
-test_run:cleanup_cluster()
----
-...
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 ---
 - true
@@ -391,6 +416,9 @@ test_run:cmd("delete server replica_auth")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.drop('cluster')
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 46726b7f4..375c8b58a 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -32,6 +32,8 @@ test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
 test_run:cmd('stop server test')
 test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
+test_run:cmd('delete server test')
+test_run:cleanup_cluster()
 
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
@@ -89,6 +91,7 @@ box.space.space1:drop()
 
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
 
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
@@ -99,9 +102,9 @@ old_fno = lim.rlim_cur
 lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd(string.format('start server sock'))
-test_run:cmd('switch sock')
+test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
+test_run:cmd('start server bork')
+test_run:cmd('switch bork')
 test_run = require('test_run').new()
 fiber = require('fiber')
 test_run:cmd("setopt delimiter ';'")
@@ -122,8 +125,10 @@ test_run:cmd('switch default')
 lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('stop server sock')
-test_run:cmd('cleanup server sock')
+test_run:cmd("stop server bork")
+test_run:cmd("cleanup server bork")
+test_run:cmd("delete server bork")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
 
@@ -138,15 +143,15 @@ test_run:cmd('stop server er_load1')
 -- er_load2 exits automatically.
 test_run:cmd('cleanup server er_load1')
 test_run:cmd('cleanup server er_load2')
+test_run:cmd('delete server er_load1')
+test_run:cmd('delete server er_load2')
+test_run:cleanup_cluster()
 
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
 --
 fiber = require('fiber')
-
-test_run:cleanup_cluster()
-
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
 -- Wait a bit to make sure replica waits till user is created.
@@ -161,6 +166,8 @@ _ = test_run:wait_vclock('replica_auth', vclock)
 test_run:cmd("stop server replica_auth")
 test_run:cmd("cleanup server replica_auth")
 test_run:cmd("delete server replica_auth")
+test_run:cleanup_cluster()
+
 box.schema.user.drop('cluster')
 
 --
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 4ffa3b25a..8fef8fb14 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -88,6 +88,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
@@ -177,3 +184,6 @@ _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 371b71cbd..23a3313b5 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -37,6 +37,8 @@ test_run:cmd("switch default")
 --
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
@@ -73,3 +75,4 @@ box.space.s2:select()
 
 _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/once.result b/test/replication/once.result
index 99ac05b72..fd787915e 100644
--- a/test/replication/once.result
+++ b/test/replication/once.result
@@ -85,3 +85,15 @@ once -- 1
 box.cfg{read_only = false}
 ---
 ...
+box.space._schema:delete{"oncero"}
+---
+- ['oncero']
+...
+box.space._schema:delete{"oncekey"}
+---
+- ['oncekey']
+...
+box.space._schema:delete{"oncetest"}
+---
+- ['oncetest']
+...
diff --git a/test/replication/once.test.lua b/test/replication/once.test.lua
index 264c63670..813fbfdab 100644
--- a/test/replication/once.test.lua
+++ b/test/replication/once.test.lua
@@ -28,3 +28,6 @@ box.cfg{read_only = true}
 box.once("ro", f, 1) -- ok, already done
 once -- 1
 box.cfg{read_only = false}
+box.space._schema:delete{"oncero"}
+box.space._schema:delete{"oncekey"}
+box.space._schema:delete{"oncetest"}
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 265b099b7..2642fe8f4 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -435,18 +435,21 @@ test_run:cmd('switch default')
 ---
 - true
 ...
-test_run:cmd('stop server replica_quorum')
+test_run:cmd("stop server replica_quorum")
 ---
 - true
 ...
-test_run:cmd('cleanup server replica_quorum')
+test_run:cmd("cleanup server replica_quorum")
 ---
 - true
 ...
-test_run:cmd('delete server replica_quorum')
+test_run:cmd("delete server replica_quorum")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 5a43275c2..24d1b27c4 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -166,7 +166,8 @@ test_run:cmd('switch replica_quorum')
 box.cfg{replication={INSTANCE_URI, nonexistent_uri(1)}}
 box.info.id
 test_run:cmd('switch default')
-test_run:cmd('stop server replica_quorum')
-test_run:cmd('cleanup server replica_quorum')
-test_run:cmd('delete server replica_quorum')
+test_run:cmd("stop server replica_quorum")
+test_run:cmd("cleanup server replica_quorum")
+test_run:cmd("delete server replica_quorum")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index 4370fae4b..37849850f 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -242,6 +242,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index f998f60d0..950ec7532 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -87,5 +87,7 @@ box.space.test:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
index 29963f56a..6ca13b472 100644
--- a/test/replication/skip_conflict_row.result
+++ b/test/replication/skip_conflict_row.result
@@ -91,6 +91,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
index 5f7d6ead3..4406ced95 100644
--- a/test/replication/skip_conflict_row.test.lua
+++ b/test/replication/skip_conflict_row.test.lua
@@ -31,5 +31,7 @@ box.info.status
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/status.result b/test/replication/status.result
index 8394b98c1..9e69f2478 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -391,3 +391,10 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 8bb25e0c6..cfdf6acdb 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -142,3 +142,5 @@ test_run:cmd('switch default')
 box.schema.user.revoke('guest', 'replication')
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index f4abc7af1..5cbc371c2 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -6,5 +6,6 @@ disabled = consistent.test.lua
 release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
+use_unix_sockets = True
 long_run = prune.test.lua
-is_parallel = False
+is_parallel = True
diff --git a/test/replication/sync.result b/test/replication/sync.result
index 81de60758..b2381ac59 100644
--- a/test/replication/sync.result
+++ b/test/replication/sync.result
@@ -303,6 +303,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
index a5cfab8de..51131667d 100644
--- a/test/replication/sync.test.lua
+++ b/test/replication/sync.test.lua
@@ -145,6 +145,8 @@ test_run:grep_log('replica', 'ER_CFG.*')
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/wal_off.result b/test/replication/wal_off.result
index e3b5709e9..e0ae84bd7 100644
--- a/test/replication/wal_off.result
+++ b/test/replication/wal_off.result
@@ -107,6 +107,13 @@ test_run:cmd("cleanup server wal_off")
 ---
 - true
 ...
+test_run:cmd("delete server wal_off")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/wal_off.test.lua b/test/replication/wal_off.test.lua
index 81fcf0b33..110f2f1f7 100644
--- a/test/replication/wal_off.test.lua
+++ b/test/replication/wal_off.test.lua
@@ -37,5 +37,7 @@ box.cfg { replication = "" }
 
 test_run:cmd("stop server wal_off")
 test_run:cmd("cleanup server wal_off")
+test_run:cmd("delete server wal_off")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
-- 
2.18.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/4] test: errinj for pause relay_send
  2018-10-05  9:02       ` Sergei Voronezhskii
  2018-10-05  9:02         ` [PATCH 1/4] test: cleanup replication tests, parallel mode on Sergei Voronezhskii
@ 2018-10-05  9:02         ` Sergei Voronezhskii
  2018-10-08 19:07           ` Alexander Turenko
  2018-10-05  9:02         ` [PATCH 3/4] test: increase timeout to check replica status Sergei Voronezhskii
  2018-10-05  9:02         ` [PATCH 4/4] test: refactor some requirements to pass the runs Sergei Voronezhskii
  3 siblings, 1 reply; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-10-05  9:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

Instead of using timeout we need just pause `relay_send`. Can't relay
on timeout because of various system load in parallel mode. Add new
errinj which checks boolean in loop and until it is not `True` do not
pass the method `relay_send` to the next statement.

Also here we change `delete` to `replace`. And lookup the xlog files
in loop with a little sleep, until the file count is not as expected.

Part of #2436, #3232
---
 src/box/relay.cc                |  7 +++++-
 src/errinj.h                    |  1 +
 test/replication/catch.result   | 44 ++++++++++++++-------------------
 test/replication/catch.test.lua | 36 +++++++++++++--------------
 test/replication/gc.result      | 18 ++++++--------
 test/replication/gc.test.lua    | 16 ++++++------
 6 files changed, 58 insertions(+), 64 deletions(-)

diff --git a/src/box/relay.cc b/src/box/relay.cc
index c90383d4a..8618fa81a 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -622,12 +622,17 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 static void
 relay_send(struct relay *relay, struct xrow_header *packet)
 {
+    struct errinj *inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
+    while (inj->bparam) {
+        fiber_sleep(0.01);
+        inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
+    }
 	packet->sync = relay->sync;
 	relay->last_row_tm = ev_monotonic_now(loop());
 	coio_write_xrow(&relay->io, packet);
 	fiber_gc();
 
-	struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
+	inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
 	if (inj != NULL && inj->dparam > 0)
 		fiber_sleep(inj->dparam);
 }
diff --git a/src/errinj.h b/src/errinj.h
index 84a1fbb5e..bf6c15ba5 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -94,6 +94,7 @@ struct errinj {
 	_(ERRINJ_VY_GC, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FLUSH, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL, {.bparam = false}) \
+	_(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/replication/catch.result b/test/replication/catch.result
index e23f33cef..b4ddc5d51 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -35,7 +35,7 @@ test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
 ---
 ...
--- vinyl does not support hash index
+-- Vinyl does not support hash index
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 ---
 ...
@@ -57,14 +57,13 @@ test_run:cmd("stop server replica")
 ---
 - true
 ...
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
+-- Insert values on the master while replica is stopped and can't fetch them.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
 ---
+- ok
 ...
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+for i=1,100 do s:insert{i, 'this is test message12345'} end
 ---
-- ok
 ...
 test_run:cmd("start server replica with args='0.01'")
 ---
@@ -75,28 +74,25 @@ test_run:cmd("switch replica")
 - true
 ...
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica is
 -- catching up with the master (local delete, remote delete) case
 --
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
 ---
 - true
 ...
-d = box.space.test:delete{1}
+box.space.test:replace{1}
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-box.space.test:get(1) ~= nil
----
-- true
-...
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 test_run:cmd("switch default")
 ---
 - true
@@ -108,20 +104,16 @@ test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
 ---
 ...
-d = c.space.test:delete{1}
+d = c.space.test:replace{1}
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-c.space.test:get(1) ~= nil
----
-- true
-...
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+-- Resume replicaton
+errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
 ---
 - ok
 ...
--- cleanup
+-- Cleanup
 test_run:cmd("stop server replica")
 ---
 - true
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 217328772..5223e3a24 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -13,7 +13,7 @@ test_run:cmd("switch replica")
 
 test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
--- vinyl does not support hash index
+-- Vinyl does not support hash index
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 
 test_run:cmd("switch replica")
@@ -22,41 +22,39 @@ while box.space.test == nil do fiber.sleep(0.01) end
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 
--- insert values on the master while replica is stopped and can't fetch them
+-- Insert values on the master while replica is stopped and can't fetch them.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
 for i=1,100 do s:insert{i, 'this is test message12345'} end
 
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
-
 test_run:cmd("start server replica with args='0.01'")
 test_run:cmd("switch replica")
 
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica is
 -- catching up with the master (local delete, remote delete) case
 --
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
-d = box.space.test:delete{1}
-box.space.test:get(1) ~= nil
+box.space.test:replace{1}
 
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 
 test_run:cmd("switch default")
 test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
-d = c.space.test:delete{1}
-c.space.test:get(1) ~= nil
+d = c.space.test:replace{1}
+
+-- Resume replicaton
+errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
 
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
 
--- cleanup
+-- Cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
 test_run:cleanup_cluster()
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 83d0de293..ef6463d87 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -95,7 +95,7 @@ test_run:cmd("switch replica")
 fiber = require('fiber')
 ---
 ...
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 ---
 ...
 box.space.test:count()
@@ -119,9 +119,9 @@ wait_gc(1)
 ---
 - true
 ...
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
 ---
 - ok
 ...
@@ -153,13 +153,12 @@ box.snapshot()
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
 ---
-- true
 ...
--- Remove the timeout injection so that the replica catches
+-- Resume replicaton so that the replica catches
 -- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
 ---
 - ok
 ...
@@ -188,9 +187,8 @@ wait_gc(1)
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 0 do fiber.sleep(0.01) end
 ---
-- true
 ...
 --
 -- Check that the master doesn't delete xlog files sent to the
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index eed76850c..ec3bf6baa 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -52,7 +52,7 @@ test_run:cmd("start server replica")
 -- data from the master. Check it.
 test_run:cmd("switch replica")
 fiber = require('fiber')
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 box.space.test:count()
 test_run:cmd("switch default")
 
@@ -61,9 +61,9 @@ test_run:cmd("switch default")
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
 #fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
 
 -- Send more data to the replica.
 -- Need to do 2 snapshots here, otherwise the replica would
@@ -78,11 +78,11 @@ box.snapshot()
 -- xlogs needed by the replica.
 box.snapshot()
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
 
--- Remove the timeout injection so that the replica catches
+-- Resume replicaton so that the replica catches
 -- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
 
 -- Check that the replica received all data from the master.
 test_run:cmd("switch replica")
@@ -94,7 +94,7 @@ test_run:cmd("switch default")
 -- from the old checkpoint.
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+while #fio.glob('./master/*.xlog') ~= 0 do fiber.sleep(0.01) end
 --
 -- Check that the master doesn't delete xlog files sent to the
 -- replica until it receives a confirmation that the data has
-- 
2.18.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/4] test: increase timeout to check replica status
  2018-10-05  9:02       ` Sergei Voronezhskii
  2018-10-05  9:02         ` [PATCH 1/4] test: cleanup replication tests, parallel mode on Sergei Voronezhskii
  2018-10-05  9:02         ` [PATCH 2/4] test: errinj for pause relay_send Sergei Voronezhskii
@ 2018-10-05  9:02         ` Sergei Voronezhskii
  2018-10-08 19:07           ` Alexander Turenko
  2018-10-05  9:02         ` [PATCH 4/4] test: refactor some requirements to pass the runs Sergei Voronezhskii
  3 siblings, 1 reply; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-10-05  9:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

The replica status is checked 100 times, each check within
`replica_timeout`. Refactor the code to get properly upstream.
Then in loop with little sleep check upstreams status until
it is not in follow mode. If count of checks is more than 200
break the loop with error. The value 200 and little sleep 0.001
choosed suitably to `replica_timeout` and `replica_connect_timeout`.

Part of #2436, #3232
---
 test/replication/misc.result   | 22 ++++++++++++----------
 test/replication/misc.test.lua | 22 ++++++++++++----------
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/test/replication/misc.result b/test/replication/misc.result
index c0c7d482d..937ef1b24 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -113,7 +113,7 @@ test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 test_run:cmd("switch autobootstrap2")
@@ -123,7 +123,7 @@ test_run:cmd("switch autobootstrap2")
 test_run = require('test_run').new()
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 test_run:cmd("switch autobootstrap3")
@@ -136,7 +136,7 @@ test_run = require('test_run').new()
 fiber=require('fiber')
 ---
 ...
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 ---
 ...
 _ = box.schema.space.create('test_timeout'):create_index('pk')
@@ -147,15 +147,16 @@ test_run:cmd("setopt delimiter ';'")
 - true
 ...
 function test_timeout()
+    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     for i = 0, 99 do 
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
-        end
+        local n = 200
+        repeat
+            fiber.sleep(0.001)
+            n = n - 1
+            if n == 0 then return error(box.info.replication) end
+        until replicaA.status == 'follow' and replicaB.status == 'follow'
     end
     return true
 end ;
@@ -165,6 +166,7 @@ test_run:cmd("setopt delimiter ''");
 ---
 - true
 ...
+-- the replica status is checked 100 times, each check within replication_timeout
 test_timeout()
 ---
 - true
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 375c8b58a..cb658f6d0 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -43,30 +43,32 @@ test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 test_run:cmd("switch autobootstrap2")
 test_run = require('test_run').new()
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 test_run:cmd("switch autobootstrap3")
 test_run = require('test_run').new()
 fiber=require('fiber')
-box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
+box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
 _ = box.schema.space.create('test_timeout'):create_index('pk')
 test_run:cmd("setopt delimiter ';'")
 function test_timeout()
+    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
     for i = 0, 99 do 
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
-        end
+        local n = 200
+        repeat
+            fiber.sleep(0.001)
+            n = n - 1
+            if n == 0 then return error(box.info.replication) end
+        until replicaA.status == 'follow' and replicaB.status == 'follow'
     end
     return true
 end ;
 test_run:cmd("setopt delimiter ''");
+-- the replica status is checked 100 times, each check within replication_timeout
 test_timeout()
 
 -- gh-3247 - Sequence-generated value is not replicated in case
-- 
2.18.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/4] test: refactor some requirements to pass the runs
  2018-10-05  9:02       ` Sergei Voronezhskii
                           ` (2 preceding siblings ...)
  2018-10-05  9:02         ` [PATCH 3/4] test: increase timeout to check replica status Sergei Voronezhskii
@ 2018-10-05  9:02         ` Sergei Voronezhskii
  2018-10-08 19:08           ` Alexander Turenko
  3 siblings, 1 reply; 13+ messages in thread
From: Sergei Voronezhskii @ 2018-10-05  9:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

Part of #2436, #3232
---
 test/replication/catch.result        | 3 +++
 test/replication/catch.test.lua      | 2 +-
 test/replication/gc.result           | 6 +++---
 test/replication/gc.test.lua         | 2 +-
 test/replication/on_replace.result   | 3 +++
 test/replication/on_replace.test.lua | 1 +
 6 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/test/replication/catch.result b/test/replication/catch.result
index b4ddc5d51..6e160e076 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -1,6 +1,9 @@
 env = require('test_run')
 ---
 ...
+fiber = require('fiber')
+---
+...
 test_run = env.new()
 ---
 ...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 5223e3a24..22adb2868 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -1,8 +1,8 @@
 env = require('test_run')
+fiber = require('fiber')
 test_run = env.new()
 engine = test_run:get_cfg('engine')
 
-
 net_box = require('net.box')
 errinj = box.error.injection
 
diff --git a/test/replication/gc.result b/test/replication/gc.result
index ef6463d87..7725c36d6 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -1,6 +1,3 @@
-fio = require 'fio'
----
-...
 test_run = require('test_run').new()
 ---
 ...
@@ -13,6 +10,9 @@ replica_set = require('fast_replica')
 fiber = require('fiber')
 ---
 ...
+fio = require('fio')
+---
+...
 test_run:cleanup_cluster()
 ---
 ...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index ec3bf6baa..b99253ee5 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -1,8 +1,8 @@
-fio = require 'fio'
 test_run = require('test_run').new()
 engine = test_run:get_cfg('engine')
 replica_set = require('fast_replica')
 fiber = require('fiber')
+fio = require('fio')
 
 test_run:cleanup_cluster()
 
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 8fef8fb14..2e95b90ea 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -63,6 +63,9 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
+fiber = require('fiber')
+---
+...
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 ---
 ...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 23a3313b5..e34832103 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -26,6 +26,7 @@ session_type
 test_run:cmd("switch default")
 box.space.test:insert{2}
 test_run:cmd("switch replica")
+fiber = require('fiber')
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 --
 -- applier
-- 
2.18.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] test: cleanup replication tests, parallel mode on
  2018-10-05  9:02         ` [PATCH 1/4] test: cleanup replication tests, parallel mode on Sergei Voronezhskii
@ 2018-10-08 19:02           ` Alexander Turenko
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Turenko @ 2018-10-08 19:02 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

On Fri, Oct 05, 2018 at 12:02:12PM +0300, Sergei Voronezhskii wrote:
> - at the end of tests which create any replication config need to call
> `test_run:clenup_cluster()` which clears `box.space._cluster`
> - switch on `use_unix_sockets` because of 'Address already in use'
> problems
> - test `once` need to clean `once*` schemas
> 
> Part of #2436, #3232
> ---

> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> index 46726b7f4..375c8b58a 100644
> --- a/test/replication/misc.test.lua
> +++ b/test/replication/misc.test.lua
> @@ -32,6 +32,8 @@ test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
>  test_run:cmd('stop server test')
>  test_run:cmd('cleanup server test')
>  box.cfg{read_only = false}
> +test_run:cmd('delete server test')

delete server is not described in the commit message. BTW, should we
always do delete server after the last stop+cleanup? Or how the rule
should look?

> @@ -99,9 +102,9 @@ old_fno = lim.rlim_cur
>  lim.rlim_cur = 64
>  rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
>  
> -test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
> -test_run:cmd(string.format('start server sock'))
> -test_run:cmd('switch sock')
> +test_run:cmd('create server bork with rpl_master=default, script="replication/replica.lua"')
> +test_run:cmd('start server bork')
> +test_run:cmd('switch bork')

Why sock -> bork? Don't find anything with the same name within
replication suite. 'Sock' means socket, but what 'bork' means? I think
it is redundant diff.

> diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
> index 5a43275c2..24d1b27c4 100644
> --- a/test/replication/quorum.test.lua
> +++ b/test/replication/quorum.test.lua
> @@ -166,7 +166,8 @@ test_run:cmd('switch replica_quorum')
>  box.cfg{replication={INSTANCE_URI, nonexistent_uri(1)}}
>  box.info.id
>  test_run:cmd('switch default')
> -test_run:cmd('stop server replica_quorum')
> -test_run:cmd('cleanup server replica_quorum')
> -test_run:cmd('delete server replica_quorum')
> +test_run:cmd("stop server replica_quorum")
> +test_run:cmd("cleanup server replica_quorum")
> +test_run:cmd("delete server replica_quorum")

Redundant diff.

> +test_run:cleanup_cluster()
>  box.schema.user.revoke('guest', 'replication')

> diff --git a/test/replication/suite.ini b/test/replication/suite.ini
> index f4abc7af1..5cbc371c2 100644
> --- a/test/replication/suite.ini
> +++ b/test/replication/suite.ini
> @@ -6,5 +6,6 @@ disabled = consistent.test.lua
>  release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua
>  config = suite.cfg
>  lua_libs = lua/fast_replica.lua lua/rlimit.lua
> +use_unix_sockets = True
>  long_run = prune.test.lua
> -is_parallel = False
> +is_parallel = True

I think it should be enabled after the all fixes, in the end of the
patchset.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] test: errinj for pause relay_send
  2018-10-05  9:02         ` [PATCH 2/4] test: errinj for pause relay_send Sergei Voronezhskii
@ 2018-10-08 19:07           ` Alexander Turenko
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Turenko @ 2018-10-08 19:07 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: Vladimir Davydov, tarantool-patches

On Fri, Oct 05, 2018 at 12:02:13PM +0300, Sergei Voronezhskii wrote:
> Instead of using timeout we need just pause `relay_send`. Can't relay

Typo: relay on -> rely on.

> on timeout because of various system load in parallel mode. Add new
> errinj which checks boolean in loop and until it is not `True` do not
> pass the method `relay_send` to the next statement.
> 
> Also here we change `delete` to `replace`. And lookup the xlog files

Ok, we changed it. Why?

The cite from our developers guide [1]:

> Explain the problem that this commit is solving. Focus on why you
> are making this change as opposed to how (the code explains that).

[1]: https://www.tarantool.io/en/doc/1.9/dev_guide/developer_guidelines/#how-to-write-a-commit-message

> in loop with a little sleep, until the file count is not as expected.
> 
> Part of #2436, #3232
> ---
>  src/box/relay.cc                |  7 +++++-
>  src/errinj.h                    |  1 +
>  test/replication/catch.result   | 44 ++++++++++++++-------------------
>  test/replication/catch.test.lua | 36 +++++++++++++--------------
>  test/replication/gc.result      | 18 ++++++--------
>  test/replication/gc.test.lua    | 16 ++++++------
>  6 files changed, 58 insertions(+), 64 deletions(-)
> 
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index c90383d4a..8618fa81a 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -622,12 +622,17 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
>  static void
>  relay_send(struct relay *relay, struct xrow_header *packet)
>  {
> +    struct errinj *inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
> +    while (inj->bparam) {
> +        fiber_sleep(0.01);
> +        inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
> +    }

Code style: tab should be used instead of spaces. Please, consider [1].

[1]: https://www.tarantool.io/en/doc/1.9/dev_guide/c_style_guide/#chapter-1-indentation

>  	packet->sync = relay->sync;
>  	relay->last_row_tm = ev_monotonic_now(loop());
>  	coio_write_xrow(&relay->io, packet);
>  	fiber_gc();
>  
> -	struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
> +	inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
>  	if (inj != NULL && inj->dparam > 0)
>  		fiber_sleep(inj->dparam);
>  }

> diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
> index 217328772..5223e3a24 100644
> --- a/test/replication/catch.test.lua
> +++ b/test/replication/catch.test.lua
> @@ -13,7 +13,7 @@ test_run:cmd("switch replica")
>  
>  test_run:cmd("switch default")
>  s = box.schema.space.create('test', {engine = engine});
> --- vinyl does not support hash index
> +-- Vinyl does not support hash index

Are intend to fix the test according to our code style? If so, then
period at the end is needed. I commented such things below with 'Nit'
(nitpicking) mark.

>  index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
>  
>  test_run:cmd("switch replica")
> @@ -22,41 +22,39 @@ while box.space.test == nil do fiber.sleep(0.01) end
>  test_run:cmd("switch default")
>  test_run:cmd("stop server replica")
>  
> --- insert values on the master while replica is stopped and can't fetch them
> +-- Insert values on the master while replica is stopped and can't fetch them.

Nit: Comments should be 66 chars long at max.

> +errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
>  for i=1,100 do s:insert{i, 'this is test message12345'} end
>  

Nit: for i=1,100 -> for i = 1, 100

> --- sleep after every tuple
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
> -
>  test_run:cmd("start server replica with args='0.01'")
>  test_run:cmd("switch replica")
>  
>  -- Check that replica doesn't enter read-write mode before
> --- catching up with the master: to check that we inject sleep into
> --- the master relay_send function and attempt a data modifying
> --- statement in replica while it's still fetching data from the
> --- master.
> --- In the next two cases we try to delete a tuple while replica is
> +-- catching up with the master: to check that we stop sending
> +-- rows on the master in relay_send function and attempt a data
> +-- modifying statement in replica while it's still fetching data
> +-- from the master.
> +--
> +-- In the next two cases we try to replace a tuple while replica is

Nit: 67 chars.

>  -- catching up with the master (local delete, remote delete) case
>  --
> --- #1: delete tuple on replica
> +-- Case #1: replace tuple on replica locally.
>  --
>  box.space.test ~= nil
> -d = box.space.test:delete{1}
> -box.space.test:get(1) ~= nil
> +box.space.test:replace{1}
>  
> --- case #2: delete tuple by net.box
> +-- Case #2: replace tuple on replica by net.box.
>  
>  test_run:cmd("switch default")
>  test_run:cmd("set variable r_uri to 'replica.listen'")
>  c = net_box.connect(r_uri)
> -d = c.space.test:delete{1}
> -c.space.test:get(1) ~= nil
> +d = c.space.test:replace{1}
> +
> +-- Resume replicaton
> +errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
>  

Nit: no period at the end.

> --- check sync
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
>  

Nit: two empty lines instead of one.

> --- cleanup
> +-- Cleanup

Nit: no period at the end.

>  test_run:cmd("stop server replica")
>  test_run:cmd("cleanup server replica")

Are not we should delete the server replica here?

>  test_run:cleanup_cluster()

Nit: extra empty line at the end.

> diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
> index eed76850c..ec3bf6baa 100644
> --- a/test/replication/gc.test.lua
> +++ b/test/replication/gc.test.lua

We still use ERRINJ_RELAY_TIMEOUT in lines 30-40. I think this should be
changed to using delay. Please, ask me or Vova for details about the
test case.

The commit bae6f037c0df9bcde56611d411bf600341e008b3 was pushed to 1.10
since your commit. You need to rebase your patchset on top of fresh
1.10.

> @@ -78,11 +78,11 @@ box.snapshot()
>  -- xlogs needed by the replica.
>  box.snapshot()
>  #box.info.gc().checkpoints == 1 or box.info.gc()
> -#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
> +while #fio.glob('./master/*.xlog') ~= 2 do fiber.sleep(0.01) end
>  

This change was performed for some file checks, but didn't for some
other. I think some consistency should be here.

Also I propose to wrap it into a function like wait_gc (say, wait_xlog)
to reuse the pattern and so make the test more readable.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] test: increase timeout to check replica status
  2018-10-05  9:02         ` [PATCH 3/4] test: increase timeout to check replica status Sergei Voronezhskii
@ 2018-10-08 19:07           ` Alexander Turenko
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Turenko @ 2018-10-08 19:07 UTC (permalink / raw)
  To: Sergei Voronezhskii
  Cc: tarantool-patches, Vladimir Davydov, Georgy Kirichenko

On Fri, Oct 05, 2018 at 12:02:14PM +0300, Sergei Voronezhskii wrote:
> The replica status is checked 100 times, each check within
> `replica_timeout`. Refactor the code to get properly upstream.
> Then in loop with little sleep check upstreams status until
> it is not in follow mode. If count of checks is more than 200
> break the loop with error. The value 200 and little sleep 0.001
> choosed suitably to `replica_timeout` and `replica_connect_timeout`.

replica_timeout -> replication_timeout
replica_connect_timeout -> replication_connect_timeout

> 
> Part of #2436, #3232
> ---
>  test/replication/misc.result   | 22 ++++++++++++----------
>  test/replication/misc.test.lua | 22 ++++++++++++----------
>  2 files changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> index 375c8b58a..cb658f6d0 100644
> --- a/test/replication/misc.test.lua
> +++ b/test/replication/misc.test.lua
> @@ -43,30 +43,32 @@ test_run:create_cluster(SERVERS, "replication", {args="0.1"})
>  test_run:wait_fullmesh(SERVERS)
>  test_run:cmd("switch autobootstrap1")
>  test_run = require('test_run').new()
> -box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
> +box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
>  test_run:cmd("switch autobootstrap2")
>  test_run = require('test_run').new()
> -box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
> +box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
>  test_run:cmd("switch autobootstrap3")
>  test_run = require('test_run').new()
>  fiber=require('fiber')
> -box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
> +box.cfg{replication_timeout = 0.2, replication_connect_timeout=0.2}
>  _ = box.schema.space.create('test_timeout'):create_index('pk')
>  test_run:cmd("setopt delimiter ';'")
>  function test_timeout()
> +    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
> +    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream

Are the 'box' code guarantees that box.info.replication[N].upstream will
update the same table? I don't think so. It is better to get these
values inside the loop.

Nit: too long lines.

>      for i = 0, 99 do 
>          box.space.test_timeout:replace({1})
> -        fiber.sleep(0.005)
> -        local rinfo = box.info.replication
> -        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
> -           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
> -           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
> -            return error('Replication broken')
> -        end
> +        local n = 200
> +        repeat
> +            fiber.sleep(0.001)
> +            n = n - 1
> +            if n == 0 then return error(box.info.replication) end
> +        until replicaA.status == 'follow' and replicaB.status == 'follow'
>      end
>      return true
>  end ;
>  test_run:cmd("setopt delimiter ''");
> +-- the replica status is checked 100 times, each check within replication_timeout

Don't get the comment 'each check within replication_timeout', what does
it mean?

Anyway, I think this just broke the test case. It did check that
replicas does not leave from the 'follow' state, now it checks nothing.

I still push to the approach where QA team working closely with
developers to understand cases or at least leave issues for developers
to fix its test cases. I strongly against random timeout tweaks to get
a test 'working'.

Please, elaborate the test case with Georgy (the case was introduced in
195d4462).

>  test_timeout()
>  
>  -- gh-3247 - Sequence-generated value is not replicated in case
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] test: refactor some requirements to pass the runs
  2018-10-05  9:02         ` [PATCH 4/4] test: refactor some requirements to pass the runs Sergei Voronezhskii
@ 2018-10-08 19:08           ` Alexander Turenko
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Turenko @ 2018-10-08 19:08 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

> test: refactor some requirements to pass the runs

requirements -> requires

'I did some stuff and things becomes cool' -- this is how I read the
commit message. What is the problem? Where it was introduced?..

WBR, Alexander Turenko.

On Fri, Oct 05, 2018 at 12:02:15PM +0300, Sergei Voronezhskii wrote:
> Part of #2436, #3232
> ---
>  test/replication/catch.result        | 3 +++
>  test/replication/catch.test.lua      | 2 +-
>  test/replication/gc.result           | 6 +++---
>  test/replication/gc.test.lua         | 2 +-
>  test/replication/on_replace.result   | 3 +++
>  test/replication/on_replace.test.lua | 1 +
>  6 files changed, 12 insertions(+), 5 deletions(-)
> 

> diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
> index 5223e3a24..22adb2868 100644
> --- a/test/replication/catch.test.lua
> +++ b/test/replication/catch.test.lua
> @@ -1,8 +1,8 @@
>  env = require('test_run')
> +fiber = require('fiber')
>  test_run = env.new()
>  engine = test_run:get_cfg('engine')
>  
> -
>  net_box = require('net.box')
>  errinj = box.error.injection
>  

'fiber' module is used only on replica and is required right before use.
Don't get what the fix actually fixes.

> diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
> index ec3bf6baa..b99253ee5 100644
> --- a/test/replication/gc.test.lua
> +++ b/test/replication/gc.test.lua
> @@ -1,8 +1,8 @@
> -fio = require 'fio'
>  test_run = require('test_run').new()
>  engine = test_run:get_cfg('engine')
>  replica_set = require('fast_replica')
>  fiber = require('fiber')
> +fio = require('fio')
>  
>  test_run:cleanup_cluster()
>  

Are this really has any effect?

> diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
> index 23a3313b5..e34832103 100644
> --- a/test/replication/on_replace.test.lua
> +++ b/test/replication/on_replace.test.lua
> @@ -26,6 +26,7 @@ session_type
>  test_run:cmd("switch default")
>  box.space.test:insert{2}
>  test_run:cmd("switch replica")
> +fiber = require('fiber')
>  while box.space.test:count() < 2 do fiber.sleep(0.01) end
>  --
>  -- applier
> -- 
> 2.18.0
> 

This one looks okay, but I want to see some kind of description of the
problem cause.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-08 19:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-27 15:38 [PATCH] test: enable parallel mode for replication tests Sergei Voronezhskii
2018-10-01  1:36 ` Alexander Turenko
2018-10-01 10:41   ` [tarantool-patches] " Alexander Turenko
2018-10-03 14:50     ` Sergei Voronezhskii
2018-10-05  9:02       ` Sergei Voronezhskii
2018-10-05  9:02         ` [PATCH 1/4] test: cleanup replication tests, parallel mode on Sergei Voronezhskii
2018-10-08 19:02           ` Alexander Turenko
2018-10-05  9:02         ` [PATCH 2/4] test: errinj for pause relay_send Sergei Voronezhskii
2018-10-08 19:07           ` Alexander Turenko
2018-10-05  9:02         ` [PATCH 3/4] test: increase timeout to check replica status Sergei Voronezhskii
2018-10-08 19:07           ` Alexander Turenko
2018-10-05  9:02         ` [PATCH 4/4] test: refactor some requirements to pass the runs Sergei Voronezhskii
2018-10-08 19:08           ` Alexander Turenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox