* [PATCH v3 1/5] test: cleanup replication tests
2018-12-06 13:38 ` [PATCH v3 0/5] enable parallel mode for replication tests Sergei Voronezhskii
@ 2018-12-06 13:38 ` Sergei Voronezhskii
2018-12-06 13:38 ` [PATCH v3 2/5] test: errinj for pause relay_send Sergei Voronezhskii
` (4 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Sergei Voronezhskii @ 2018-12-06 13:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Vladimir Davydov
- at the end of tests which create any replication config need to call:
* `test_run:cmd('delete server ...')` removes server object
from `TestState.servers` list, this behaviour was taken
from `test_run:drop_cluster()` function
* `test_run:clenup_cluster()` which clears `box.space._cluster`
- switch on `use_unix_sockets` because of 'Address already in use'
problems
- test `once` need to clean `once*` schemas
Part of #2436, #3232
---
test/replication/before_replace.result | 6 +++
test/replication/before_replace.test.lua | 2 +
test/replication/catch.result | 7 ++++
test/replication/catch.test.lua | 3 +-
test/replication/gc.result | 4 ++
test/replication/gc.test.lua | 1 +
test/replication/hot_standby.result | 11 +++++
test/replication/hot_standby.test.lua | 3 ++
test/replication/local_spaces.result | 7 ++++
test/replication/local_spaces.test.lua | 2 +
test/replication/misc.result | 46 ++++++++++++++++++---
test/replication/misc.test.lua | 21 +++++++---
test/replication/on_replace.result | 10 +++++
test/replication/on_replace.test.lua | 3 ++
test/replication/once.result | 12 ++++++
test/replication/once.test.lua | 3 ++
test/replication/quorum.result | 3 ++
test/replication/quorum.test.lua | 1 +
test/replication/replica_rejoin.result | 7 ++++
test/replication/replica_rejoin.test.lua | 2 +
test/replication/skip_conflict_row.result | 7 ++++
test/replication/skip_conflict_row.test.lua | 2 +
test/replication/status.result | 7 ++++
test/replication/status.test.lua | 2 +
test/replication/suite.ini | 1 +
test/replication/sync.result | 7 ++++
test/replication/sync.test.lua | 2 +
test/replication/wal_off.result | 7 ++++
test/replication/wal_off.test.lua | 2 +
29 files changed, 178 insertions(+), 13 deletions(-)
diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index abfbc4bb3..ced40547e 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -226,6 +226,9 @@ test_run:cmd("switch default")
test_run:drop_cluster(SERVERS)
---
...
+test_run:cleanup_cluster()
+---
+...
--
-- gh-3722: Check that when space:before_replace trigger modifies
-- the result of a replicated operation, it writes it to the WAL
@@ -300,6 +303,9 @@ test_run:cmd("delete server replica")
---
- true
...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.revoke('guest', 'replication')
---
...
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index fc444cd35..bcc6dc00d 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -80,6 +80,7 @@ box.space.test:select()
-- Cleanup.
test_run:cmd("switch default")
test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
--
-- gh-3722: Check that when space:before_replace trigger modifies
@@ -117,6 +118,7 @@ test_run:cmd("switch default")
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
box.space.test:drop()
diff --git a/test/replication/catch.result b/test/replication/catch.result
index aebba819f..663bdc758 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -130,6 +130,13 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.space.test:drop()
---
...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 8cc3242f7..6773675d0 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -59,6 +59,7 @@ errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
-- cleanup
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.space.test:drop()
box.schema.user.revoke('guest', 'replication')
-
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 6b85513f3..c73544d95 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -388,6 +388,10 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
_ = s:auto_increment{}
---
...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index c1de68c6c..1e4e02df9 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -179,6 +179,7 @@ box.cfg{replication = replica_port}
-- Stop the replica and write a few WALs.
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
_ = s:auto_increment{}
box.snapshot()
_ = s:auto_increment{}
diff --git a/test/replication/hot_standby.result b/test/replication/hot_standby.result
index e72254f91..b140887df 100644
--- a/test/replication/hot_standby.result
+++ b/test/replication/hot_standby.result
@@ -318,6 +318,10 @@ test_run:cmd("cleanup server hot_standby")
---
- true
...
+test_run:cmd("delete server hot_standby")
+---
+- true
+...
test_run:cmd("deploy server default")
---
- true
@@ -338,3 +342,10 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/hot_standby.test.lua b/test/replication/hot_standby.test.lua
index 1842ff39a..f43982f15 100644
--- a/test/replication/hot_standby.test.lua
+++ b/test/replication/hot_standby.test.lua
@@ -121,8 +121,11 @@ _select(11, 20)
test_run:cmd("stop server hot_standby")
test_run:cmd("cleanup server hot_standby")
+test_run:cmd("delete server hot_standby")
test_run:cmd("deploy server default")
test_run:cmd("start server default")
test_run:cmd("switch default")
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/local_spaces.result b/test/replication/local_spaces.result
index 151735530..ed1b76da8 100644
--- a/test/replication/local_spaces.result
+++ b/test/replication/local_spaces.result
@@ -216,6 +216,13 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.revoke('guest', 'replication')
---
...
diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
index 06e2b0bd2..bb7294538 100644
--- a/test/replication/local_spaces.test.lua
+++ b/test/replication/local_spaces.test.lua
@@ -76,6 +76,8 @@ box.space.test3:select()
test_run:cmd("switch default")
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
s1:select()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index bebe18db9..32676c4ea 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -88,6 +88,13 @@ test_run:cmd('cleanup server test')
box.cfg{read_only = false}
---
...
+test_run:cmd('delete server test')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
-- gh-3160 - Send heartbeats if there are changes from a remote master only
SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
---
@@ -229,6 +236,9 @@ test_run:cmd("switch default")
test_run:drop_cluster(SERVERS)
---
...
+test_run:cleanup_cluster()
+---
+...
-- gh-3642 - Check that socket file descriptor doesn't leak
-- when a replica is disconnected.
rlimit = require('rlimit')
@@ -253,7 +263,7 @@ test_run:cmd('create server sock with rpl_master=default, script="replication/re
---
- true
...
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server sock')
---
- true
...
@@ -299,14 +309,21 @@ lim.rlim_cur = old_fno
rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
---
...
-test_run:cmd('stop server sock')
+test_run:cmd("stop server sock")
---
- true
...
-test_run:cmd('cleanup server sock')
+test_run:cmd("cleanup server sock")
---
- true
...
+test_run:cmd("delete server sock")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.revoke('guest', 'replication')
---
...
@@ -342,6 +359,17 @@ test_run:cmd('cleanup server er_load2')
---
- true
...
+test_run:cmd('delete server er_load1')
+---
+- true
+...
+test_run:cmd('delete server er_load2')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
--
-- Test case for gh-3637. Before the fix replica would exit with
-- an error. Now check that we don't hang and successfully connect.
@@ -349,9 +377,6 @@ test_run:cmd('cleanup server er_load2')
fiber = require('fiber')
---
...
-test_run:cleanup_cluster()
----
-...
test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
---
- true
@@ -391,6 +416,9 @@ test_run:cmd("delete server replica_auth")
---
- true
...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.drop('cluster')
---
...
@@ -464,6 +492,9 @@ test_run:cmd("delete server replica")
---
- true
...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.revoke('guest', 'replication')
---
...
@@ -561,3 +592,6 @@ test_run:cmd("delete server replica")
---
- true
...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index eab36dc52..3bf1fc5a1 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -32,6 +32,8 @@ test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
test_run:cmd('stop server test')
test_run:cmd('cleanup server test')
box.cfg{read_only = false}
+test_run:cmd('delete server test')
+test_run:cleanup_cluster()
-- gh-3160 - Send heartbeats if there are changes from a remote master only
SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
@@ -89,6 +91,7 @@ box.space.space1:drop()
test_run:cmd("switch default")
test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
-- gh-3642 - Check that socket file descriptor doesn't leak
-- when a replica is disconnected.
@@ -100,7 +103,7 @@ lim.rlim_cur = 64
rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server sock')
test_run:cmd('switch sock')
test_run = require('test_run').new()
fiber = require('fiber')
@@ -122,8 +125,10 @@ test_run:cmd('switch default')
lim.rlim_cur = old_fno
rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
-test_run:cmd('stop server sock')
-test_run:cmd('cleanup server sock')
+test_run:cmd("stop server sock")
+test_run:cmd("cleanup server sock")
+test_run:cmd("delete server sock")
+test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
@@ -138,15 +143,15 @@ test_run:cmd('stop server er_load1')
-- er_load2 exits automatically.
test_run:cmd('cleanup server er_load1')
test_run:cmd('cleanup server er_load2')
+test_run:cmd('delete server er_load1')
+test_run:cmd('delete server er_load2')
+test_run:cleanup_cluster()
--
-- Test case for gh-3637. Before the fix replica would exit with
-- an error. Now check that we don't hang and successfully connect.
--
fiber = require('fiber')
-
-test_run:cleanup_cluster()
-
test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
-- Wait a bit to make sure replica waits till user is created.
@@ -161,6 +166,8 @@ _ = test_run:wait_vclock('replica_auth', vclock)
test_run:cmd("stop server replica_auth")
test_run:cmd("cleanup server replica_auth")
test_run:cmd("delete server replica_auth")
+test_run:cleanup_cluster()
+
box.schema.user.drop('cluster')
--
@@ -190,6 +197,7 @@ while test_run:grep_log('replica', 'duplicate connection') == nil do fiber.sleep
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
--
@@ -227,3 +235,4 @@ test_run:cmd("switch default")
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 4ffa3b25a..8fef8fb14 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -88,6 +88,13 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.space.test:drop()
---
...
@@ -177,3 +184,6 @@ _ = test_run:cmd('switch default')
test_run:drop_cluster(SERVERS)
---
...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 371b71cbd..23a3313b5 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -37,6 +37,8 @@ test_run:cmd("switch default")
--
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.space.test:drop()
box.schema.user.revoke('guest', 'replication')
@@ -73,3 +75,4 @@ box.space.s2:select()
_ = test_run:cmd('switch default')
test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/once.result b/test/replication/once.result
index 99ac05b72..fd787915e 100644
--- a/test/replication/once.result
+++ b/test/replication/once.result
@@ -85,3 +85,15 @@ once -- 1
box.cfg{read_only = false}
---
...
+box.space._schema:delete{"oncero"}
+---
+- ['oncero']
+...
+box.space._schema:delete{"oncekey"}
+---
+- ['oncekey']
+...
+box.space._schema:delete{"oncetest"}
+---
+- ['oncetest']
+...
diff --git a/test/replication/once.test.lua b/test/replication/once.test.lua
index 264c63670..813fbfdab 100644
--- a/test/replication/once.test.lua
+++ b/test/replication/once.test.lua
@@ -28,3 +28,6 @@ box.cfg{read_only = true}
box.once("ro", f, 1) -- ok, already done
once -- 1
box.cfg{read_only = false}
+box.space._schema:delete{"oncero"}
+box.space._schema:delete{"oncekey"}
+box.space._schema:delete{"oncetest"}
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 265b099b7..ff5fa0150 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -447,6 +447,9 @@ test_run:cmd('delete server replica_quorum')
---
- true
...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.revoke('guest', 'replication')
---
...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 5a43275c2..98febb367 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -169,4 +169,5 @@ test_run:cmd('switch default')
test_run:cmd('stop server replica_quorum')
test_run:cmd('cleanup server replica_quorum')
test_run:cmd('delete server replica_quorum')
+test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index df1057d10..87d626e20 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -379,6 +379,13 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.space.test:drop()
---
...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 40094a1af..9bf43eff8 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -138,5 +138,7 @@ test_run:cmd("switch default")
box.cfg{replication = ''}
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.space.test:drop()
box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
index 29963f56a..6ca13b472 100644
--- a/test/replication/skip_conflict_row.result
+++ b/test/replication/skip_conflict_row.result
@@ -91,6 +91,13 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.space.test:drop()
---
...
diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
index 5f7d6ead3..4406ced95 100644
--- a/test/replication/skip_conflict_row.test.lua
+++ b/test/replication/skip_conflict_row.test.lua
@@ -31,5 +31,7 @@ box.info.status
-- cleanup
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.space.test:drop()
box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/status.result b/test/replication/status.result
index 8394b98c1..9e69f2478 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -391,3 +391,10 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 8bb25e0c6..cfdf6acdb 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -142,3 +142,5 @@ test_run:cmd('switch default')
box.schema.user.revoke('guest', 'replication')
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 569c90480..260309573 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -6,5 +6,6 @@ disabled = consistent.test.lua
release_disabled = catch.test.lua errinj.test.lua gc.test.lua gc_no_space.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua
config = suite.cfg
lua_libs = lua/fast_replica.lua lua/rlimit.lua
+use_unix_sockets = True
long_run = prune.test.lua
is_parallel = False
diff --git a/test/replication/sync.result b/test/replication/sync.result
index dc3a6f69b..b34501dae 100644
--- a/test/replication/sync.result
+++ b/test/replication/sync.result
@@ -352,6 +352,13 @@ test_run:cmd("cleanup server replica")
---
- true
...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.space.test:drop()
---
...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
index bc7147355..cae97a26f 100644
--- a/test/replication/sync.test.lua
+++ b/test/replication/sync.test.lua
@@ -171,6 +171,8 @@ box.info.ro -- false
test_run:cmd("switch default")
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
box.space.test:drop()
box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/wal_off.result b/test/replication/wal_off.result
index e3b5709e9..e0ae84bd7 100644
--- a/test/replication/wal_off.result
+++ b/test/replication/wal_off.result
@@ -107,6 +107,13 @@ test_run:cmd("cleanup server wal_off")
---
- true
...
+test_run:cmd("delete server wal_off")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
box.schema.user.revoke('guest', 'replication')
---
...
diff --git a/test/replication/wal_off.test.lua b/test/replication/wal_off.test.lua
index 81fcf0b33..110f2f1f7 100644
--- a/test/replication/wal_off.test.lua
+++ b/test/replication/wal_off.test.lua
@@ -37,5 +37,7 @@ box.cfg { replication = "" }
test_run:cmd("stop server wal_off")
test_run:cmd("cleanup server wal_off")
+test_run:cmd("delete server wal_off")
+test_run:cleanup_cluster()
box.schema.user.revoke('guest', 'replication')
--
2.18.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 2/5] test: errinj for pause relay_send
2018-12-06 13:38 ` [PATCH v3 0/5] enable parallel mode for replication tests Sergei Voronezhskii
2018-12-06 13:38 ` [PATCH v3 1/5] test: cleanup " Sergei Voronezhskii
@ 2018-12-06 13:38 ` Sergei Voronezhskii
2018-12-06 15:44 ` Vladimir Davydov
2018-12-06 13:38 ` [PATCH v3 3/5] test: put require in proper places Sergei Voronezhskii
` (3 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Sergei Voronezhskii @ 2018-12-06 13:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Vladimir Davydov
Instead of using timeout we need just pause `relay_send`. Can't rely
on timeout because of various system load in parallel mode. Add new
errinj which checks boolean in loop and until it is not `True` do not
pass the method `relay_send` to the next statement.
To check the read-only mode, need to make a modification of tuple. It
is enough to call `replace` method. Instead of `delete` and then
useless verification that we have not delete tuple by using `get`
method.
And lookup the xlog files in loop with a little sleep, until the file
count is not as expected.
Update box/errinj.result because new errinj was added.
Part of #2436, #3232
---
src/box/relay.cc | 7 ++-
src/errinj.h | 1 +
test/box/errinj.result | 34 ++++++------
test/replication/catch.result | 48 ++++++++---------
test/replication/catch.test.lua | 41 +++++++--------
test/replication/gc.result | 92 +++++++++++++++++++++------------
test/replication/gc.test.lua | 83 +++++++++++++++++++----------
7 files changed, 181 insertions(+), 125 deletions(-)
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 0034f99a0..17daf76bf 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -635,12 +635,17 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
static void
relay_send(struct relay *relay, struct xrow_header *packet)
{
+ struct errinj *inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
+ while (inj != NULL && inj->bparam) {
+ fiber_sleep(0.01);
+ inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
+ }
packet->sync = relay->sync;
relay->last_row_tm = ev_monotonic_now(loop());
coio_write_xrow(&relay->io, packet);
fiber_gc();
- struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
+ inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
if (inj != NULL && inj->dparam > 0)
fiber_sleep(inj->dparam);
}
diff --git a/src/errinj.h b/src/errinj.h
index aed570e79..39de63d19 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -95,6 +95,7 @@ struct errinj {
_(ERRINJ_VY_GC, ERRINJ_BOOL, {.bparam = false}) \
_(ERRINJ_VY_LOG_FLUSH, ERRINJ_BOOL, {.bparam = false}) \
_(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL, {.bparam = false}) \
+ _(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL, {.bparam = false}) \
_(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
_(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
_(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 825bb3696..12303670e 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -30,7 +30,7 @@ errinj.info()
state: false
ERRINJ_WAL_DELAY:
state: false
- ERRINJ_XLOG_READ:
+ ERRINJ_VY_INDEX_DUMP:
state: -1
ERRINJ_WAL_WRITE_EOF:
state: false
@@ -46,6 +46,8 @@ errinj.info()
state: false
ERRINJ_WAL_FALLOCATE:
state: 0
+ ERRINJ_SNAP_COMMIT_DELAY:
+ state: false
ERRINJ_TUPLE_ALLOC:
state: false
ERRINJ_VY_RUN_WRITE_DELAY:
@@ -54,25 +56,25 @@ errinj.info()
state: false
ERRINJ_RELAY_REPORT_INTERVAL:
state: 0
- ERRINJ_HTTP_RESPONSE_ADD_WAIT:
- state: false
+ ERRINJ_WAL_BREAK_LSN:
+ state: -1
ERRINJ_VY_READ_PAGE_TIMEOUT:
state: 0
ERRINJ_XLOG_META:
state: false
- ERRINJ_WAL_BREAK_LSN:
- state: -1
ERRINJ_RELAY_BREAK_LSN:
state: -1
- ERRINJ_WAL_WRITE_DISK:
- state: false
ERRINJ_VY_INDEX_FILE_RENAME:
state: false
+ ERRINJ_WAL_WRITE_DISK:
+ state: false
ERRINJ_VY_RUN_FILE_RENAME:
state: false
+ ERRINJ_VY_LOG_FILE_RENAME:
+ state: false
ERRINJ_VY_RUN_WRITE:
state: false
- ERRINJ_VY_LOG_FILE_RENAME:
+ ERRINJ_HTTP_RESPONSE_ADD_WAIT:
state: false
ERRINJ_VY_LOG_FLUSH_DELAY:
state: false
@@ -86,18 +88,18 @@ errinj.info()
state: false
ERRINJ_WAL_ROTATE:
state: false
- ERRINJ_SNAP_COMMIT_DELAY:
- state: false
ERRINJ_LOG_ROTATE:
state: false
+ ERRINJ_VY_POINT_ITER_WAIT:
+ state: false
ERRINJ_RELAY_EXIT_DELAY:
state: 0
ERRINJ_IPROTO_TX_DELAY:
state: false
- ERRINJ_VY_POINT_ITER_WAIT:
- state: false
ERRINJ_BUILD_INDEX:
state: -1
+ ERRINJ_XLOG_READ:
+ state: -1
ERRINJ_XLOG_GARBAGE:
state: false
ERRINJ_TUPLE_FIELD:
@@ -106,14 +108,14 @@ errinj.info()
state: false
ERRINJ_TESTING:
state: false
- ERRINJ_RELAY_TIMEOUT:
- state: 0
+ ERRINJ_RELAY_SEND_DELAY:
+ state: false
ERRINJ_VY_SQUASH_TIMEOUT:
state: 0
ERRINJ_VY_LOG_FLUSH:
state: false
- ERRINJ_VY_INDEX_DUMP:
- state: -1
+ ERRINJ_RELAY_TIMEOUT:
+ state: 0
...
errinj.set("some-injection", true)
---
diff --git a/test/replication/catch.result b/test/replication/catch.result
index 663bdc758..e1b2995ec 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -35,7 +35,7 @@ test_run:cmd("switch default")
s = box.schema.space.create('test', {engine = engine});
---
...
--- vinyl does not support hash index
+-- Vinyl does not support hash index.
index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
---
...
@@ -57,14 +57,14 @@ test_run:cmd("stop server replica")
---
- true
...
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
+-- Insert values on the master while replica is stopped and can't
+-- fetch them.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
---
+- ok
...
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+for i = 1, 100 do s:insert{i, 'this is test message12345'} end
---
-- ok
...
test_run:cmd("start server replica with args='0.01'")
---
@@ -75,28 +75,26 @@ test_run:cmd("switch replica")
- true
...
-- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
--- catching up with the master (local delete, remote delete) case
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica
+-- is catching up with the master (local replace, remote replace)
+-- case.
--
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
--
box.space.test ~= nil
---
- true
...
-d = box.space.test:delete{1}
+box.space.test:replace{1}
---
- error: Can't modify data because this instance is in read-only mode.
...
-box.space.test:get(1) ~= nil
----
-- true
-...
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
test_run:cmd("switch default")
---
- true
@@ -108,20 +106,16 @@ test_run:cmd("set variable r_uri to 'replica.listen'")
c = net_box.connect(r_uri)
---
...
-d = c.space.test:delete{1}
+d = c.space.test:replace{1}
---
- error: Can't modify data because this instance is in read-only mode.
...
-c.space.test:get(1) ~= nil
----
-- true
-...
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+-- Resume replication.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
---
- ok
...
--- cleanup
+-- Cleanup.
test_run:cmd("stop server replica")
---
- true
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 6773675d0..d5de88642 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -13,7 +13,7 @@ test_run:cmd("switch replica")
test_run:cmd("switch default")
s = box.schema.space.create('test', {engine = engine});
--- vinyl does not support hash index
+-- Vinyl does not support hash index.
index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
test_run:cmd("switch replica")
@@ -22,41 +22,40 @@ while box.space.test == nil do fiber.sleep(0.01) end
test_run:cmd("switch default")
test_run:cmd("stop server replica")
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
-
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+-- Insert values on the master while replica is stopped and can't
+-- fetch them.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
+for i = 1, 100 do s:insert{i, 'this is test message12345'} end
test_run:cmd("start server replica with args='0.01'")
test_run:cmd("switch replica")
-- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
--- catching up with the master (local delete, remote delete) case
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica
+-- is catching up with the master (local replace, remote replace)
+-- case.
--
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
--
box.space.test ~= nil
-d = box.space.test:delete{1}
-box.space.test:get(1) ~= nil
+box.space.test:replace{1}
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
test_run:cmd("switch default")
test_run:cmd("set variable r_uri to 'replica.listen'")
c = net_box.connect(r_uri)
-d = c.space.test:delete{1}
-c.space.test:get(1) ~= nil
+d = c.space.test:replace{1}
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+-- Resume replication.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
--- cleanup
+-- Cleanup.
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
test_run:cmd("delete server replica")
diff --git a/test/replication/gc.result b/test/replication/gc.result
index c73544d95..273b77efc 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -27,6 +27,38 @@ default_checkpoint_count = box.cfg.checkpoint_count
box.cfg{checkpoint_count = 1}
---
...
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+function wait_gc(n)
+ return test_run:wait_cond(function()
+ return #box.info.gc().checkpoints == n
+ end, 10)
+end
+
+function value_in(val, arr)
+ for _, elem in ipairs(arr) do
+ if val == elem then
+ return true
+ end
+ end
+ return false
+end
+
+function wait_xlog(n, timeout)
+ timeout = timeout or 1.0
+ if type(n) ~= 'table' then
+ n = {n}
+ end
+ return test_run:wait_cond(function()
+ return value_in(#fio.glob('./master/*.xlog'), n)
+ end, timeout)
+end
+
+test_run:cmd("setopt delimiter ''") ;
+---
+...
-- Grant permissions needed for replication.
box.schema.user.grant('guest', 'replication')
---
@@ -63,14 +95,13 @@ for i = 1, 100 do s:auto_increment{} end
...
-- Make sure replica join will take long enough for us to
-- invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
---
- ok
...
-- While the replica is receiving the initial data set,
-- make a snapshot and invoke garbage collection, then
--- remove the timeout injection so that we don't have to
--- wait too long for the replica to start.
+-- remove delay to allow replica to start.
test_run:cmd("setopt delimiter ';'")
---
- true
@@ -78,7 +109,7 @@ test_run:cmd("setopt delimiter ';'")
fiber.create(function()
fiber.sleep(0.1)
box.snapshot()
- box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+ box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
end)
test_run:cmd("setopt delimiter ''");
---
@@ -110,21 +141,17 @@ test_run:cmd("switch default")
...
-- Check that garbage collection removed the snapshot once
-- the replica released the corresponding checkpoint.
-test_run:wait_cond(function() return #box.info.gc().checkpoints == 1 end, 10)
+wait_gc(1) or box.info.gc()
---
- true
...
-#box.info.gc().checkpoints == 1 or box.info.gc()
+wait_xlog(1) or fio.listdir('./master')
---
- true
...
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
----
-- true
-...
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
---
- ok
...
@@ -152,17 +179,17 @@ box.snapshot()
---
- ok
...
-#box.info.gc().checkpoints == 1 or box.info.gc()
+wait_gc(1) or box.info.gc()
---
- true
...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_xlog(2) or fio.listdir('./master')
---
- true
...
--- Remove the timeout injection so that the replica catches
+-- Resume replication so that the replica catches
-- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
---
- ok
...
@@ -185,11 +212,11 @@ test_run:cmd("switch default")
...
-- Now garbage collection should resume and delete files left
-- from the old checkpoint.
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 0 end, 10)
+wait_gc(1) or box.info.gc()
---
- true
...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0) or fio.listdir('./master')
---
- true
...
@@ -228,11 +255,11 @@ fiber.sleep(0.1) -- wait for master to relay data
-- Garbage collection must not delete the old xlog file
-- because it is still needed by the replica, but remove
-- the old snapshot.
-#box.info.gc().checkpoints == 1 or box.info.gc()
+wait_gc(1) or box.info.gc()
---
- true
...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_xlog(2) or fio.listdir('./master')
---
- true
...
@@ -268,11 +295,11 @@ test_run:cmd("switch default")
- true
...
-- Now it's safe to drop the old xlog.
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 1 end, 10)
+wait_gc(1) or box.info.gc()
---
- true
...
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
---
- true
...
@@ -304,11 +331,14 @@ box.snapshot()
---
- ok
...
-#box.info.gc().checkpoints == 1 or box.info.gc()
+wait_gc(1) or box.info.gc()
---
- true
...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+-- The replica may have managed to download all data
+-- from xlog #1 before it was stopped, in which case
+-- it's OK to collect xlog #1.
+wait_xlog({2, 3}) or fio.listdir('./master')
---
- true
...
@@ -317,11 +347,11 @@ box.snapshot()
test_run:cleanup_cluster()
---
...
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 1 end, 10)
+wait_gc(1) or box.info.gc()
---
- true
...
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
---
- true
...
@@ -413,7 +443,7 @@ box.snapshot()
---
- ok
...
-#fio.glob('./master/*.xlog') == 3 or fio.listdir('./master')
+wait_xlog(3) or fio.listdir('./master')
---
- true
...
@@ -426,11 +456,7 @@ box.snapshot()
---
- ok
...
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 0 end, 10)
----
-- true
-...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0, 10) or fio.listdir('./master')
---
- true
...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index 1e4e02df9..9f79120e9 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -11,6 +11,35 @@ test_run:cmd("create server replica with rpl_master=default, script='replication
default_checkpoint_count = box.cfg.checkpoint_count
box.cfg{checkpoint_count = 1}
+test_run:cmd("setopt delimiter ';'")
+
+function wait_gc(n)
+ return test_run:wait_cond(function()
+ return #box.info.gc().checkpoints == n
+ end, 10)
+end
+
+function value_in(val, arr)
+ for _, elem in ipairs(arr) do
+ if val == elem then
+ return true
+ end
+ end
+ return false
+end
+
+function wait_xlog(n, timeout)
+ timeout = timeout or 1.0
+ if type(n) ~= 'table' then
+ n = {n}
+ end
+ return test_run:wait_cond(function()
+ return value_in(#fio.glob('./master/*.xlog'), n)
+ end, timeout)
+end
+
+test_run:cmd("setopt delimiter ''") ;
+
-- Grant permissions needed for replication.
box.schema.user.grant('guest', 'replication')
@@ -29,17 +58,16 @@ for i = 1, 100 do s:auto_increment{} end
-- Make sure replica join will take long enough for us to
-- invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
-- While the replica is receiving the initial data set,
-- make a snapshot and invoke garbage collection, then
--- remove the timeout injection so that we don't have to
--- wait too long for the replica to start.
+-- remove delay to allow replica to start.
test_run:cmd("setopt delimiter ';'")
fiber.create(function()
fiber.sleep(0.1)
box.snapshot()
- box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+ box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
end)
test_run:cmd("setopt delimiter ''");
@@ -57,12 +85,11 @@ test_run:cmd("switch default")
-- Check that garbage collection removed the snapshot once
-- the replica released the corresponding checkpoint.
-test_run:wait_cond(function() return #box.info.gc().checkpoints == 1 end, 10)
-#box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+wait_gc(1) or box.info.gc()
+wait_xlog(1) or fio.listdir('./master')
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
-- Send more data to the replica.
-- Need to do 2 snapshots here, otherwise the replica would
@@ -76,12 +103,12 @@ box.snapshot()
-- Invoke garbage collection. Check that it doesn't remove
-- xlogs needed by the replica.
box.snapshot()
-#box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_gc(1) or box.info.gc()
+wait_xlog(2) or fio.listdir('./master')
--- Remove the timeout injection so that the replica catches
+-- Resume replication so that the replica catches
-- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
-- Check that the replica received all data from the master.
test_run:cmd("switch replica")
@@ -91,8 +118,8 @@ test_run:cmd("switch default")
-- Now garbage collection should resume and delete files left
-- from the old checkpoint.
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 0 end, 10)
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_gc(1) or box.info.gc()
+wait_xlog(0) or fio.listdir('./master')
--
-- Check that the master doesn't delete xlog files sent to the
-- replica until it receives a confirmation that the data has
@@ -110,8 +137,8 @@ fiber.sleep(0.1) -- wait for master to relay data
-- Garbage collection must not delete the old xlog file
-- because it is still needed by the replica, but remove
-- the old snapshot.
-#box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_gc(1) or box.info.gc()
+wait_xlog(2) or fio.listdir('./master')
test_run:cmd("switch replica")
-- Unblock the replica and break replication.
box.error.injection.set("ERRINJ_WAL_DELAY", false)
@@ -124,8 +151,8 @@ test_run:wait_cond(function() return box.space.test:count() == 310 end, 10)
box.space.test:count()
test_run:cmd("switch default")
-- Now it's safe to drop the old xlog.
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 1 end, 10)
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_gc(1) or box.info.gc()
+wait_xlog(1) or fio.listdir('./master')
-- Stop the replica.
test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica")
@@ -139,14 +166,17 @@ _ = s:auto_increment{}
box.snapshot()
_ = s:auto_increment{}
box.snapshot()
-#box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_gc(1) or box.info.gc()
+-- The replica may have managed to download all data
+-- from xlog #1 before it was stopped, in which case
+-- it's OK to collect xlog #1.
+wait_xlog({2, 3}) or fio.listdir('./master')
-- The xlog should only be deleted after the replica
-- is unregistered.
test_run:cleanup_cluster()
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 1 end, 10)
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_gc(1) or box.info.gc()
+wait_xlog(1) or fio.listdir('./master')
--
-- Test that concurrent invocation of the garbage collector works fine.
--
@@ -186,14 +216,13 @@ _ = s:auto_increment{}
box.snapshot()
_ = s:auto_increment{}
box.snapshot()
-#fio.glob('./master/*.xlog') == 3 or fio.listdir('./master')
+wait_xlog(3) or fio.listdir('./master')
-- Delete the replica from the cluster table and check that
-- all xlog files are removed.
test_run:cleanup_cluster()
box.snapshot()
-test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == 0 end, 10)
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0, 10) or fio.listdir('./master')
-- Restore the config.
box.cfg{replication = {}}
--
2.18.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/5] test: errinj for pause relay_send
2018-12-06 13:38 ` [PATCH v3 2/5] test: errinj for pause relay_send Sergei Voronezhskii
@ 2018-12-06 15:44 ` Vladimir Davydov
0 siblings, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2018-12-06 15:44 UTC (permalink / raw)
To: Sergei Voronezhskii; +Cc: tarantool-patches
On Thu, Dec 06, 2018 at 04:38:49PM +0300, Sergei Voronezhskii wrote:
> diff --git a/src/box/relay.cc b/src/box/relay.cc
> index 0034f99a0..17daf76bf 100644
> --- a/src/box/relay.cc
> +++ b/src/box/relay.cc
> @@ -635,12 +635,17 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
> static void
> relay_send(struct relay *relay, struct xrow_header *packet)
> {
> + struct errinj *inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
> + while (inj != NULL && inj->bparam) {
> + fiber_sleep(0.01);
> + inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
> + }
No need to update 'inj' here. I removed this line.
> @@ -304,11 +331,14 @@ box.snapshot()
> ---
> - ok
> ...
> -#box.info.gc().checkpoints == 1 or box.info.gc()
> +wait_gc(1) or box.info.gc()
> ---
> - true
> ...
> -#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
^^^^^^ 2
> +-- The replica may have managed to download all data
> +-- from xlog #1 before it was stopped, in which case
> +-- it's OK to collect xlog #1.
> +wait_xlog({2, 3}) or fio.listdir('./master')
^^^^^^ 2 or 3
If you did the rebase more thoroughly, you'd notice that now there can't
be 3 xlogs here, only 2. I fixed it (removed value_in; made wait_xlog
always take a number).
Anyway, all this refactoring (I mean wait_xlog, wait_gc helpers) doesn't
have anything to do with the goal pursued by this patch. I left it as is
for now, but in future please avoid squashing gratuitous changes in your
patches - this complicates review.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 3/5] test: put require in proper places
2018-12-06 13:38 ` [PATCH v3 0/5] enable parallel mode for replication tests Sergei Voronezhskii
2018-12-06 13:38 ` [PATCH v3 1/5] test: cleanup " Sergei Voronezhskii
2018-12-06 13:38 ` [PATCH v3 2/5] test: errinj for pause relay_send Sergei Voronezhskii
@ 2018-12-06 13:38 ` Sergei Voronezhskii
2018-12-06 13:38 ` [PATCH v3 4/5] test: use wait_cond to check follow status Sergei Voronezhskii
` (2 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Sergei Voronezhskii @ 2018-12-06 13:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Vladimir Davydov
* put `require('fiber')` after each switch server command, because
sometimes got 'fiber' not defined error
* use `require('fio')` after `require('test_run').new()`, because
sometimes got 'fio' not defined error
Part of #2436, #3232
---
test/replication/catch.test.lua | 1 -
test/replication/gc.result | 6 +++---
test/replication/gc.test.lua | 2 +-
test/replication/on_replace.result | 3 +++
test/replication/on_replace.test.lua | 1 +
5 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index d5de88642..7a531df39 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -2,7 +2,6 @@ env = require('test_run')
test_run = env.new()
engine = test_run:get_cfg('engine')
-
net_box = require('net.box')
errinj = box.error.injection
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 273b77efc..f0cade079 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -1,6 +1,3 @@
-fio = require 'fio'
----
-...
test_run = require('test_run').new()
---
...
@@ -13,6 +10,9 @@ replica_set = require('fast_replica')
fiber = require('fiber')
---
...
+fio = require('fio')
+---
+...
test_run:cleanup_cluster()
---
...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index 9f79120e9..899319546 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -1,8 +1,8 @@
-fio = require 'fio'
test_run = require('test_run').new()
engine = test_run:get_cfg('engine')
replica_set = require('fast_replica')
fiber = require('fiber')
+fio = require('fio')
test_run:cleanup_cluster()
test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 8fef8fb14..2e95b90ea 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -63,6 +63,9 @@ test_run:cmd("switch replica")
---
- true
...
+fiber = require('fiber')
+---
+...
while box.space.test:count() < 2 do fiber.sleep(0.01) end
---
...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 23a3313b5..e34832103 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -26,6 +26,7 @@ session_type
test_run:cmd("switch default")
box.space.test:insert{2}
test_run:cmd("switch replica")
+fiber = require('fiber')
while box.space.test:count() < 2 do fiber.sleep(0.01) end
--
-- applier
--
2.18.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 4/5] test: use wait_cond to check follow status
2018-12-06 13:38 ` [PATCH v3 0/5] enable parallel mode for replication tests Sergei Voronezhskii
` (2 preceding siblings ...)
2018-12-06 13:38 ` [PATCH v3 3/5] test: put require in proper places Sergei Voronezhskii
@ 2018-12-06 13:38 ` Sergei Voronezhskii
2018-12-06 13:38 ` [PATCH v3 5/5] test: replication parallel mode on Sergei Voronezhskii
2018-12-06 15:44 ` [PATCH v3 0/5] enable parallel mode for replication tests Vladimir Davydov
5 siblings, 0 replies; 21+ messages in thread
From: Sergei Voronezhskii @ 2018-12-06 13:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Vladimir Davydov
After setting timeouts in `box.cfg` and before making a `replace` needs
to wait for replicas in `follow` status. Then if `wait_follow()` found
not `follow` status it returns true. Which immediately causes an error.
Fixes #3734
Part of #2436, #3232
---
test/replication/misc.result | 21 +++++++++++++++------
test/replication/misc.test.lua | 19 +++++++++++++------
2 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/test/replication/misc.result b/test/replication/misc.result
index 32676c4ea..c32681a7a 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -146,15 +146,24 @@ test_run:cmd("setopt delimiter ';'")
---
- true
...
+function wait_follow(replicaA, replicaB)
+ return test_run:wait_cond(function()
+ return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
+ end, 0.01)
+end ;
+---
+...
function test_timeout()
+ local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+ local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
+ local follows = test_run:wait_cond(function()
+ return replicaA.status == 'follow' or replicaB.status == 'follow'
+ end, 0.1)
+ if not follows then error('replicas not in follow status') end
for i = 0, 99 do
box.space.test_timeout:replace({1})
- fiber.sleep(0.005)
- local rinfo = box.info.replication
- if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
- rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
- rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
- return error('Replication broken')
+ if wait_follow(replicaA, replicaB) then
+ return error(box.info.replication)
end
end
return true
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 3bf1fc5a1..6a8af05c3 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -53,15 +53,22 @@ fiber=require('fiber')
box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
_ = box.schema.space.create('test_timeout'):create_index('pk')
test_run:cmd("setopt delimiter ';'")
+function wait_follow(replicaA, replicaB)
+ return test_run:wait_cond(function()
+ return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
+ end, 0.01)
+end ;
function test_timeout()
+ local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+ local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
+ local follows = test_run:wait_cond(function()
+ return replicaA.status == 'follow' or replicaB.status == 'follow'
+ end, 0.1)
+ if not follows then error('replicas not in follow status') end
for i = 0, 99 do
box.space.test_timeout:replace({1})
- fiber.sleep(0.005)
- local rinfo = box.info.replication
- if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
- rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
- rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
- return error('Replication broken')
+ if wait_follow(replicaA, replicaB) then
+ return error(box.info.replication)
end
end
return true
--
2.18.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 5/5] test: replication parallel mode on
2018-12-06 13:38 ` [PATCH v3 0/5] enable parallel mode for replication tests Sergei Voronezhskii
` (3 preceding siblings ...)
2018-12-06 13:38 ` [PATCH v3 4/5] test: use wait_cond to check follow status Sergei Voronezhskii
@ 2018-12-06 13:38 ` Sergei Voronezhskii
2018-12-06 15:44 ` [PATCH v3 0/5] enable parallel mode for replication tests Vladimir Davydov
5 siblings, 0 replies; 21+ messages in thread
From: Sergei Voronezhskii @ 2018-12-06 13:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Vladimir Davydov
Part of #2436, #3232
---
test/replication/suite.ini | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 260309573..6e9e3edd0 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -8,4 +8,4 @@ config = suite.cfg
lua_libs = lua/fast_replica.lua lua/rlimit.lua
use_unix_sockets = True
long_run = prune.test.lua
-is_parallel = False
+is_parallel = True
--
2.18.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/5] enable parallel mode for replication tests
2018-12-06 13:38 ` [PATCH v3 0/5] enable parallel mode for replication tests Sergei Voronezhskii
` (4 preceding siblings ...)
2018-12-06 13:38 ` [PATCH v3 5/5] test: replication parallel mode on Sergei Voronezhskii
@ 2018-12-06 15:44 ` Vladimir Davydov
5 siblings, 0 replies; 21+ messages in thread
From: Vladimir Davydov @ 2018-12-06 15:44 UTC (permalink / raw)
To: Sergei Voronezhskii; +Cc: tarantool-patches
On Thu, Dec 06, 2018 at 04:38:47PM +0300, Sergei Voronezhskii wrote:
> BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean-2.1
> After rebase to the last 2.1 fiexed conflicts in replicatiln/gc and
> box/errinj
>
> Sergei Voronezhskii (5):
> test: cleanup replication tests
> test: errinj for pause relay_send
> test: put require in proper places
> test: use wait_cond to check follow status
> test: replication parallel mode on
Pushed to 2.1.
Thanks.
^ permalink raw reply [flat|nested] 21+ messages in thread