Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH v2 0/5] test: replication parallel mode on
@ 2018-10-19 16:17 Sergei Voronezhskii
  2018-10-19 16:17 ` [PATCH v2 1/5] test: cleanup replication tests Sergei Voronezhskii
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-19 16:17 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
Depends on: https://github.com/tarantool/test-run/pull/128

Rebased on master

Sergei Voronezhskii (5):
  test: cleanup replication tests
  test: errinj for pause relay_send
  test: put require in proper places
  test: use wait_cond to check follow status
  test: replication parallel mode on

 src/box/relay.cc                            |  7 ++-
 src/errinj.h                                |  1 +
 test/replication/before_replace.result      |  3 ++
 test/replication/before_replace.test.lua    |  1 +
 test/replication/catch.result               | 55 ++++++++++----------
 test/replication/catch.test.lua             | 45 ++++++++--------
 test/replication/gc.result                  | 56 ++++++++++----------
 test/replication/gc.test.lua                | 41 ++++++++-------
 test/replication/local_spaces.result        |  3 ++
 test/replication/local_spaces.test.lua      |  1 +
 test/replication/misc.result                | 57 ++++++++++++++++-----
 test/replication/misc.test.lua              | 34 +++++++-----
 test/replication/on_replace.result          | 13 +++++
 test/replication/on_replace.test.lua        |  4 ++
 test/replication/once.result                | 12 +++++
 test/replication/once.test.lua              |  3 ++
 test/replication/quorum.result              |  3 ++
 test/replication/quorum.test.lua            |  1 +
 test/replication/replica_rejoin.result      |  7 +++
 test/replication/replica_rejoin.test.lua    |  2 +
 test/replication/skip_conflict_row.result   |  7 +++
 test/replication/skip_conflict_row.test.lua |  2 +
 test/replication/status.result              |  7 +++
 test/replication/status.test.lua            |  2 +
 test/replication/suite.ini                  |  3 +-
 test/replication/sync.result                |  7 +++
 test/replication/sync.test.lua              |  2 +
 test/replication/wal_off.result             |  7 +++
 test/replication/wal_off.test.lua           |  2 +
 29 files changed, 262 insertions(+), 126 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/5] test: cleanup replication tests
  2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
@ 2018-10-19 16:17 ` Sergei Voronezhskii
  2018-10-21 20:41   ` Alexander Turenko
  2018-10-19 16:17 ` [PATCH v2 2/5] test: errinj for pause relay_send Sergei Voronezhskii
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-19 16:17 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

- at the end of tests which create any replication config need to call:
  * `test_run:cmd('delete server ...')` removes server object
from `TestState.servers` list, this behaviour was taken
from `test_run:drop_cluster()` function
  * `test_run:clenup_cluster()` which clears `box.space._cluster`
- switch on `use_unix_sockets` because of 'Address already in use'
problems
- test `once` need to clean `once*` schemas

Part of #2436, #3232
---
 test/replication/before_replace.result      |  3 ++
 test/replication/before_replace.test.lua    |  1 +
 test/replication/catch.result               |  7 ++++
 test/replication/catch.test.lua             |  3 +-
 test/replication/gc.result                  |  4 +++
 test/replication/gc.test.lua                |  1 +
 test/replication/local_spaces.result        |  3 ++
 test/replication/local_spaces.test.lua      |  1 +
 test/replication/misc.result                | 40 +++++++++++++++++----
 test/replication/misc.test.lua              | 19 ++++++----
 test/replication/on_replace.result          | 10 ++++++
 test/replication/on_replace.test.lua        |  3 ++
 test/replication/once.result                | 12 +++++++
 test/replication/once.test.lua              |  3 ++
 test/replication/quorum.result              |  3 ++
 test/replication/quorum.test.lua            |  1 +
 test/replication/replica_rejoin.result      |  7 ++++
 test/replication/replica_rejoin.test.lua    |  2 ++
 test/replication/skip_conflict_row.result   |  7 ++++
 test/replication/skip_conflict_row.test.lua |  2 ++
 test/replication/status.result              |  7 ++++
 test/replication/status.test.lua            |  2 ++
 test/replication/suite.ini                  |  1 +
 test/replication/sync.result                |  7 ++++
 test/replication/sync.test.lua              |  2 ++
 test/replication/wal_off.result             |  7 ++++
 test/replication/wal_off.test.lua           |  2 ++
 27 files changed, 147 insertions(+), 13 deletions(-)

diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index 858a52de6..87973b6d1 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -223,3 +223,6 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index f1e590703..bcd264521 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -80,3 +80,4 @@ box.space.test:select()
 -- Cleanup.
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/catch.result b/test/replication/catch.result
index aebba819f..663bdc758 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -130,6 +130,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 8cc3242f7..6773675d0 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -59,6 +59,7 @@ errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
-
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 5944ba513..2895bed3b 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -408,6 +408,10 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
 _ = s:auto_increment{}
 ---
 ...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index 84c980dd1..5100378b3 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -190,6 +190,7 @@ box.cfg{replication = replica_port}
 -- Stop the replica and write a few WALs.
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
 _ = s:auto_increment{}
 box.snapshot()
 _ = s:auto_increment{}
diff --git a/test/replication/local_spaces.result b/test/replication/local_spaces.result
index 151735530..4de223261 100644
--- a/test/replication/local_spaces.result
+++ b/test/replication/local_spaces.result
@@ -216,6 +216,9 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
index 06e2b0bd2..633cc9f1a 100644
--- a/test/replication/local_spaces.test.lua
+++ b/test/replication/local_spaces.test.lua
@@ -76,6 +76,7 @@ box.space.test3:select()
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
 
 s1:select()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index f8aa8dab6..02f4d47f4 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -88,6 +88,13 @@ test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
 ---
 ...
+test_run:cmd('delete server test')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
@@ -229,6 +236,9 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
 rlimit = require('rlimit')
@@ -253,7 +263,7 @@ test_run:cmd('create server sock with rpl_master=default, script="replication/re
 ---
 - true
 ...
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server sock')
 ---
 - true
 ...
@@ -299,14 +309,21 @@ lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 ---
 ...
-test_run:cmd('stop server sock')
+test_run:cmd("stop server sock")
+---
+- true
+...
+test_run:cmd("cleanup server sock")
 ---
 - true
 ...
-test_run:cmd('cleanup server sock')
+test_run:cmd("delete server sock")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
@@ -342,6 +359,17 @@ test_run:cmd('cleanup server er_load2')
 ---
 - true
 ...
+test_run:cmd('delete server er_load1')
+---
+- true
+...
+test_run:cmd('delete server er_load2')
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
@@ -349,9 +377,6 @@ test_run:cmd('cleanup server er_load2')
 fiber = require('fiber')
 ---
 ...
-test_run:cleanup_cluster()
----
-...
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 ---
 - true
@@ -391,6 +416,9 @@ test_run:cmd("delete server replica_auth")
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.drop('cluster')
 ---
 ...
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 46726b7f4..06ad974db 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -32,6 +32,8 @@ test_run:cmd(string.format('start server test with args="%s"', replica_uuid))
 test_run:cmd('stop server test')
 test_run:cmd('cleanup server test')
 box.cfg{read_only = false}
+test_run:cmd('delete server test')
+test_run:cleanup_cluster()
 
 -- gh-3160 - Send heartbeats if there are changes from a remote master only
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
@@ -89,6 +91,7 @@ box.space.space1:drop()
 
 test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
 
 -- gh-3642 - Check that socket file descriptor doesn't leak
 -- when a replica is disconnected.
@@ -100,7 +103,7 @@ lim.rlim_cur = 64
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
 test_run:cmd('create server sock with rpl_master=default, script="replication/replica.lua"')
-test_run:cmd(string.format('start server sock'))
+test_run:cmd('start server sock')
 test_run:cmd('switch sock')
 test_run = require('test_run').new()
 fiber = require('fiber')
@@ -122,8 +125,10 @@ test_run:cmd('switch default')
 lim.rlim_cur = old_fno
 rlimit.setrlimit(rlimit.RLIMIT_NOFILE, lim)
 
-test_run:cmd('stop server sock')
-test_run:cmd('cleanup server sock')
+test_run:cmd("stop server sock")
+test_run:cmd("cleanup server sock")
+test_run:cmd("delete server sock")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
 
@@ -138,15 +143,15 @@ test_run:cmd('stop server er_load1')
 -- er_load2 exits automatically.
 test_run:cmd('cleanup server er_load1')
 test_run:cmd('cleanup server er_load2')
+test_run:cmd('delete server er_load1')
+test_run:cmd('delete server er_load2')
+test_run:cleanup_cluster()
 
 --
 -- Test case for gh-3637. Before the fix replica would exit with
 -- an error. Now check that we don't hang and successfully connect.
 --
 fiber = require('fiber')
-
-test_run:cleanup_cluster()
-
 test_run:cmd("create server replica_auth with rpl_master=default, script='replication/replica_auth.lua'")
 test_run:cmd("start server replica_auth with wait=False, wait_load=False, args='cluster:pass 0.05'")
 -- Wait a bit to make sure replica waits till user is created.
@@ -161,6 +166,8 @@ _ = test_run:wait_vclock('replica_auth', vclock)
 test_run:cmd("stop server replica_auth")
 test_run:cmd("cleanup server replica_auth")
 test_run:cmd("delete server replica_auth")
+test_run:cleanup_cluster()
+
 box.schema.user.drop('cluster')
 
 --
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 4ffa3b25a..8fef8fb14 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -88,6 +88,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
@@ -177,3 +184,6 @@ _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
 ---
 ...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 371b71cbd..23a3313b5 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -37,6 +37,8 @@ test_run:cmd("switch default")
 --
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
 
@@ -73,3 +75,4 @@ box.space.s2:select()
 
 _ = test_run:cmd('switch default')
 test_run:drop_cluster(SERVERS)
+test_run:cleanup_cluster()
diff --git a/test/replication/once.result b/test/replication/once.result
index 99ac05b72..fd787915e 100644
--- a/test/replication/once.result
+++ b/test/replication/once.result
@@ -85,3 +85,15 @@ once -- 1
 box.cfg{read_only = false}
 ---
 ...
+box.space._schema:delete{"oncero"}
+---
+- ['oncero']
+...
+box.space._schema:delete{"oncekey"}
+---
+- ['oncekey']
+...
+box.space._schema:delete{"oncetest"}
+---
+- ['oncetest']
+...
diff --git a/test/replication/once.test.lua b/test/replication/once.test.lua
index 264c63670..813fbfdab 100644
--- a/test/replication/once.test.lua
+++ b/test/replication/once.test.lua
@@ -28,3 +28,6 @@ box.cfg{read_only = true}
 box.once("ro", f, 1) -- ok, already done
 once -- 1
 box.cfg{read_only = false}
+box.space._schema:delete{"oncero"}
+box.space._schema:delete{"oncekey"}
+box.space._schema:delete{"oncetest"}
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 265b099b7..ff5fa0150 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -447,6 +447,9 @@ test_run:cmd('delete server replica_quorum')
 ---
 - true
 ...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 5a43275c2..98febb367 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -169,4 +169,5 @@ test_run:cmd('switch default')
 test_run:cmd('stop server replica_quorum')
 test_run:cmd('cleanup server replica_quorum')
 test_run:cmd('delete server replica_quorum')
+test_run:cleanup_cluster()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_rejoin.result b/test/replication/replica_rejoin.result
index df1057d10..87d626e20 100644
--- a/test/replication/replica_rejoin.result
+++ b/test/replication/replica_rejoin.result
@@ -379,6 +379,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/replica_rejoin.test.lua b/test/replication/replica_rejoin.test.lua
index 40094a1af..9bf43eff8 100644
--- a/test/replication/replica_rejoin.test.lua
+++ b/test/replication/replica_rejoin.test.lua
@@ -138,5 +138,7 @@ test_run:cmd("switch default")
 box.cfg{replication = ''}
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/skip_conflict_row.result b/test/replication/skip_conflict_row.result
index 29963f56a..6ca13b472 100644
--- a/test/replication/skip_conflict_row.result
+++ b/test/replication/skip_conflict_row.result
@@ -91,6 +91,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/skip_conflict_row.test.lua b/test/replication/skip_conflict_row.test.lua
index 5f7d6ead3..4406ced95 100644
--- a/test/replication/skip_conflict_row.test.lua
+++ b/test/replication/skip_conflict_row.test.lua
@@ -31,5 +31,7 @@ box.info.status
 -- cleanup
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/status.result b/test/replication/status.result
index 8394b98c1..9e69f2478 100644
--- a/test/replication/status.result
+++ b/test/replication/status.result
@@ -391,3 +391,10 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
diff --git a/test/replication/status.test.lua b/test/replication/status.test.lua
index 8bb25e0c6..cfdf6acdb 100644
--- a/test/replication/status.test.lua
+++ b/test/replication/status.test.lua
@@ -142,3 +142,5 @@ test_run:cmd('switch default')
 box.schema.user.revoke('guest', 'replication')
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index f4abc7af1..6e3a73341 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -6,5 +6,6 @@ disabled = consistent.test.lua
 release_disabled = catch.test.lua errinj.test.lua gc.test.lua before_replace.test.lua quorum.test.lua recover_missing_xlog.test.lua sync.test.lua
 config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
+use_unix_sockets = True
 long_run = prune.test.lua
 is_parallel = False
diff --git a/test/replication/sync.result b/test/replication/sync.result
index 81de60758..b2381ac59 100644
--- a/test/replication/sync.result
+++ b/test/replication/sync.result
@@ -303,6 +303,13 @@ test_run:cmd("cleanup server replica")
 ---
 - true
 ...
+test_run:cmd("delete server replica")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.space.test:drop()
 ---
 ...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
index a5cfab8de..51131667d 100644
--- a/test/replication/sync.test.lua
+++ b/test/replication/sync.test.lua
@@ -145,6 +145,8 @@ test_run:grep_log('replica', 'ER_CFG.*')
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
+test_run:cmd("delete server replica")
+test_run:cleanup_cluster()
 
 box.space.test:drop()
 box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/wal_off.result b/test/replication/wal_off.result
index e3b5709e9..e0ae84bd7 100644
--- a/test/replication/wal_off.result
+++ b/test/replication/wal_off.result
@@ -107,6 +107,13 @@ test_run:cmd("cleanup server wal_off")
 ---
 - true
 ...
+test_run:cmd("delete server wal_off")
+---
+- true
+...
+test_run:cleanup_cluster()
+---
+...
 box.schema.user.revoke('guest', 'replication')
 ---
 ...
diff --git a/test/replication/wal_off.test.lua b/test/replication/wal_off.test.lua
index 81fcf0b33..110f2f1f7 100644
--- a/test/replication/wal_off.test.lua
+++ b/test/replication/wal_off.test.lua
@@ -37,5 +37,7 @@ box.cfg { replication = "" }
 
 test_run:cmd("stop server wal_off")
 test_run:cmd("cleanup server wal_off")
+test_run:cmd("delete server wal_off")
+test_run:cleanup_cluster()
 
 box.schema.user.revoke('guest', 'replication')
-- 
2.18.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 2/5] test: errinj for pause relay_send
  2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
  2018-10-19 16:17 ` [PATCH v2 1/5] test: cleanup replication tests Sergei Voronezhskii
@ 2018-10-19 16:17 ` Sergei Voronezhskii
  2018-10-21 20:41   ` Alexander Turenko
  2018-10-19 16:17 ` [PATCH v2 3/5] test: put require in proper places Sergei Voronezhskii
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-19 16:17 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

Instead of using timeout we need just pause `relay_send`. Can't rely
on timeout because of various system load in parallel mode. Add new
errinj which checks boolean in loop and until it is not `True` do not
pass the method `relay_send` to the next statement.

To check the read-only mode, need to make a modification of tuple. It
is enough to call `replace` method. Instead of `delete` and then
useless verification that we have not delete space by using `get`
method.

And lookup the xlog files in loop with a little sleep, until the file
count is not as expected.

Part of #2436, #3232
---
 src/box/relay.cc                |  7 ++++-
 src/errinj.h                    |  1 +
 test/replication/catch.result   | 48 +++++++++++++++------------------
 test/replication/catch.test.lua | 41 ++++++++++++++--------------
 test/replication/gc.result      | 46 ++++++++++++++-----------------
 test/replication/gc.test.lua    | 38 +++++++++++++-------------
 6 files changed, 86 insertions(+), 95 deletions(-)

diff --git a/src/box/relay.cc b/src/box/relay.cc
index 0a1e95af0..81f2b821c 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -631,12 +631,17 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
 static void
 relay_send(struct relay *relay, struct xrow_header *packet)
 {
+	struct errinj *inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
+	while (inj->bparam) {
+		fiber_sleep(0.01);
+		inj = errinj(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL);
+	}
 	packet->sync = relay->sync;
 	relay->last_row_tm = ev_monotonic_now(loop());
 	coio_write_xrow(&relay->io, packet);
 	fiber_gc();
 
-	struct errinj *inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
+	inj = errinj(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE);
 	if (inj != NULL && inj->dparam > 0)
 		fiber_sleep(inj->dparam);
 }
diff --git a/src/errinj.h b/src/errinj.h
index 84a1fbb5e..bf6c15ba5 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -94,6 +94,7 @@ struct errinj {
 	_(ERRINJ_VY_GC, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FLUSH, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_VY_LOG_FLUSH_DELAY, ERRINJ_BOOL, {.bparam = false}) \
+	_(ERRINJ_RELAY_SEND_DELAY, ERRINJ_BOOL, {.bparam = false}) \
 	_(ERRINJ_RELAY_TIMEOUT, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_REPORT_INTERVAL, ERRINJ_DOUBLE, {.dparam = 0}) \
 	_(ERRINJ_RELAY_FINAL_SLEEP, ERRINJ_BOOL, {.bparam = false}) \
diff --git a/test/replication/catch.result b/test/replication/catch.result
index 663bdc758..d25932d1b 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -35,7 +35,7 @@ test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
 ---
 ...
--- vinyl does not support hash index
+-- Vinyl does not support hash index.
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 ---
 ...
@@ -57,14 +57,14 @@ test_run:cmd("stop server replica")
 ---
 - true
 ...
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
+-- Insert values on the master while replica is stopped and can't
+-- fetch them.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
 ---
+- ok
 ...
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+for i = 1, 100 do s:insert{i, 'this is test message12345'} end
 ---
-- ok
 ...
 test_run:cmd("start server replica with args='0.01'")
 ---
@@ -75,28 +75,26 @@ test_run:cmd("switch replica")
 - true
 ...
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
--- catching up with the master (local delete, remote delete) case
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica
+-- is catching up with the master (local delete, remote delete)
+-- case
 --
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
 ---
 - true
 ...
-d = box.space.test:delete{1}
+box.space.test:replace{1}
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-box.space.test:get(1) ~= nil
----
-- true
-...
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 test_run:cmd("switch default")
 ---
 - true
@@ -108,20 +106,16 @@ test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
 ---
 ...
-d = c.space.test:delete{1}
+d = c.space.test:replace{1}
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-c.space.test:get(1) ~= nil
----
-- true
-...
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+-- Resume replicaton.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
 ---
 - ok
 ...
--- cleanup
+-- Cleanup.
 test_run:cmd("stop server replica")
 ---
 - true
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 6773675d0..685c3e869 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -13,7 +13,7 @@ test_run:cmd("switch replica")
 
 test_run:cmd("switch default")
 s = box.schema.space.create('test', {engine = engine});
--- vinyl does not support hash index
+-- Vinyl does not support hash index.
 index = s:create_index('primary', {type = (engine == 'vinyl' and 'tree' or 'hash') })
 
 test_run:cmd("switch replica")
@@ -22,41 +22,40 @@ while box.space.test == nil do fiber.sleep(0.01) end
 test_run:cmd("switch default")
 test_run:cmd("stop server replica")
 
--- insert values on the master while replica is stopped and can't fetch them
-for i=1,100 do s:insert{i, 'this is test message12345'} end
-
--- sleep after every tuple
-errinj.set("ERRINJ_RELAY_TIMEOUT", 1000.0)
+-- Insert values on the master while replica is stopped and can't
+-- fetch them.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', true)
+for i = 1, 100 do s:insert{i, 'this is test message12345'} end
 
 test_run:cmd("start server replica with args='0.01'")
 test_run:cmd("switch replica")
 
 -- Check that replica doesn't enter read-write mode before
--- catching up with the master: to check that we inject sleep into
--- the master relay_send function and attempt a data modifying
--- statement in replica while it's still fetching data from the
--- master.
--- In the next two cases we try to delete a tuple while replica is
--- catching up with the master (local delete, remote delete) case
+-- catching up with the master: to check that we stop sending
+-- rows on the master in relay_send function and attempt a data
+-- modifying statement in replica while it's still fetching data
+-- from the master.
+--
+-- In the next two cases we try to replace a tuple while replica
+-- is catching up with the master (local delete, remote delete)
+-- case
 --
--- #1: delete tuple on replica
+-- Case #1: replace tuple on replica locally.
 --
 box.space.test ~= nil
-d = box.space.test:delete{1}
-box.space.test:get(1) ~= nil
+box.space.test:replace{1}
 
--- case #2: delete tuple by net.box
+-- Case #2: replace tuple on replica by net.box.
 
 test_run:cmd("switch default")
 test_run:cmd("set variable r_uri to 'replica.listen'")
 c = net_box.connect(r_uri)
-d = c.space.test:delete{1}
-c.space.test:get(1) ~= nil
+d = c.space.test:replace{1}
 
--- check sync
-errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
+-- Resume replicaton.
+errinj.set('ERRINJ_RELAY_SEND_DELAY', false)
 
--- cleanup
+-- Cleanup.
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
 test_run:cmd("delete server replica")
diff --git a/test/replication/gc.result b/test/replication/gc.result
index 2895bed3b..dad5539de 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -30,6 +30,9 @@ box.cfg{checkpoint_count = 1}
 function wait_gc(n) while #box.info.gc().checkpoints > n do fiber.sleep(0.01) end end
 ---
 ...
+function wait_xlog(n, timeout) timeout = timeout or 1.0 return test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == n end, timeout) end
+---
+...
 -- Grant permissions needed for replication.
 box.schema.user.grant('guest', 'replication')
 ---
@@ -66,7 +69,7 @@ for i = 1, 100 do s:auto_increment{} end
 ...
 -- Make sure replica join will take long enough for us to
 -- invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
 ---
 - ok
 ...
@@ -81,7 +84,7 @@ test_run:cmd("setopt delimiter ';'")
 fiber.create(function()
     fiber.sleep(0.1)
     box.snapshot()
-    box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+    box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
 end)
 test_run:cmd("setopt delimiter ''");
 ---
@@ -102,7 +105,7 @@ test_run:cmd("switch replica")
 fiber = require('fiber')
 ---
 ...
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 ---
 ...
 box.space.test:count()
@@ -122,13 +125,13 @@ wait_gc(1)
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
 ---
 - true
 ...
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
 ---
 - ok
 ...
@@ -160,13 +163,13 @@ box.snapshot()
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_xlog(2) or fio.listdir('./master')
 ---
 - true
 ...
--- Remove the timeout injection so that the replica catches
+-- Resume replicaton so that the replica catches
 -- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
 ---
 - ok
 ...
@@ -195,7 +198,7 @@ wait_gc(1)
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0) or fio.listdir('./master')
 ---
 - true
 ...
@@ -238,7 +241,7 @@ fiber.sleep(0.1) -- wait for master to relay data
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_xlog(2) or fio.listdir('./master')
 ---
 - true
 ...
@@ -283,7 +286,7 @@ wait_gc(1)
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
 ---
 - true
 ...
@@ -319,13 +322,10 @@ box.snapshot()
 ---
 - true
 ...
-xlog_count = #fio.glob('./master/*.xlog')
----
-...
 -- the replica may have managed to download all data
 -- from xlog #1 before it was stopped, in which case
 -- it's OK to collect xlog #1
-xlog_count == 3 or xlog_count == 2 or fio.listdir('./master')
+wait_xlog(3, 0.1) or wait_xlog(2, 0.1) or fio.listdir('./master')
 ---
 - true
 ...
@@ -338,7 +338,7 @@ test_run:cleanup_cluster()
 ---
 - true
 ...
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
 ---
 - true
 ...
@@ -433,7 +433,7 @@ box.snapshot()
 ---
 - ok
 ...
-#fio.glob('./master/*.xlog') == 3 or fio.listdir('./master')
+wait_xlog(3) or fio.listdir('./master')
 ---
 - true
 ...
@@ -446,13 +446,7 @@ box.snapshot()
 ---
 - ok
 ...
-t = fiber.time()
----
-...
-while #fio.glob('./master/*xlog') > 0 and fiber.time() - t < 10 do fiber.sleep(0.01) end
----
-...
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0, 10) or fio.listdir('./master')
 ---
 - true
 ...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index 5100378b3..22921289d 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -12,6 +12,7 @@ default_checkpoint_count = box.cfg.checkpoint_count
 box.cfg{checkpoint_count = 1}
 
 function wait_gc(n) while #box.info.gc().checkpoints > n do fiber.sleep(0.01) end end
+function wait_xlog(n, timeout) timeout = timeout or 1.0 return test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == n end, timeout) end
 
 -- Grant permissions needed for replication.
 box.schema.user.grant('guest', 'replication')
@@ -31,7 +32,7 @@ for i = 1, 100 do s:auto_increment{} end
 
 -- Make sure replica join will take long enough for us to
 -- invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
 
 -- While the replica is receiving the initial data set,
 -- make a snapshot and invoke garbage collection, then
@@ -41,7 +42,7 @@ test_run:cmd("setopt delimiter ';'")
 fiber.create(function()
     fiber.sleep(0.1)
     box.snapshot()
-    box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+    box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
 end)
 test_run:cmd("setopt delimiter ''");
 
@@ -54,7 +55,7 @@ test_run:cmd("start server replica")
 -- data from the master. Check it.
 test_run:cmd("switch replica")
 fiber = require('fiber')
-while box.space.test:count() < 200 do fiber.sleep(0.01) end
+while box.space.test == nil or box.space.test:count() < 200 do fiber.sleep(0.01) end
 box.space.test:count()
 test_run:cmd("switch default")
 
@@ -62,10 +63,10 @@ test_run:cmd("switch default")
 -- the replica released the corresponding checkpoint.
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
--- Make sure the replica will receive data it is subscribed
--- to long enough for us to invoke garbage collection.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
+wait_xlog(1) or fio.listdir('./master')
+-- Make sure the replica will not receive data until
+-- we test garbage collection.
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
 
 -- Send more data to the replica.
 -- Need to do 2 snapshots here, otherwise the replica would
@@ -80,11 +81,11 @@ box.snapshot()
 -- xlogs needed by the replica.
 box.snapshot()
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_xlog(2) or fio.listdir('./master')
 
--- Remove the timeout injection so that the replica catches
+-- Resume replicaton so that the replica catches
 -- up quickly.
-box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
 
 -- Check that the replica received all data from the master.
 test_run:cmd("switch replica")
@@ -96,7 +97,7 @@ test_run:cmd("switch default")
 -- from the old checkpoint.
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0) or fio.listdir('./master')
 --
 -- Check that the master doesn't delete xlog files sent to the
 -- replica until it receives a confirmation that the data has
@@ -115,7 +116,7 @@ fiber.sleep(0.1) -- wait for master to relay data
 -- because it is still needed by the replica, but remove
 -- the old snapshot.
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 2 or fio.listdir('./master')
+wait_xlog(2) or fio.listdir('./master')
 test_run:cmd("switch replica")
 -- Unblock the replica and break replication.
 box.error.injection.set("ERRINJ_WAL_DELAY", false)
@@ -131,7 +132,7 @@ test_run:cmd("switch default")
 -- Now it's safe to drop the old xlog.
 wait_gc(1)
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
 -- Stop the replica.
 test_run:cmd("stop server replica")
 test_run:cmd("cleanup server replica")
@@ -146,17 +147,16 @@ box.snapshot()
 _ = s:auto_increment{}
 box.snapshot()
 #box.info.gc().checkpoints == 1 or box.info.gc()
-xlog_count = #fio.glob('./master/*.xlog')
 -- the replica may have managed to download all data
 -- from xlog #1 before it was stopped, in which case
 -- it's OK to collect xlog #1
-xlog_count == 3 or xlog_count == 2 or fio.listdir('./master')
+wait_xlog(3, 0.1) or wait_xlog(2, 0.1) or fio.listdir('./master')
 
 -- The xlog should only be deleted after the replica
 -- is unregistered.
 test_run:cleanup_cluster()
 #box.info.gc().checkpoints == 1 or box.info.gc()
-#fio.glob('./master/*.xlog') == 1 or fio.listdir('./master')
+wait_xlog(1) or fio.listdir('./master')
 --
 -- Test that concurrent invocation of the garbage collector works fine.
 --
@@ -197,15 +197,13 @@ _ = s:auto_increment{}
 box.snapshot()
 _ = s:auto_increment{}
 box.snapshot()
-#fio.glob('./master/*.xlog') == 3 or fio.listdir('./master')
+wait_xlog(3) or fio.listdir('./master')
 
 -- Delete the replica from the cluster table and check that
 -- all xlog files are removed.
 test_run:cleanup_cluster()
 box.snapshot()
-t = fiber.time()
-while #fio.glob('./master/*xlog') > 0 and fiber.time() - t < 10 do fiber.sleep(0.01) end
-#fio.glob('./master/*.xlog') == 0 or fio.listdir('./master')
+wait_xlog(0, 10) or fio.listdir('./master')
 
 -- Restore the config.
 box.cfg{replication = {}}
-- 
2.18.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 3/5] test: put require in proper places
  2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
  2018-10-19 16:17 ` [PATCH v2 1/5] test: cleanup replication tests Sergei Voronezhskii
  2018-10-19 16:17 ` [PATCH v2 2/5] test: errinj for pause relay_send Sergei Voronezhskii
@ 2018-10-19 16:17 ` Sergei Voronezhskii
  2018-10-21 20:41   ` Alexander Turenko
  2018-10-19 16:17 ` [PATCH v2 4/5] test: use wait_cond to check follow status Sergei Voronezhskii
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-19 16:17 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

* put `require('fiber')` after each switch server command, because
sometimes got 'fiber' not defined error
* use `require('fio')` after `require('test_run').new()`, because
sometimes got 'fio' not defined error

Part of #2436, #3232
---
 test/replication/catch.test.lua      | 1 -
 test/replication/gc.result           | 6 +++---
 test/replication/gc.test.lua         | 2 +-
 test/replication/on_replace.result   | 3 +++
 test/replication/on_replace.test.lua | 1 +
 5 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 685c3e869..9ceb2fb9c 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -2,7 +2,6 @@ env = require('test_run')
 test_run = env.new()
 engine = test_run:get_cfg('engine')
 
-
 net_box = require('net.box')
 errinj = box.error.injection
 
diff --git a/test/replication/gc.result b/test/replication/gc.result
index dad5539de..631944da5 100644
--- a/test/replication/gc.result
+++ b/test/replication/gc.result
@@ -1,6 +1,3 @@
-fio = require 'fio'
----
-...
 test_run = require('test_run').new()
 ---
 ...
@@ -13,6 +10,9 @@ replica_set = require('fast_replica')
 fiber = require('fiber')
 ---
 ...
+fio = require('fio')
+---
+...
 test_run:cleanup_cluster()
 ---
 ...
diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
index 22921289d..8c3337613 100644
--- a/test/replication/gc.test.lua
+++ b/test/replication/gc.test.lua
@@ -1,8 +1,8 @@
-fio = require 'fio'
 test_run = require('test_run').new()
 engine = test_run:get_cfg('engine')
 replica_set = require('fast_replica')
 fiber = require('fiber')
+fio = require('fio')
 
 test_run:cleanup_cluster()
 test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 8fef8fb14..2e95b90ea 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -63,6 +63,9 @@ test_run:cmd("switch replica")
 ---
 - true
 ...
+fiber = require('fiber')
+---
+...
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 ---
 ...
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 23a3313b5..e34832103 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -26,6 +26,7 @@ session_type
 test_run:cmd("switch default")
 box.space.test:insert{2}
 test_run:cmd("switch replica")
+fiber = require('fiber')
 while box.space.test:count() < 2 do fiber.sleep(0.01) end
 --
 -- applier
-- 
2.18.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 4/5] test: use wait_cond to check follow status
  2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
                   ` (2 preceding siblings ...)
  2018-10-19 16:17 ` [PATCH v2 3/5] test: put require in proper places Sergei Voronezhskii
@ 2018-10-19 16:17 ` Sergei Voronezhskii
  2018-10-19 23:24   ` Alexander Turenko
  2018-10-19 16:17 ` [PATCH v2 5/5] test: replication parallel mode on Sergei Voronezhskii
  2018-10-19 23:37 ` [PATCH v2 0/5] " Alexander Turenko
  5 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-19 16:17 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

If `test_run:wait_cond()` found a not 'follow` status it returns true.
Which immediately causes an error.

Fixes #3734
Part of #2436, #3232
---
 test/replication/misc.result   | 17 +++++++++++------
 test/replication/misc.test.lua | 15 +++++++++------
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/test/replication/misc.result b/test/replication/misc.result
index 02f4d47f4..58e512c30 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -146,15 +146,20 @@ test_run:cmd("setopt delimiter ';'")
 ---
 - true
 ...
+function wait_follow(replicaA, replicaB)
+    return test_run:wait_cond(function()
+        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
+    end, 0.01)
+end ;
+---
+...
 function test_timeout()
     for i = 0, 99 do 
+        local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+        local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
+        if wait_follow(replicaA, replicaB) then
+            return error(box.info.replication)
         end
     end
     return true
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index 06ad974db..3866eb3ac 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -53,15 +53,18 @@ fiber=require('fiber')
 box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
 _ = box.schema.space.create('test_timeout'):create_index('pk')
 test_run:cmd("setopt delimiter ';'")
+function wait_follow(replicaA, replicaB)
+    return test_run:wait_cond(function()
+        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
+    end, 0.01)
+end ;
 function test_timeout()
     for i = 0, 99 do 
+        local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
+        local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
         box.space.test_timeout:replace({1})
-        fiber.sleep(0.005)
-        local rinfo = box.info.replication
-        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
-           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
-           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
-            return error('Replication broken')
+        if wait_follow(replicaA, replicaB) then
+            return error(box.info.replication)
         end
     end
     return true
-- 
2.18.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 5/5] test: replication parallel mode on
  2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
                   ` (3 preceding siblings ...)
  2018-10-19 16:17 ` [PATCH v2 4/5] test: use wait_cond to check follow status Sergei Voronezhskii
@ 2018-10-19 16:17 ` Sergei Voronezhskii
  2018-10-19 23:37 ` [PATCH v2 0/5] " Alexander Turenko
  5 siblings, 0 replies; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-19 16:17 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

Part of #2436, #3232
---
 test/replication/suite.ini | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/replication/suite.ini b/test/replication/suite.ini
index 6e3a73341..5cbc371c2 100644
--- a/test/replication/suite.ini
+++ b/test/replication/suite.ini
@@ -8,4 +8,4 @@ config = suite.cfg
 lua_libs = lua/fast_replica.lua lua/rlimit.lua
 use_unix_sockets = True
 long_run = prune.test.lua
-is_parallel = False
+is_parallel = True
-- 
2.18.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 4/5] test: use wait_cond to check follow status
  2018-10-19 16:17 ` [PATCH v2 4/5] test: use wait_cond to check follow status Sergei Voronezhskii
@ 2018-10-19 23:24   ` Alexander Turenko
  2018-10-25 16:43     ` [tarantool-patches] " Sergei Voronezhskii
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Turenko @ 2018-10-19 23:24 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

Hi!

See below.

WBR, Alexander Turenko.

On Fri, Oct 19, 2018 at 07:17:20PM +0300, Sergei Voronezhskii wrote:
> If `test_run:wait_cond()` found a not 'follow` status it returns true.
> Which immediately causes an error.
> 
> Fixes #3734
> Part of #2436, #3232
> ---
>  test/replication/misc.result   | 17 +++++++++++------
>  test/replication/misc.test.lua | 15 +++++++++------
>  2 files changed, 20 insertions(+), 12 deletions(-)
> 

> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> index 06ad974db..3866eb3ac 100644
> --- a/test/replication/misc.test.lua
> +++ b/test/replication/misc.test.lua
> @@ -53,15 +53,18 @@ fiber=require('fiber')
>  box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
>  _ = box.schema.space.create('test_timeout'):create_index('pk')
>  test_run:cmd("setopt delimiter ';'")
> +function wait_follow(replicaA, replicaB)
> +    return test_run:wait_cond(function()
> +        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
> +    end, 0.01)
> +end ;
>  function test_timeout()
>      for i = 0, 99 do 
> +        local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
> +        local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
>          box.space.test_timeout:replace({1})
> -        fiber.sleep(0.005)
> -        local rinfo = box.info.replication
> -        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
> -           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
> -           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
> -            return error('Replication broken')
> +        if wait_follow(replicaA, replicaB) then
> +            return error(box.info.replication)

AFAIU, this test case checks that replicas do not leave from 'follow'
state even for a short time period. We should wait for 'follow' state
before the loop and perform some amount of attemps to catch an another
state. I don't sure, though. Georgy should draw the line.

I still think correction of test cases is a developer responsibility. If
you want to do it, please, discuss it with the author before. This will
save us some time we spend now on those extra review iterations.

>          end
>      end
>      return true
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
                   ` (4 preceding siblings ...)
  2018-10-19 16:17 ` [PATCH v2 5/5] test: replication parallel mode on Sergei Voronezhskii
@ 2018-10-19 23:37 ` Alexander Turenko
  2018-10-19 23:44   ` Alexander Turenko
  5 siblings, 1 reply; 25+ messages in thread
From: Alexander Turenko @ 2018-10-19 23:37 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

It seems you catched segfault on the replication suite with this patch
:)

https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591

Please, comment whether you are able to reproduce it somehow (at least
in CI or maybe some specific machine or docker image) and file an issue.
Please, do not restart this job / build to have at least some proof
link, because now it does not clear whether the case is easy to
reproduce.

WBR, Alexander Turenko.

On Fri, Oct 19, 2018 at 07:17:16PM +0300, Sergei Voronezhskii wrote:
> BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
> Depends on: https://github.com/tarantool/test-run/pull/128
> 
> Rebased on master
> 
> Sergei Voronezhskii (5):
>   test: cleanup replication tests
>   test: errinj for pause relay_send
>   test: put require in proper places
>   test: use wait_cond to check follow status
>   test: replication parallel mode on

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-19 23:37 ` [PATCH v2 0/5] " Alexander Turenko
@ 2018-10-19 23:44   ` Alexander Turenko
  2018-10-26 12:41     ` Re[2]: " Sergei Voronezhskii
  2018-10-26 12:44     ` Sergei Voronezhskii
  0 siblings, 2 replies; 25+ messages in thread
From: Alexander Turenko @ 2018-10-19 23:44 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

I also observed a fail and a hang on box/:

https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173

Maybe test-run do something incorrect and the suites are not fully
independent in the parallel run? Maybe some tests from box/ and
replication/ conflicts on some global resource, say the same unix socket
name in /tmp?

Please, try to reproduce the issue and fix / report it.

WBR, Alexander Turenko.

On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
> It seems you catched segfault on the replication suite with this patch
> :)
> 
> https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591
> 
> Please, comment whether you are able to reproduce it somehow (at least
> in CI or maybe some specific machine or docker image) and file an issue.
> Please, do not restart this job / build to have at least some proof
> link, because now it does not clear whether the case is easy to
> reproduce.
> 
> WBR, Alexander Turenko.
> 
> On Fri, Oct 19, 2018 at 07:17:16PM +0300, Sergei Voronezhskii wrote:
> > BRANCH: https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
> > Depends on: https://github.com/tarantool/test-run/pull/128
> > 
> > Rebased on master
> > 
> > Sergei Voronezhskii (5):
> >   test: cleanup replication tests
> >   test: errinj for pause relay_send
> >   test: put require in proper places
> >   test: use wait_cond to check follow status
> >   test: replication parallel mode on

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] test: cleanup replication tests
  2018-10-19 16:17 ` [PATCH v2 1/5] test: cleanup replication tests Sergei Voronezhskii
@ 2018-10-21 20:41   ` Alexander Turenko
  2018-10-22  8:07     ` Re[2]: " Sergei Voronezhskii
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Turenko @ 2018-10-21 20:41 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

Ok except minor comments below.

It would be glab if you'll answer to review messages, cite reviewer
comments and answer what was changed or why it didn't changed. It is
hard to manually track all questions were raised, esp. when the patch is
relatively big.

For small changes (eps. for a relatively big patch) you can just answer
for comments. Ideally also attach diff from the previous version at end
of the email (just call `git diff` before commit your changes and
paste). I can see the whole diff on the branch and paste related hunks
to comment it. Other reviewers can ask you to paste the whole new diff
explicitly, I personally prefer to run git diff and paste things I want
to comment.

WBR, Alexander Turenko.

On Fri, Oct 19, 2018 at 07:17:17PM +0300, Sergei Voronezhskii wrote:
> - at the end of tests which create any replication config need to call:
>   * `test_run:cmd('delete server ...')` removes server object
> from `TestState.servers` list, this behaviour was taken
> from `test_run:drop_cluster()` function

Please, indent it like so (for ease reading):

- xxx
  * yyy
    zzz
    aaa
- fff
  eee
  * ggg
    hhh

> diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
> index 06e2b0bd2..633cc9f1a 100644
> --- a/test/replication/local_spaces.test.lua
> +++ b/test/replication/local_spaces.test.lua
> @@ -76,6 +76,7 @@ box.space.test3:select()
>  test_run:cmd("switch default")
>  test_run:cmd("stop server replica")
>  test_run:cmd("cleanup server replica")
> +test_run:cleanup_cluster()
>  box.schema.user.revoke('guest', 'replication')
>  
>  s1:select()

Missed test_run:cmd("delete server replica")?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] test: errinj for pause relay_send
  2018-10-19 16:17 ` [PATCH v2 2/5] test: errinj for pause relay_send Sergei Voronezhskii
@ 2018-10-21 20:41   ` Alexander Turenko
  2018-10-22  8:42     ` Re[2]: " Sergei Voronezhskii
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Turenko @ 2018-10-21 20:41 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

Hi!

I don't have objections in general. Some minor comments are below.

Please, answer with fixes, don't just send the whole patch.

WBR, Alexander Turenko.

> Instead of using timeout we need just pause `relay_send`. Can't rely
> on timeout because of various system load in parallel mode. Add new
> errinj which checks boolean in loop and until it is not `True` do not
> pass the method `relay_send` to the next statement.
> 
> To check the read-only mode, need to make a modification of tuple. It
> is enough to call `replace` method. Instead of `delete` and then
> useless verification that we have not delete space by using `get`
> method.
> 

delete space -> delete tuple?

> +-- In the next two cases we try to replace a tuple while replica
> +-- is catching up with the master (local delete, remote delete)

delete -> replace

> +-- case

Nit: period at the end.

> --- check sync
> -errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
> +-- Resume replicaton.

replicaton -> replication

> diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
> index 5100378b3..22921289d 100644
> --- a/test/replication/gc.test.lua
> +++ b/test/replication/gc.test.lua
> @@ -12,6 +12,7 @@ default_checkpoint_count = box.cfg.checkpoint_count
>  box.cfg{checkpoint_count = 1}
>  
>  function wait_gc(n) while #box.info.gc().checkpoints > n do fiber.sleep(0.01) end end
> +function wait_xlog(n, timeout) timeout = timeout or 1.0 return test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == n end, timeout) end
>

Use 'set delimiter' and write it in several lines. Also, below I
proposed to support 'n' being a table to allow count of files being one
of several values. You can use auxiliary function like the following and
type(n) == 'table' check.

function value_in(val, arr)
    for _, elem in ipairs(arr) do
        if val == elem then
            return true
        end
    end
    return false
end

> @@ -31,7 +32,7 @@ for i = 1, 100 do s:auto_increment{} end
>  
>  -- Make sure replica join will take long enough for us to
>  -- invoke garbage collection.
> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
> +box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
>  
>  -- While the replica is receiving the initial data set,
>  -- make a snapshot and invoke garbage collection, then
> @@ -41,7 +42,7 @@ test_run:cmd("setopt delimiter ';'")
>  fiber.create(function()
>      fiber.sleep(0.1)
>      box.snapshot()
> -    box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
> +    box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
>  end)
>  test_run:cmd("setopt delimiter ''");
>  

The entire comment:

> -- While the replica is receiving the initial data set,
> -- make a snapshot and invoke garbage collection, then
> -- remove the timeout injection so that we don't have to
> -- wait too long for the replica to start.

Proposed: then remove delay to allow replica to start.

> --- Remove the timeout injection so that the replica catches
> +-- Resume replicaton so that the replica catches

replicaton -> replication

> @@ -146,17 +147,16 @@ box.snapshot()
>  _ = s:auto_increment{}
>  box.snapshot()
>  #box.info.gc().checkpoints == 1 or box.info.gc()
> -xlog_count = #fio.glob('./master/*.xlog')
>  -- the replica may have managed to download all data
>  -- from xlog #1 before it was stopped, in which case
>  -- it's OK to collect xlog #1
> -xlog_count == 3 or xlog_count == 2 or fio.listdir('./master')
> +wait_xlog(3, 0.1) or wait_xlog(2, 0.1) or fio.listdir('./master')

You are set timeout to 1.0 for other cases, but 0.2 here. So, 0.2 is
enough? It is better to allow the function to accept a table like {2, 3}
as files count. Use 'set delimiter' and update the function.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/5] test: put require in proper places
  2018-10-19 16:17 ` [PATCH v2 3/5] test: put require in proper places Sergei Voronezhskii
@ 2018-10-21 20:41   ` Alexander Turenko
  0 siblings, 0 replies; 25+ messages in thread
From: Alexander Turenko @ 2018-10-21 20:41 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

This commit looks good to me.

WBR, Alexander Turenko.

On Fri, Oct 19, 2018 at 07:17:19PM +0300, Sergei Voronezhskii wrote:
> * put `require('fiber')` after each switch server command, because
> sometimes got 'fiber' not defined error
> * use `require('fio')` after `require('test_run').new()`, because
> sometimes got 'fio' not defined error
> 
> Part of #2436, #3232

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[2]: [PATCH v2 1/5] test: cleanup replication tests
  2018-10-21 20:41   ` Alexander Turenko
@ 2018-10-22  8:07     ` Sergei Voronezhskii
  2018-10-23  3:21       ` Alexander Turenko
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-22  8:07 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]




>Воскресенье, 21 октября 2018, 23:41 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:
>
>Ok except minor comments below.
>
>It would be glab if you'll answer to review messages, cite reviewer
>comments and answer what was changed or why it didn't changed. It is
>hard to manually track all questions were raised, esp. when the patch is
>relatively big.
>
>For small changes (eps. for a relatively big patch) you can just answer
>for comments. Ideally also attach diff from the previous version at end
>of the email (just call `git diff` before commit your changes and
>paste). I can see the whole diff on the branch and paste related hunks
>to comment it. Other reviewers can ask you to paste the whole new diff
>explicitly, I personally prefer to run git diff and paste things I want
>to comment.
>
>WBR, Alexander Turenko.
>
>On Fri, Oct 19, 2018 at 07:17:17PM +0300, Sergei Voronezhskii wrote:
>> - at the end of tests which create any replication config need to call:
>>   * `test_run:cmd('delete server ...')` removes server object
>> from `TestState.servers` list, this behaviour was taken
>> from `test_run:drop_cluster()` function
>
>Please, indent it like so (for ease reading):
>
>- xxx
>  * yyy
>    zzz
>    aaa
>- fff
>  eee
>  * ggg
>    hhh 
fixed
>
>> diff --git a/test/replication/local_spaces.test.lua b/test/replication/local_spaces.test.lua
>> index 06e2b0bd2..633cc9f1a 100644
>> --- a/test/replication/local_spaces.test.lua
>> +++ b/test/replication/local_spaces.test.lua
>> @@ -76,6 +76,7 @@ box.space.test3:select()
>>  test_run:cmd("switch default")
>>  test_run:cmd("stop server replica")
>>  test_run:cmd("cleanup server replica")
>> +test_run:cleanup_cluster()
>>  box.schema.user.revoke('guest', 'replication')
>> 
>>  s1:select()
>
>Missed test_run:cmd("delete server replica")?
fixed

-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 2679 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[2]: [PATCH v2 2/5] test: errinj for pause relay_send
  2018-10-21 20:41   ` Alexander Turenko
@ 2018-10-22  8:42     ` Sergei Voronezhskii
  2018-10-23  3:22       ` Alexander Turenko
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-22  8:42 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 4070 bytes --]


>Воскресенье, 21 октября 2018, 23:41 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:
>
>Hi!
>
>I don't have objections in general. Some minor comments are below.
>
>Please, answer with fixes, don't just send the whole patch.
>
>WBR, Alexander Turenko.
>
>> Instead of using timeout we need just pause `relay_send`. Can't rely
>> on timeout because of various system load in parallel mode. Add new
>> errinj which checks boolean in loop and until it is not `True` do not
>> pass the method `relay_send` to the next statement.
>> 
>> To check the read-only mode, need to make a modification of tuple. It
>> is enough to call `replace` method. Instead of `delete` and then
>> useless verification that we have not delete space by using `get`
>> method.
>> 
>
>delete space -> delete tuple? 
fixed
>
>
>> +-- In the next two cases we try to replace a tuple while replica
>> +-- is catching up with the master (local delete, remote delete)
>
>delete -> replace
fixed
>
>> +-- case
>
>Nit: period at the end. 
fixed
>
>
>> --- check sync
>> -errinj.set("ERRINJ_RELAY_TIMEOUT", 0)
>> +-- Resume replicaton.
>
>replicaton -> replication
fixed
>
>> diff --git a/test/replication/gc.test.lua b/test/replication/gc.test.lua
>> index 5100378b3..22921289d 100644
>> --- a/test/replication/gc.test.lua
>> +++ b/test/replication/gc.test.lua
>> @@ -12,6 +12,7 @@ default_checkpoint_count = box.cfg.checkpoint_count
>>  box.cfg{checkpoint_count = 1}
>> 
>>  function wait_gc(n) while #box.info.gc().checkpoints > n do fiber.sleep(0.01) end end
>> +function wait_xlog(n, timeout) timeout = timeout or 1.0 return test_run:wait_cond(function() return #fio.glob('./master/*.xlog') == n end, timeout) end
>>
>
>Use 'set delimiter' and write it in several lines. Also, below I
>proposed to support 'n' being a table to allow count of files being one
>of several values. You can use auxiliary function like the following and
>type(n) == 'table' check.
>
>function value_in(val, arr)
>    for _, elem in ipairs(arr) do
>        if val == elem then
>            return true
>        end
>    end
>    return false
>end
fixed
>
>> @@ -31,7 +32,7 @@ for i = 1, 100 do s:auto_increment{} end
>> 
>>  -- Make sure replica join will take long enough for us to
>>  -- invoke garbage collection.
>> -box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.05)
>> +box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", true)
>> 
>>  -- While the replica is receiving the initial data set,
>>  -- make a snapshot and invoke garbage collection, then
>> @@ -41,7 +42,7 @@ test_run:cmd("setopt delimiter ';'")
>>  fiber.create(function()
>>      fiber.sleep(0.1)
>>      box.snapshot()
>> -    box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
>> +    box.error.injection.set("ERRINJ_RELAY_SEND_DELAY", false)
>>  end)
>>  test_run:cmd("setopt delimiter ''");
>> 
>
>The entire comment:
>
>> -- While the replica is receiving the initial data set,
>> -- make a snapshot and invoke garbage collection, then
>> -- remove the timeout injection so that we don't have to
>> -- wait too long for the replica to start.
>
>Proposed: then remove delay to allow replica to start.
fixed
>
>> --- Remove the timeout injection so that the replica catches
>> +-- Resume replicaton so that the replica catches
>
>replicaton -> replication
fixed
>
>> @@ -146,17 +147,16 @@ box.snapshot()
>>  _ = s:auto_increment{}
>>  box.snapshot()
>>  #box.info.gc().checkpoints == 1 or box.info.gc()
>> -xlog_count = #fio.glob('./master/*.xlog')
>>  -- the replica may have managed to download all data
>>  -- from xlog #1 before it was stopped, in which case
>>  -- it's OK to collect xlog #1
>> -xlog_count == 3 or xlog_count == 2 or fio.listdir('./master')
>> +wait_xlog(3, 0.1) or wait_xlog(2, 0.1) or fio.listdir('./master')
>
>You are set timeout to 1.0 for other cases, but 0.2 here. So, 0.2 is
>enough? It is better to allow the function to accept a table like {2, 3}
>as files count. Use 'set delimiter' and update the function.
fixed

-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 6571 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/5] test: cleanup replication tests
  2018-10-22  8:07     ` Re[2]: " Sergei Voronezhskii
@ 2018-10-23  3:21       ` Alexander Turenko
  0 siblings, 0 replies; 25+ messages in thread
From: Alexander Turenko @ 2018-10-23  3:21 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

Now ok for this commit.

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 2/5] test: errinj for pause relay_send
  2018-10-22  8:42     ` Re[2]: " Sergei Voronezhskii
@ 2018-10-23  3:22       ` Alexander Turenko
  0 siblings, 0 replies; 25+ messages in thread
From: Alexander Turenko @ 2018-10-23  3:22 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

Have no more objections against this commit.

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v2 4/5] test: use wait_cond to check follow status
  2018-10-19 23:24   ` Alexander Turenko
@ 2018-10-25 16:43     ` Sergei Voronezhskii
  2018-10-29 10:41       ` Alexander Turenko
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-25 16:43 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladimir Davydov, Alexander Turenko, Georgy Kirichenko

[-- Attachment #1: Type: text/plain, Size: 3275 bytes --]




>Суббота, 20 октября 2018, 2:24 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:
>
>Hi!
>
>See below.
>
>WBR, Alexander Turenko.
>
>On Fri, Oct 19, 2018 at 07:17:20PM +0300, Sergei Voronezhskii wrote:
>> If `test_run:wait_cond()` found a not 'follow` status it returns true.
>> Which immediately causes an error.
>> 
>> Fixes #3734
>> Part of #2436, #3232
>> ---
>>  test/replication/misc.result   | 17 +++++++++++------
>>  test/replication/misc.test.lua | 15 +++++++++------
>>  2 files changed, 20 insertions(+), 12 deletions(-)
>> 
>
>> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
>> index 06ad974db..3866eb3ac 100644
>> --- a/test/replication/misc.test.lua
>> +++ b/test/replication/misc.test.lua
>> @@ -53,15 +53,18 @@ fiber=require('fiber')
>>  box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
>>  _ = box.schema.space.create('test_timeout'):create_index('pk')
>>  test_run:cmd("setopt delimiter ';'")
>> +function wait_follow(replicaA, replicaB)
>> +    return test_run:wait_cond(function()
>> +        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
>> +    end, 0.01)
>> +end ;
>>  function test_timeout()
>>      for i = 0, 99 do 
>> +        local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
>> +        local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
>>          box.space.test_timeout:replace({1})
>> -        fiber.sleep(0.005)
>> -        local rinfo = box.info.replication
>> -        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
>> -           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
>> -           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
>> -            return error('Replication broken')
>> +        if wait_follow(replicaA, replicaB) then
>> +            return error(box.info.replication)
>
>AFAIU, this test case checks that replicas do not leave from 'follow'
>state even for a short time period. We should wait for 'follow' state
>before the loop and perform some amount of attemps to catch an another
>state. I don't sure, though. Georgy should draw the line.
>
>I still think correction of test cases is a developer responsibility. If
>you want to do it, please, discuss it with the author before. This will
>save us some time we spend now on those extra review iterations.
We discussed with Georgy how to do it:
function test_timeout()
    local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
    local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
    local follows = test_run:wait_cond(function()
        return replicaA.status == 'follow' or replicaB.status == 'follow'
    end, 0.1)
    if not follows then error('replicas not in follow status') end
    for i = 0, 99 do
        box.space.test_timeout:replace({1})
        if wait_follow(replicaA, replicaB) then
            return error(box.info.replication)
        end
    end
    return true
end ;

Branch was updated.
>
>>          end
>>      end
>>      return true
>> -- 
>> 2.18.0
>


-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 4405 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[2]: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-19 23:44   ` Alexander Turenko
@ 2018-10-26 12:41     ` Sergei Voronezhskii
  2018-10-26 12:44     ` Sergei Voronezhskii
  1 sibling, 0 replies; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-26 12:41 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 1771 bytes --]




>Суббота, 20 октября 2018, 2:44 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:
>
>I also observed a fail and a hang on box/:
>
>https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
>https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173
>
>Maybe test-run do something incorrect and the suites are not fully
>independent in the parallel run? Maybe some tests from box/ and
>replication/ conflicts on some global resource, say the same unix socket
>name in /tmp?
>
>Please, try to reproduce the issue and fix / report it.
Fixed ( https://github.com/tarantool/tarantool/issues/3765  ) 
>
>WBR, Alexander Turenko.
>
>On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
>> It seems you catched segfault on the replication suite with this patch
>> :)
>> 
>>  https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591
>> 
>> Please, comment whether you are able to reproduce it somehow (at least
>> in CI or maybe some specific machine or docker image) and file an issue.
>> Please, do not restart this job / build to have at least some proof
>> link, because now it does not clear whether the case is easy to
>> reproduce.
>> 
>> WBR, Alexander Turenko.
>> 
>> On Fri, Oct 19, 2018 at 07:17:16PM +0300, Sergei Voronezhskii wrote:
>> > BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
>> > Depends on:  https://github.com/tarantool/test-run/pull/128
>> > 
>> > Rebased on master
>> > 
>> > Sergei Voronezhskii (5):
>> >   test: cleanup replication tests
>> >   test: errinj for pause relay_send
>> >   test: put require in proper places
>> >   test: use wait_cond to check follow status
>> >   test: replication parallel mode on


-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 3068 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[2]: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-19 23:44   ` Alexander Turenko
  2018-10-26 12:41     ` Re[2]: " Sergei Voronezhskii
@ 2018-10-26 12:44     ` Sergei Voronezhskii
  2018-10-30 17:38       ` Re[3]: " Sergei Voronezhskii
  1 sibling, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-26 12:44 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 1820 bytes --]




>Суббота, 20 октября 2018, 2:44 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:
>
>I also observed a fail and a hang on box/:
>
>https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
>https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173
>
>Maybe test-run do something incorrect and the suites are not fully
>independent in the parallel run? Maybe some tests from box/ and
>replication/ conflicts on some global resource, say the same unix socket
>name in /tmp?
>
>Please, try to reproduce the issue and fix / report it.
Fixed problem with box/errinj (after adding new errinj, ordering of output 'box.error.injection.info()' changed)
>
>WBR, Alexander Turenko.
>
>On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
>> It seems you catched segfault on the replication suite with this patch
>> :)
>> 
>>  https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591
>> 
>> Please, comment whether you are able to reproduce it somehow (at least
>> in CI or maybe some specific machine or docker image) and file an issue.
>> Please, do not restart this job / build to have at least some proof
>> link, because now it does not clear whether the case is easy to
>> reproduce.
>> 
>> WBR, Alexander Turenko.
>> 
>> On Fri, Oct 19, 2018 at 07:17:16PM +0300, Sergei Voronezhskii wrote:
>> > BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
>> > Depends on:  https://github.com/tarantool/test-run/pull/128
>> > 
>> > Rebased on master
>> > 
>> > Sergei Voronezhskii (5):
>> >   test: cleanup replication tests
>> >   test: errinj for pause relay_send
>> >   test: put require in proper places
>> >   test: use wait_cond to check follow status
>> >   test: replication parallel mode on


-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 3050 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v2 4/5] test: use wait_cond to check follow status
  2018-10-25 16:43     ` [tarantool-patches] " Sergei Voronezhskii
@ 2018-10-29 10:41       ` Alexander Turenko
  2018-10-31 21:38         ` Re[2]: " Sergei Voronezhskii
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Turenko @ 2018-10-29 10:41 UTC (permalink / raw)
  To: Sergei Voronezhskii
  Cc: tarantool-patches, Vladimir Davydov, Georgy Kirichenko

> >> If `test_run:wait_cond()` found a not 'follow` status it returns true.
> >> Which immediately causes an error.
> >> 
> >> Fixes #3734
> >> Part of #2436, #3232
> >> ---
> >>  test/replication/misc.result   | 17 +++++++++++------
> >>  test/replication/misc.test.lua | 15 +++++++++------
> >>  2 files changed, 20 insertions(+), 12 deletions(-)
> >> 
> >
> >> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
> >> index 06ad974db..3866eb3ac 100644
> >> --- a/test/replication/misc.test.lua
> >> +++ b/test/replication/misc.test.lua
> >> @@ -53,15 +53,18 @@ fiber=require('fiber')
> >>  box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
> >>  _ = box.schema.space.create('test_timeout'):create_index('pk')
> >>  test_run:cmd("setopt delimiter ';'")
> >> +function wait_follow(replicaA, replicaB)
> >> +    return test_run:wait_cond(function()
> >> +        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
> >> +    end, 0.01)
> >> +end ;
> >>  function test_timeout()
> >>      for i = 0, 99 do 
> >> +        local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
> >> +        local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
> >>          box.space.test_timeout:replace({1})
> >> -        fiber.sleep(0.005)
> >> -        local rinfo = box.info.replication
> >> -        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
> >> -           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
> >> -           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
> >> -            return error('Replication broken')
> >> +        if wait_follow(replicaA, replicaB) then
> >> +            return error(box.info.replication)
> >
> >AFAIU, this test case checks that replicas do not leave from 'follow'
> >state even for a short time period. We should wait for 'follow' state
> >before the loop and perform some amount of attemps to catch an another
> >state. I don't sure, though. Georgy should draw the line.
> >
> >I still think correction of test cases is a developer responsibility. If
> >you want to do it, please, discuss it with the author before. This will
> >save us some time we spend now on those extra review iterations.
>
> We discussed with Georgy how to do it:
> function test_timeout()
>     local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
>     local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
>     local follows = test_run:wait_cond(function()
>         return replicaA.status == 'follow' or replicaB.status == 'follow'
>     end, 0.1)
>     if not follows then error('replicas not in follow status') end
>     for i = 0, 99 do
>         box.space.test_timeout:replace({1})
>         if wait_follow(replicaA, replicaB) then
>             return error(box.info.replication)
>         end
>     end
>     return true
> end ;
> 
> Branch was updated.

Now I understand the approach. It looks good. I think it should be
commented inside the test, because it is counter-intuitive that wait_xxx
function returns true when something went wrong.

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[3]: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-26 12:44     ` Sergei Voronezhskii
@ 2018-10-30 17:38       ` Sergei Voronezhskii
  2018-10-31 18:28         ` Alexander Turenko
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-30 17:38 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Vladimir Davydov, Alexander Turenko

[-- Attachment #1: Type: text/plain, Size: 2095 bytes --]




>Пятница, 26 октября 2018, 15:45 +03:00 от Sergei Voronezhskii <sergw@tarantool.org>:
>
>
>
>
>>Суббота, 20 октября 2018, 2:44 +03:00 от Alexander Turenko < alexander.turenko@tarantool.org >:
>>
>>I also observed a fail and a hang on box/:
>>
>>https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
>>https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173
>>
>>Maybe test-run do something incorrect and the suites are not fully
>>independent in the parallel run? Maybe some tests from box/ and
>>replication/ conflicts on some global resource, say the same unix socket
>>name in /tmp?
>>
>>Please, try to reproduce the issue and fix / report it.
>Fixed problem with box/errinj (after adding new errinj, ordering of output 'box.error.injection.info()' changed) 
Problem with engine/ddl is not depends on enabling replication tests in parallel mode.
>
>>
>>WBR, Alexander Turenko.
>>
>>On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
>>> It seems you catched segfault on the replication suite with this patch
>>> :)
>>> 
>>>  https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591
>>> 
>>> Please, comment whether you are able to reproduce it somehow (at least
>>> in CI or maybe some specific machine or docker image) and file an issue.
>>> Please, do not restart this job / build to have at least some proof
>>> link, because now it does not clear whether the case is easy to
>>> reproduce.
>>> 
>>> WBR, Alexander Turenko.
>>> 
>>> On Fri, Oct 19, 2018 at 07:17:16PM +0300, Sergei Voronezhskii wrote:
>>> > BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
>>> > Depends on:  https://github.com/tarantool/test-run/pull/128
>>> > 
>>> > Rebased on master
>>> > 
>>> > Sergei Voronezhskii (5):
>>> >   test: cleanup replication tests
>>> >   test: errinj for pause relay_send
>>> >   test: put require in proper places
>>> >   test: use wait_cond to check follow status
>>> >   test: replication parallel mode on
>
>
>-- 
>Sergei Voronezhskii


-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 4170 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-30 17:38       ` Re[3]: " Sergei Voronezhskii
@ 2018-10-31 18:28         ` Alexander Turenko
  2018-11-26 13:04           ` Re[2]: " Sergei Voronezhskii
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Turenko @ 2018-10-31 18:28 UTC (permalink / raw)
  To: Sergei Voronezhskii; +Cc: tarantool-patches, Vladimir Davydov

> >>I also observed a fail and a hang on box/:
> >>
> >>https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
> >>https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173
> >>
> >>Maybe test-run do something incorrect and the suites are not fully
> >>independent in the parallel run? Maybe some tests from box/ and
> >>replication/ conflicts on some global resource, say the same unix socket
> >>name in /tmp?
> >>
> >>Please, try to reproduce the issue and fix / report it.
> >
> >Fixed problem with box/errinj (after adding new errinj, ordering of output 'box.error.injection.info()' changed) 
>
> Problem with engine/ddl is not depends on enabling replication tests in parallel mode.

Please, file an issue re flaky test even if you do not have much details
at the time.

So it seems I have no more comments. Please, proceed with Vova.

> >
> >>
> >>WBR, Alexander Turenko.
> >>
> >>On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
> >>> It seems you catched segfault on the replication suite with this patch
> >>> :)
> >>> 
> >>>  https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591

To track: it was introduced in the patch and was fixed. See [1] for
details.

[1]: https://github.com/tarantool/tarantool/issues/3765

WBR, Alexander Turenko.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[2]: [tarantool-patches] Re: [PATCH v2 4/5] test: use wait_cond to check follow status
  2018-10-29 10:41       ` Alexander Turenko
@ 2018-10-31 21:38         ` Sergei Voronezhskii
  0 siblings, 0 replies; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-10-31 21:38 UTC (permalink / raw)
  To: Alexander Turenko; +Cc: tarantool-patches, Vladimir Davydov, Georgy Kirichenko




>Понедельник, 29 октября 2018, 13:41 +03:00 от Alexander Turenko <alexander.turenko@tarantool.org>:
>
>> >> If `test_run:wait_cond()` found a not 'follow` status it returns true.
>> >> Which immediately causes an error.
>> >> 
>> >> Fixes #3734
>> >> Part of #2436, #3232
>> >> ---
>> >>  test/replication/misc.result   | 17 +++++++++++------
>> >>  test/replication/misc.test.lua | 15 +++++++++------
>> >>  2 files changed, 20 insertions(+), 12 deletions(-)
>> >> 
>> >
>> >> diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
>> >> index 06ad974db..3866eb3ac 100644
>> >> --- a/test/replication/misc.test.lua
>> >> +++ b/test/replication/misc.test.lua
>> >> @@ -53,15 +53,18 @@ fiber=require('fiber')
>> >>  box.cfg{replication_timeout = 0.01, replication_connect_timeout=0.01}
>> >>  _ = box.schema.space.create('test_timeout'):create_index('pk')
>> >>  test_run:cmd("setopt delimiter ';'")
>> >> +function wait_follow(replicaA, replicaB)
>> >> +    return test_run:wait_cond(function()
>> >> +        return replicaA.status ~= 'follow' or replicaB.status ~= 'follow'
>> >> +    end, 0.01)
>> >> +end ;
>> >>  function test_timeout()
>> >>      for i = 0, 99 do 
>> >> +        local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
>> >> +        local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
>> >>          box.space.test_timeout:replace({1})
>> >> -        fiber.sleep(0.005)
>> >> -        local rinfo = box.info.replication
>> >> -        if rinfo[1].upstream and rinfo[1].upstream.status ~= 'follow' or
>> >> -           rinfo[2].upstream and rinfo[2].upstream.status ~= 'follow' or
>> >> -           rinfo[3].upstream and rinfo[3].upstream.status ~= 'follow' then
>> >> -            return error('Replication broken')
>> >> +        if wait_follow(replicaA, replicaB) then
>> >> +            return error(box.info.replication)
>> >
>> >AFAIU, this test case checks that replicas do not leave from 'follow'
>> >state even for a short time period. We should wait for 'follow' state
>> >before the loop and perform some amount of attemps to catch an another
>> >state. I don't sure, though. Georgy should draw the line.
>> >
>> >I still think correction of test cases is a developer responsibility. If
>> >you want to do it, please, discuss it with the author before. This will
>> >save us some time we spend now on those extra review iterations.
>>
>> We discussed with Georgy how to do it:
>> function test_timeout()
>>     local replicaA = box.info.replication[1].upstream or box.info.replication[2].upstream
>>     local replicaB = box.info.replication[3].upstream or box.info.replication[2].upstream
>>     local follows = test_run:wait_cond(function()
>>         return replicaA.status == 'follow' or replicaB.status == 'follow'
>>     end, 0.1)
>>     if not follows then error('replicas not in follow status') end
>>     for i = 0, 99 do
>>         box.space.test_timeout:replace({1})
>>         if wait_follow(replicaA, replicaB) then
>>             return error(box.info.replication)
>>         end
>>     end
>>     return true
>> end ;
>> 
>> Branch was updated.
>
>Now I understand the approach. It looks good. I think it should be
>commented inside the test, because it is counter-intuitive that wait_xxx
>function returns true when something went wrong.

Updated wait_follow with comment:
-- The wait_cond checking the function in loop while it returns false.
-- It returns true if replicas status changed from follow state.
-- Which immediately causes an error

>WBR, Alexander Turenko.


-- 
Sergei Voronezhskii

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[2]: [PATCH v2 0/5] test: replication parallel mode on
  2018-10-31 18:28         ` Alexander Turenko
@ 2018-11-26 13:04           ` Sergei Voronezhskii
  2018-12-05  4:44             ` Re[3]: " Sergei Voronezhskii
  0 siblings, 1 reply; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-11-26 13:04 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]


I've checked with 1.10 and it's ok:
BUILD:  https://travis-ci.org/tarantool/tarantool/builds/458795680  
BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean

I've also rebased on 2.1 and it's ok:
BUILD:  https://travis-ci.org/tarantool/tarantool/builds/459728459
BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean-2.1

>Среда, 31 октября 2018, 21:28 +03:00 от Alexander Turenko < alexander.turenko@tarantool.org >:
>
>> >>I also observed a fail and a hang on box/:
>> >>
>> >> https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
>> >> https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173
>> >>
>> >>Maybe test-run do something incorrect and the suites are not fully
>> >>independent in the parallel run? Maybe some tests from box/ and
>> >>replication/ conflicts on some global resource, say the same unix socket
>> >>name in /tmp?
>> >>
>> >>Please, try to reproduce the issue and fix / report it.
>> >
>> >Fixed problem with box/errinj (after adding new errinj, ordering of output 'box.error.injection.info()' changed) 
>>
>> Problem with engine/ddl is not depends on enabling replication tests in parallel mode.
>
>Please, file an issue re flaky test even if you do not have much details
>at the time.
>
>So it seems I have no more comments. Please, proceed with Vova.
>
>> >
>> >>
>> >>WBR, Alexander Turenko.
>> >>
>> >>On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
>> >>> It seems you catched segfault on the replication suite with this patch
>> >>> :)
>> >>> 
>> >>>  https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591
>
>To track: it was introduced in the patch and was fixed. See [1] for
>details.
>
>[1]:  https://github.com/tarantool/tarantool/issues/3765
>
>WBR, Alexander Turenko.


-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 3957 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re[3]: [PATCH v2 0/5] test: replication parallel mode on
  2018-11-26 13:04           ` Re[2]: " Sergei Voronezhskii
@ 2018-12-05  4:44             ` Sergei Voronezhskii
  0 siblings, 0 replies; 25+ messages in thread
From: Sergei Voronezhskii @ 2018-12-05  4:44 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Alexander Turenko, Vladimir Davydov

[-- Attachment #1: Type: text/plain, Size: 2073 bytes --]

Up

>Понедельник, 26 ноября 2018, 16:04 +03:00 от Sergei Voronezhskii <sergw@tarantool.org>:
>
>I've checked with 1.10 and it's ok:
>BUILD:  https://travis-ci.org/tarantool/tarantool/builds/458795680  
>BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean
>
>I've also rebased on 2.1 and it's ok:
>BUILD:  https://travis-ci.org/tarantool/tarantool/builds/459728459
>BRANCH:  https://github.com/tarantool/tarantool/tree/sergw/enable-parallel-test-replication-clean-2.1
>
>>Среда, 31 октября 2018, 21:28 +03:00 от Alexander Turenko < alexander.turenko@tarantool.org >:
>>
>>> >>I also observed a fail and a hang on box/:
>>> >>
>>> >> https://travis-ci.org/tarantool/tarantool/jobs/443475294#L2496
>>> >> https://travis-ci.org/tarantool/tarantool/jobs/443475296#L6173
>>> >>
>>> >>Maybe test-run do something incorrect and the suites are not fully
>>> >>independent in the parallel run? Maybe some tests from box/ and
>>> >>replication/ conflicts on some global resource, say the same unix socket
>>> >>name in /tmp?
>>> >>
>>> >>Please, try to reproduce the issue and fix / report it.
>>> >
>>> >Fixed problem with box/errinj (after adding new errinj, ordering of output 'box.error.injection.info()' changed) 
>>>
>>> Problem with engine/ddl is not depends on enabling replication tests in parallel mode.
>>
>>Please, file an issue re flaky test even if you do not have much details
>>at the time.
>>
>>So it seems I have no more comments. Please, proceed with Vova.
>>
>>> >
>>> >>
>>> >>WBR, Alexander Turenko.
>>> >>
>>> >>On Sat, Oct 20, 2018 at 02:37:01AM +0300, Alexander Turenko wrote:
>>> >>> It seems you catched segfault on the replication suite with this patch
>>> >>> :)
>>> >>> 
>>> >>>  https://travis-ci.org/tarantool/tarantool/jobs/443475291#L2591
>>
>>To track: it was introduced in the patch and was fixed. See [1] for
>>details.
>>
>>[1]:  https://github.com/tarantool/tarantool/issues/3765
>>
>>WBR, Alexander Turenko.
>
>
>-- 
>Sergei Voronezhskii


-- 
Sergei Voronezhskii

[-- Attachment #2: Type: text/html, Size: 4858 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-12-05  4:44 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 16:17 [PATCH v2 0/5] test: replication parallel mode on Sergei Voronezhskii
2018-10-19 16:17 ` [PATCH v2 1/5] test: cleanup replication tests Sergei Voronezhskii
2018-10-21 20:41   ` Alexander Turenko
2018-10-22  8:07     ` Re[2]: " Sergei Voronezhskii
2018-10-23  3:21       ` Alexander Turenko
2018-10-19 16:17 ` [PATCH v2 2/5] test: errinj for pause relay_send Sergei Voronezhskii
2018-10-21 20:41   ` Alexander Turenko
2018-10-22  8:42     ` Re[2]: " Sergei Voronezhskii
2018-10-23  3:22       ` Alexander Turenko
2018-10-19 16:17 ` [PATCH v2 3/5] test: put require in proper places Sergei Voronezhskii
2018-10-21 20:41   ` Alexander Turenko
2018-10-19 16:17 ` [PATCH v2 4/5] test: use wait_cond to check follow status Sergei Voronezhskii
2018-10-19 23:24   ` Alexander Turenko
2018-10-25 16:43     ` [tarantool-patches] " Sergei Voronezhskii
2018-10-29 10:41       ` Alexander Turenko
2018-10-31 21:38         ` Re[2]: " Sergei Voronezhskii
2018-10-19 16:17 ` [PATCH v2 5/5] test: replication parallel mode on Sergei Voronezhskii
2018-10-19 23:37 ` [PATCH v2 0/5] " Alexander Turenko
2018-10-19 23:44   ` Alexander Turenko
2018-10-26 12:41     ` Re[2]: " Sergei Voronezhskii
2018-10-26 12:44     ` Sergei Voronezhskii
2018-10-30 17:38       ` Re[3]: " Sergei Voronezhskii
2018-10-31 18:28         ` Alexander Turenko
2018-11-26 13:04           ` Re[2]: " Sergei Voronezhskii
2018-12-05  4:44             ` Re[3]: " Sergei Voronezhskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox