Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH v4 0/3] replication: do not ignore replication_connect_quorum.
@ 2018-08-14 10:02 Serge Petrenko
  2018-08-14 10:02 ` [PATCH v4 1/3] test: update test-run Serge Petrenko
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Serge Petrenko @ 2018-08-14 10:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: georgy, vdavydov.dev, Serge Petrenko

https://github.com/tarantool/tarantool/issues/3428
https://github.com/tarantool/tarantool/tree/sergepetrenko/gh-3428-replication-connect-quorum

Previously replication_connect_quorum setting was ignored during initial
bootstrap and during replication reconfiguration: the instance tried to
connect to every other instance listed in box.cfg.replication, and threw
an error, if it faild to do so. Now instance tries to connect to every
other instance possible during replication_connect_timeout, but doesn't
throw an error, if it was able to connect to replication_connect_quorum
instances.

First patch updates test-run to allow to pass arguments to instances
started with create_cluster.

Second patch utilizes the new functionality of test-run to update replication
tests with easy to control start options for instances.
Also alter on_replace.lua instance file to eliminate possible timing error, when
executing code before box.once could cause an error "user 'cluster' is not found".

Third patch introduces new replication_connect_quorum behaviour and adds
a test case to check that replication_conenct_quorum isn't ignored anymore.

Changes in v4: 
  - split the patch into 3 separate patches.
  - update test-run to allow passing start
    arguments to create_cluster().
  - update comments to match new behaviour.
  - in test instances when no timeout option
    is passed, set a default option.
  - be consistent when choosing timeouts
    for tests.

Changes in v3: 
 - Added a documentation request to TarantoolBot to
   commit message.
 - removed timeout parameter from box_sync_replication()
   and replicaset_connect()
 - rewritten replication tests to start instances with
   different replication_connect_timeout and 
   replication_timeout parameters. This made tests run 
   faster and more stable.
 - added a new test case to replication/quorum.test.lua
   to check that replication_connect_quorum isn't ignored
   anymore during bootstrap and reconfiguration.

Changes in v2: 
 - change test/replication/ddl.lua instance file to fix 
   test failure on Travis.

Serge Petrenko (3):
  test: update test-run
  Add arguments to replication test instances.
  replication: do not ignore replication_connect_quorum.

 src/box/box.cc                                 | 35 +++++++++-----
 src/box/replication.cc                         | 11 +++--
 src/box/replication.h                          |  7 +--
 test-run                                       |  2 +-
 test/replication-py/init_storage.test.py       |  2 +-
 test/replication-py/master.lua                 |  1 +
 test/replication-py/replica.lua                |  1 +
 test/replication/autobootstrap.lua             |  6 ++-
 test/replication/autobootstrap.result          |  4 +-
 test/replication/autobootstrap.test.lua        |  4 +-
 test/replication/autobootstrap_guest.lua       |  7 ++-
 test/replication/autobootstrap_guest.result    |  2 +-
 test/replication/autobootstrap_guest.test.lua  |  2 +-
 test/replication/before_replace.result         |  8 ++--
 test/replication/before_replace.test.lua       |  8 ++--
 test/replication/catch.result                  |  2 +-
 test/replication/catch.test.lua                |  2 +-
 test/replication/ddl.lua                       |  7 ++-
 test/replication/ddl.result                    |  2 +-
 test/replication/ddl.test.lua                  |  2 +-
 test/replication/errinj.result                 |  6 +--
 test/replication/errinj.test.lua               |  6 +--
 test/replication/master.lua                    |  1 +
 test/replication/master_quorum.lua             |  7 ++-
 test/replication/misc.result                   |  2 +-
 test/replication/misc.test.lua                 |  2 +-
 test/replication/on_replace.lua                | 14 ++++--
 test/replication/on_replace.result             |  2 +-
 test/replication/on_replace.test.lua           |  2 +-
 test/replication/quorum.lua                    |  8 +++-
 test/replication/quorum.result                 | 65 ++++++++++++++++++++++----
 test/replication/quorum.test.lua               | 36 ++++++++++----
 test/replication/rebootstrap.lua               |  8 +++-
 test/replication/rebootstrap.result            |  6 +--
 test/replication/rebootstrap.test.lua          |  6 +--
 test/replication/recover_missing_xlog.result   |  4 +-
 test/replication/recover_missing_xlog.test.lua |  4 +-
 test/replication/replica_no_quorum.lua         |  3 +-
 test/replication/replica_quorum.lua            | 24 ++++++++++
 test/replication/replica_timeout.lua           |  3 +-
 test/replication/replica_uuid_ro.lua           |  7 ++-
 test/replication/replicaset_ro_mostly.result   |  8 ++--
 test/replication/replicaset_ro_mostly.test.lua |  8 ++--
 43 files changed, 251 insertions(+), 96 deletions(-)
 create mode 100644 test/replication/replica_quorum.lua

-- 
2.15.2 (Apple Git-101.1)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 1/3] test: update test-run
  2018-08-14 10:02 [PATCH v4 0/3] replication: do not ignore replication_connect_quorum Serge Petrenko
@ 2018-08-14 10:02 ` Serge Petrenko
  2018-08-14 10:02 ` [PATCH v4 2/3] Add arguments to replication test instances Serge Petrenko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Serge Petrenko @ 2018-08-14 10:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: georgy, vdavydov.dev, Serge Petrenko

* allow to pass arguments to servers started with create_cluster()
---
 test-run | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test-run b/test-run
index ed45e1dbd..0aa25ae8a 160000
--- a/test-run
+++ b/test-run
@@ -1 +1 @@
-Subproject commit ed45e1dbd36ab9109b84ef7189ef9d7e4b813fb9
+Subproject commit 0aa25ae8a9d4af977b3c3478cba3ccdc4ef81d35
-- 
2.15.2 (Apple Git-101.1)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 2/3] Add arguments to replication test instances.
  2018-08-14 10:02 [PATCH v4 0/3] replication: do not ignore replication_connect_quorum Serge Petrenko
  2018-08-14 10:02 ` [PATCH v4 1/3] test: update test-run Serge Petrenko
@ 2018-08-14 10:02 ` Serge Petrenko
  2018-08-14 10:02 ` [PATCH v4 3/3] replication: do not ignore replication_connect_quorum Serge Petrenko
  2018-08-14 17:07 ` [PATCH v4 0/3] " Vladimir Davydov
  3 siblings, 0 replies; 5+ messages in thread
From: Serge Petrenko @ 2018-08-14 10:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: georgy, vdavydov.dev, Serge Petrenko

Add start arguments to replication test instances to control
replication_timeout and replication_connect_timeout settings between
restarts.

Prerequisite #3428
---
 test/replication-py/init_storage.test.py       |  2 +-
 test/replication-py/master.lua                 |  1 +
 test/replication-py/replica.lua                |  1 +
 test/replication/autobootstrap.lua             |  6 +++++-
 test/replication/autobootstrap.result          |  4 ++--
 test/replication/autobootstrap.test.lua        |  4 ++--
 test/replication/autobootstrap_guest.lua       |  7 ++++++-
 test/replication/autobootstrap_guest.result    |  2 +-
 test/replication/autobootstrap_guest.test.lua  |  2 +-
 test/replication/before_replace.result         |  8 ++++----
 test/replication/before_replace.test.lua       |  8 ++++----
 test/replication/catch.result                  |  2 +-
 test/replication/catch.test.lua                |  2 +-
 test/replication/ddl.lua                       |  7 ++++++-
 test/replication/ddl.result                    |  2 +-
 test/replication/ddl.test.lua                  |  2 +-
 test/replication/errinj.result                 |  6 +++---
 test/replication/errinj.test.lua               |  6 +++---
 test/replication/master.lua                    |  1 +
 test/replication/master_quorum.lua             |  7 ++++++-
 test/replication/misc.result                   |  2 +-
 test/replication/misc.test.lua                 |  2 +-
 test/replication/on_replace.lua                | 14 +++++++++-----
 test/replication/on_replace.result             |  2 +-
 test/replication/on_replace.test.lua           |  2 +-
 test/replication/quorum.lua                    |  8 ++++++--
 test/replication/quorum.result                 | 16 ++++++++--------
 test/replication/quorum.test.lua               | 16 ++++++++--------
 test/replication/rebootstrap.lua               |  8 ++++++--
 test/replication/rebootstrap.result            |  6 +++---
 test/replication/rebootstrap.test.lua          |  6 +++---
 test/replication/recover_missing_xlog.result   |  4 ++--
 test/replication/recover_missing_xlog.test.lua |  4 ++--
 test/replication/replica_no_quorum.lua         |  3 ++-
 test/replication/replica_timeout.lua           |  3 ++-
 test/replication/replica_uuid_ro.lua           |  7 ++++++-
 test/replication/replicaset_ro_mostly.result   |  8 +++++---
 test/replication/replicaset_ro_mostly.test.lua |  8 +++++---
 38 files changed, 122 insertions(+), 77 deletions(-)

diff --git a/test/replication-py/init_storage.test.py b/test/replication-py/init_storage.test.py
index 0911a02c0..32b4639f1 100644
--- a/test/replication-py/init_storage.test.py
+++ b/test/replication-py/init_storage.test.py
@@ -57,7 +57,7 @@ print '-------------------------------------------------------------'
 
 server.stop()
 replica = TarantoolServer(server.ini)
-replica.script = 'replication/replica.lua'
+replica.script = 'replication-py/replica.lua'
 replica.vardir = server.vardir #os.path.join(server.vardir, 'replica')
 replica.rpl_master = master
 replica.deploy(wait=False)
diff --git a/test/replication-py/master.lua b/test/replication-py/master.lua
index 0f9f7a6f0..e924b5495 100644
--- a/test/replication-py/master.lua
+++ b/test/replication-py/master.lua
@@ -3,6 +3,7 @@ os = require('os')
 box.cfg({
     listen              = os.getenv("LISTEN"),
     memtx_memory        = 107374182,
+    replication_timeout = 0.1
 })
 
 require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication-py/replica.lua b/test/replication-py/replica.lua
index 278291bba..32d888eff 100644
--- a/test/replication-py/replica.lua
+++ b/test/replication-py/replica.lua
@@ -7,6 +7,7 @@ box.cfg({
     listen              = os.getenv("LISTEN"),
     replication         = os.getenv("MASTER"),
     memtx_memory        = 107374182,
+    replication_timeout = 0.1
 })
 
 box_cfg_done = true
diff --git a/test/replication/autobootstrap.lua b/test/replication/autobootstrap.lua
index 4f55417ae..856b36e66 100644
--- a/test/replication/autobootstrap.lua
+++ b/test/replication/autobootstrap.lua
@@ -5,6 +5,9 @@ local INSTANCE_ID = string.match(arg[0], "%d")
 local USER = 'cluster'
 local PASSWORD = 'somepassword'
 local SOCKET_DIR = require('fio').cwd()
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/autobootstrap'..instance_id..'.sock';
@@ -21,7 +24,8 @@ box.cfg({
         USER..':'..PASSWORD..'@'..instance_uri(2);
         USER..':'..PASSWORD..'@'..instance_uri(3);
     };
-    replication_connect_timeout = 0.5,
+    replication_timeout = TIMEOUT;
+    replication_connect_timeout = CON_TIMEOUT;
 })
 
 box.once("bootstrap", function()
diff --git a/test/replication/autobootstrap.result b/test/replication/autobootstrap.result
index 04aeb4315..91badc1f1 100644
--- a/test/replication/autobootstrap.result
+++ b/test/replication/autobootstrap.result
@@ -13,7 +13,7 @@ SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 --
 -- Start servers
 --
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 --
@@ -161,7 +161,7 @@ box.space.test_u:select()
 _ = test_run:cmd("switch autobootstrap1")
 ---
 ...
-_ = test_run:cmd("restart server autobootstrap1 with cleanup=1")
+_ = test_run:cmd("restart server autobootstrap1 with cleanup=1, args ='0.1 0.5'")
 _ = box.space.test_u:replace({5, 6, 7, 8})
 ---
 ...
diff --git a/test/replication/autobootstrap.test.lua b/test/replication/autobootstrap.test.lua
index f1e2a9991..752d5f317 100644
--- a/test/replication/autobootstrap.test.lua
+++ b/test/replication/autobootstrap.test.lua
@@ -8,7 +8,7 @@ SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 --
 -- Start servers
 --
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 
 --
 -- Wait for full mesh
@@ -76,7 +76,7 @@ box.space.test_u:select()
 -- Rebootstrap one node and check that others follow.
 --
 _ = test_run:cmd("switch autobootstrap1")
-_ = test_run:cmd("restart server autobootstrap1 with cleanup=1")
+_ = test_run:cmd("restart server autobootstrap1 with cleanup=1, args ='0.1 0.5'")
 
 _ = box.space.test_u:replace({5, 6, 7, 8})
 box.space.test_u:select()
diff --git a/test/replication/autobootstrap_guest.lua b/test/replication/autobootstrap_guest.lua
index 40fef2c7a..d7176ae5b 100644
--- a/test/replication/autobootstrap_guest.lua
+++ b/test/replication/autobootstrap_guest.lua
@@ -4,6 +4,10 @@
 local INSTANCE_ID = string.match(arg[0], "%d")
 
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/autobootstrap_guest'..instance_id..'.sock';
@@ -20,7 +24,8 @@ box.cfg({
         instance_uri(2);
         instance_uri(3);
     };
-    replication_connect_timeout = 0.5,
+    replication_timeout = TIMEOUT;
+    replication_connect_timeout = CON_TIMEOUT;
 })
 
 box.once("bootstrap", function()
diff --git a/test/replication/autobootstrap_guest.result b/test/replication/autobootstrap_guest.result
index 49f9bee01..1efef310c 100644
--- a/test/replication/autobootstrap_guest.result
+++ b/test/replication/autobootstrap_guest.result
@@ -13,7 +13,7 @@ SERVERS = { 'autobootstrap_guest1', 'autobootstrap_guest2', 'autobootstrap_guest
 --
 -- Start servers
 --
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 --
diff --git a/test/replication/autobootstrap_guest.test.lua b/test/replication/autobootstrap_guest.test.lua
index 70250cff9..3aad8a4da 100644
--- a/test/replication/autobootstrap_guest.test.lua
+++ b/test/replication/autobootstrap_guest.test.lua
@@ -7,7 +7,7 @@ SERVERS = { 'autobootstrap_guest1', 'autobootstrap_guest2', 'autobootstrap_guest
 --
 -- Start servers
 --
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 
 --
 -- Wait for full mesh
diff --git a/test/replication/before_replace.result b/test/replication/before_replace.result
index d561b4813..00f9bcb8b 100644
--- a/test/replication/before_replace.result
+++ b/test/replication/before_replace.result
@@ -11,7 +11,7 @@ SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
 ...
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
@@ -125,7 +125,7 @@ box.space.test:select()
   - [9, 90]
   - [10, 100]
 ...
-test_run:cmd('restart server autobootstrap1')
+test_run:cmd('restart server autobootstrap1 with args="0.1 0.5"')
 box.space.test:select()
 ---
 - - [1, 10]
@@ -156,7 +156,7 @@ box.space.test:select()
   - [9, 90]
   - [10, 100]
 ...
-test_run:cmd('restart server autobootstrap2')
+test_run:cmd('restart server autobootstrap2 with args="0.1 0.5"')
 box.space.test:select()
 ---
 - - [1, 10]
@@ -187,7 +187,7 @@ box.space.test:select()
   - [9, 90]
   - [10, 100]
 ...
-test_run:cmd('restart server autobootstrap3')
+test_run:cmd('restart server autobootstrap3 with args="0.1 0.5"')
 box.space.test:select()
 ---
 - - [1, 10]
diff --git a/test/replication/before_replace.test.lua b/test/replication/before_replace.test.lua
index 2c6912d06..b49d438b4 100644
--- a/test/replication/before_replace.test.lua
+++ b/test/replication/before_replace.test.lua
@@ -7,7 +7,7 @@ test_run = env.new()
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 
 -- Setup space:before_replace trigger on all replicas.
@@ -54,15 +54,15 @@ vclock2 = test_run:wait_cluster_vclock(SERVERS, vclock)
 -- and the state persists after restart.
 test_run:cmd("switch autobootstrap1")
 box.space.test:select()
-test_run:cmd('restart server autobootstrap1')
+test_run:cmd('restart server autobootstrap1 with args="0.1 0.5"')
 box.space.test:select()
 test_run:cmd("switch autobootstrap2")
 box.space.test:select()
-test_run:cmd('restart server autobootstrap2')
+test_run:cmd('restart server autobootstrap2 with args="0.1 0.5"')
 box.space.test:select()
 test_run:cmd("switch autobootstrap3")
 box.space.test:select()
-test_run:cmd('restart server autobootstrap3')
+test_run:cmd('restart server autobootstrap3 with args="0.1 0.5"')
 box.space.test:select()
 
 
diff --git a/test/replication/catch.result b/test/replication/catch.result
index 91be32725..0f72e89e2 100644
--- a/test/replication/catch.result
+++ b/test/replication/catch.result
@@ -23,7 +23,7 @@ test_run:cmd("create server replica with rpl_master=default, script='replication
 ---
 - true
 ...
-test_run:cmd("start server replica with args='1'")
+test_run:cmd("start server replica with args='0.1'")
 ---
 - true
 ...
diff --git a/test/replication/catch.test.lua b/test/replication/catch.test.lua
index 2e2e97bc4..457f910e9 100644
--- a/test/replication/catch.test.lua
+++ b/test/replication/catch.test.lua
@@ -9,7 +9,7 @@ errinj = box.error.injection
 
 box.schema.user.grant('guest', 'replication')
 test_run:cmd("create server replica with rpl_master=default, script='replication/replica_timeout.lua'")
-test_run:cmd("start server replica with args='1'")
+test_run:cmd("start server replica with args='0.1'")
 test_run:cmd("switch replica")
 
 test_run:cmd("switch default")
diff --git a/test/replication/ddl.lua b/test/replication/ddl.lua
index 694f40eac..72cf1db69 100644
--- a/test/replication/ddl.lua
+++ b/test/replication/ddl.lua
@@ -5,6 +5,10 @@ local INSTANCE_ID = string.match(arg[0], "%d")
 local USER = 'cluster'
 local PASSWORD = 'somepassword'
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT  = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/autobootstrap'..instance_id..'.sock';
@@ -22,7 +26,8 @@ box.cfg({
         USER..':'..PASSWORD..'@'..instance_uri(3);
         USER..':'..PASSWORD..'@'..instance_uri(4);
     };
-    replication_connect_timeout = 0.5,
+    replication_timeout = TIMEOUT,
+    replication_connect_timeout = CON_TIMEOUT,
 })
 
 box.once("bootstrap", function()
diff --git a/test/replication/ddl.result b/test/replication/ddl.result
index cc61fd4ce..8cd54cdfb 100644
--- a/test/replication/ddl.result
+++ b/test/replication/ddl.result
@@ -5,7 +5,7 @@ SERVERS = { 'ddl1', 'ddl2', 'ddl3', 'ddl4' }
 ---
 ...
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
diff --git a/test/replication/ddl.test.lua b/test/replication/ddl.test.lua
index 4cf4ffa38..f56071adc 100644
--- a/test/replication/ddl.test.lua
+++ b/test/replication/ddl.test.lua
@@ -3,7 +3,7 @@ test_run = require('test_run').new()
 SERVERS = { 'ddl1', 'ddl2', 'ddl3', 'ddl4' }
 
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 test_run:cmd("switch ddl1")
 test_run = require('test_run').new()
diff --git a/test/replication/errinj.result b/test/replication/errinj.result
index ca8af2988..3fc432010 100644
--- a/test/replication/errinj.result
+++ b/test/replication/errinj.result
@@ -418,7 +418,7 @@ test_run:cmd("create server replica_timeout with rpl_master=default, script='rep
 ---
 - true
 ...
-test_run:cmd("start server replica_timeout with args='0.01'")
+test_run:cmd("start server replica_timeout with args='0.01 0.5'")
 ---
 - true
 ...
@@ -474,7 +474,7 @@ errinj.set("ERRINJ_RELAY_REPORT_INTERVAL", 0)
 ...
 -- Check replica's ACKs don't prevent the master from sending
 -- heartbeat messages (gh-3160).
-test_run:cmd("start server replica_timeout with args='0.009'")
+test_run:cmd("start server replica_timeout with args='0.009 0.5'")
 ---
 - true
 ...
@@ -522,7 +522,7 @@ for i = 0, 9999 do box.space.test:replace({i, 4, 5, 'test'}) end
 -- during the join stage, i.e. a replica with a minuscule
 -- timeout successfully bootstraps and breaks connection only
 -- after subscribe.
-test_run:cmd("start server replica_timeout with args='0.00001'")
+test_run:cmd("start server replica_timeout with args='0.00001 0.5'")
 ---
 - true
 ...
diff --git a/test/replication/errinj.test.lua b/test/replication/errinj.test.lua
index 463d89a8f..37375f45e 100644
--- a/test/replication/errinj.test.lua
+++ b/test/replication/errinj.test.lua
@@ -173,7 +173,7 @@ errinj.set("ERRINJ_RELAY_EXIT_DELAY", 0)
 box.cfg{replication_timeout = 0.01}
 
 test_run:cmd("create server replica_timeout with rpl_master=default, script='replication/replica_timeout.lua'")
-test_run:cmd("start server replica_timeout with args='0.01'")
+test_run:cmd("start server replica_timeout with args='0.01 0.5'")
 test_run:cmd("switch replica_timeout")
 
 fiber = require('fiber')
@@ -199,7 +199,7 @@ errinj.set("ERRINJ_RELAY_REPORT_INTERVAL", 0)
 -- Check replica's ACKs don't prevent the master from sending
 -- heartbeat messages (gh-3160).
 
-test_run:cmd("start server replica_timeout with args='0.009'")
+test_run:cmd("start server replica_timeout with args='0.009 0.5'")
 test_run:cmd("switch replica_timeout")
 
 fiber = require('fiber')
@@ -219,7 +219,7 @@ for i = 0, 9999 do box.space.test:replace({i, 4, 5, 'test'}) end
 -- during the join stage, i.e. a replica with a minuscule
 -- timeout successfully bootstraps and breaks connection only
 -- after subscribe.
-test_run:cmd("start server replica_timeout with args='0.00001'")
+test_run:cmd("start server replica_timeout with args='0.00001 0.5'")
 test_run:cmd("switch replica_timeout")
 fiber = require('fiber')
 while box.info.replication[1].upstream.message ~= 'timed out' do fiber.sleep(0.0001) end
diff --git a/test/replication/master.lua b/test/replication/master.lua
index 6d431aaeb..9b96b7891 100644
--- a/test/replication/master.lua
+++ b/test/replication/master.lua
@@ -4,6 +4,7 @@ box.cfg({
     listen              = os.getenv("LISTEN"),
     memtx_memory        = 107374182,
     replication_connect_timeout = 0.5,
+    replication_timeout = 0.1
 })
 
 require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/master_quorum.lua b/test/replication/master_quorum.lua
index fb5f7ec2b..05272ac5e 100644
--- a/test/replication/master_quorum.lua
+++ b/test/replication/master_quorum.lua
@@ -4,6 +4,10 @@
 local INSTANCE_ID = string.match(arg[0], "%d")
 
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/master_quorum'..instance_id..'.sock';
@@ -20,7 +24,8 @@ box.cfg({
         instance_uri(2);
     };
     replication_connect_quorum = 0;
-    replication_connect_timeout = 0.1;
+    replication_timeout = TIMEOUT;
+    replication_connect_timeout = CON_TIMEOUT;
 })
 
 test_run = require('test_run').new()
diff --git a/test/replication/misc.result b/test/replication/misc.result
index 9d9d010c0..9df2a2c4b 100644
--- a/test/replication/misc.result
+++ b/test/replication/misc.result
@@ -93,7 +93,7 @@ SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
 ...
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
diff --git a/test/replication/misc.test.lua b/test/replication/misc.test.lua
index da5a90239..979c5d58c 100644
--- a/test/replication/misc.test.lua
+++ b/test/replication/misc.test.lua
@@ -37,7 +37,7 @@ box.cfg{read_only = false}
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 test_run:cmd("switch autobootstrap1")
 test_run = require('test_run').new()
diff --git a/test/replication/on_replace.lua b/test/replication/on_replace.lua
index 03f15d94c..40c12a9ea 100644
--- a/test/replication/on_replace.lua
+++ b/test/replication/on_replace.lua
@@ -5,6 +5,10 @@ local INSTANCE_ID = string.match(arg[0], "%d")
 local USER = 'cluster'
 local PASSWORD = 'somepassword'
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/on_replace'..instance_id..'.sock';
@@ -12,6 +16,9 @@ end
 
 -- start console first
 require('console').listen(os.getenv('ADMIN'))
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
 
 box.cfg({
     listen = instance_uri(INSTANCE_ID);
@@ -20,13 +27,10 @@ box.cfg({
         USER..':'..PASSWORD..'@'..instance_uri(1);
         USER..':'..PASSWORD..'@'..instance_uri(2);
     };
-    replication_connect_timeout = 0.5,
+    replication_timeout = TIMEOUT,
+    replication_connect_timeout = CON_TIMEOUT,
 })
 
-env = require('test_run')
-test_run = env.new()
-engine = test_run:get_cfg('engine')
-
 box.once("bootstrap", function()
     box.schema.user.create(USER, { password = PASSWORD })
     box.schema.user.grant(USER, 'replication')
diff --git a/test/replication/on_replace.result b/test/replication/on_replace.result
index 1736c53b7..4ffa3b25a 100644
--- a/test/replication/on_replace.result
+++ b/test/replication/on_replace.result
@@ -98,7 +98,7 @@ box.schema.user.revoke('guest', 'replication')
 SERVERS = { 'on_replace1', 'on_replace2' }
 ---
 ...
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.2"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
diff --git a/test/replication/on_replace.test.lua b/test/replication/on_replace.test.lua
index 10bba2ddf..371b71cbd 100644
--- a/test/replication/on_replace.test.lua
+++ b/test/replication/on_replace.test.lua
@@ -44,7 +44,7 @@ box.schema.user.revoke('guest', 'replication')
 -- gh-2682 on_replace on slave server with data change
 
 SERVERS = { 'on_replace1', 'on_replace2' }
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.2"})
 test_run:wait_fullmesh(SERVERS)
 
 test_run:cmd('switch on_replace1')
diff --git a/test/replication/quorum.lua b/test/replication/quorum.lua
index 9c7bf5c93..f61c8748f 100644
--- a/test/replication/quorum.lua
+++ b/test/replication/quorum.lua
@@ -4,6 +4,10 @@
 local INSTANCE_ID = string.match(arg[0], "%d")
 
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/quorum'..instance_id..'.sock';
@@ -14,9 +18,9 @@ require('console').listen(os.getenv('ADMIN'))
 
 box.cfg({
     listen = instance_uri(INSTANCE_ID);
-    replication_timeout = 0.05;
+    replication_timeout = TIMEOUT;
     replication_sync_lag = 0.01;
-    replication_connect_timeout = 0.1;
+    replication_connect_timeout = CON_TIMEOUT;
     replication_connect_quorum = 3;
     replication = {
         instance_uri(1);
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index 8f6e7a070..a55b0d087 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -5,7 +5,7 @@ SERVERS = {'quorum1', 'quorum2', 'quorum3'}
 ---
 ...
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
@@ -27,7 +27,7 @@ test_run:cmd('switch quorum2')
 ---
 - true
 ...
-test_run:cmd('restart server quorum2')
+test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 ---
 - orphan
@@ -51,7 +51,7 @@ box.info.status -- running
 ---
 - running
 ...
-test_run:cmd('restart server quorum2')
+test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 ---
 - orphan
@@ -82,7 +82,7 @@ box.info.status -- running
 ---
 - running
 ...
-test_run:cmd('restart server quorum2')
+test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 ---
 - orphan
@@ -99,7 +99,7 @@ box.space.test:replace{100} -- error
 ---
 - error: Can't modify data because this instance is in read-only mode.
 ...
-test_run:cmd('start server quorum1')
+test_run:cmd('start server quorum1 with args="0.1 0.5"')
 ---
 - true
 ...
@@ -158,7 +158,7 @@ fiber = require('fiber')
 fiber.sleep(0.1)
 ---
 ...
-test_run:cmd('start server quorum1')
+test_run:cmd('start server quorum1 with args="0.1  0.5"')
 ---
 - true
 ...
@@ -196,7 +196,7 @@ test_run:cmd('switch quorum1')
 ---
 - true
 ...
-test_run:cmd('restart server quorum1 with cleanup=1')
+test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
 box.space.test:count() -- 100
 ---
 - 100
@@ -356,7 +356,7 @@ SERVERS = {'master_quorum1', 'master_quorum2'}
 ---
 ...
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 1df0ae1e7..8290f8ea5 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -3,7 +3,7 @@ test_run = require('test_run').new()
 SERVERS = {'quorum1', 'quorum2', 'quorum3'}
 
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 
 -- Stop one replica and try to restart another one.
@@ -18,7 +18,7 @@ test_run:cmd('stop server quorum1')
 
 test_run:cmd('switch quorum2')
 
-test_run:cmd('restart server quorum2')
+test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 box.ctl.wait_rw(0.001) -- timeout
 box.info.ro -- true
@@ -26,7 +26,7 @@ box.space.test:replace{100} -- error
 box.cfg{replication={}}
 box.info.status -- running
 
-test_run:cmd('restart server quorum2')
+test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 box.ctl.wait_rw(0.001) -- timeout
 box.info.ro -- true
@@ -36,12 +36,12 @@ box.ctl.wait_rw()
 box.info.ro -- false
 box.info.status -- running
 
-test_run:cmd('restart server quorum2')
+test_run:cmd('restart server quorum2 with args="0.1 0.5"')
 box.info.status -- orphan
 box.ctl.wait_rw(0.001) -- timeout
 box.info.ro -- true
 box.space.test:replace{100} -- error
-test_run:cmd('start server quorum1')
+test_run:cmd('start server quorum1 with args="0.1 0.5"')
 box.ctl.wait_rw()
 box.info.ro -- false
 box.info.status -- running
@@ -63,7 +63,7 @@ for i = 1, 100 do box.space.test:insert{i} end
 fiber = require('fiber')
 fiber.sleep(0.1)
 
-test_run:cmd('start server quorum1')
+test_run:cmd('start server quorum1 with args="0.1  0.5"')
 test_run:cmd('switch quorum1')
 box.space.test:count() -- 100
 
@@ -79,7 +79,7 @@ test_run:cmd('switch quorum2')
 box.snapshot()
 
 test_run:cmd('switch quorum1')
-test_run:cmd('restart server quorum1 with cleanup=1')
+test_run:cmd('restart server quorum1 with cleanup=1, args="0.1 0.5"')
 
 box.space.test:count() -- 100
 
@@ -136,7 +136,7 @@ box.schema.user.revoke('guest', 'replication')
 -- Second case, check that master-master works.
 SERVERS = {'master_quorum1', 'master_quorum2'}
 -- Deploy a cluster.
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 test_run:cmd("switch master_quorum1")
 repl = box.cfg.replication
diff --git a/test/replication/rebootstrap.lua b/test/replication/rebootstrap.lua
index e743577e4..3e7d8f062 100644
--- a/test/replication/rebootstrap.lua
+++ b/test/replication/rebootstrap.lua
@@ -4,6 +4,10 @@
 local INSTANCE_ID = string.match(arg[0], "%d")
 
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
+
 local function instance_uri(instance_id)
     return SOCKET_DIR..'/rebootstrap'..instance_id..'.sock';
 end
@@ -14,8 +18,8 @@ require('console').listen(os.getenv('ADMIN'))
 box.cfg({
     listen = instance_uri(INSTANCE_ID),
     instance_uuid = '12345678-abcd-1234-abcd-123456789ef' .. INSTANCE_ID,
-    replication_timeout = 0.1,
-    replication_connect_timeout = 0.5,
+    replication_timeout = TIMEOUT,
+    replication_connect_timeout = CON_TIMEOUT,
     replication = {
         instance_uri(1);
         instance_uri(2);
diff --git a/test/replication/rebootstrap.result b/test/replication/rebootstrap.result
index afbfc8e65..ea390c19f 100644
--- a/test/replication/rebootstrap.result
+++ b/test/replication/rebootstrap.result
@@ -4,7 +4,7 @@ test_run = require('test_run').new()
 SERVERS = {'rebootstrap1', 'rebootstrap2'}
 ---
 ...
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 test_run:wait_fullmesh(SERVERS)
@@ -20,11 +20,11 @@ test_run:cmd('stop server rebootstrap1')
 ---
 - true
 ...
-test_run:cmd('restart server rebootstrap2 with cleanup=True, wait=False, wait_load=False')
+test_run:cmd('restart server rebootstrap2 with cleanup=True, wait=False, wait_load=False, args="0.1 2.0"')
 ---
 - true
 ...
-test_run:cmd('start server rebootstrap1')
+test_run:cmd('start server rebootstrap1 with args="0.1 0.5"')
 ---
 - true
 ...
diff --git a/test/replication/rebootstrap.test.lua b/test/replication/rebootstrap.test.lua
index 954726ddb..8ddf77912 100644
--- a/test/replication/rebootstrap.test.lua
+++ b/test/replication/rebootstrap.test.lua
@@ -2,7 +2,7 @@ test_run = require('test_run').new()
 
 SERVERS = {'rebootstrap1', 'rebootstrap2'}
 
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 test_run:wait_fullmesh(SERVERS)
 
 --
@@ -12,8 +12,8 @@ test_run:wait_fullmesh(SERVERS)
 -- in 'orphan' mode.
 --
 test_run:cmd('stop server rebootstrap1')
-test_run:cmd('restart server rebootstrap2 with cleanup=True, wait=False, wait_load=False')
-test_run:cmd('start server rebootstrap1')
+test_run:cmd('restart server rebootstrap2 with cleanup=True, wait=False, wait_load=False, args="0.1 2.0"')
+test_run:cmd('start server rebootstrap1 with args="0.1 0.5"')
 test_run:cmd('switch rebootstrap1')
 box.info.status -- running
 
diff --git a/test/replication/recover_missing_xlog.result b/test/replication/recover_missing_xlog.result
index 027f8761e..5ed05c635 100644
--- a/test/replication/recover_missing_xlog.result
+++ b/test/replication/recover_missing_xlog.result
@@ -8,7 +8,7 @@ SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 ---
 ...
 -- Start servers
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 ---
 ...
 -- Wait for full mesh
@@ -66,7 +66,7 @@ fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xl
 ---
 - true
 ...
-test_run:cmd("start server autobootstrap1")
+test_run:cmd('start server autobootstrap1 with args="0.1 0.5"')
 ---
 - true
 ...
diff --git a/test/replication/recover_missing_xlog.test.lua b/test/replication/recover_missing_xlog.test.lua
index 57bc7d31f..d2d378837 100644
--- a/test/replication/recover_missing_xlog.test.lua
+++ b/test/replication/recover_missing_xlog.test.lua
@@ -3,7 +3,7 @@ test_run = env.new()
 
 SERVERS = { 'autobootstrap1', 'autobootstrap2', 'autobootstrap3' }
 -- Start servers
-test_run:create_cluster(SERVERS)
+test_run:create_cluster(SERVERS, "replication", {args="0.1"})
 -- Wait for full mesh
 test_run:wait_fullmesh(SERVERS)
 
@@ -28,7 +28,7 @@ fio = require('fio')
 -- Also check that there is no concurrency, i.e. master is
 -- in 'read-only' mode unless it receives all data.
 fio.unlink(fio.pathjoin(fio.abspath("."), string.format('autobootstrap1/%020d.xlog', 8)))
-test_run:cmd("start server autobootstrap1")
+test_run:cmd('start server autobootstrap1 with args="0.1 0.5"')
 
 test_run:cmd("switch autobootstrap1")
 for i = 10, 19 do box.space.test:insert{i, 'test' .. i} end
diff --git a/test/replication/replica_no_quorum.lua b/test/replication/replica_no_quorum.lua
index b9edeea94..c30c043cc 100644
--- a/test/replication/replica_no_quorum.lua
+++ b/test/replication/replica_no_quorum.lua
@@ -5,7 +5,8 @@ box.cfg({
     replication         = os.getenv("MASTER"),
     memtx_memory        = 107374182,
     replication_connect_quorum = 0,
-    replication_connect_timeout = 0.1,
+    replication_timeout = 0.1,
+    replication_connect_timeout = 0.5,
 })
 
 require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/replica_timeout.lua b/test/replication/replica_timeout.lua
index 64f119763..38922fa3d 100644
--- a/test/replication/replica_timeout.lua
+++ b/test/replication/replica_timeout.lua
@@ -1,13 +1,14 @@
 #!/usr/bin/env tarantool
 
 local TIMEOUT = tonumber(arg[1])
+local CON_TIMEOUT = arg[2] and tonumber(arg[2]) or 30.0
 
 box.cfg({
     listen              = os.getenv("LISTEN"),
     replication         = os.getenv("MASTER"),
     memtx_memory        = 107374182,
     replication_timeout = TIMEOUT,
-    replication_connect_timeout = TIMEOUT * 3,
+    replication_connect_timeout = CON_TIMEOUT,
 })
 
 require('console').listen(os.getenv('ADMIN'))
diff --git a/test/replication/replica_uuid_ro.lua b/test/replication/replica_uuid_ro.lua
index 8e1c6cc47..d5ba55852 100644
--- a/test/replication/replica_uuid_ro.lua
+++ b/test/replication/replica_uuid_ro.lua
@@ -5,6 +5,10 @@ local INSTANCE_ID = string.match(arg[0], "%d")
 local USER = 'cluster'
 local PASSWORD = 'somepassword'
 local SOCKET_DIR = require('fio').cwd()
+
+local TIMEOUT = tonumber(arg[2])
+local CON_TIMEOUT = arg[3] and tonumber(arg[3]) or 30.0
+
 local function instance_uri(instance_id)
     --return 'localhost:'..(3310 + instance_id)
     return SOCKET_DIR..'/replica_uuid_ro'..instance_id..'.sock';
@@ -22,7 +26,8 @@ box.cfg({
         USER..':'..PASSWORD..'@'..instance_uri(2);
     };
     read_only = (INSTANCE_ID ~= '1' and true or false);
-    replication_connect_timeout = 0.5,
+    replication_timeout = TIMEOUT;
+    replication_connect_timeout = CON_TIMEOUT;
 })
 
 box.once("bootstrap", function()
diff --git a/test/replication/replicaset_ro_mostly.result b/test/replication/replicaset_ro_mostly.result
index b9e8f1fe8..1ce7d6f8e 100644
--- a/test/replication/replicaset_ro_mostly.result
+++ b/test/replication/replicaset_ro_mostly.result
@@ -27,7 +27,7 @@ UUID = sort({uuid1, uuid2}, sort_cmp)
 create_cluster_cmd1 = 'create server %s with script="replication/%s.lua"'
 ---
 ...
-create_cluster_cmd2 = 'start server %s with args="%s", wait_load=False, wait=False'
+create_cluster_cmd2 = 'start server %s with args="%s %s", wait_load=False, wait=False'
 ---
 ...
 test_run:cmd("setopt delimiter ';'")
@@ -37,7 +37,9 @@ test_run:cmd("setopt delimiter ';'")
 function create_cluster_uuid(servers, uuids)
     for i, name in ipairs(servers) do
         test_run:cmd(create_cluster_cmd1:format(name, name))
-        test_run:cmd(create_cluster_cmd2:format(name, uuids[i]))
+    end
+    for i, name in ipairs(servers) do
+        test_run:cmd(create_cluster_cmd2:format(name, uuids[i], "0.1"))
     end
 end;
 ---
@@ -61,7 +63,7 @@ test_run:cmd(create_cluster_cmd1:format(name, name))
 ---
 - true
 ...
-test_run:cmd(create_cluster_cmd2:format(name, uuid.new()))
+test_run:cmd(create_cluster_cmd2:format(name, uuid.new(), "0.1"))
 ---
 - true
 ...
diff --git a/test/replication/replicaset_ro_mostly.test.lua b/test/replication/replicaset_ro_mostly.test.lua
index f2c2d0d11..c75af7218 100644
--- a/test/replication/replicaset_ro_mostly.test.lua
+++ b/test/replication/replicaset_ro_mostly.test.lua
@@ -12,13 +12,15 @@ function sort(t) table.sort(t, sort_cmp) return t end
 UUID = sort({uuid1, uuid2}, sort_cmp)
 
 create_cluster_cmd1 = 'create server %s with script="replication/%s.lua"'
-create_cluster_cmd2 = 'start server %s with args="%s", wait_load=False, wait=False'
+create_cluster_cmd2 = 'start server %s with args="%s %s", wait_load=False, wait=False'
 
 test_run:cmd("setopt delimiter ';'")
 function create_cluster_uuid(servers, uuids)
     for i, name in ipairs(servers) do
         test_run:cmd(create_cluster_cmd1:format(name, name))
-        test_run:cmd(create_cluster_cmd2:format(name, uuids[i]))
+    end
+    for i, name in ipairs(servers) do
+        test_run:cmd(create_cluster_cmd2:format(name, uuids[i], "0.1"))
     end
 end;
 test_run:cmd("setopt delimiter ''");
@@ -30,7 +32,7 @@ test_run:wait_fullmesh(SERVERS)
 -- Add third replica
 name = 'replica_uuid_ro3'
 test_run:cmd(create_cluster_cmd1:format(name, name))
-test_run:cmd(create_cluster_cmd2:format(name, uuid.new()))
+test_run:cmd(create_cluster_cmd2:format(name, uuid.new(), "0.1"))
 test_run:cmd('switch replica_uuid_ro3')
 test_run:cmd('switch default')
 
-- 
2.15.2 (Apple Git-101.1)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v4 3/3] replication: do not ignore replication_connect_quorum.
  2018-08-14 10:02 [PATCH v4 0/3] replication: do not ignore replication_connect_quorum Serge Petrenko
  2018-08-14 10:02 ` [PATCH v4 1/3] test: update test-run Serge Petrenko
  2018-08-14 10:02 ` [PATCH v4 2/3] Add arguments to replication test instances Serge Petrenko
@ 2018-08-14 10:02 ` Serge Petrenko
  2018-08-14 17:07 ` [PATCH v4 0/3] " Vladimir Davydov
  3 siblings, 0 replies; 5+ messages in thread
From: Serge Petrenko @ 2018-08-14 10:02 UTC (permalink / raw)
  To: tarantool-patches; +Cc: georgy, vdavydov.dev, Serge Petrenko

On bootstrap and after replication reconfiguration
replication_connect_quorum was ignored. The instance tried to connect to
every replica listed in replication parameter, and errored if it wasn't
possible.
The patch alters this behaviour. An instance still tries to connect to
every node listed in replication, but does not raise an error if it was
able to connect to at least replication_connect_quorum instances.

Closes #3428

@TarantoolBot document
Title: replication_connect_quorum is not ignored.
Now on replica set bootstrap and in case of replication reconfiguration
(e.g. calling box.cfg{replication=...} for the second time) tarantool
doesn't fail, if it couldn't connect to to every replica, but could
connect to replication_connect_quorum replicas. If after
replication_connect_timeout seconds the instance is not connected to at
least replication_connect_quorum other instances, we throw an error.
---
 src/box/box.cc                      | 35 +++++++++++++++++---------
 src/box/replication.cc              | 11 ++++++---
 src/box/replication.h               |  7 +++---
 test/replication/quorum.result      | 49 +++++++++++++++++++++++++++++++++++++
 test/replication/quorum.test.lua    | 20 +++++++++++++++
 test/replication/replica_quorum.lua | 24 ++++++++++++++++++
 6 files changed, 128 insertions(+), 18 deletions(-)
 create mode 100644 test/replication/replica_quorum.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index e3eb2738f..8d7454d1f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -595,7 +595,7 @@ cfg_get_replication(int *p_count)
  * don't start appliers.
  */
 static void
-box_sync_replication(double timeout, bool connect_all)
+box_sync_replication(bool connect_quorum)
 {
 	int count = 0;
 	struct applier **appliers = cfg_get_replication(&count);
@@ -607,7 +607,7 @@ box_sync_replication(double timeout, bool connect_all)
 			applier_delete(appliers[i]); /* doesn't affect diag */
 	});
 
-	replicaset_connect(appliers, count, timeout, connect_all);
+	replicaset_connect(appliers, count, connect_quorum);
 
 	guard.is_active = false;
 }
@@ -625,8 +625,13 @@ box_set_replication(void)
 	}
 
 	box_check_replication();
-	/* Try to connect to all replicas within the timeout period */
-	box_sync_replication(replication_connect_timeout, true);
+	/*
+	 * Try to connect to all replicas within the timeout period.
+	 * The configuration will succeed as long as we've managed
+	 * to connect to at least replication_connect_quorum
+	 * masters.
+	 */
+	box_sync_replication(true);
 	/* Follow replica */
 	replicaset_follow();
 }
@@ -1865,8 +1870,13 @@ box_cfg_xc(void)
 
 		title("orphan");
 
-		/* Wait for the cluster to start up */
-		box_sync_replication(replication_connect_timeout, false);
+		/*
+		 * In case of recovering from a checkpoint we
+		 * don't need to wait for 'quorum' masters, since
+		 * the recovered _cluster space will have all the
+		 * information about cluster.
+		 */
+		box_sync_replication(false);
 	} else {
 		if (!tt_uuid_is_nil(&instance_uuid))
 			INSTANCE_UUID = instance_uuid;
@@ -1883,12 +1893,15 @@ box_cfg_xc(void)
 		/*
 		 * Wait for the cluster to start up.
 		 *
-		 * Note, when bootstrapping a new instance, we have to
-		 * connect to all masters to make sure all replicas
-		 * receive the same replica set UUID when a new cluster
-		 * is deployed.
+		 * Note, when bootstrapping a new instance, we try to
+		 * connect to all masters during timeout to make sure
+		 * all replicas recieve the same replica set UUID when
+		 * a new cluster is deployed.
+		 * If we fail to do so, settle with connecting to
+		 * 'replication_connect_quorum' masters.
+		 * If this also fails, throw an error.
 		 */
-		box_sync_replication(TIMEOUT_INFINITY, true);
+		box_sync_replication(true);
 		/* Bootstrap a new master */
 		bootstrap(&replicaset_uuid, &is_bootstrap_leader);
 	}
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 528fe4459..861ce34ea 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -46,7 +46,7 @@ struct tt_uuid INSTANCE_UUID;
 struct tt_uuid REPLICASET_UUID;
 
 double replication_timeout = 1.0; /* seconds */
-double replication_connect_timeout = 4.0; /* seconds */
+double replication_connect_timeout = 30.0; /* seconds */
 int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
 double replication_sync_lag = 10.0; /* seconds */
 
@@ -540,7 +540,7 @@ applier_on_connect_f(struct trigger *trigger, void *event)
 
 void
 replicaset_connect(struct applier **appliers, int count,
-		   double timeout, bool connect_all)
+		   bool connect_quorum)
 {
 	if (count == 0) {
 		/* Cleanup the replica set. */
@@ -571,6 +571,9 @@ replicaset_connect(struct applier **appliers, int count,
 	state.connected = state.failed = 0;
 	fiber_cond_create(&state.wakeup);
 
+	double timeout = replication_connect_timeout;
+	int quorum = MIN(count, replication_connect_quorum);
+
 	/* Add triggers and start simulations connection to remote peers */
 	for (int i = 0; i < count; i++) {
 		struct applier *applier = appliers[i];
@@ -587,7 +590,7 @@ replicaset_connect(struct applier **appliers, int count,
 		double wait_start = ev_monotonic_now(loop());
 		if (fiber_cond_wait_timeout(&state.wakeup, timeout) != 0)
 			break;
-		if (state.failed > 0 && connect_all)
+		if (count - state.failed < quorum)
 			break;
 		timeout -= ev_monotonic_now(loop()) - wait_start;
 	}
@@ -595,7 +598,7 @@ replicaset_connect(struct applier **appliers, int count,
 		say_crit("failed to connect to %d out of %d replicas",
 			 count - state.connected, count);
 		/* Timeout or connection failure. */
-		if (connect_all)
+		if (connect_quorum && state.connected < quorum)
 			goto error;
 	} else {
 		say_verbose("connected to %d replicas", state.connected);
diff --git a/src/box/replication.h b/src/box/replication.h
index 95122eb45..06a2867b6 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -357,12 +357,13 @@ replicaset_add(uint32_t replica_id, const struct tt_uuid *instance_uuid);
  * \param appliers the array of appliers
  * \param count size of appliers array
  * \param timeout connection timeout
- * \param connect_all if this flag is set, fail unless all
- *                    appliers have successfully connected
+ * \param connect_quorum if this flag is set, fail unless at
+ *                       least replication_connect_quorum
+ *                       appliers have successfully connected.
  */
 void
 replicaset_connect(struct applier **appliers, int count,
-		   double timeout, bool connect_all);
+		   bool connect_quorum);
 
 /**
  * Resume all appliers registered with the replica set.
diff --git a/test/replication/quorum.result b/test/replication/quorum.result
index a55b0d087..265b099b7 100644
--- a/test/replication/quorum.result
+++ b/test/replication/quorum.result
@@ -401,3 +401,52 @@ test_run:cmd("switch default")
 test_run:drop_cluster(SERVERS)
 ---
 ...
+-- Test that quorum is not ignored neither during bootstrap, nor
+-- during reconfiguration.
+box.schema.user.grant('guest', 'replication')
+---
+...
+test_run:cmd('create server replica_quorum with script="replication/replica_quorum.lua"')
+---
+- true
+...
+-- Arguments are: replication_connect_quorum, replication_timeout
+-- replication_connect_timeout.
+-- If replication_connect_quorum was ignored here, the instance
+-- would exit with an error.
+test_run:cmd('start server replica_quorum with wait=True, wait_load=True, args="1 0.05 0.1"')
+---
+- true
+...
+test_run:cmd('switch replica_quorum')
+---
+- true
+...
+-- If replication_connect_quorum was ignored here, the instance
+-- would exit with an error.
+box.cfg{replication={INSTANCE_URI, nonexistent_uri(1)}}
+---
+...
+box.info.id
+---
+- 1
+...
+test_run:cmd('switch default')
+---
+- true
+...
+test_run:cmd('stop server replica_quorum')
+---
+- true
+...
+test_run:cmd('cleanup server replica_quorum')
+---
+- true
+...
+test_run:cmd('delete server replica_quorum')
+---
+- true
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/replication/quorum.test.lua b/test/replication/quorum.test.lua
index 8290f8ea5..5a43275c2 100644
--- a/test/replication/quorum.test.lua
+++ b/test/replication/quorum.test.lua
@@ -150,3 +150,23 @@ box.space.test:select()
 test_run:cmd("switch default")
 -- Cleanup.
 test_run:drop_cluster(SERVERS)
+
+-- Test that quorum is not ignored neither during bootstrap, nor
+-- during reconfiguration.
+box.schema.user.grant('guest', 'replication')
+test_run:cmd('create server replica_quorum with script="replication/replica_quorum.lua"')
+-- Arguments are: replication_connect_quorum, replication_timeout
+-- replication_connect_timeout.
+-- If replication_connect_quorum was ignored here, the instance
+-- would exit with an error.
+test_run:cmd('start server replica_quorum with wait=True, wait_load=True, args="1 0.05 0.1"')
+test_run:cmd('switch replica_quorum')
+-- If replication_connect_quorum was ignored here, the instance
+-- would exit with an error.
+box.cfg{replication={INSTANCE_URI, nonexistent_uri(1)}}
+box.info.id
+test_run:cmd('switch default')
+test_run:cmd('stop server replica_quorum')
+test_run:cmd('cleanup server replica_quorum')
+test_run:cmd('delete server replica_quorum')
+box.schema.user.revoke('guest', 'replication')
diff --git a/test/replication/replica_quorum.lua b/test/replication/replica_quorum.lua
new file mode 100644
index 000000000..dd42b8214
--- /dev/null
+++ b/test/replication/replica_quorum.lua
@@ -0,0 +1,24 @@
+#!/usr/bin/env tarantool
+
+local SOCKET_DIR = require('fio').cwd()
+
+local QUORUM = tonumber(arg[1])
+local TIMEOUT = arg[2] and tonumber(arg[2]) or 0.1
+local CON_TIMEOUT = arg[3] and tonumber(arg[3]) or 30.0
+INSTANCE_URI = SOCKET_DIR .. '/replica_quorum.sock'
+
+function nonexistent_uri(id)
+    return SOCKET_DIR .. '/replica_quorum' .. (1000 + id) .. '.sock'
+end
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg{
+    listen = INSTANCE_URI,
+    replication_timeout = TIMEOUT,
+    replication_connect_timeout = CON_TIMEOUT,
+    replication_connect_quorum = QUORUM,
+    replication = {INSTANCE_URI,
+                   nonexistent_uri(1),
+                   nonexistent_uri(2)}
+}
-- 
2.15.2 (Apple Git-101.1)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4 0/3] replication: do not ignore replication_connect_quorum.
  2018-08-14 10:02 [PATCH v4 0/3] replication: do not ignore replication_connect_quorum Serge Petrenko
                   ` (2 preceding siblings ...)
  2018-08-14 10:02 ` [PATCH v4 3/3] replication: do not ignore replication_connect_quorum Serge Petrenko
@ 2018-08-14 17:07 ` Vladimir Davydov
  3 siblings, 0 replies; 5+ messages in thread
From: Vladimir Davydov @ 2018-08-14 17:07 UTC (permalink / raw)
  To: Serge Petrenko; +Cc: tarantool-patches, georgy

Pushed to 1.9

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-14 17:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14 10:02 [PATCH v4 0/3] replication: do not ignore replication_connect_quorum Serge Petrenko
2018-08-14 10:02 ` [PATCH v4 1/3] test: update test-run Serge Petrenko
2018-08-14 10:02 ` [PATCH v4 2/3] Add arguments to replication test instances Serge Petrenko
2018-08-14 10:02 ` [PATCH v4 3/3] replication: do not ignore replication_connect_quorum Serge Petrenko
2018-08-14 17:07 ` [PATCH v4 0/3] " Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox