* [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic
@ 2018-08-30 23:38 Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Olga Arkhangelskaia @ 2018-08-30 23:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Olga Arkhangelskaia
In gh-3427 replication_sync_lag should be taken into account during
replication reconfiguration. In order to configure replication properly
this parameter is made dynamical and can be changed on demand.
@TarantoolBot document
Title: recation_sync_lag option can be set dynamically
recation_sync_lag now can be set at any time.
---
src/box/box.cc | 8 +++++++-
src/box/box.h | 1 +
src/box/lua/cfg.cc | 12 ++++++++++++
src/box/lua/load_cfg.lua | 2 ++
test/box-tap/cfg.test.lua | 7 ++++++-
5 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 8d7454d1f..7155ad085 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -656,6 +656,12 @@ box_set_replication_connect_quorum(void)
replicaset_check_quorum();
}
+void
+box_set_replication_sync_lag(void)
+{
+ replication_sync_lag = box_check_replication_sync_lag();
+}
+
void
box_bind(void)
{
@@ -1747,7 +1753,7 @@ box_cfg_xc(void)
box_set_replication_timeout();
box_set_replication_connect_timeout();
box_set_replication_connect_quorum();
- replication_sync_lag = box_check_replication_sync_lag();
+ box_set_replication_sync_lag();
xstream_create(&join_stream, apply_initial_join_row);
xstream_create(&subscribe_stream, apply_row);
diff --git a/src/box/box.h b/src/box/box.h
index 9dfb3fd2a..3090fdcdb 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -176,6 +176,7 @@ void box_set_vinyl_timeout(void);
void box_set_replication_timeout(void);
void box_set_replication_connect_timeout(void);
void box_set_replication_connect_quorum(void);
+void box_set_replication_sync_lag(void);
extern "C" {
#endif /* defined(__cplusplus) */
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index 0ca150877..5442723b5 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -262,6 +262,17 @@ lbox_cfg_set_replication_connect_quorum(struct lua_State *L)
return 0;
}
+static int
+lbox_cfg_set_replication_sync_lag(struct lua_State *L)
+{
+ try {
+ box_set_replication_sync_lag();
+ } catch (Exception *) {
+ luaT_error(L);
+ }
+ return 0;
+}
+
void
box_lua_cfg_init(struct lua_State *L)
{
@@ -286,6 +297,7 @@ box_lua_cfg_init(struct lua_State *L)
{"cfg_set_replication_timeout", lbox_cfg_set_replication_timeout},
{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
+ {"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
{NULL, NULL}
};
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 2a7142de6..f803d8987 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -199,6 +199,7 @@ local dynamic_cfg = {
replication_timeout = private.cfg_set_replication_timeout,
replication_connect_timeout = private.cfg_set_replication_connect_timeout,
replication_connect_quorum = private.cfg_set_replication_connect_quorum,
+ replication_sync_lag = private.cfg_set_replication_sync_lag,
instance_uuid = function()
if box.cfg.instance_uuid ~= box.info.uuid then
box.error(box.error.CFG, 'instance_uuid',
@@ -220,6 +221,7 @@ local dynamic_cfg_skip_at_load = {
replication_timeout = true,
replication_connect_timeout = true,
replication_connect_quorum = true,
+ replication_sync_lag = true,
wal_dir_rescan_delay = true,
custom_proc_title = true,
force_recovery = true,
diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
index 5e72004ca..d315346de 100755
--- a/test/box-tap/cfg.test.lua
+++ b/test/box-tap/cfg.test.lua
@@ -6,7 +6,7 @@ local socket = require('socket')
local fio = require('fio')
local uuid = require('uuid')
local msgpack = require('msgpack')
-test:plan(90)
+test:plan(91)
--------------------------------------------------------------------------------
-- Invalid values
@@ -95,6 +95,11 @@ test:ok(status and result == 'table', 'configured box')
invalid('log_level', 'unknown')
+lag = box.cfg.replication_sync_lag
+status, result = pcall(box.cfg, {replication_sync_lag = 1})
+test:ok(status, "dynamic replication_sync_lag")
+pcall(box.cfg, {repliction_sync_lag = lag})
+
--------------------------------------------------------------------------------
-- gh-534: Segmentation fault after two bad wal_mode settings
--------------------------------------------------------------------------------
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout
2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
@ 2018-08-30 23:38 ` Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia
2018-09-04 13:24 ` [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic Kirill Yukhin
2 siblings, 0 replies; 6+ messages in thread
From: Olga Arkhangelskaia @ 2018-08-30 23:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Olga Arkhangelskaia
In scope of gh-3427 we need timeout in case if replicaset will wait for
synchronization for too long, or even forever. Default value is 300.
Closes #3674
@TarantoolBot document
Title: Introduce new option replication_sync_timeout.
After initial bootstrap or after replication configuration changes we
need to sync up with replication quorum. Sometimes sync can take too
long or replication_sync_lag can be smaller than network latency we
replica will stuck in sync loop that can't be cancelled.To avoid this
situations replication_sync_timeout can be used. When time set in
replication_sync_timeout is passed replica enters orphan state.
Can be set dynamically. Default value is 300 seconds.
---
https://github.com/tarantool/tarantool/issues/3647
https://github.com/tarantool/tarantool/tree/OKriw/gh-3427-replication-no-sync-1.9
v1:
https://www.freelists.org/post/tarantool-patches/PATCH-23-box-add-replication-sync-lag-timeout
v2:
https://www.freelists.org/post/tarantool-patches/PATCH-v2-23-box-add-replication-sync-timeout
Changes in v2:
- renamed replication_sync_lag_timeout to replication_sync_timeout
- fiber_cond_timeout changed to deadline
- default time is set to 300
Changes in v3:
-fixed spacesi, empty lines
src/box/box.cc | 19 ++++++++++++++++++
src/box/box.h | 1 +
src/box/lua/cfg.cc | 12 ++++++++++++
src/box/lua/load_cfg.lua | 4 ++++
src/box/replication.cc | 14 +++++++++++---
src/box/replication.h | 6 ++++++
test/app-tap/init_script.result | 43 +++++++++++++++++++++--------------------
test/box-tap/cfg.test.lua | 9 ++++++++-
test/box/admin.result | 2 ++
test/box/cfg.result | 4 ++++
10 files changed, 89 insertions(+), 25 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 7155ad085..dcedfd002 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -420,6 +420,17 @@ box_check_replication_sync_lag(void)
return lag;
}
+static double
+box_check_replication_sync_timeout(void)
+{
+ double timeout = cfg_getd("replication_sync_timeout");
+ if (timeout <= 0) {
+ tnt_raise(ClientError, ER_CFG, "replication_sync_timeout",
+ "the value must be greater than 0");
+ }
+ return timeout;
+}
+
static void
box_check_instance_uuid(struct tt_uuid *uuid)
{
@@ -546,6 +557,7 @@ box_check_config()
box_check_replication_connect_timeout();
box_check_replication_connect_quorum();
box_check_replication_sync_lag();
+ box_check_replication_sync_timeout();
box_check_readahead(cfg_geti("readahead"));
box_check_checkpoint_count(cfg_geti("checkpoint_count"));
box_check_wal_max_rows(cfg_geti64("rows_per_wal"));
@@ -662,6 +674,12 @@ box_set_replication_sync_lag(void)
replication_sync_lag = box_check_replication_sync_lag();
}
+void
+box_set_replication_sync_timeout(void)
+{
+ replication_sync_timeout = box_check_replication_sync_timeout();
+}
+
void
box_bind(void)
{
@@ -1754,6 +1772,7 @@ box_cfg_xc(void)
box_set_replication_connect_timeout();
box_set_replication_connect_quorum();
box_set_replication_sync_lag();
+ box_set_replication_sync_timeout();
xstream_create(&join_stream, apply_initial_join_row);
xstream_create(&subscribe_stream, apply_row);
diff --git a/src/box/box.h b/src/box/box.h
index 3090fdcdb..6e1c13f59 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -177,6 +177,7 @@ void box_set_replication_timeout(void);
void box_set_replication_connect_timeout(void);
void box_set_replication_connect_quorum(void);
void box_set_replication_sync_lag(void);
+void box_set_replication_sync_timeout(void);
extern "C" {
#endif /* defined(__cplusplus) */
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index 5442723b5..17431dc9f 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -273,6 +273,17 @@ lbox_cfg_set_replication_sync_lag(struct lua_State *L)
return 0;
}
+static int
+lbox_cfg_set_replication_sync_timeout(struct lua_State *L)
+{
+ try {
+ box_set_replication_sync_timeout();
+ } catch (Exception *) {
+ luaT_error(L);
+ }
+ return 0;
+}
+
void
box_lua_cfg_init(struct lua_State *L)
{
@@ -298,6 +309,7 @@ box_lua_cfg_init(struct lua_State *L)
{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
{"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
+ {"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout},
{NULL, NULL}
};
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index f803d8987..01cfe0b4d 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -72,6 +72,7 @@ local default_cfg = {
worker_pool_threads = 4,
replication_timeout = 1,
replication_sync_lag = 10,
+ replication_sync_timeout = 300,
replication_connect_timeout = 30,
replication_connect_quorum = nil, -- connect all
}
@@ -128,6 +129,7 @@ local template_cfg = {
worker_pool_threads = 'number',
replication_timeout = 'number',
replication_sync_lag = 'number',
+ replication_sync_timeout = 'number',
replication_connect_timeout = 'number',
replication_connect_quorum = 'number',
}
@@ -200,6 +202,7 @@ local dynamic_cfg = {
replication_connect_timeout = private.cfg_set_replication_connect_timeout,
replication_connect_quorum = private.cfg_set_replication_connect_quorum,
replication_sync_lag = private.cfg_set_replication_sync_lag,
+ replication_sync_timeout = private.cfg_set_replication_sync_timeout,
instance_uuid = function()
if box.cfg.instance_uuid ~= box.info.uuid then
box.error(box.error.CFG, 'instance_uuid',
@@ -222,6 +225,7 @@ local dynamic_cfg_skip_at_load = {
replication_connect_timeout = true,
replication_connect_quorum = true,
replication_sync_lag = true,
+ replication_sync_timeout = true,
wal_dir_rescan_delay = true,
custom_proc_title = true,
force_recovery = true,
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 861ce34ea..4001c86a3 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -49,6 +49,7 @@ double replication_timeout = 1.0; /* seconds */
double replication_connect_timeout = 30.0; /* seconds */
int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL;
double replication_sync_lag = 10.0; /* seconds */
+double replication_sync_timeout = 300.0; /* seconds */
struct replicaset replicaset;
@@ -673,12 +674,19 @@ replicaset_sync(void)
/*
* Wait until all connected replicas synchronize up to
- * replication_sync_lag
+ * replication_sync_lag or return on replication_sync_timeout
*/
+ double start_time = ev_monotonic_now(loop());
+ double deadline = start_time + replication_sync_timeout;
while (replicaset.applier.synced < quorum &&
replicaset.applier.connected +
- replicaset.applier.loading >= quorum)
- fiber_cond_wait(&replicaset.applier.cond);
+ replicaset.applier.loading >= quorum) {
+ if (fiber_cond_wait_deadline(&replicaset.applier.cond,
+ deadline) != 0) {
+ break;
+ }
+
+ }
if (replicaset.applier.synced < quorum) {
/*
diff --git a/src/box/replication.h b/src/box/replication.h
index 06a2867b6..a6f1dbf69 100644
--- a/src/box/replication.h
+++ b/src/box/replication.h
@@ -126,6 +126,12 @@ extern int replication_connect_quorum;
*/
extern double replication_sync_lag;
+/**
+ * Time to wait before enter orphan state in case of unsuccessful
+ * synchronization.
+ */
+extern double replication_sync_timeout;
+
/**
* Wait for the given period of time before trying to reconnect
* to a master.
diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
index eea9f5bcf..261ddf3a4 100644
--- a/test/app-tap/init_script.result
+++ b/test/app-tap/init_script.result
@@ -23,27 +23,28 @@ box.cfg
18 readahead:16320
19 replication_connect_timeout:30
20 replication_sync_lag:10
-21 replication_timeout:1
-22 rows_per_wal:500000
-23 slab_alloc_factor:1.05
-24 too_long_threshold:0.5
-25 vinyl_bloom_fpr:0.05
-26 vinyl_cache:134217728
-27 vinyl_dir:.
-28 vinyl_max_tuple_size:1048576
-29 vinyl_memory:134217728
-30 vinyl_page_size:8192
-31 vinyl_range_size:1073741824
-32 vinyl_read_threads:1
-33 vinyl_run_count_per_level:2
-34 vinyl_run_size_ratio:3.5
-35 vinyl_timeout:60
-36 vinyl_write_threads:2
-37 wal_dir:.
-38 wal_dir_rescan_delay:2
-39 wal_max_size:268435456
-40 wal_mode:write
-41 worker_pool_threads:4
+21 replication_sync_timeout:300
+22 replication_timeout:1
+23 rows_per_wal:500000
+24 slab_alloc_factor:1.05
+25 too_long_threshold:0.5
+26 vinyl_bloom_fpr:0.05
+27 vinyl_cache:134217728
+28 vinyl_dir:.
+29 vinyl_max_tuple_size:1048576
+30 vinyl_memory:134217728
+31 vinyl_page_size:8192
+32 vinyl_range_size:1073741824
+33 vinyl_read_threads:1
+34 vinyl_run_count_per_level:2
+35 vinyl_run_size_ratio:3.5
+36 vinyl_timeout:60
+37 vinyl_write_threads:2
+38 wal_dir:.
+39 wal_dir_rescan_delay:2
+40 wal_max_size:268435456
+41 wal_mode:write
+42 worker_pool_threads:4
--
-- Test insert from detached fiber
--
diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
index d315346de..023a2af72 100755
--- a/test/box-tap/cfg.test.lua
+++ b/test/box-tap/cfg.test.lua
@@ -6,7 +6,7 @@ local socket = require('socket')
local fio = require('fio')
local uuid = require('uuid')
local msgpack = require('msgpack')
-test:plan(91)
+test:plan(94)
--------------------------------------------------------------------------------
-- Invalid values
@@ -29,6 +29,8 @@ invalid('replication_timeout', -1)
invalid('replication_timeout', 0)
invalid('replication_sync_lag', -1)
invalid('replication_sync_lag', 0)
+invalid('replication_sync_timeout', -1)
+invalid('replication_sync_timeout', 0)
invalid('replication_connect_timeout', -1)
invalid('replication_connect_timeout', 0)
invalid('replication_connect_quorum', -1)
@@ -100,6 +102,11 @@ status, result = pcall(box.cfg, {replication_sync_lag = 1})
test:ok(status, "dynamic replication_sync_lag")
pcall(box.cfg, {repliction_sync_lag = lag})
+timeout = box.cfg.replication_sync_timeout
+status, result = pcall(box.cfg, {replication_sync_timeout = 10})
+test:ok(status, "dynamic replication_sync_timeout")
+pcall(box.cfg, {repliction_sync_timeout = timeout})
+
--------------------------------------------------------------------------------
-- gh-534: Segmentation fault after two bad wal_mode settings
--------------------------------------------------------------------------------
diff --git a/test/box/admin.result b/test/box/admin.result
index c3e318a6a..ace88e6e9 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -58,6 +58,8 @@ cfg_filter(box.cfg)
- 30
- - replication_sync_lag
- 10
+ - - replication_sync_timeout
+ - 300
- - replication_timeout
- 1
- - rows_per_wal
diff --git a/test/box/cfg.result b/test/box/cfg.result
index a2df83310..816178513 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -54,6 +54,8 @@ cfg_filter(box.cfg)
- 30
- - replication_sync_lag
- 10
+ - - replication_sync_timeout
+ - 300
- - replication_timeout
- 1
- - rows_per_wal
@@ -143,6 +145,8 @@ cfg_filter(box.cfg)
- 30
- - replication_sync_lag
- 10
+ - - replication_sync_timeout
+ - 300
- - replication_timeout
- 1
- - rows_per_wal
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update
2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia
@ 2018-08-30 23:38 ` Olga Arkhangelskaia
2018-08-31 13:01 ` Vladimir Davydov
2018-09-04 13:24 ` [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic Kirill Yukhin
2 siblings, 1 reply; 6+ messages in thread
From: Olga Arkhangelskaia @ 2018-08-30 23:38 UTC (permalink / raw)
To: tarantool-patches; +Cc: Olga Arkhangelskaia
When replica reconnects to replica set not for the first time, we
suffer from absence of synchronization. Such behavior leads to giving
away outdated data.
Closes #3427
@TarantoolBot document
Title: Orphan status after configuration update or initial bootstrap.
In case of initial bootstrap or after configuration update we can get
an orphan status in two cases. If we synced up with number of replicas
that is smaller than quorum or if we failed to sync up during the time
specified in replication_sync_timeout.
---
https://github.com/tarantool/tarantool/issues/3427
https://github.com/tarantool/tarantool/tree/OKriw/gh-3427-replication-no-sync-1.9
v1:
https://www.freelists.org/post/tarantool-patches/PATCH-replication-adds-replication-sync-after-cfg-update
v2:
https://www.freelists.org/post/tarantool-patches/PATCH-v2-replication-adds-replication-sync-after-cfg-update
v3:
https://www.freelists.org/post/tarantool-patches/PATCH-v3-box-adds-replication-sync-after-cfg-update
v4:
https://www.freelists.org/post/tarantool-patches/PATCH-v4-22-box-adds-replication-sync-after-cfg-update
v5:
https://www.freelists.org/post/tarantool-patches/PATCH-v5-33-box-adds-replication-sync-after-cfg-update
v6:
https://www.freelists.org/post/tarantool-patches/PATCH-v6-33-box-adds-replication-sync-after-cfg-update
Changes in v2:
- fixed test
- changed replicaset_sync
Changes in v3:
- now we raise the exception when sync is not successful.
- fixed test
- renamed test
Changes in v4:
- fixed test
- replication_sync_lag is made dynamicall in separate patch
- removed unnecessary error type
- moved say_crit to another place
- in case of sync error we rollback to prev. config
Changes in v5:
- added test case
- now we don't roll back to prev. cfg
Changes in v6:
- set orphan
- added testcases
Changes in v7:
- fixed test with orphan state (added error inj.)
- now we enter orphan state in the end of sync - this point need to be discussed
- no good check for big replication lag - need to be discussed
src/box/box.cc | 17 +++++
src/box/box.h | 8 +++
src/box/replication.cc | 1 +
test/replication/sync.result | 146 +++++++++++++++++++++++++++++++++++++++++
test/replication/sync.test.lua | 73 +++++++++++++++++++++
5 files changed, 245 insertions(+)
create mode 100644 test/replication/sync.result
create mode 100644 test/replication/sync.test.lua
diff --git a/src/box/box.cc b/src/box/box.cc
index dcedfd002..ca7bb60c2 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -231,6 +231,19 @@ box_clear_orphan(void)
title("running");
}
+void
+box_set_orphan(void)
+{
+ if (is_orphan)
+ return; /* nothing to do */
+
+ is_orphan = true;
+ fiber_cond_broadcast(&ro_cond);
+
+ /* Update the title to reflect the new status. */
+ title("orphan");
+}
+
struct wal_stream {
struct xstream base;
/** How many rows have been recovered so far. */
@@ -646,6 +659,10 @@ box_set_replication(void)
box_sync_replication(true);
/* Follow replica */
replicaset_follow();
+ /* Set orphan and sync replica up to quorum.
+ * If we fail to sync up, replica will be left in orphan state.
+ */
+ replicaset_sync();
}
void
diff --git a/src/box/box.h b/src/box/box.h
index 6e1c13f59..15a305fb1 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -107,6 +107,14 @@ box_wait_ro(bool ro, double timeout);
void
box_clear_orphan(void);
+/**
+ * Switch this instance from 'running' to 'orphan' state.
+ * Called on configuration change if this instance failed to
+ * synchronizes with enough replicas to form a quorum.
+ */
+void
+box_set_orphan(void);
+
/** True if snapshot is in progress. */
extern bool box_checkpoint_is_in_progress;
/** Incremented with each next snapshot. */
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 4001c86a3..ef365c17f 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -695,6 +695,7 @@ replicaset_sync(void)
* in 'orphan' state.
*/
say_crit("entering orphan mode");
+ box_set_orphan();
return;
}
diff --git a/test/replication/sync.result b/test/replication/sync.result
new file mode 100644
index 000000000..bca3250d1
--- /dev/null
+++ b/test/replication/sync.result
@@ -0,0 +1,146 @@
+fiber = require('fiber')
+---
+...
+--
+-- gh-3427: no sync after configuration update
+--
+--
+-- successful sync
+--
+env = require('test_run')
+---
+...
+test_run = env.new()
+---
+...
+engine = test_run:get_cfg('engine')
+---
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+s = box.schema.space.create('test', {engine = engine})
+---
+...
+index = s:create_index('primary')
+---
+...
+-- change replica configuration
+test_run:cmd("switch replica")
+---
+- true
+...
+replication = box.cfg.replication
+---
+...
+box.cfg{replication={}}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit()
+---
+...
+box.space.test:count()
+---
+- 100
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{replication = replication}
+---
+...
+box.space.test:count() == 100
+---
+- true
+...
+--
+-- unsuccessful sync entering orphan state
+--
+box.cfg{replication={}}
+---
+...
+box.cfg{replication_sync_timeout = 0.000001}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit()
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.cfg{replication = replication}
+---
+...
+status = box.info.status
+---
+...
+status == "orphan"
+---
+- true
+...
+while status == "orphan" do require'fiber'.sleep(0.1) status = box.info.status end
+---
+...
+--
+-- replication_sync_lag is too big
+--
+box.cfg{replication_sync_lag = 100}
+---
+...
+test_run:cmd("switch default")
+---
+- true
+...
+function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end
+---
+...
+_=fiber.create(f)
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+box.space.test:count() < 500
+---
+- true
+...
+test_run:cmd("switch default")
+---
+- true
+...
+-- cleanup
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+box.space.test:drop()
+---
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
new file mode 100644
index 000000000..c15581a50
--- /dev/null
+++ b/test/replication/sync.test.lua
@@ -0,0 +1,73 @@
+fiber = require('fiber')
+--
+-- gh-3427: no sync after configuration update
+--
+
+--
+-- successful sync
+--
+
+env = require('test_run')
+test_run = env.new()
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+
+s = box.schema.space.create('test', {engine = engine})
+index = s:create_index('primary')
+
+-- change replica configuration
+test_run:cmd("switch replica")
+replication = box.cfg.replication
+box.cfg{replication={}}
+
+test_run:cmd("switch default")
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit()
+box.space.test:count()
+
+test_run:cmd("switch replica")
+box.cfg{replication = replication}
+box.space.test:count() == 100
+
+--
+-- unsuccessful sync entering orphan state
+--
+box.cfg{replication={}}
+box.cfg{replication_sync_timeout = 1}
+
+test_run:cmd("switch default")
+-- insert values on the master while replica is unconfigured
+box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit()
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01)
+test_run:cmd("switch replica")
+
+box.cfg{replication = replication}
+status = box.info.status
+status == "orphan"
+
+while status == "orphan" do require'fiber'.sleep(0.1) status = box.info.status end
+
+--
+-- replication_sync_lag is too big
+--
+
+box.cfg{replication_sync_lag = 100}
+
+test_run:cmd("switch default")
+
+function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end
+_=fiber.create(f)
+
+test_run:cmd("switch replica")
+box.space.test:count() < 500
+
+test_run:cmd("switch default")
+-- cleanup
+test_run:cmd("stop server replica")
+test_run:cmd("cleanup server replica")
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update
2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia
@ 2018-08-31 13:01 ` Vladimir Davydov
0 siblings, 0 replies; 6+ messages in thread
From: Vladimir Davydov @ 2018-08-31 13:01 UTC (permalink / raw)
To: Konstantin Osipov; +Cc: tarantool-patches, Olga Arkhangelskaia
I reworked this patch, added a proper test, and forced pushed it
to the branch:
https://github.com/tarantool/tarantool/commits/OKriw/gh-3427-replication-no-sync-1.9
Kostja, please take a look. The new patch is below:
From 43839be2feb3aad444b44b84076d5f0f6b374b55 Mon Sep 17 00:00:00 2001
From: Vladimir Davydov <vdavydov.dev@gmail.com>
Date: Fri, 31 Aug 2018 13:11:58 +0300
Subject: [PATCH] box: sync on replication configuration update
Now box.cfg() doesn't return until 'quorum' appliers are in sync not
only on initial configuration, but also on replication configuration
update. If it fails to synchronize within replication_sync_timeout,
box.cfg() returns without an error, but the instance enters 'orphan'
state, which is basically read-only mode. In the meantime, appliers
will keep trying to synchronize in the background, and the instance
will leave 'orphan' state as soon as enough appliers are in sync.
Note, this patch also changes logging a bit:
- 'ready to accept request' is printed on startup before syncing
with the replica set, because although the instance is read-only
at that time, it can indeed accept all sorts of ro requests.
- For 'connecting', 'connected', 'synchronizing' messages, we now
use 'info' logging level, not 'verbose' as they used to be, because
those messages are important as they give the admin idea what's
going on with the instance, and they can't flood logs.
- 'sync complete' message is also printed as 'info', not 'crit',
because there's nothing critical about it (it's not an error).
Also note that we only enter 'orphan' state if failed to synchronize.
In particular, if the instnace manages to synchronize with all replicas
within a timeout, it will jump from 'loading' straight into 'running'
bypassing 'orphan' state. This is done for the sake of consistency
between initial configuration and reconfiguration.
Closes #3427
@TarantoolBot document
Title: Sync on replication configuration update
The behavior of box.cfg() on replication configuration update is
now consistent with initial configuration, that is box.cfg() will
not return until it synchronizes with as many masters as specified
by replication_connect_quorum configuration option or the timeout
specified by replication_connect_sync occurs. On timeout, it will
return without an error, but the instance will enter 'orphan' state.
It will leave 'orphan' state as soon as enough appliers have synced.
diff --git a/src/box/box.cc b/src/box/box.cc
index dcedfd00..63b2974f 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -110,7 +110,7 @@ static fiber_cond ro_cond;
* synchronize to a sufficient number of replicas to form
* a quorum and so was forced to switch to read-only mode.
*/
-static bool is_orphan = true;
+static bool is_orphan;
/* Use the shared instance of xstream for all appliers */
static struct xstream join_stream;
@@ -219,16 +219,22 @@ box_wait_ro(bool ro, double timeout)
}
void
-box_clear_orphan(void)
+box_set_orphan(bool orphan)
{
- if (!is_orphan)
+ if (is_orphan == orphan)
return; /* nothing to do */
- is_orphan = false;
+ is_orphan = orphan;
fiber_cond_broadcast(&ro_cond);
/* Update the title to reflect the new status. */
- title("running");
+ if (is_orphan) {
+ say_crit("entering orphan mode");
+ title("orphan");
+ } else {
+ say_crit("leaving orphan mode");
+ title("running");
+ }
}
struct wal_stream {
@@ -646,6 +652,8 @@ box_set_replication(void)
box_sync_replication(true);
/* Follow replica */
replicaset_follow();
+ /* Wait until appliers are in sync */
+ replicaset_sync();
}
void
@@ -1893,8 +1901,6 @@ box_cfg_xc(void)
/** Begin listening only when the local recovery is complete. */
box_listen();
- title("orphan");
-
/*
* In case of recovering from a checkpoint we
* don't need to wait for 'quorum' masters, since
@@ -1913,8 +1919,6 @@ box_cfg_xc(void)
*/
box_listen();
- title("orphan");
-
/*
* Wait for the cluster to start up.
*
@@ -1951,25 +1955,17 @@ box_cfg_xc(void)
rmean_cleanup(rmean_box);
- /*
- * If this instance is a leader of a newly bootstrapped
- * cluster, it is uptodate by definition so leave the
- * 'orphan' mode right away to let it initialize cluster
- * schema.
- */
- if (is_bootstrap_leader)
- box_clear_orphan();
-
/* Follow replica */
replicaset_follow();
fiber_gc();
is_box_configured = true;
+ title("running");
+ say_info("ready to accept requests");
+
if (!is_bootstrap_leader)
replicaset_sync();
-
- say_info("ready to accept requests");
}
void
diff --git a/src/box/box.h b/src/box/box.h
index 6e1c13f5..5e5ec7d3 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -100,12 +100,16 @@ int
box_wait_ro(bool ro, double timeout);
/**
- * Switch this instance from 'orphan' to 'running' state.
- * Called on initial configuration as soon as this instance
- * synchronizes with enough replicas to form a quorum.
+ * Switch this instance from 'orphan' to 'running' state or
+ * vice versa depending on the value of the function argument.
+ *
+ * An instance enters 'orphan' state on returning from box.cfg()
+ * if it failed to synchornize with 'quorum' replicas within a
+ * specified timeout. It will keep trying to synchronize in the
+ * background and leave 'orphan' state once it's done.
*/
void
-box_clear_orphan(void);
+box_set_orphan(bool orphan);
/** True if snapshot is in progress. */
extern bool box_checkpoint_is_in_progress;
diff --git a/src/box/replication.cc b/src/box/replication.cc
index ff56f442..b0952740 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -548,7 +548,8 @@ replicaset_connect(struct applier **appliers, int count,
replicaset_update(appliers, count);
return;
}
- say_verbose("connecting to %d replicas", count);
+
+ say_info("connecting to %d replicas", count);
/*
* Simultaneously connect to remote peers to receive their UUIDs
@@ -602,7 +603,7 @@ replicaset_connect(struct applier **appliers, int count,
if (connect_quorum && state.connected < quorum)
goto error;
} else {
- say_verbose("connected to %d replicas", state.connected);
+ say_info("connected to %d replicas", state.connected);
}
for (int i = 0; i < count; i++) {
@@ -636,13 +637,6 @@ error:
void
replicaset_follow(void)
{
- if (replicaset.applier.total == 0) {
- /*
- * Replication is not configured.
- */
- box_clear_orphan();
- return;
- }
struct replica *replica;
replicaset_foreach(replica) {
/* Resume connected appliers. */
@@ -653,13 +647,6 @@ replicaset_follow(void)
/* Restart appliers that failed to connect. */
applier_start(replica->applier);
}
- if (replicaset_quorum() == 0) {
- /*
- * Leaving orphan mode immediately since
- * replication_connect_quorum is set to 0.
- */
- box_clear_orphan();
- }
}
void
@@ -667,10 +654,16 @@ replicaset_sync(void)
{
int quorum = replicaset_quorum();
- if (quorum == 0)
+ if (quorum == 0) {
+ /*
+ * Quorum is 0 or replication is not configured.
+ * Leaving 'orphan' state immediately.
+ */
+ box_set_orphan(false);
return;
+ }
- say_verbose("synchronizing with %d replicas", quorum);
+ say_info("synchronizing with %d replicas", quorum);
/*
* Wait until all connected replicas synchronize up to
@@ -691,22 +684,21 @@ replicaset_sync(void)
* Do not stall configuration, leave the instance
* in 'orphan' state.
*/
- say_crit("entering orphan mode");
- return;
+ say_crit("failed to synchronize with %d out of %d replicas",
+ replicaset.applier.total - replicaset.applier.synced,
+ replicaset.applier.total);
+ box_set_orphan(true);
+ } else {
+ say_info("replica set sync complete");
+ box_set_orphan(false);
}
-
- say_crit("replica set sync complete, quorum of %d "
- "replicas formed", quorum);
}
void
replicaset_check_quorum(void)
{
- if (replicaset.applier.synced >= replicaset_quorum()) {
- if (replicaset_quorum() > 0)
- say_crit("leaving orphan mode");
- box_clear_orphan();
- }
+ if (replicaset.applier.synced >= replicaset_quorum())
+ box_set_orphan(false);
}
void
diff --git a/test/replication/sync.result b/test/replication/sync.result
new file mode 100644
index 00000000..8994aa3e
--- /dev/null
+++ b/test/replication/sync.result
@@ -0,0 +1,236 @@
+fiber = require('fiber')
+---
+...
+test_run = require('test_run').new()
+---
+...
+engine = test_run:get_cfg('engine')
+---
+...
+box.schema.user.grant('guest', 'replication')
+---
+...
+_ = box.schema.space.create('test', {engine = engine})
+---
+...
+_ = box.space.test:create_index('pk')
+---
+...
+-- Slow down replication a little to test replication_sync_lag.
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
+---
+- ok
+...
+-- Helper that adds some records to the space and then starts
+-- a fiber to add more records in the background.
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+count = 0;
+---
+...
+function fill()
+ for i = count + 1, count + 100 do
+ box.space.test:replace{i}
+ end
+ fiber.create(function()
+ for i = count + 101, count + 200 do
+ fiber.sleep(0.0001)
+ box.space.test:replace{i}
+ end
+ end)
+ count = count + 200
+end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+-- Deploy a replica.
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+---
+- true
+...
+test_run:cmd("start server replica")
+---
+- true
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+fiber = require('fiber')
+---
+...
+-- Stop replication.
+replication = box.cfg.replication
+---
+...
+box.cfg{replication = {}}
+---
+...
+-- Fill the space.
+test_run:cmd("switch default")
+---
+- true
+...
+fill()
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+-- Resume replication.
+--
+-- Since max allowed lag is small, all recoreds should arrive
+-- by the time box.cfg() returns.
+--
+box.cfg{replication_sync_lag = 0.001}
+---
+...
+box.cfg{replication = replication}
+---
+...
+box.space.test:count() == 200
+---
+- true
+...
+box.info.status -- running
+---
+- running
+...
+box.info.ro -- false
+---
+- false
+...
+-- Stop replication.
+replication = box.cfg.replication
+---
+...
+box.cfg{replication = {}}
+---
+...
+-- Fill the space.
+test_run:cmd("switch default")
+---
+- true
+...
+fill()
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+-- Resume replication
+--
+-- Since max allowed lag is big, not all records will arrive
+-- upon returning from box.cfg() but the instance won't enter
+-- orphan state.
+--
+box.cfg{replication_sync_lag = 1}
+---
+...
+box.cfg{replication = replication}
+---
+...
+box.space.test:count() < 400
+---
+- true
+...
+box.info.status -- running
+---
+- running
+...
+box.info.ro -- false
+---
+- false
+...
+-- Wait for remaining rows to arrive.
+repeat fiber.sleep(0.01) until box.space.test:count() == 400
+---
+...
+-- Stop replication.
+replication = box.cfg.replication
+---
+...
+box.cfg{replication = {}}
+---
+...
+-- Fill the space.
+test_run:cmd("switch default")
+---
+- true
+...
+fill()
+---
+...
+test_run:cmd("switch replica")
+---
+- true
+...
+-- Resume replication
+--
+-- Since max allowed lag is big, not all records will arrive
+-- upon returning from box.cfg() but the instance won't enter
+-- orphan state.
+--
+box.cfg{replication_sync_lag = 0.001, replication_sync_timeout = 0.001}
+---
+...
+box.cfg{replication = replication}
+---
+...
+box.space.test:count() < 600
+---
+- true
+...
+box.info.status -- orphan
+---
+- orphan
+...
+box.info.ro -- true
+---
+- true
+...
+-- Wait for remaining rows to arrive.
+repeat fiber.sleep(0.01) until box.space.test:count() == 600
+---
+...
+-- Make sure replica leaves oprhan state.
+repeat fiber.sleep(0.01) until box.info.status ~= 'orphan'
+---
+...
+box.info.status -- running
+---
+- running
+...
+box.info.ro -- false
+---
+- false
+...
+test_run:cmd("switch default")
+---
+- true
+...
+test_run:cmd("stop server replica")
+---
+- true
+...
+test_run:cmd("cleanup server replica")
+---
+- true
+...
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+---
+- ok
+...
+box.space.test:drop()
+---
+...
+box.schema.user.revoke('guest', 'replication')
+---
+...
diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua
new file mode 100644
index 00000000..3cef825d
--- /dev/null
+++ b/test/replication/sync.test.lua
@@ -0,0 +1,116 @@
+fiber = require('fiber')
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+
+box.schema.user.grant('guest', 'replication')
+_ = box.schema.space.create('test', {engine = engine})
+_ = box.space.test:create_index('pk')
+
+-- Slow down replication a little to test replication_sync_lag.
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001)
+
+-- Helper that adds some records to the space and then starts
+-- a fiber to add more records in the background.
+test_run:cmd("setopt delimiter ';'")
+count = 0;
+function fill()
+ for i = count + 1, count + 100 do
+ box.space.test:replace{i}
+ end
+ fiber.create(function()
+ for i = count + 101, count + 200 do
+ fiber.sleep(0.0001)
+ box.space.test:replace{i}
+ end
+ end)
+ count = count + 200
+end;
+test_run:cmd("setopt delimiter ''");
+
+-- Deploy a replica.
+test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'")
+test_run:cmd("start server replica")
+test_run:cmd("switch replica")
+
+fiber = require('fiber')
+
+-- Stop replication.
+replication = box.cfg.replication
+box.cfg{replication = {}}
+
+-- Fill the space.
+test_run:cmd("switch default")
+fill()
+test_run:cmd("switch replica")
+
+-- Resume replication.
+--
+-- Since max allowed lag is small, all recoreds should arrive
+-- by the time box.cfg() returns.
+--
+box.cfg{replication_sync_lag = 0.001}
+box.cfg{replication = replication}
+box.space.test:count() == 200
+box.info.status -- running
+box.info.ro -- false
+
+-- Stop replication.
+replication = box.cfg.replication
+box.cfg{replication = {}}
+
+-- Fill the space.
+test_run:cmd("switch default")
+fill()
+test_run:cmd("switch replica")
+
+-- Resume replication
+--
+-- Since max allowed lag is big, not all records will arrive
+-- upon returning from box.cfg() but the instance won't enter
+-- orphan state.
+--
+box.cfg{replication_sync_lag = 1}
+box.cfg{replication = replication}
+box.space.test:count() < 400
+box.info.status -- running
+box.info.ro -- false
+
+-- Wait for remaining rows to arrive.
+repeat fiber.sleep(0.01) until box.space.test:count() == 400
+
+-- Stop replication.
+replication = box.cfg.replication
+box.cfg{replication = {}}
+
+-- Fill the space.
+test_run:cmd("switch default")
+fill()
+test_run:cmd("switch replica")
+
+-- Resume replication
+--
+-- Since max allowed lag is big, not all records will arrive
+-- upon returning from box.cfg() but the instance won't enter
+-- orphan state.
+--
+box.cfg{replication_sync_lag = 0.001, replication_sync_timeout = 0.001}
+box.cfg{replication = replication}
+box.space.test:count() < 600
+box.info.status -- orphan
+box.info.ro -- true
+
+-- Wait for remaining rows to arrive.
+repeat fiber.sleep(0.01) until box.space.test:count() == 600
+
+-- Make sure replica leaves oprhan state.
+repeat fiber.sleep(0.01) until box.info.status ~= 'orphan'
+box.info.status -- running
+box.info.ro -- false
+
+test_run:cmd("switch default")
+test_run:cmd("stop server replica")
+test_run:cmd("cleanup server replica")
+
+box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0)
+box.space.test:drop()
+box.schema.user.revoke('guest', 'replication')
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic
2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia
@ 2018-09-04 13:24 ` Kirill Yukhin
2 siblings, 0 replies; 6+ messages in thread
From: Kirill Yukhin @ 2018-09-04 13:24 UTC (permalink / raw)
To: tarantool-patches; +Cc: Olga Arkhangelskaia
Hello,
On 31 авг 02:38, Olga Arkhangelskaia wrote:
> In gh-3427 replication_sync_lag should be taken into account during
> replication reconfiguration. In order to configure replication properly
> this parameter is made dynamical and can be changed on demand.
>
> @TarantoolBot document
> Title: recation_sync_lag option can be set dynamically
> recation_sync_lag now can be set at any time.
I've checked updated patchset into 1.9 branch.
--
Regards, Kirill Yukhin
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic
@ 2018-08-30 14:11 Olga Arkhangelskaia
0 siblings, 0 replies; 6+ messages in thread
From: Olga Arkhangelskaia @ 2018-08-30 14:11 UTC (permalink / raw)
To: tarantool-patches; +Cc: Olga Arkhangelskaia
In gh-3427 replication_sync_lag should be taken into account during
replication reconfiguration. In order to configure replication properly
this parameter is made dynamical and can be changed on demand.
@TarantoolBot document
Title: recation_sync_lag option can be set dynamically
recation_sync_lag now can be set at any time.
---
src/box/box.cc | 8 +++++++-
src/box/box.h | 1 +
src/box/lua/cfg.cc | 12 ++++++++++++
src/box/lua/load_cfg.lua | 2 ++
test/box-tap/cfg.test.lua | 7 ++++++-
5 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index 8d7454d1f..7155ad085 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -656,6 +656,12 @@ box_set_replication_connect_quorum(void)
replicaset_check_quorum();
}
+void
+box_set_replication_sync_lag(void)
+{
+ replication_sync_lag = box_check_replication_sync_lag();
+}
+
void
box_bind(void)
{
@@ -1747,7 +1753,7 @@ box_cfg_xc(void)
box_set_replication_timeout();
box_set_replication_connect_timeout();
box_set_replication_connect_quorum();
- replication_sync_lag = box_check_replication_sync_lag();
+ box_set_replication_sync_lag();
xstream_create(&join_stream, apply_initial_join_row);
xstream_create(&subscribe_stream, apply_row);
diff --git a/src/box/box.h b/src/box/box.h
index 9dfb3fd2a..3090fdcdb 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -176,6 +176,7 @@ void box_set_vinyl_timeout(void);
void box_set_replication_timeout(void);
void box_set_replication_connect_timeout(void);
void box_set_replication_connect_quorum(void);
+void box_set_replication_sync_lag(void);
extern "C" {
#endif /* defined(__cplusplus) */
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index 0ca150877..5442723b5 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -262,6 +262,17 @@ lbox_cfg_set_replication_connect_quorum(struct lua_State *L)
return 0;
}
+static int
+lbox_cfg_set_replication_sync_lag(struct lua_State *L)
+{
+ try {
+ box_set_replication_sync_lag();
+ } catch (Exception *) {
+ luaT_error(L);
+ }
+ return 0;
+}
+
void
box_lua_cfg_init(struct lua_State *L)
{
@@ -286,6 +297,7 @@ box_lua_cfg_init(struct lua_State *L)
{"cfg_set_replication_timeout", lbox_cfg_set_replication_timeout},
{"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout},
{"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum},
+ {"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag},
{NULL, NULL}
};
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 2a7142de6..f803d8987 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -199,6 +199,7 @@ local dynamic_cfg = {
replication_timeout = private.cfg_set_replication_timeout,
replication_connect_timeout = private.cfg_set_replication_connect_timeout,
replication_connect_quorum = private.cfg_set_replication_connect_quorum,
+ replication_sync_lag = private.cfg_set_replication_sync_lag,
instance_uuid = function()
if box.cfg.instance_uuid ~= box.info.uuid then
box.error(box.error.CFG, 'instance_uuid',
@@ -220,6 +221,7 @@ local dynamic_cfg_skip_at_load = {
replication_timeout = true,
replication_connect_timeout = true,
replication_connect_quorum = true,
+ replication_sync_lag = true,
wal_dir_rescan_delay = true,
custom_proc_title = true,
force_recovery = true,
diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua
index 5e72004ca..d315346de 100755
--- a/test/box-tap/cfg.test.lua
+++ b/test/box-tap/cfg.test.lua
@@ -6,7 +6,7 @@ local socket = require('socket')
local fio = require('fio')
local uuid = require('uuid')
local msgpack = require('msgpack')
-test:plan(90)
+test:plan(91)
--------------------------------------------------------------------------------
-- Invalid values
@@ -95,6 +95,11 @@ test:ok(status and result == 'table', 'configured box')
invalid('log_level', 'unknown')
+lag = box.cfg.replication_sync_lag
+status, result = pcall(box.cfg, {replication_sync_lag = 1})
+test:ok(status, "dynamic replication_sync_lag")
+pcall(box.cfg, {repliction_sync_lag = lag})
+
--------------------------------------------------------------------------------
-- gh-534: Segmentation fault after two bad wal_mode settings
--------------------------------------------------------------------------------
--
2.14.3 (Apple Git-98)
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-09-04 13:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia
2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia
2018-08-31 13:01 ` Vladimir Davydov
2018-09-04 13:24 ` [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic Kirill Yukhin
-- strict thread matches above, loose matches on Subject: below --
2018-08-30 14:11 [tarantool-patches] " Olga Arkhangelskaia
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox