* [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic @ 2018-08-30 23:38 Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia ` (2 more replies) 0 siblings, 3 replies; 5+ messages in thread From: Olga Arkhangelskaia @ 2018-08-30 23:38 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia In gh-3427 replication_sync_lag should be taken into account during replication reconfiguration. In order to configure replication properly this parameter is made dynamical and can be changed on demand. @TarantoolBot document Title: recation_sync_lag option can be set dynamically recation_sync_lag now can be set at any time. --- src/box/box.cc | 8 +++++++- src/box/box.h | 1 + src/box/lua/cfg.cc | 12 ++++++++++++ src/box/lua/load_cfg.lua | 2 ++ test/box-tap/cfg.test.lua | 7 ++++++- 5 files changed, 28 insertions(+), 2 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index 8d7454d1f..7155ad085 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -656,6 +656,12 @@ box_set_replication_connect_quorum(void) replicaset_check_quorum(); } +void +box_set_replication_sync_lag(void) +{ + replication_sync_lag = box_check_replication_sync_lag(); +} + void box_bind(void) { @@ -1747,7 +1753,7 @@ box_cfg_xc(void) box_set_replication_timeout(); box_set_replication_connect_timeout(); box_set_replication_connect_quorum(); - replication_sync_lag = box_check_replication_sync_lag(); + box_set_replication_sync_lag(); xstream_create(&join_stream, apply_initial_join_row); xstream_create(&subscribe_stream, apply_row); diff --git a/src/box/box.h b/src/box/box.h index 9dfb3fd2a..3090fdcdb 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -176,6 +176,7 @@ void box_set_vinyl_timeout(void); void box_set_replication_timeout(void); void box_set_replication_connect_timeout(void); void box_set_replication_connect_quorum(void); +void box_set_replication_sync_lag(void); extern "C" { #endif /* defined(__cplusplus) */ diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc index 0ca150877..5442723b5 100644 --- a/src/box/lua/cfg.cc +++ b/src/box/lua/cfg.cc @@ -262,6 +262,17 @@ lbox_cfg_set_replication_connect_quorum(struct lua_State *L) return 0; } +static int +lbox_cfg_set_replication_sync_lag(struct lua_State *L) +{ + try { + box_set_replication_sync_lag(); + } catch (Exception *) { + luaT_error(L); + } + return 0; +} + void box_lua_cfg_init(struct lua_State *L) { @@ -286,6 +297,7 @@ box_lua_cfg_init(struct lua_State *L) {"cfg_set_replication_timeout", lbox_cfg_set_replication_timeout}, {"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout}, {"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum}, + {"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag}, {NULL, NULL} }; diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index 2a7142de6..f803d8987 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -199,6 +199,7 @@ local dynamic_cfg = { replication_timeout = private.cfg_set_replication_timeout, replication_connect_timeout = private.cfg_set_replication_connect_timeout, replication_connect_quorum = private.cfg_set_replication_connect_quorum, + replication_sync_lag = private.cfg_set_replication_sync_lag, instance_uuid = function() if box.cfg.instance_uuid ~= box.info.uuid then box.error(box.error.CFG, 'instance_uuid', @@ -220,6 +221,7 @@ local dynamic_cfg_skip_at_load = { replication_timeout = true, replication_connect_timeout = true, replication_connect_quorum = true, + replication_sync_lag = true, wal_dir_rescan_delay = true, custom_proc_title = true, force_recovery = true, diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua index 5e72004ca..d315346de 100755 --- a/test/box-tap/cfg.test.lua +++ b/test/box-tap/cfg.test.lua @@ -6,7 +6,7 @@ local socket = require('socket') local fio = require('fio') local uuid = require('uuid') local msgpack = require('msgpack') -test:plan(90) +test:plan(91) -------------------------------------------------------------------------------- -- Invalid values @@ -95,6 +95,11 @@ test:ok(status and result == 'table', 'configured box') invalid('log_level', 'unknown') +lag = box.cfg.replication_sync_lag +status, result = pcall(box.cfg, {replication_sync_lag = 1}) +test:ok(status, "dynamic replication_sync_lag") +pcall(box.cfg, {repliction_sync_lag = lag}) + -------------------------------------------------------------------------------- -- gh-534: Segmentation fault after two bad wal_mode settings -------------------------------------------------------------------------------- -- 2.14.3 (Apple Git-98) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout 2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia @ 2018-08-30 23:38 ` Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia 2018-09-04 13:24 ` [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic Kirill Yukhin 2 siblings, 0 replies; 5+ messages in thread From: Olga Arkhangelskaia @ 2018-08-30 23:38 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia In scope of gh-3427 we need timeout in case if replicaset will wait for synchronization for too long, or even forever. Default value is 300. Closes #3674 @TarantoolBot document Title: Introduce new option replication_sync_timeout. After initial bootstrap or after replication configuration changes we need to sync up with replication quorum. Sometimes sync can take too long or replication_sync_lag can be smaller than network latency we replica will stuck in sync loop that can't be cancelled.To avoid this situations replication_sync_timeout can be used. When time set in replication_sync_timeout is passed replica enters orphan state. Can be set dynamically. Default value is 300 seconds. --- https://github.com/tarantool/tarantool/issues/3647 https://github.com/tarantool/tarantool/tree/OKriw/gh-3427-replication-no-sync-1.9 v1: https://www.freelists.org/post/tarantool-patches/PATCH-23-box-add-replication-sync-lag-timeout v2: https://www.freelists.org/post/tarantool-patches/PATCH-v2-23-box-add-replication-sync-timeout Changes in v2: - renamed replication_sync_lag_timeout to replication_sync_timeout - fiber_cond_timeout changed to deadline - default time is set to 300 Changes in v3: -fixed spacesi, empty lines src/box/box.cc | 19 ++++++++++++++++++ src/box/box.h | 1 + src/box/lua/cfg.cc | 12 ++++++++++++ src/box/lua/load_cfg.lua | 4 ++++ src/box/replication.cc | 14 +++++++++++--- src/box/replication.h | 6 ++++++ test/app-tap/init_script.result | 43 +++++++++++++++++++++-------------------- test/box-tap/cfg.test.lua | 9 ++++++++- test/box/admin.result | 2 ++ test/box/cfg.result | 4 ++++ 10 files changed, 89 insertions(+), 25 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index 7155ad085..dcedfd002 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -420,6 +420,17 @@ box_check_replication_sync_lag(void) return lag; } +static double +box_check_replication_sync_timeout(void) +{ + double timeout = cfg_getd("replication_sync_timeout"); + if (timeout <= 0) { + tnt_raise(ClientError, ER_CFG, "replication_sync_timeout", + "the value must be greater than 0"); + } + return timeout; +} + static void box_check_instance_uuid(struct tt_uuid *uuid) { @@ -546,6 +557,7 @@ box_check_config() box_check_replication_connect_timeout(); box_check_replication_connect_quorum(); box_check_replication_sync_lag(); + box_check_replication_sync_timeout(); box_check_readahead(cfg_geti("readahead")); box_check_checkpoint_count(cfg_geti("checkpoint_count")); box_check_wal_max_rows(cfg_geti64("rows_per_wal")); @@ -662,6 +674,12 @@ box_set_replication_sync_lag(void) replication_sync_lag = box_check_replication_sync_lag(); } +void +box_set_replication_sync_timeout(void) +{ + replication_sync_timeout = box_check_replication_sync_timeout(); +} + void box_bind(void) { @@ -1754,6 +1772,7 @@ box_cfg_xc(void) box_set_replication_connect_timeout(); box_set_replication_connect_quorum(); box_set_replication_sync_lag(); + box_set_replication_sync_timeout(); xstream_create(&join_stream, apply_initial_join_row); xstream_create(&subscribe_stream, apply_row); diff --git a/src/box/box.h b/src/box/box.h index 3090fdcdb..6e1c13f59 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -177,6 +177,7 @@ void box_set_replication_timeout(void); void box_set_replication_connect_timeout(void); void box_set_replication_connect_quorum(void); void box_set_replication_sync_lag(void); +void box_set_replication_sync_timeout(void); extern "C" { #endif /* defined(__cplusplus) */ diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc index 5442723b5..17431dc9f 100644 --- a/src/box/lua/cfg.cc +++ b/src/box/lua/cfg.cc @@ -273,6 +273,17 @@ lbox_cfg_set_replication_sync_lag(struct lua_State *L) return 0; } +static int +lbox_cfg_set_replication_sync_timeout(struct lua_State *L) +{ + try { + box_set_replication_sync_timeout(); + } catch (Exception *) { + luaT_error(L); + } + return 0; +} + void box_lua_cfg_init(struct lua_State *L) { @@ -298,6 +309,7 @@ box_lua_cfg_init(struct lua_State *L) {"cfg_set_replication_connect_timeout", lbox_cfg_set_replication_connect_timeout}, {"cfg_set_replication_connect_quorum", lbox_cfg_set_replication_connect_quorum}, {"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag}, + {"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout}, {NULL, NULL} }; diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index f803d8987..01cfe0b4d 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -72,6 +72,7 @@ local default_cfg = { worker_pool_threads = 4, replication_timeout = 1, replication_sync_lag = 10, + replication_sync_timeout = 300, replication_connect_timeout = 30, replication_connect_quorum = nil, -- connect all } @@ -128,6 +129,7 @@ local template_cfg = { worker_pool_threads = 'number', replication_timeout = 'number', replication_sync_lag = 'number', + replication_sync_timeout = 'number', replication_connect_timeout = 'number', replication_connect_quorum = 'number', } @@ -200,6 +202,7 @@ local dynamic_cfg = { replication_connect_timeout = private.cfg_set_replication_connect_timeout, replication_connect_quorum = private.cfg_set_replication_connect_quorum, replication_sync_lag = private.cfg_set_replication_sync_lag, + replication_sync_timeout = private.cfg_set_replication_sync_timeout, instance_uuid = function() if box.cfg.instance_uuid ~= box.info.uuid then box.error(box.error.CFG, 'instance_uuid', @@ -222,6 +225,7 @@ local dynamic_cfg_skip_at_load = { replication_connect_timeout = true, replication_connect_quorum = true, replication_sync_lag = true, + replication_sync_timeout = true, wal_dir_rescan_delay = true, custom_proc_title = true, force_recovery = true, diff --git a/src/box/replication.cc b/src/box/replication.cc index 861ce34ea..4001c86a3 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -49,6 +49,7 @@ double replication_timeout = 1.0; /* seconds */ double replication_connect_timeout = 30.0; /* seconds */ int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL; double replication_sync_lag = 10.0; /* seconds */ +double replication_sync_timeout = 300.0; /* seconds */ struct replicaset replicaset; @@ -673,12 +674,19 @@ replicaset_sync(void) /* * Wait until all connected replicas synchronize up to - * replication_sync_lag + * replication_sync_lag or return on replication_sync_timeout */ + double start_time = ev_monotonic_now(loop()); + double deadline = start_time + replication_sync_timeout; while (replicaset.applier.synced < quorum && replicaset.applier.connected + - replicaset.applier.loading >= quorum) - fiber_cond_wait(&replicaset.applier.cond); + replicaset.applier.loading >= quorum) { + if (fiber_cond_wait_deadline(&replicaset.applier.cond, + deadline) != 0) { + break; + } + + } if (replicaset.applier.synced < quorum) { /* diff --git a/src/box/replication.h b/src/box/replication.h index 06a2867b6..a6f1dbf69 100644 --- a/src/box/replication.h +++ b/src/box/replication.h @@ -126,6 +126,12 @@ extern int replication_connect_quorum; */ extern double replication_sync_lag; +/** + * Time to wait before enter orphan state in case of unsuccessful + * synchronization. + */ +extern double replication_sync_timeout; + /** * Wait for the given period of time before trying to reconnect * to a master. diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result index eea9f5bcf..261ddf3a4 100644 --- a/test/app-tap/init_script.result +++ b/test/app-tap/init_script.result @@ -23,27 +23,28 @@ box.cfg 18 readahead:16320 19 replication_connect_timeout:30 20 replication_sync_lag:10 -21 replication_timeout:1 -22 rows_per_wal:500000 -23 slab_alloc_factor:1.05 -24 too_long_threshold:0.5 -25 vinyl_bloom_fpr:0.05 -26 vinyl_cache:134217728 -27 vinyl_dir:. -28 vinyl_max_tuple_size:1048576 -29 vinyl_memory:134217728 -30 vinyl_page_size:8192 -31 vinyl_range_size:1073741824 -32 vinyl_read_threads:1 -33 vinyl_run_count_per_level:2 -34 vinyl_run_size_ratio:3.5 -35 vinyl_timeout:60 -36 vinyl_write_threads:2 -37 wal_dir:. -38 wal_dir_rescan_delay:2 -39 wal_max_size:268435456 -40 wal_mode:write -41 worker_pool_threads:4 +21 replication_sync_timeout:300 +22 replication_timeout:1 +23 rows_per_wal:500000 +24 slab_alloc_factor:1.05 +25 too_long_threshold:0.5 +26 vinyl_bloom_fpr:0.05 +27 vinyl_cache:134217728 +28 vinyl_dir:. +29 vinyl_max_tuple_size:1048576 +30 vinyl_memory:134217728 +31 vinyl_page_size:8192 +32 vinyl_range_size:1073741824 +33 vinyl_read_threads:1 +34 vinyl_run_count_per_level:2 +35 vinyl_run_size_ratio:3.5 +36 vinyl_timeout:60 +37 vinyl_write_threads:2 +38 wal_dir:. +39 wal_dir_rescan_delay:2 +40 wal_max_size:268435456 +41 wal_mode:write +42 worker_pool_threads:4 -- -- Test insert from detached fiber -- diff --git a/test/box-tap/cfg.test.lua b/test/box-tap/cfg.test.lua index d315346de..023a2af72 100755 --- a/test/box-tap/cfg.test.lua +++ b/test/box-tap/cfg.test.lua @@ -6,7 +6,7 @@ local socket = require('socket') local fio = require('fio') local uuid = require('uuid') local msgpack = require('msgpack') -test:plan(91) +test:plan(94) -------------------------------------------------------------------------------- -- Invalid values @@ -29,6 +29,8 @@ invalid('replication_timeout', -1) invalid('replication_timeout', 0) invalid('replication_sync_lag', -1) invalid('replication_sync_lag', 0) +invalid('replication_sync_timeout', -1) +invalid('replication_sync_timeout', 0) invalid('replication_connect_timeout', -1) invalid('replication_connect_timeout', 0) invalid('replication_connect_quorum', -1) @@ -100,6 +102,11 @@ status, result = pcall(box.cfg, {replication_sync_lag = 1}) test:ok(status, "dynamic replication_sync_lag") pcall(box.cfg, {repliction_sync_lag = lag}) +timeout = box.cfg.replication_sync_timeout +status, result = pcall(box.cfg, {replication_sync_timeout = 10}) +test:ok(status, "dynamic replication_sync_timeout") +pcall(box.cfg, {repliction_sync_timeout = timeout}) + -------------------------------------------------------------------------------- -- gh-534: Segmentation fault after two bad wal_mode settings -------------------------------------------------------------------------------- diff --git a/test/box/admin.result b/test/box/admin.result index c3e318a6a..ace88e6e9 100644 --- a/test/box/admin.result +++ b/test/box/admin.result @@ -58,6 +58,8 @@ cfg_filter(box.cfg) - 30 - - replication_sync_lag - 10 + - - replication_sync_timeout + - 300 - - replication_timeout - 1 - - rows_per_wal diff --git a/test/box/cfg.result b/test/box/cfg.result index a2df83310..816178513 100644 --- a/test/box/cfg.result +++ b/test/box/cfg.result @@ -54,6 +54,8 @@ cfg_filter(box.cfg) - 30 - - replication_sync_lag - 10 + - - replication_sync_timeout + - 300 - - replication_timeout - 1 - - rows_per_wal @@ -143,6 +145,8 @@ cfg_filter(box.cfg) - 30 - - replication_sync_lag - 10 + - - replication_sync_timeout + - 300 - - replication_timeout - 1 - - rows_per_wal -- 2.14.3 (Apple Git-98) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update 2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia @ 2018-08-30 23:38 ` Olga Arkhangelskaia 2018-08-31 13:01 ` Vladimir Davydov 2018-09-04 13:24 ` [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic Kirill Yukhin 2 siblings, 1 reply; 5+ messages in thread From: Olga Arkhangelskaia @ 2018-08-30 23:38 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia When replica reconnects to replica set not for the first time, we suffer from absence of synchronization. Such behavior leads to giving away outdated data. Closes #3427 @TarantoolBot document Title: Orphan status after configuration update or initial bootstrap. In case of initial bootstrap or after configuration update we can get an orphan status in two cases. If we synced up with number of replicas that is smaller than quorum or if we failed to sync up during the time specified in replication_sync_timeout. --- https://github.com/tarantool/tarantool/issues/3427 https://github.com/tarantool/tarantool/tree/OKriw/gh-3427-replication-no-sync-1.9 v1: https://www.freelists.org/post/tarantool-patches/PATCH-replication-adds-replication-sync-after-cfg-update v2: https://www.freelists.org/post/tarantool-patches/PATCH-v2-replication-adds-replication-sync-after-cfg-update v3: https://www.freelists.org/post/tarantool-patches/PATCH-v3-box-adds-replication-sync-after-cfg-update v4: https://www.freelists.org/post/tarantool-patches/PATCH-v4-22-box-adds-replication-sync-after-cfg-update v5: https://www.freelists.org/post/tarantool-patches/PATCH-v5-33-box-adds-replication-sync-after-cfg-update v6: https://www.freelists.org/post/tarantool-patches/PATCH-v6-33-box-adds-replication-sync-after-cfg-update Changes in v2: - fixed test - changed replicaset_sync Changes in v3: - now we raise the exception when sync is not successful. - fixed test - renamed test Changes in v4: - fixed test - replication_sync_lag is made dynamicall in separate patch - removed unnecessary error type - moved say_crit to another place - in case of sync error we rollback to prev. config Changes in v5: - added test case - now we don't roll back to prev. cfg Changes in v6: - set orphan - added testcases Changes in v7: - fixed test with orphan state (added error inj.) - now we enter orphan state in the end of sync - this point need to be discussed - no good check for big replication lag - need to be discussed src/box/box.cc | 17 +++++ src/box/box.h | 8 +++ src/box/replication.cc | 1 + test/replication/sync.result | 146 +++++++++++++++++++++++++++++++++++++++++ test/replication/sync.test.lua | 73 +++++++++++++++++++++ 5 files changed, 245 insertions(+) create mode 100644 test/replication/sync.result create mode 100644 test/replication/sync.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index dcedfd002..ca7bb60c2 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -231,6 +231,19 @@ box_clear_orphan(void) title("running"); } +void +box_set_orphan(void) +{ + if (is_orphan) + return; /* nothing to do */ + + is_orphan = true; + fiber_cond_broadcast(&ro_cond); + + /* Update the title to reflect the new status. */ + title("orphan"); +} + struct wal_stream { struct xstream base; /** How many rows have been recovered so far. */ @@ -646,6 +659,10 @@ box_set_replication(void) box_sync_replication(true); /* Follow replica */ replicaset_follow(); + /* Set orphan and sync replica up to quorum. + * If we fail to sync up, replica will be left in orphan state. + */ + replicaset_sync(); } void diff --git a/src/box/box.h b/src/box/box.h index 6e1c13f59..15a305fb1 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -107,6 +107,14 @@ box_wait_ro(bool ro, double timeout); void box_clear_orphan(void); +/** + * Switch this instance from 'running' to 'orphan' state. + * Called on configuration change if this instance failed to + * synchronizes with enough replicas to form a quorum. + */ +void +box_set_orphan(void); + /** True if snapshot is in progress. */ extern bool box_checkpoint_is_in_progress; /** Incremented with each next snapshot. */ diff --git a/src/box/replication.cc b/src/box/replication.cc index 4001c86a3..ef365c17f 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -695,6 +695,7 @@ replicaset_sync(void) * in 'orphan' state. */ say_crit("entering orphan mode"); + box_set_orphan(); return; } diff --git a/test/replication/sync.result b/test/replication/sync.result new file mode 100644 index 000000000..bca3250d1 --- /dev/null +++ b/test/replication/sync.result @@ -0,0 +1,146 @@ +fiber = require('fiber') +--- +... +-- +-- gh-3427: no sync after configuration update +-- +-- +-- successful sync +-- +env = require('test_run') +--- +... +test_run = env.new() +--- +... +engine = test_run:get_cfg('engine') +--- +... +box.schema.user.grant('guest', 'replication') +--- +... +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +s = box.schema.space.create('test', {engine = engine}) +--- +... +index = s:create_index('primary') +--- +... +-- change replica configuration +test_run:cmd("switch replica") +--- +- true +... +replication = box.cfg.replication +--- +... +box.cfg{replication={}} +--- +... +test_run:cmd("switch default") +--- +- true +... +-- insert values on the master while replica is unconfigured +box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit() +--- +... +box.space.test:count() +--- +- 100 +... +test_run:cmd("switch replica") +--- +- true +... +box.cfg{replication = replication} +--- +... +box.space.test:count() == 100 +--- +- true +... +-- +-- unsuccessful sync entering orphan state +-- +box.cfg{replication={}} +--- +... +box.cfg{replication_sync_timeout = 0.000001} +--- +... +test_run:cmd("switch default") +--- +- true +... +-- insert values on the master while replica is unconfigured +box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit() +--- +... +test_run:cmd("switch replica") +--- +- true +... +box.cfg{replication = replication} +--- +... +status = box.info.status +--- +... +status == "orphan" +--- +- true +... +while status == "orphan" do require'fiber'.sleep(0.1) status = box.info.status end +--- +... +-- +-- replication_sync_lag is too big +-- +box.cfg{replication_sync_lag = 100} +--- +... +test_run:cmd("switch default") +--- +- true +... +function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end +--- +... +_=fiber.create(f) +--- +... +test_run:cmd("switch replica") +--- +- true +... +box.space.test:count() < 500 +--- +- true +... +test_run:cmd("switch default") +--- +- true +... +-- cleanup +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("cleanup server replica") +--- +- true +... +box.space.test:drop() +--- +... +box.schema.user.revoke('guest', 'replication') +--- +... diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua new file mode 100644 index 000000000..c15581a50 --- /dev/null +++ b/test/replication/sync.test.lua @@ -0,0 +1,73 @@ +fiber = require('fiber') +-- +-- gh-3427: no sync after configuration update +-- + +-- +-- successful sync +-- + +env = require('test_run') +test_run = env.new() +engine = test_run:get_cfg('engine') + +box.schema.user.grant('guest', 'replication') + +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +test_run:cmd("start server replica") + +s = box.schema.space.create('test', {engine = engine}) +index = s:create_index('primary') + +-- change replica configuration +test_run:cmd("switch replica") +replication = box.cfg.replication +box.cfg{replication={}} + +test_run:cmd("switch default") +-- insert values on the master while replica is unconfigured +box.begin() for i = 1, 100 do box.space.test:insert{i, i} end box.commit() +box.space.test:count() + +test_run:cmd("switch replica") +box.cfg{replication = replication} +box.space.test:count() == 100 + +-- +-- unsuccessful sync entering orphan state +-- +box.cfg{replication={}} +box.cfg{replication_sync_timeout = 1} + +test_run:cmd("switch default") +-- insert values on the master while replica is unconfigured +box.begin() for i = 101, 200 do box.space.test:insert{i, i} end box.commit() +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.01) +test_run:cmd("switch replica") + +box.cfg{replication = replication} +status = box.info.status +status == "orphan" + +while status == "orphan" do require'fiber'.sleep(0.1) status = box.info.status end + +-- +-- replication_sync_lag is too big +-- + +box.cfg{replication_sync_lag = 100} + +test_run:cmd("switch default") + +function f () box.begin() for i = 201, 500 do box.space.test:insert{i, i} end box.commit(); end +_=fiber.create(f) + +test_run:cmd("switch replica") +box.space.test:count() < 500 + +test_run:cmd("switch default") +-- cleanup +test_run:cmd("stop server replica") +test_run:cmd("cleanup server replica") +box.space.test:drop() +box.schema.user.revoke('guest', 'replication') -- 2.14.3 (Apple Git-98) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update 2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia @ 2018-08-31 13:01 ` Vladimir Davydov 0 siblings, 0 replies; 5+ messages in thread From: Vladimir Davydov @ 2018-08-31 13:01 UTC (permalink / raw) To: Konstantin Osipov; +Cc: tarantool-patches, Olga Arkhangelskaia I reworked this patch, added a proper test, and forced pushed it to the branch: https://github.com/tarantool/tarantool/commits/OKriw/gh-3427-replication-no-sync-1.9 Kostja, please take a look. The new patch is below: From 43839be2feb3aad444b44b84076d5f0f6b374b55 Mon Sep 17 00:00:00 2001 From: Vladimir Davydov <vdavydov.dev@gmail.com> Date: Fri, 31 Aug 2018 13:11:58 +0300 Subject: [PATCH] box: sync on replication configuration update Now box.cfg() doesn't return until 'quorum' appliers are in sync not only on initial configuration, but also on replication configuration update. If it fails to synchronize within replication_sync_timeout, box.cfg() returns without an error, but the instance enters 'orphan' state, which is basically read-only mode. In the meantime, appliers will keep trying to synchronize in the background, and the instance will leave 'orphan' state as soon as enough appliers are in sync. Note, this patch also changes logging a bit: - 'ready to accept request' is printed on startup before syncing with the replica set, because although the instance is read-only at that time, it can indeed accept all sorts of ro requests. - For 'connecting', 'connected', 'synchronizing' messages, we now use 'info' logging level, not 'verbose' as they used to be, because those messages are important as they give the admin idea what's going on with the instance, and they can't flood logs. - 'sync complete' message is also printed as 'info', not 'crit', because there's nothing critical about it (it's not an error). Also note that we only enter 'orphan' state if failed to synchronize. In particular, if the instnace manages to synchronize with all replicas within a timeout, it will jump from 'loading' straight into 'running' bypassing 'orphan' state. This is done for the sake of consistency between initial configuration and reconfiguration. Closes #3427 @TarantoolBot document Title: Sync on replication configuration update The behavior of box.cfg() on replication configuration update is now consistent with initial configuration, that is box.cfg() will not return until it synchronizes with as many masters as specified by replication_connect_quorum configuration option or the timeout specified by replication_connect_sync occurs. On timeout, it will return without an error, but the instance will enter 'orphan' state. It will leave 'orphan' state as soon as enough appliers have synced. diff --git a/src/box/box.cc b/src/box/box.cc index dcedfd00..63b2974f 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -110,7 +110,7 @@ static fiber_cond ro_cond; * synchronize to a sufficient number of replicas to form * a quorum and so was forced to switch to read-only mode. */ -static bool is_orphan = true; +static bool is_orphan; /* Use the shared instance of xstream for all appliers */ static struct xstream join_stream; @@ -219,16 +219,22 @@ box_wait_ro(bool ro, double timeout) } void -box_clear_orphan(void) +box_set_orphan(bool orphan) { - if (!is_orphan) + if (is_orphan == orphan) return; /* nothing to do */ - is_orphan = false; + is_orphan = orphan; fiber_cond_broadcast(&ro_cond); /* Update the title to reflect the new status. */ - title("running"); + if (is_orphan) { + say_crit("entering orphan mode"); + title("orphan"); + } else { + say_crit("leaving orphan mode"); + title("running"); + } } struct wal_stream { @@ -646,6 +652,8 @@ box_set_replication(void) box_sync_replication(true); /* Follow replica */ replicaset_follow(); + /* Wait until appliers are in sync */ + replicaset_sync(); } void @@ -1893,8 +1901,6 @@ box_cfg_xc(void) /** Begin listening only when the local recovery is complete. */ box_listen(); - title("orphan"); - /* * In case of recovering from a checkpoint we * don't need to wait for 'quorum' masters, since @@ -1913,8 +1919,6 @@ box_cfg_xc(void) */ box_listen(); - title("orphan"); - /* * Wait for the cluster to start up. * @@ -1951,25 +1955,17 @@ box_cfg_xc(void) rmean_cleanup(rmean_box); - /* - * If this instance is a leader of a newly bootstrapped - * cluster, it is uptodate by definition so leave the - * 'orphan' mode right away to let it initialize cluster - * schema. - */ - if (is_bootstrap_leader) - box_clear_orphan(); - /* Follow replica */ replicaset_follow(); fiber_gc(); is_box_configured = true; + title("running"); + say_info("ready to accept requests"); + if (!is_bootstrap_leader) replicaset_sync(); - - say_info("ready to accept requests"); } void diff --git a/src/box/box.h b/src/box/box.h index 6e1c13f5..5e5ec7d3 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -100,12 +100,16 @@ int box_wait_ro(bool ro, double timeout); /** - * Switch this instance from 'orphan' to 'running' state. - * Called on initial configuration as soon as this instance - * synchronizes with enough replicas to form a quorum. + * Switch this instance from 'orphan' to 'running' state or + * vice versa depending on the value of the function argument. + * + * An instance enters 'orphan' state on returning from box.cfg() + * if it failed to synchornize with 'quorum' replicas within a + * specified timeout. It will keep trying to synchronize in the + * background and leave 'orphan' state once it's done. */ void -box_clear_orphan(void); +box_set_orphan(bool orphan); /** True if snapshot is in progress. */ extern bool box_checkpoint_is_in_progress; diff --git a/src/box/replication.cc b/src/box/replication.cc index ff56f442..b0952740 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -548,7 +548,8 @@ replicaset_connect(struct applier **appliers, int count, replicaset_update(appliers, count); return; } - say_verbose("connecting to %d replicas", count); + + say_info("connecting to %d replicas", count); /* * Simultaneously connect to remote peers to receive their UUIDs @@ -602,7 +603,7 @@ replicaset_connect(struct applier **appliers, int count, if (connect_quorum && state.connected < quorum) goto error; } else { - say_verbose("connected to %d replicas", state.connected); + say_info("connected to %d replicas", state.connected); } for (int i = 0; i < count; i++) { @@ -636,13 +637,6 @@ error: void replicaset_follow(void) { - if (replicaset.applier.total == 0) { - /* - * Replication is not configured. - */ - box_clear_orphan(); - return; - } struct replica *replica; replicaset_foreach(replica) { /* Resume connected appliers. */ @@ -653,13 +647,6 @@ replicaset_follow(void) /* Restart appliers that failed to connect. */ applier_start(replica->applier); } - if (replicaset_quorum() == 0) { - /* - * Leaving orphan mode immediately since - * replication_connect_quorum is set to 0. - */ - box_clear_orphan(); - } } void @@ -667,10 +654,16 @@ replicaset_sync(void) { int quorum = replicaset_quorum(); - if (quorum == 0) + if (quorum == 0) { + /* + * Quorum is 0 or replication is not configured. + * Leaving 'orphan' state immediately. + */ + box_set_orphan(false); return; + } - say_verbose("synchronizing with %d replicas", quorum); + say_info("synchronizing with %d replicas", quorum); /* * Wait until all connected replicas synchronize up to @@ -691,22 +684,21 @@ replicaset_sync(void) * Do not stall configuration, leave the instance * in 'orphan' state. */ - say_crit("entering orphan mode"); - return; + say_crit("failed to synchronize with %d out of %d replicas", + replicaset.applier.total - replicaset.applier.synced, + replicaset.applier.total); + box_set_orphan(true); + } else { + say_info("replica set sync complete"); + box_set_orphan(false); } - - say_crit("replica set sync complete, quorum of %d " - "replicas formed", quorum); } void replicaset_check_quorum(void) { - if (replicaset.applier.synced >= replicaset_quorum()) { - if (replicaset_quorum() > 0) - say_crit("leaving orphan mode"); - box_clear_orphan(); - } + if (replicaset.applier.synced >= replicaset_quorum()) + box_set_orphan(false); } void diff --git a/test/replication/sync.result b/test/replication/sync.result new file mode 100644 index 00000000..8994aa3e --- /dev/null +++ b/test/replication/sync.result @@ -0,0 +1,236 @@ +fiber = require('fiber') +--- +... +test_run = require('test_run').new() +--- +... +engine = test_run:get_cfg('engine') +--- +... +box.schema.user.grant('guest', 'replication') +--- +... +_ = box.schema.space.create('test', {engine = engine}) +--- +... +_ = box.space.test:create_index('pk') +--- +... +-- Slow down replication a little to test replication_sync_lag. +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001) +--- +- ok +... +-- Helper that adds some records to the space and then starts +-- a fiber to add more records in the background. +test_run:cmd("setopt delimiter ';'") +--- +- true +... +count = 0; +--- +... +function fill() + for i = count + 1, count + 100 do + box.space.test:replace{i} + end + fiber.create(function() + for i = count + 101, count + 200 do + fiber.sleep(0.0001) + box.space.test:replace{i} + end + end) + count = count + 200 +end; +--- +... +test_run:cmd("setopt delimiter ''"); +--- +- true +... +-- Deploy a replica. +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +--- +- true +... +test_run:cmd("start server replica") +--- +- true +... +test_run:cmd("switch replica") +--- +- true +... +fiber = require('fiber') +--- +... +-- Stop replication. +replication = box.cfg.replication +--- +... +box.cfg{replication = {}} +--- +... +-- Fill the space. +test_run:cmd("switch default") +--- +- true +... +fill() +--- +... +test_run:cmd("switch replica") +--- +- true +... +-- Resume replication. +-- +-- Since max allowed lag is small, all recoreds should arrive +-- by the time box.cfg() returns. +-- +box.cfg{replication_sync_lag = 0.001} +--- +... +box.cfg{replication = replication} +--- +... +box.space.test:count() == 200 +--- +- true +... +box.info.status -- running +--- +- running +... +box.info.ro -- false +--- +- false +... +-- Stop replication. +replication = box.cfg.replication +--- +... +box.cfg{replication = {}} +--- +... +-- Fill the space. +test_run:cmd("switch default") +--- +- true +... +fill() +--- +... +test_run:cmd("switch replica") +--- +- true +... +-- Resume replication +-- +-- Since max allowed lag is big, not all records will arrive +-- upon returning from box.cfg() but the instance won't enter +-- orphan state. +-- +box.cfg{replication_sync_lag = 1} +--- +... +box.cfg{replication = replication} +--- +... +box.space.test:count() < 400 +--- +- true +... +box.info.status -- running +--- +- running +... +box.info.ro -- false +--- +- false +... +-- Wait for remaining rows to arrive. +repeat fiber.sleep(0.01) until box.space.test:count() == 400 +--- +... +-- Stop replication. +replication = box.cfg.replication +--- +... +box.cfg{replication = {}} +--- +... +-- Fill the space. +test_run:cmd("switch default") +--- +- true +... +fill() +--- +... +test_run:cmd("switch replica") +--- +- true +... +-- Resume replication +-- +-- Since max allowed lag is big, not all records will arrive +-- upon returning from box.cfg() but the instance won't enter +-- orphan state. +-- +box.cfg{replication_sync_lag = 0.001, replication_sync_timeout = 0.001} +--- +... +box.cfg{replication = replication} +--- +... +box.space.test:count() < 600 +--- +- true +... +box.info.status -- orphan +--- +- orphan +... +box.info.ro -- true +--- +- true +... +-- Wait for remaining rows to arrive. +repeat fiber.sleep(0.01) until box.space.test:count() == 600 +--- +... +-- Make sure replica leaves oprhan state. +repeat fiber.sleep(0.01) until box.info.status ~= 'orphan' +--- +... +box.info.status -- running +--- +- running +... +box.info.ro -- false +--- +- false +... +test_run:cmd("switch default") +--- +- true +... +test_run:cmd("stop server replica") +--- +- true +... +test_run:cmd("cleanup server replica") +--- +- true +... +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0) +--- +- ok +... +box.space.test:drop() +--- +... +box.schema.user.revoke('guest', 'replication') +--- +... diff --git a/test/replication/sync.test.lua b/test/replication/sync.test.lua new file mode 100644 index 00000000..3cef825d --- /dev/null +++ b/test/replication/sync.test.lua @@ -0,0 +1,116 @@ +fiber = require('fiber') +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +box.schema.user.grant('guest', 'replication') +_ = box.schema.space.create('test', {engine = engine}) +_ = box.space.test:create_index('pk') + +-- Slow down replication a little to test replication_sync_lag. +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0.001) + +-- Helper that adds some records to the space and then starts +-- a fiber to add more records in the background. +test_run:cmd("setopt delimiter ';'") +count = 0; +function fill() + for i = count + 1, count + 100 do + box.space.test:replace{i} + end + fiber.create(function() + for i = count + 101, count + 200 do + fiber.sleep(0.0001) + box.space.test:replace{i} + end + end) + count = count + 200 +end; +test_run:cmd("setopt delimiter ''"); + +-- Deploy a replica. +test_run:cmd("create server replica with rpl_master=default, script='replication/replica.lua'") +test_run:cmd("start server replica") +test_run:cmd("switch replica") + +fiber = require('fiber') + +-- Stop replication. +replication = box.cfg.replication +box.cfg{replication = {}} + +-- Fill the space. +test_run:cmd("switch default") +fill() +test_run:cmd("switch replica") + +-- Resume replication. +-- +-- Since max allowed lag is small, all recoreds should arrive +-- by the time box.cfg() returns. +-- +box.cfg{replication_sync_lag = 0.001} +box.cfg{replication = replication} +box.space.test:count() == 200 +box.info.status -- running +box.info.ro -- false + +-- Stop replication. +replication = box.cfg.replication +box.cfg{replication = {}} + +-- Fill the space. +test_run:cmd("switch default") +fill() +test_run:cmd("switch replica") + +-- Resume replication +-- +-- Since max allowed lag is big, not all records will arrive +-- upon returning from box.cfg() but the instance won't enter +-- orphan state. +-- +box.cfg{replication_sync_lag = 1} +box.cfg{replication = replication} +box.space.test:count() < 400 +box.info.status -- running +box.info.ro -- false + +-- Wait for remaining rows to arrive. +repeat fiber.sleep(0.01) until box.space.test:count() == 400 + +-- Stop replication. +replication = box.cfg.replication +box.cfg{replication = {}} + +-- Fill the space. +test_run:cmd("switch default") +fill() +test_run:cmd("switch replica") + +-- Resume replication +-- +-- Since max allowed lag is big, not all records will arrive +-- upon returning from box.cfg() but the instance won't enter +-- orphan state. +-- +box.cfg{replication_sync_lag = 0.001, replication_sync_timeout = 0.001} +box.cfg{replication = replication} +box.space.test:count() < 600 +box.info.status -- orphan +box.info.ro -- true + +-- Wait for remaining rows to arrive. +repeat fiber.sleep(0.01) until box.space.test:count() == 600 + +-- Make sure replica leaves oprhan state. +repeat fiber.sleep(0.01) until box.info.status ~= 'orphan' +box.info.status -- running +box.info.ro -- false + +test_run:cmd("switch default") +test_run:cmd("stop server replica") +test_run:cmd("cleanup server replica") + +box.error.injection.set("ERRINJ_RELAY_TIMEOUT", 0) +box.space.test:drop() +box.schema.user.revoke('guest', 'replication') ^ permalink raw reply [flat|nested] 5+ messages in thread
* [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic 2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia @ 2018-09-04 13:24 ` Kirill Yukhin 2 siblings, 0 replies; 5+ messages in thread From: Kirill Yukhin @ 2018-09-04 13:24 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia Hello, On 31 авг 02:38, Olga Arkhangelskaia wrote: > In gh-3427 replication_sync_lag should be taken into account during > replication reconfiguration. In order to configure replication properly > this parameter is made dynamical and can be changed on demand. > > @TarantoolBot document > Title: recation_sync_lag option can be set dynamically > recation_sync_lag now can be set at any time. I've checked updated patchset into 1.9 branch. -- Regards, Kirill Yukhin ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-04 13:24 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-30 23:38 [tarantool-patches] [PATCH 1/3] box: make replication_sync_lag option dynamic Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v3 2/3] box: add replication_sync_timeout Olga Arkhangelskaia 2018-08-30 23:38 ` [tarantool-patches] [PATCH v7 3/3] box: adds replication sync after cfg. update Olga Arkhangelskaia 2018-08-31 13:01 ` Vladimir Davydov 2018-09-04 13:24 ` [tarantool-patches] Re: [PATCH 1/3] box: make replication_sync_lag option dynamic Kirill Yukhin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox