* [tarantool-patches] Re: [tarantool-patches] Re: [PATCH v2 1/2] box: added replication_dead/rw_gap options @ 2018-10-23 18:32 Olga Arkhangelskaia 2018-10-24 16:49 ` Konstantin Osipov 0 siblings, 1 reply; 3+ messages in thread From: Olga Arkhangelskaia @ 2018-10-23 18:32 UTC (permalink / raw) To: Konstantin Osipov, tarantool-patches [-- Attachment #1: Type: text/plain, Size: 2180 bytes --] 23/10/2018 10:10, Konstantin Osipov пишет: > * Olga Arkhangelskaia < arkholga@tarantool.org > [18/10/13 08:20]: >> In scope of gh-3110 we need options that store periods of time, >> to be compared with time of last activity of relay and applier. >> This patch introduces replication_dead_gap and replication_rw_gap options. >> >> replication_dead_gap is configured in box.cfg, with default 0 value. >> If time that passed from now till last reader/writer activity of given replica >> exceeds replication_dead_gap value, replica is suspected to be dead. >> replication_dead_gap is measured in hours. >> >> replication_rw_gap is configured in box.cfg, with default 0 value. >> If time difference between last reader activity and last writer activity of >> given replica exceeds replication_rw_gap value, replica is suspected to be dead. >> replication_rw_gap is measured in hours. > Why do we need this if we have heartbeats? I used to think that we need some parameters, that can be set by user, to check that replica is not active. For example, if replica is not active for XXXX seconds - it is dead. However, I did not think about the idea of passing this parameter as a function argument: list_dead_replicas(XXXX). So I will throw it away. Another question that is worth to discuss - is kind of statistics to use for accusing replica to be dead. The is two ways - save time of last write/read by applier and relay. I implemented it, but as Vova pointed out, may be we need to save period of time that replica spends in stopped status. So we decided to do statistics in separate patch set, and implement both way. And than decide. However, may be you have better ideas, etc. > > And with swim on board we will have gossip information about entire replica set? I have read about swim, and as I understand it : if we have replica set with some topology except full-mesh, we can save dead replicas mask, numbers, etc, (that we obtained using list_dead_replicas on some of replicas), and in the end, after some questioning, we will definitely have information about every replica in the set. If that what you mean. If not, can you be more specific. > >> -- [-- Attachment #2: Type: text/html, Size: 2821 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* [tarantool-patches] Re: [PATCH v2 1/2] box: added replication_dead/rw_gap options 2018-10-23 18:32 [tarantool-patches] Re: [tarantool-patches] Re: [PATCH v2 1/2] box: added replication_dead/rw_gap options Olga Arkhangelskaia @ 2018-10-24 16:49 ` Konstantin Osipov 0 siblings, 0 replies; 3+ messages in thread From: Konstantin Osipov @ 2018-10-24 16:49 UTC (permalink / raw) To: Olga Arkhangelskaia; +Cc: tarantool-patches * Olga Arkhangelskaia <arkholga@tarantool.org> [18/10/23 21:41]: > >> In scope of gh-3110 we need options that store periods of time, > >> to be compared with time of last activity of relay and applier. > >> This patch introduces replication_dead_gap and replication_rw_gap options. > >> > >> replication_dead_gap is configured in box.cfg, with default 0 value. > >> If time that passed from now till last reader/writer activity of given replica > >> exceeds replication_dead_gap value, replica is suspected to be dead. > >> replication_dead_gap is measured in hours. > >> > >> replication_rw_gap is configured in box.cfg, with default 0 value. > >> If time difference between last reader activity and last writer activity of > >> given replica exceeds replication_rw_gap value, replica is suspected to be dead. > >> replication_rw_gap is measured in hours. > > Why do we need this if we have heartbeats? > I used to think that we need some parameters, that can be set by user, > to check that replica is not active. > For example, if replica is not active for XXXX seconds - it is dead. > However, I did not think about the idea of passing this parameter as a > function argument: list_dead_replicas(XXXX). So I will throw it away. OK. > > Another question that is worth to discuss - is kind of statistics to use > for accusing replica to be dead. > The is two ways - save time of last write/read by applier and relay. I > implemented it, but as Vova pointed out, may be we need to save period > of time that replica spends in stopped status. So we decided to do > statistics in separate patch set, and implement both way. And than > decide. However, may be you have better ideas, etc. I think unless this statistics is persistent it is of little value. > > And with swim on board we will have gossip information about entire replica set? > I have read about swim, and as I understand it : > if we have replica set with some topology except full-mesh, we can save > dead replicas mask, numbers, etc, (that we obtained using > list_dead_replicas on some of replicas), and in the end, after some > questioning, we will definitely have information about every replica > in the set. > If that what you mean. > If not, can you be more specific. We can simply query which replicas are dead according to swim and correlate this information with relay state. If a replica is dead according to relay/applier state and it's dead according to swim, it's dead. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 3+ messages in thread
* [tarantool-patches] [PATCH v2 0/2] detect and throw away dead replicas @ 2018-10-12 19:45 Olga Arkhangelskaia 2018-10-12 19:45 ` [tarantool-patches] [PATCH v2 1/2] box: added replication_dead/rw_gap options Olga Arkhangelskaia 0 siblings, 1 reply; 3+ messages in thread From: Olga Arkhangelskaia @ 2018-10-12 19:45 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia According to previous discussions the way of replicas bad state detection is changed completely. Now we maintain two time differences between now and last activity of applier and relay. THis values can be found in box.info.replication.lar/law: We use hours, but i still have some doubts may be we should display days, hours and minutes. Lar/law are compared with replication_dead/rw_gap, that should be previously configured via box.cfg. The question here - now I am not sure in replication_rw_gap. The reason I added tis parameter is the idea that in master case the difference between applier and relay activity is too be - there is big chance that something is wrong with replica. The last problem I want to discuss - is test cases, test takes too much time, and there is no separate case for applier. I mean that relay and rw_gap can be tested separetly by turning off replication and tuning gap parameters, however i do not see case when only lar is lagging seriously. If you have ideas how to make this functionality better - please, share. Will be glad to see other opinions. --- Branch: https://github.com/tarantool/tarantool/tree/OKriw/gh-3110-prune-dead-replica-from-replicaset-1.10 Issue: https://github.com/tarantool/tarantool/issues/3110 v1: https://www.freelists.org/post/tarantool-patches/PATCH-rfc-schema-add-possibility-to-find-and-throw-away-dead-replicas Changes v2: - changed the way of replicas death detection - added special box options - changed test - now only dead replicas are shown - added function to throw away any replica Olga Arkhangelskaia (2): box: added replication_dead/rw_gap options ctl: added functionality to detect and prune dead replicas src/box/CMakeLists.txt | 1 + src/box/box.cc | 34 ++++++ src/box/box.h | 2 + src/box/lua/cfg.cc | 24 +++++ src/box/lua/ctl.lua | 58 ++++++++++ src/box/lua/info.c | 10 ++ src/box/lua/init.c | 2 + src/box/lua/load_cfg.lua | 8 ++ src/box/relay.cc | 6 ++ src/box/relay.h | 4 + src/box/replication.cc | 3 +- src/box/replication.h | 12 +++ test/box/admin.result | 4 + test/box/cfg.result | 8 ++ test/replication/trim.lua | 66 ++++++++++++ test/replication/trim.result | 237 +++++++++++++++++++++++++++++++++++++++++ test/replication/trim.test.lua | 93 ++++++++++++++++ test/replication/trim1.lua | 1 + test/replication/trim2.lua | 1 + test/replication/trim3.lua | 1 + test/replication/trim4.lua | 1 + 21 files changed, 575 insertions(+), 1 deletion(-) create mode 100644 src/box/lua/ctl.lua create mode 100644 test/replication/trim.lua create mode 100644 test/replication/trim.result create mode 100644 test/replication/trim.test.lua create mode 120000 test/replication/trim1.lua create mode 120000 test/replication/trim2.lua create mode 120000 test/replication/trim3.lua create mode 120000 test/replication/trim4.lua -- 2.14.3 (Apple Git-98) ^ permalink raw reply [flat|nested] 3+ messages in thread
* [tarantool-patches] [PATCH v2 1/2] box: added replication_dead/rw_gap options 2018-10-12 19:45 [tarantool-patches] [PATCH v2 0/2] detect and throw away dead replicas Olga Arkhangelskaia @ 2018-10-12 19:45 ` Olga Arkhangelskaia 2018-10-23 7:10 ` [tarantool-patches] " Konstantin Osipov 0 siblings, 1 reply; 3+ messages in thread From: Olga Arkhangelskaia @ 2018-10-12 19:45 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia In scope of gh-3110 we need options that store periods of time, to be compared with time of last activity of relay and applier. This patch introduces replication_dead_gap and replication_rw_gap options. replication_dead_gap is configured in box.cfg, with default 0 value. If time that passed from now till last reader/writer activity of given replica exceeds replication_dead_gap value, replica is suspected to be dead. replication_dead_gap is measured in hours. replication_rw_gap is configured in box.cfg, with default 0 value. If time difference between last reader activity and last writer activity of given replica exceeds replication_rw_gap value, replica is suspected to be dead. replication_rw_gap is measured in hours. --- src/box/box.cc | 34 ++++++++++++++++++++++++++++++++++ src/box/box.h | 2 ++ src/box/lua/cfg.cc | 24 ++++++++++++++++++++++++ src/box/lua/load_cfg.lua | 8 ++++++++ src/box/replication.cc | 3 ++- src/box/replication.h | 12 ++++++++++++ test/box/admin.result | 4 ++++ test/box/cfg.result | 8 ++++++++ 8 files changed, 94 insertions(+), 1 deletion(-) diff --git a/src/box/box.cc b/src/box/box.cc index 7e32b9fc7..f74e012f7 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -465,6 +465,28 @@ box_check_replication_sync_timeout(void) return timeout; } +static double +box_check_replication_dead_gap(void) +{ + double gap = cfg_getd("replication_dead_gap"); + if (gap <= 0) { + tnt_raise(ClientError, ER_CFG, "replication_dead_gap", + "the value must be grater than 0"); + } + return gap; +} + +static double +box_check_replication_rw_gap(void) +{ + double gap = cfg_getd("replication_rw_gap"); + if (gap <= 0) { + tnt_raise(ClientError, ER_CFG, "replication_dead_gap", + "the value must be grater than 0"); + } + return gap; +} + static void box_check_instance_uuid(struct tt_uuid *uuid) { @@ -739,6 +761,18 @@ box_set_replication_sync_timeout(void) replication_sync_timeout = box_check_replication_sync_timeout(); } +void +box_set_replication_dead_gap(void) +{ + replication_dead_gap = box_check_replication_dead_gap(); +} + +void +box_set_replication_rw_gap(void) +{ + replication_rw_gap = box_check_replication_rw_gap(); +} + void box_set_replication_skip_conflict(void) { diff --git a/src/box/box.h b/src/box/box.h index 9930d4a1a..bfb5bb873 100644 --- a/src/box/box.h +++ b/src/box/box.h @@ -198,6 +198,8 @@ void box_set_replication_connect_quorum(void); void box_set_replication_sync_lag(void); void box_set_replication_sync_timeout(void); void box_set_replication_skip_conflict(void); +void box_set_replication_dead_gap(void); +void box_set_replication_rw_gap(void); void box_set_net_msg_max(void); extern "C" { diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc index c3825591c..f34b34bee 100644 --- a/src/box/lua/cfg.cc +++ b/src/box/lua/cfg.cc @@ -316,6 +316,28 @@ lbox_cfg_set_replication_sync_timeout(struct lua_State *L) return 0; } +static int +lbox_cfg_set_replication_dead_gap(struct lua_State *L) +{ + try { + box_set_replication_dead_gap(); + } catch (Exception *) { + luaT_error(L); + } + return 0; +} + +static int +lbox_cfg_set_replication_rw_gap(struct lua_State *L) +{ + try { + box_set_replication_rw_gap(); + } catch (Exception *) { + luaT_error(L); + } + return 0; +} + static int lbox_cfg_set_replication_skip_conflict(struct lua_State *L) { @@ -353,6 +375,8 @@ box_lua_cfg_init(struct lua_State *L) {"cfg_set_replication_sync_lag", lbox_cfg_set_replication_sync_lag}, {"cfg_set_replication_sync_timeout", lbox_cfg_set_replication_sync_timeout}, {"cfg_set_replication_skip_conflict", lbox_cfg_set_replication_skip_conflict}, + {"cfg_set_replication_dead_gap", lbox_cfg_set_replication_dead_gap}, + {"cfg_set_replication_rw_gap", lbox_cfg_set_replication_rw_gap}, {"cfg_set_net_msg_max", lbox_cfg_set_net_msg_max}, {NULL, NULL} }; diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index f62f4dc1e..c15769dfe 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -76,6 +76,8 @@ local default_cfg = { replication_connect_timeout = 30, replication_connect_quorum = nil, -- connect all replication_skip_conflict = false, + replication_dead_gap = 0, + replication_rw_gap = 0, feedback_enabled = true, feedback_host = "https://feedback.tarantool.io", feedback_interval = 3600, @@ -138,6 +140,8 @@ local template_cfg = { replication_connect_timeout = 'number', replication_connect_quorum = 'number', replication_skip_conflict = 'boolean', + replication_dead_gap = 'number', + replication_rw_gap = 'number', feedback_enabled = 'boolean', feedback_host = 'string', feedback_interval = 'number', @@ -232,6 +236,8 @@ local dynamic_cfg = { replication_sync_lag = private.cfg_set_replication_sync_lag, replication_sync_timeout = private.cfg_set_replication_sync_timeout, replication_skip_conflict = private.cfg_set_replication_skip_conflict, + replication_dead_gap = private.cfg_set_replication_dead_gap, + replication_rw_gap = private.cfg_set_replication_rw_gap, instance_uuid = check_instance_uuid, replicaset_uuid = check_replicaset_uuid, net_msg_max = private.cfg_set_net_msg_max, @@ -248,6 +254,8 @@ local dynamic_cfg_skip_at_load = { replication_connect_quorum = true, replication_sync_lag = true, replication_sync_timeout = true, + replication_dead_gap = true, + replication_rw_gap = true, wal_dir_rescan_delay = true, custom_proc_title = true, force_recovery = true, diff --git a/src/box/replication.cc b/src/box/replication.cc index 2cb4ec0f8..392f8d9fd 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -53,7 +53,8 @@ int replication_connect_quorum = REPLICATION_CONNECT_QUORUM_ALL; double replication_sync_lag = 10.0; /* seconds */ double replication_sync_timeout = 300.0; /* seconds */ bool replication_skip_conflict = false; - +double replication_dead_gap = 0.0; /* hours */ +double replication_rw_gap = 0.0; /* hours */ struct replicaset replicaset; static int diff --git a/src/box/replication.h b/src/box/replication.h index 2ac620d86..bf0c99fb7 100644 --- a/src/box/replication.h +++ b/src/box/replication.h @@ -137,6 +137,18 @@ extern double replication_sync_timeout; */ extern bool replication_skip_conflict; +/* + * If replica is not active during time that exceeds replication_dead_gap it + * is considered as dead replica and can be thrown out from system space. + */ +extern double replication_dead_gap; + +/* + * If replica has both roles and gap between read/write activities exceeds this + * value it can be considered as dead one. + */ +extern double replication_rw_gap; + /** * Wait for the given period of time before trying to reconnect * to a master. diff --git a/test/box/admin.result b/test/box/admin.result index 8048460a1..5341e6f78 100644 --- a/test/box/admin.result +++ b/test/box/admin.result @@ -64,6 +64,10 @@ cfg_filter(box.cfg) - 16320 - - replication_connect_timeout - 30 + - - replication_dead_gap + - 0 + - - replication_rw_gap + - 0 - - replication_skip_conflict - false - - replication_sync_lag diff --git a/test/box/cfg.result b/test/box/cfg.result index 515033754..2f6bcd788 100644 --- a/test/box/cfg.result +++ b/test/box/cfg.result @@ -60,6 +60,10 @@ cfg_filter(box.cfg) - 16320 - - replication_connect_timeout - 30 + - - replication_dead_gap + - 0 + - - replication_rw_gap + - 0 - - replication_skip_conflict - false - - replication_sync_lag @@ -161,6 +165,10 @@ cfg_filter(box.cfg) - 16320 - - replication_connect_timeout - 30 + - - replication_dead_gap + - 0 + - - replication_rw_gap + - 0 - - replication_skip_conflict - false - - replication_sync_lag -- 2.14.3 (Apple Git-98) ^ permalink raw reply [flat|nested] 3+ messages in thread
* [tarantool-patches] Re: [PATCH v2 1/2] box: added replication_dead/rw_gap options 2018-10-12 19:45 ` [tarantool-patches] [PATCH v2 1/2] box: added replication_dead/rw_gap options Olga Arkhangelskaia @ 2018-10-23 7:10 ` Konstantin Osipov 0 siblings, 0 replies; 3+ messages in thread From: Konstantin Osipov @ 2018-10-23 7:10 UTC (permalink / raw) To: tarantool-patches; +Cc: Olga Arkhangelskaia * Olga Arkhangelskaia <arkholga@tarantool.org> [18/10/13 08:20]: > In scope of gh-3110 we need options that store periods of time, > to be compared with time of last activity of relay and applier. > This patch introduces replication_dead_gap and replication_rw_gap options. > > replication_dead_gap is configured in box.cfg, with default 0 value. > If time that passed from now till last reader/writer activity of given replica > exceeds replication_dead_gap value, replica is suspected to be dead. > replication_dead_gap is measured in hours. > > replication_rw_gap is configured in box.cfg, with default 0 value. > If time difference between last reader activity and last writer activity of > given replica exceeds replication_rw_gap value, replica is suspected to be dead. > replication_rw_gap is measured in hours. Why do we need this if we have heartbeats? And with swim on board we will have gossip information about entire replica set? > -- -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-10-24 16:49 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-23 18:32 [tarantool-patches] Re: [tarantool-patches] Re: [PATCH v2 1/2] box: added replication_dead/rw_gap options Olga Arkhangelskaia 2018-10-24 16:49 ` Konstantin Osipov -- strict thread matches above, loose matches on Subject: below -- 2018-10-12 19:45 [tarantool-patches] [PATCH v2 0/2] detect and throw away dead replicas Olga Arkhangelskaia 2018-10-12 19:45 ` [tarantool-patches] [PATCH v2 1/2] box: added replication_dead/rw_gap options Olga Arkhangelskaia 2018-10-23 7:10 ` [tarantool-patches] " Konstantin Osipov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox