* [Tarantool-patches] [PATCH 0/2] gc/xlog: delay xlog cleanup until relays are subscribed @ 2021-03-18 18:41 Cyrill Gorcunov via Tarantool-patches 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 1/2] " Cyrill Gorcunov via Tarantool-patches 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches 0 siblings, 2 replies; 14+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 18:41 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson Guys, take a look please once time permit. issue https://github.com/tarantool/tarantool/issues/5806 branch gorcunov/gh-5806-xlog-gc Cyrill Gorcunov (2): gc/xlog: delay xlog cleanup until relays are subscribed test: add a test for wal_cleanup_delay option src/box/box.cc | 11 + src/box/gc.c | 75 +++- src/box/gc.h | 30 ++ src/box/lua/info.c | 8 + src/box/lua/load_cfg.lua | 3 + src/box/relay.cc | 20 ++ src/box/replication.cc | 2 + test/app-tap/init_script.result | 1 + test/box/admin.result | 2 + test/box/cfg.result | 4 + test/replication/gh-5806-master.lua | 13 + test/replication/gh-5806-slave.lua | 13 + test/replication/gh-5806-xlog-cleanup.result | 336 ++++++++++++++++++ .../replication/gh-5806-xlog-cleanup.test.lua | 148 ++++++++ 14 files changed, 663 insertions(+), 3 deletions(-) create mode 100644 test/replication/gh-5806-master.lua create mode 100644 test/replication/gh-5806-slave.lua create mode 100644 test/replication/gh-5806-xlog-cleanup.result create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua base-commit: a0ade20662e0d55c912a72656e6dc75defd7c7f4 -- 2.30.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-18 18:41 [Tarantool-patches] [PATCH 0/2] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 18:41 ` Cyrill Gorcunov via Tarantool-patches 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 13:50 ` Serge Petrenko via Tarantool-patches 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches 1 sibling, 2 replies; 14+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 18:41 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson In case if replica managed to be far behind the master node (so there are a number of xlog files present after the last master's snapshot) then once master node get restarted it may clean up the xlogs needed by the replica to subscribe in a fast way and instead the replica will have to rejoin reading a number of data back. Lets try to address this by delaying xlog files cleanup until replicas are got subscribed and relays are up and running. For this sake we start with cleanup fiber spinning in nop cycle ("paused" mode) and use a delay counter to wait until relays decrement them. This implies that if `_cluster` system space is not empty upon restart and the registered replica somehow vanished completely and won't ever come back, then the node administrator has to drop this replica from `_cluster` manually. Note that this delayed cleanup start doesn't prevent WAL engine from removing old files if there is no space left on a storage device. The WAL will simply drop old data without a question. We need to take into account that some administrators might not need this functionality at all, for this sake we introduce "wal_cleanup_delay" configuration option which allows to enable or disable the delay. Closes #5806 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> @TarantoolBot document Title: Add wal_cleanup_delay configuration parameter The `wal_cleanup_delay` option defines a delay in seconds before write ahead log files (`*.xlog`) are getting started to prune upon a node restart. This option is isnored in case if a node is running as an anonymous replica (`replication_anon = true`). Similarly if replication is unused or there is no plans to use replication at all then this option should not be considered. An initial problem to solve is the case where a node is operating so fast that its replicas do not manage to reach the node state and in case if the node is restarted at this moment (for various reasons, for example due to power outage) then unfetched `*.xlog` files might be pruned during restart. In result replicas will not find these files on the main node and have to reread all data back which is a very expensive procedure. Since replicas are tracked via `_cluster` system space this we use its content to count subscribed replicas and when all of them are up and running the cleanup procedure is automatically enabled even if `wal_cleanup_delay` is not expired. The `wal_cleanup_delay` should be set to: - `-1` to wait infinitely until all existing replicas are subscribed; - `0` to disable the cleanup delay; - `>0` to wait for specified number of seconds. By default it is set to `14400` seconds (ie `4` hours). In case if registered replica is lost forever and timeout is set to infinity then a preferred way to enable cleanup procedure is not setting up a small timeout value but rather to delete this replica from `_cluster` space manually. Note that the option does *not* prevent WAL engine from removing old `*.xlog` files if there is no space left on a storage device, WAL engine can remove them in a force way. Current state of `*.xlog` garbage collector can be found in `box.info.gc()` output. For example ``` Lua tarantool> box.info.gc() --- ... cleanup_is_paused: false delay_ref: 0 ``` The `cleanup_is_paused` shows if cleanup fiber is paused or not, and `delay_ref` represents number of replicas to be subscribed if the cleanup fiber is in `pause` mode. --- src/box/box.cc | 11 +++++ src/box/gc.c | 75 +++++++++++++++++++++++++++++++-- src/box/gc.h | 30 +++++++++++++ src/box/lua/info.c | 8 ++++ src/box/lua/load_cfg.lua | 3 ++ src/box/relay.cc | 20 +++++++++ src/box/replication.cc | 2 + test/app-tap/init_script.result | 1 + test/box/admin.result | 2 + test/box/cfg.result | 4 ++ 10 files changed, 153 insertions(+), 3 deletions(-) diff --git a/src/box/box.cc b/src/box/box.cc index b4a1a5e07..ee017c2dc 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -754,6 +754,15 @@ box_check_wal_mode(const char *mode_name) return (enum wal_mode) mode; } +static void +box_check_wal_cleanup_delay(int tmo) +{ + if (tmo < 0 && tmo != -1) { + tnt_raise(ClientError, ER_CFG, "wal_cleanup_delay", + "the value must be either >= 0 or -1"); + } +} + static void box_check_readahead(int readahead) { @@ -899,6 +908,7 @@ box_check_config(void) box_check_checkpoint_count(cfg_geti("checkpoint_count")); box_check_wal_max_size(cfg_geti64("wal_max_size")); box_check_wal_mode(cfg_gets("wal_mode")); + box_check_wal_cleanup_delay(cfg_geti("wal_cleanup_delay")); if (box_check_memory_quota("memtx_memory") < 0) diag_raise(); box_check_memtx_min_tuple_size(cfg_geti64("memtx_min_tuple_size")); @@ -3045,6 +3055,7 @@ box_cfg_xc(void) bootstrap(&instance_uuid, &replicaset_uuid, &is_bootstrap_leader); } + gc_delay_unref(); fiber_gc(); bootstrap_journal_guard.is_active = false; diff --git a/src/box/gc.c b/src/box/gc.c index 9af4ef958..595b98bdf 100644 --- a/src/box/gc.c +++ b/src/box/gc.c @@ -46,6 +46,7 @@ #include <small/rlist.h> #include <tarantool_ev.h> +#include "cfg.h" #include "diag.h" #include "errcode.h" #include "fiber.h" @@ -107,6 +108,25 @@ gc_init(void) /* Don't delete any files until recovery is complete. */ gc.min_checkpoint_count = INT_MAX; + gc.wal_cleanup_delay = cfg_geti("wal_cleanup_delay"); + gc.delay_ref = 0; + + if (cfg_geti("replication_anon") != 0) { + /* + * Anonymous replicas are read only + * so no need to keep XLOGs. + */ + gc.cleanup_is_paused = false; + } else if (gc.wal_cleanup_delay == 0) { + /* + * The delay is disabled explicitly. + */ + gc.cleanup_is_paused = false; + } else { + say_info("gc: wal/engine cleanup is paused"); + gc.cleanup_is_paused = true; + } + vclock_create(&gc.vclock); rlist_create(&gc.checkpoints); gc_tree_new(&gc.consumers); @@ -238,6 +258,30 @@ static int gc_cleanup_fiber_f(va_list ap) { (void)ap; + + /* + * Stage 1 (optional): in case if we're booting + * up with cleanup disabled lets do wait in a + * separate cycle to minimize branching on stage 2. + */ + if (gc.cleanup_is_paused) { + ev_tstamp tmo = gc.wal_cleanup_delay == -1 ? + TIMEOUT_INFINITY : gc.wal_cleanup_delay; + while (!fiber_is_cancelled()) { + if (fiber_yield_timeout(tmo)) { + say_info("gc: wal/engine cleanup is resumed " + "due to timeout expiration"); + gc.cleanup_is_paused = false; + break; + } + if (!gc.cleanup_is_paused) + break; + } + } + + /* + * Stage 2: a regular cleanup cycle. + */ while (!fiber_is_cancelled()) { int64_t delta = gc.cleanup_scheduled - gc.cleanup_completed; if (delta == 0) { @@ -253,6 +297,29 @@ gc_cleanup_fiber_f(va_list ap) return 0; } +void +gc_delay_ref(void) +{ + if (gc.cleanup_is_paused) { + assert(gc.delay_ref >= 0); + gc.delay_ref++; + } +} + +void +gc_delay_unref(void) +{ + if (gc.cleanup_is_paused) { + assert(gc.delay_ref > 0); + gc.delay_ref--; + if (gc.delay_ref == 0) { + say_info("gc: wal/engine cleanup is resumed"); + gc.cleanup_is_paused = false; + fiber_wakeup(gc.cleanup_fiber); + } + } +} + /** * Trigger asynchronous garbage collection. */ @@ -278,9 +345,11 @@ gc_schedule_cleanup(void) static void gc_wait_cleanup(void) { - int64_t scheduled = gc.cleanup_scheduled; - while (gc.cleanup_completed < scheduled) - fiber_cond_wait(&gc.cleanup_cond); + if (!gc.cleanup_is_paused) { + int64_t scheduled = gc.cleanup_scheduled; + while (gc.cleanup_completed < scheduled) + fiber_cond_wait(&gc.cleanup_cond); + } } void diff --git a/src/box/gc.h b/src/box/gc.h index 2a568c5f9..9e73d39b0 100644 --- a/src/box/gc.h +++ b/src/box/gc.h @@ -147,6 +147,24 @@ struct gc_state { * taken at that moment of time. */ int64_t cleanup_completed, cleanup_scheduled; + /** + * A counter to wait until all replicas are managed to + * subscribe so that we can enable cleanup fiber to + * remove old XLOGs. Otherwise some replicas might be + * far behind the master node and after the master + * node been restarted they will have to reread all + * data back due to XlogGapError, ie too early deleted + * XLOGs. + */ + int64_t delay_ref; + /** + * Delay timeout in seconds. + */ + int32_t wal_cleanup_delay; + /** + * When set the cleanup fiber is paused. + */ + bool cleanup_is_paused; /** * Set if there's a fiber making a checkpoint right now. */ @@ -206,6 +224,18 @@ gc_init(void); void gc_free(void); +/** + * Increment a reference to delay counter. + */ +void +gc_delay_ref(void); + +/** + * Decrement a reference from the delay counter. + */ +void +gc_delay_unref(void); + /** * Advance the garbage collector vclock to the given position. * Deactivate WAL consumers that need older data. diff --git a/src/box/lua/info.c b/src/box/lua/info.c index c4c9fa0a0..3230b5bbb 100644 --- a/src/box/lua/info.c +++ b/src/box/lua/info.c @@ -445,6 +445,14 @@ lbox_info_gc_call(struct lua_State *L) lua_pushboolean(L, gc.checkpoint_is_in_progress); lua_settable(L, -3); + lua_pushstring(L, "cleanup_is_paused"); + lua_pushboolean(L, gc.cleanup_is_paused); + lua_settable(L, -3); + + lua_pushstring(L, "delay_ref"); + luaL_pushint64(L, gc.delay_ref); + lua_settable(L, -3); + lua_pushstring(L, "checkpoints"); lua_newtable(L); diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua index aac216932..1ca9d5e22 100644 --- a/src/box/lua/load_cfg.lua +++ b/src/box/lua/load_cfg.lua @@ -72,6 +72,7 @@ local default_cfg = { wal_mode = "write", wal_max_size = 256 * 1024 * 1024, wal_dir_rescan_delay= 2, + wal_cleanup_delay = 4 * 3600, force_recovery = false, replication = nil, instance_uuid = nil, @@ -154,6 +155,7 @@ local template_cfg = { wal_mode = 'string', wal_max_size = 'number', wal_dir_rescan_delay= 'number', + wal_cleanup_delay = 'number', force_recovery = 'boolean', replication = 'string, number, table', instance_uuid = 'string', @@ -378,6 +380,7 @@ local dynamic_cfg_skip_at_load = { replication_skip_conflict = true, replication_anon = true, wal_dir_rescan_delay = true, + wal_cleanup_delay = true, custom_proc_title = true, force_recovery = true, instance_uuid = true, diff --git a/src/box/relay.cc b/src/box/relay.cc index 41f949e8e..5d09690ad 100644 --- a/src/box/relay.cc +++ b/src/box/relay.cc @@ -668,6 +668,13 @@ relay_send_is_raft_enabled(struct relay *relay, } } +static void +relay_gc_delay_unref(struct cmsg *msg) +{ + (void)msg; + gc_delay_unref(); +} + /** * A libev callback invoked when a relay client socket is ready * for read. This currently only happens when the client closes @@ -721,6 +728,19 @@ relay_subscribe_f(va_list ap) */ relay_send_heartbeat(relay); + /* + * Now we can resume wal/engine gc as relay + * is up and running. + */ + if (!relay->replica->anon) { + static const struct cmsg_hop gc_delay_route[] = { + {relay_gc_delay_unref, NULL} + }; + struct cmsg gc_delay_msg; + cmsg_init(&gc_delay_msg, gc_delay_route); + cpipe_push(&relay->tx_pipe, &gc_delay_msg); + } + /* * Run the event loop until the connection is broken * or an error occurs. diff --git a/src/box/replication.cc b/src/box/replication.cc index 1fa8843e7..aefb812b3 100644 --- a/src/box/replication.cc +++ b/src/box/replication.cc @@ -250,6 +250,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id) tt_uuid_str(&replica->uuid)); } replicaset.replica_by_id[replica_id] = replica; + gc_delay_ref(); ++replicaset.registered_count; say_info("assigned id %d to replica %s", replica->id, tt_uuid_str(&replica->uuid)); @@ -273,6 +274,7 @@ replica_clear_id(struct replica *replica) replicaset.replica_by_id[replica->id] = NULL; assert(replicaset.registered_count > 0); --replicaset.registered_count; + gc_delay_unref(); if (replica->id == instance_id) { /* See replica_check_id(). */ assert(replicaset.is_joining); diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result index 76fe2ea27..33d861ecd 100644 --- a/test/app-tap/init_script.result +++ b/test/app-tap/init_script.result @@ -53,6 +53,7 @@ vinyl_run_count_per_level:2 vinyl_run_size_ratio:3.5 vinyl_timeout:60 vinyl_write_threads:4 +wal_cleanup_delay:14400 wal_dir:. wal_dir_rescan_delay:2 wal_max_size:268435456 diff --git a/test/box/admin.result b/test/box/admin.result index c9a6ff9e4..3443dbf96 100644 --- a/test/box/admin.result +++ b/test/box/admin.result @@ -127,6 +127,8 @@ cfg_filter(box.cfg) - 60 - - vinyl_write_threads - 4 + - - wal_cleanup_delay + - 14400 - - wal_dir - <hidden> - - wal_dir_rescan_delay diff --git a/test/box/cfg.result b/test/box/cfg.result index ae37b28f0..87be6053d 100644 --- a/test/box/cfg.result +++ b/test/box/cfg.result @@ -115,6 +115,8 @@ cfg_filter(box.cfg) | - 60 | - - vinyl_write_threads | - 4 + | - - wal_cleanup_delay + | - 14400 | - - wal_dir | - <hidden> | - - wal_dir_rescan_delay @@ -232,6 +234,8 @@ cfg_filter(box.cfg) | - 60 | - - vinyl_write_threads | - 4 + | - - wal_cleanup_delay + | - 14400 | - - wal_dir | - <hidden> | - - wal_dir_rescan_delay -- 2.30.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 1/2] " Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 11:03 ` Cyrill Gorcunov via Tarantool-patches 2021-03-19 13:40 ` Serge Petrenko via Tarantool-patches 2021-03-19 13:50 ` Serge Petrenko via Tarantool-patches 1 sibling, 2 replies; 14+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-18 23:04 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson Hi! Thanks for the patch! Generally looks fine except details. AFAIU, we go for the timeout option for the only reason that there might be non-fullmesh typologies, when a replica is in _cluster, but does not have relays to other nodes. If we would know the topology, we could tell which nodes will relay to who, and would be able to detect when a replica needs to keep the logs, and when don't. Not even the entire topology. Just the info from the adjacent nodes. From our own box.cfg.replication, and from box.cfg.replication of these nodes. I still don't understand what was wrong with implementing some kind of topology discovery for this 2-level topology tree. For instance when applier on a replica is connected, the remote node sends us a flag whether it is going to connect back. If the flag is false, we don't keep the logs for that instance. See 13 comments below. > @TarantoolBot document > Title: Add wal_cleanup_delay configuration parameter 1. There are 2 issues with the name. Firstly, 'cleanup' conflicts with the existing 'gc' name, which is already exposed in box.info.gc. It is not called box.info.cleanup, so I would propose to use 'gc' here too. Another issue I described in the first patchset's discussion. It is not really a delay. Because it is simply ignored when the replication feels like it. It must either have 'replication' prefix to designate it is not just a fixated timeout to keep the WALs and it depends on the replication, or at least it must mention "max". To designate it is not the exact strict timeout for keeping the logs. > The `wal_cleanup_delay` option defines a delay in seconds > before write ahead log files (`*.xlog`) are getting started > to prune upon a node restart. > > This option is isnored in case if a node is running as 2. isnored -> ignored. > an anonymous replica (`replication_anon = true`). Similarly > if replication is unused or there is no plans to use > replication at all then this option should not be considered. > > An initial problem to solve is the case where a node is operating > so fast that its replicas do not manage to reach the node state > and in case if the node is restarted at this moment (for various > reasons, for example due to power outage) then unfetched `*.xlog` > files might be pruned during restart. In result replicas will not > find these files on the main node and have to reread all data back > which is a very expensive procedure. > > Since replicas are tracked via `_cluster` system space this we use > its content to count subscribed replicas and when all of them are > up and running the cleanup procedure is automatically enabled even > if `wal_cleanup_delay` is not expired. > > The `wal_cleanup_delay` should be set to: > > - `-1` to wait infinitely until all existing replicas are subscribed; > - `0` to disable the cleanup delay; > - `>0` to wait for specified number of seconds. > > By default it is set to `14400` seconds (ie `4` hours). > > In case if registered replica is lost forever and timeout is set to > infinity then a preferred way to enable cleanup procedure is not setting > up a small timeout value but rather to delete this replica from `_cluster` > space manually. > > Note that the option does *not* prevent WAL engine from removing > old `*.xlog` files if there is no space left on a storage device, > WAL engine can remove them in a force way. > > Current state of `*.xlog` garbage collector can be found in > `box.info.gc()` output. For example > > ``` Lua > tarantool> box.info.gc() > --- > ... > cleanup_is_paused: false 3. 'gc.cleanup' is a tautology. If you want to expose the flag, it better be simply 'is_paused' IMO. > delay_ref: 0 4. Can't the user get the same by looking at _cluster size - box.info.gc().consumers count ? > ``` > > The `cleanup_is_paused` shows if cleanup fiber is paused or not, > and `delay_ref` represents number of replicas to be subscribed > if the cleanup fiber is in `pause` mode. > --- > src/box/box.cc | 11 +++++ > src/box/gc.c | 75 +++++++++++++++++++++++++++++++-- > src/box/gc.h | 30 +++++++++++++ > src/box/lua/info.c | 8 ++++ > src/box/lua/load_cfg.lua | 3 ++ > src/box/relay.cc | 20 +++++++++ > src/box/replication.cc | 2 + > test/app-tap/init_script.result | 1 + > test/box/admin.result | 2 + > test/box/cfg.result | 4 ++ > 10 files changed, 153 insertions(+), 3 deletions(-) > > diff --git a/src/box/box.cc b/src/box/box.cc > index b4a1a5e07..ee017c2dc 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -754,6 +754,15 @@ box_check_wal_mode(const char *mode_name) > return (enum wal_mode) mode; > } > > +static void > +box_check_wal_cleanup_delay(int tmo) > +{ > + if (tmo < 0 && tmo != -1) { > + tnt_raise(ClientError, ER_CFG, "wal_cleanup_delay", > + "the value must be either >= 0 or -1"); 3. As you could notice, we try not to use C++ exceptions in the new code. Lets use diag_set(). For example, like the new option wal_max_queue_size does. 4. Can you move cfg_geti() into this function? I see no reason why do you need to pass it as a parameter. The function is used in a single place. > + } > +} > @@ -3045,6 +3055,7 @@ box_cfg_xc(void) > bootstrap(&instance_uuid, &replicaset_uuid, > &is_bootstrap_leader); > } > + gc_delay_unref(); 5. Why? > fiber_gc(); > > bootstrap_journal_guard.is_active = false; > diff --git a/src/box/gc.c b/src/box/gc.c > index 9af4ef958..595b98bdf 100644 > --- a/src/box/gc.c > +++ b/src/box/gc.c > @@ -107,6 +108,25 @@ gc_init(void) > /* Don't delete any files until recovery is complete. */ > gc.min_checkpoint_count = INT_MAX; > > + gc.wal_cleanup_delay = cfg_geti("wal_cleanup_delay"); 6. Please, pass the options as arguments of gc_init(). Having the global options disseminated over the sources does not simplify understanding of what subsystems on which options depend. That would be consistent with not using cfg_get* fuctions almost anywhere except a few big 'main' files, and one sql file, which is also wrong, but is a mess in many other ways as well. > + gc.delay_ref = 0; > + > + if (cfg_geti("replication_anon") != 0) { 7. GC does not really need to know if the instance is anon. This could be both controlled by wal_cleanup_delay cfg and the caller of gc_init(): box_cfg_xc(). Otherwise you made GC depend on replication details in the code, which is not great. We must keep the dependencies as low as possible. In box_cfg_xc you can get both wal_cleanup_delay and replication_anon and check if the anon is set, then you call gc_init(0). If it is false, you call gc_init(wal_cleanup_delay). GC only knows about some delay obtained from its owner. > + /* > + * Anonymous replicas are read only > + * so no need to keep XLOGs. > + */ > + gc.cleanup_is_paused = false; > + } else if (gc.wal_cleanup_delay == 0) { > + /* > + * The delay is disabled explicitly. > + */ > + gc.cleanup_is_paused = false; > + } else { > + say_info("gc: wal/engine cleanup is paused"); > + gc.cleanup_is_paused = true; > + } > + > vclock_create(&gc.vclock); > rlist_create(&gc.checkpoints); > gc_tree_new(&gc.consumers); > @@ -238,6 +258,30 @@ static int > gc_cleanup_fiber_f(va_list ap) > { > (void)ap; > + > + /* > + * Stage 1 (optional): in case if we're booting > + * up with cleanup disabled lets do wait in a > + * separate cycle to minimize branching on stage 2. > + */ > + if (gc.cleanup_is_paused) { > + ev_tstamp tmo = gc.wal_cleanup_delay == -1 ? > + TIMEOUT_INFINITY : gc.wal_cleanup_delay; 8. Please, call it 'timeout'. 'tmo' looks super weird, and is never used anywhere except this patch. 9. What if I see the timout was too big, and I cal box.cfg{wal_cleanup_delay = 0}? It seems I can only reduce it on restart. Can you support a dynamic update? It should be easy. You need to store gc deadline instead of the delay. So when you change the timeout, you can see if with the new timeout the deadline is lready reached, and then can start GC earlier. If I increase the timeout, GC could be paused again, or simply ignored, or banned. But the decrease is more important, as I think. > + while (!fiber_is_cancelled()) { > + if (fiber_yield_timeout(tmo)) { > + say_info("gc: wal/engine cleanup is resumed " > + "due to timeout expiration"); 10. Other gc logs don't have 'gc' prefix. I would propose to follow it in the new logs too. > + gc.cleanup_is_paused = false; > + break; > + } > + if (!gc.cleanup_is_paused) > + break; > + } > + } > + > + /* > + * Stage 2: a regular cleanup cycle. > + */ > while (!fiber_is_cancelled()) { > int64_t delta = gc.cleanup_scheduled - gc.cleanup_completed; > if (delta == 0) { > @@ -253,6 +297,29 @@ gc_cleanup_fiber_f(va_list ap) > return 0; > } > > +void > +gc_delay_ref(void) > +{ > + if (gc.cleanup_is_paused) { > + assert(gc.delay_ref >= 0); > + gc.delay_ref++; > + } > +} > + > +void > +gc_delay_unref(void) > +{ > + if (gc.cleanup_is_paused) { > + assert(gc.delay_ref > 0); > + gc.delay_ref--; 11. If it is not paused and GC started earlier due to a timeout, the user will see 'delay_ref' in box.info even after all the replicas are connected. > + if (gc.delay_ref == 0) { > + say_info("gc: wal/engine cleanup is resumed"); > + gc.cleanup_is_paused = false; > + fiber_wakeup(gc.cleanup_fiber); > + } > + } > +} > + > /** > * Trigger asynchronous garbage collection. > */ > @@ -278,9 +345,11 @@ gc_schedule_cleanup(void) > static void > gc_wait_cleanup(void) > { > - int64_t scheduled = gc.cleanup_scheduled; > - while (gc.cleanup_completed < scheduled) > - fiber_cond_wait(&gc.cleanup_cond); > + if (!gc.cleanup_is_paused) { > + int64_t scheduled = gc.cleanup_scheduled; > + while (gc.cleanup_completed < scheduled) > + fiber_cond_wait(&gc.cleanup_cond); > + } 12. The function is called 'wait_cleanup', but it does not really wait in case GC is paused. It looks wrong. > diff --git a/src/box/relay.cc b/src/box/relay.cc > index 41f949e8e..5d09690ad 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -668,6 +668,13 @@ relay_send_is_raft_enabled(struct relay *relay, > } > } > > +static void > +relay_gc_delay_unref(struct cmsg *msg) > +{ > + (void)msg; > + gc_delay_unref(); 13. You don't need a separate callback, and don't need to call it from the relay thread. Relay already works with GC - it calls gc_consumer_register() before starting the thread. You can do the unref in the same place. Starting from the consumer registration the logs are going to be kept anyway. Looks like it would be simpler if it works. > +} > + > /** > * A libev callback invoked when a relay client socket is ready > * for read. This currently only happens when the client closes > @@ -721,6 +728,19 @@ relay_subscribe_f(va_list ap) > */ > relay_send_heartbeat(relay); > > + /* > + * Now we can resume wal/engine gc as relay > + * is up and running. > + */ > + if (!relay->replica->anon) { > + static const struct cmsg_hop gc_delay_route[] = { > + {relay_gc_delay_unref, NULL} > + }; > + struct cmsg gc_delay_msg; > + cmsg_init(&gc_delay_msg, gc_delay_route); > + cpipe_push(&relay->tx_pipe, &gc_delay_msg); > + } > + > /* > * Run the event loop until the connection is broken > * or an error occurs. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-03-19 11:03 ` Cyrill Gorcunov via Tarantool-patches 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 13:40 ` Serge Petrenko via Tarantool-patches 1 sibling, 1 reply; 14+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-19 11:03 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml On Fri, Mar 19, 2021 at 12:04:09AM +0100, Vladislav Shpilevoy wrote: > > AFAIU, we go for the timeout option for the only reason that > there might be non-fullmesh typologies, when a replica is in > _cluster, but does not have relays to other nodes. Exactly. > If we would know the topology, we could tell which nodes will > relay to who, and would be able to detect when a replica needs > to keep the logs, and when don't. Not even the entire topology. > Just the info from the adjacent nodes. From our own box.cfg.replication, > and from box.cfg.replication of these nodes. > > I still don't understand what was wrong with implementing some kind > of topology discovery for this 2-level topology tree. For instance > when applier on a replica is connected, the remote node sends us a > flag whether it is going to connect back. If the flag is false, we don't > keep the logs for that instance. There is nothing wrong and I think we should do it. Could you please elaborate the details? You mean to extend applier protocol data, so that it would send not only vclock but also the flag if it going to be setting up a relay? > See 13 comments below. > > > @TarantoolBot document > > Title: Add wal_cleanup_delay configuration parameter > > 1. There are 2 issues with the name. Firstly, 'cleanup' conflicts > with the existing 'gc' name, which is already exposed in box.info.gc. > It is not called box.info.cleanup, so I would propose to use 'gc' > here too. > > Another issue I described in the first patchset's discussion. It is > not really a delay. Because it is simply ignored when the replication > feels like it. It must either have 'replication' prefix to designate > it is not just a fixated timeout to keep the WALs and it depends on > the replication, or at least it must mention "max". To designate it > is not the exact strict timeout for keeping the logs. Vlad, personally I don't mind to name this option whatever you like, just gimme a name and I'll use it. > > The `wal_cleanup_delay` option defines a delay in seconds > > before write ahead log files (`*.xlog`) are getting started > > to prune upon a node restart. > > > > This option is isnored in case if a node is running as > > 2. isnored -> ignored. Ouch, I happen to miss this while been running aspell. Thanks! > > > > Current state of `*.xlog` garbage collector can be found in > > `box.info.gc()` output. For example > > > > ``` Lua > > tarantool> box.info.gc() > > --- > > ... > > cleanup_is_paused: false > > 3. 'gc.cleanup' is a tautology. If you want to expose the > flag, it better be simply 'is_paused' IMO. OK > > delay_ref: 0 > > 4. Can't the user get the same by looking at > > _cluster size - box.info.gc().consumers count > > ? I would prefer to keep this as a separate output simply because I need an internal variable for gc code itself and reading it is a way more convenient than fetching _cluster size and then consumers and do some math after. Moreover I need a way to find out _internal_ counter value for debug purpose. > > +static void > > +box_check_wal_cleanup_delay(int tmo) > > +{ > > + if (tmo < 0 && tmo != -1) { > > + tnt_raise(ClientError, ER_CFG, "wal_cleanup_delay", > > + "the value must be either >= 0 or -1"); > > 3. As you could notice, we try not to use C++ exceptions in the new > code. Lets use diag_set(). For example, like the new option > wal_max_queue_size does. OK > > 4. Can you move cfg_geti() into this function? I see no reason why > do you need to pass it as a parameter. The function is used in > a single place. Sure > > + } > > +} > > @@ -3045,6 +3055,7 @@ box_cfg_xc(void) > > bootstrap(&instance_uuid, &replicaset_uuid, > > &is_bootstrap_leader); > > } > > + gc_delay_unref(); > > 5. Why? For case where you don't have replicas at all thus you need to unref yourself from counting and the counter shrinks to zero enabling gc. > > > fiber_gc(); > > > > bootstrap_journal_guard.is_active = false; > > diff --git a/src/box/gc.c b/src/box/gc.c > > index 9af4ef958..595b98bdf 100644 > > --- a/src/box/gc.c > > +++ b/src/box/gc.c > > @@ -107,6 +108,25 @@ gc_init(void) > > /* Don't delete any files until recovery is complete. */ > > gc.min_checkpoint_count = INT_MAX; > > > > + gc.wal_cleanup_delay = cfg_geti("wal_cleanup_delay"); > > 6. Please, pass the options as arguments of gc_init(). Having the > global options disseminated over the sources does not simplify > understanding of what subsystems on which options depend. > > That would be consistent with not using cfg_get* fuctions almost > anywhere except a few big 'main' files, and one sql file, which > is also wrong, but is a mess in many other ways as well. OK > > > + gc.delay_ref = 0; > > + > > + if (cfg_geti("replication_anon") != 0) { > > 7. GC does not really need to know if the instance is anon. This > could be both controlled by wal_cleanup_delay cfg and the caller > of gc_init(): box_cfg_xc(). Otherwise you made GC depend on > replication details in the code, which is not great. We must keep > the dependencies as low as possible. > > In box_cfg_xc you can get both wal_cleanup_delay and replication_anon > and check if the anon is set, then you call gc_init(0). If it is > false, you call gc_init(wal_cleanup_delay). GC only knows about some > delay obtained from its owner. OK > > + > > + /* > > + * Stage 1 (optional): in case if we're booting > > + * up with cleanup disabled lets do wait in a > > + * separate cycle to minimize branching on stage 2. > > + */ > > + if (gc.cleanup_is_paused) { > > + ev_tstamp tmo = gc.wal_cleanup_delay == -1 ? > > + TIMEOUT_INFINITY : gc.wal_cleanup_delay; > > 8. Please, call it 'timeout'. 'tmo' looks super weird, and is never > used anywhere except this patch. TMO is well known abbreviation in network stack but sure will rename. > > 9. What if I see the timout was too big, and I cal > box.cfg{wal_cleanup_delay = 0}? It seems I can only reduce it on > restart. Can you support a dynamic update? It should be easy. You > need to store gc deadline instead of the delay. So when you change > the timeout, you can see if with the new timeout the deadline is > lready reached, and then can start GC earlier. Thanks, will make it dynamic. > If I increase the timeout, GC could be paused again, or simply > ignored, or banned. But the decrease is more important, as I think. > > > + while (!fiber_is_cancelled()) { > > + if (fiber_yield_timeout(tmo)) { > > + say_info("gc: wal/engine cleanup is resumed " > > + "due to timeout expiration"); > > 10. Other gc logs don't have 'gc' prefix. I would propose to follow > it in the new logs too. This makes easier for grepping the logs but sure will drop. > > +void > > +gc_delay_unref(void) > > +{ > > + if (gc.cleanup_is_paused) { > > + assert(gc.delay_ref > 0); > > + gc.delay_ref--; > > 11. If it is not paused and GC started earlier due to a > timeout, the user will see 'delay_ref' in box.info even > after all the replicas are connected. Yes, and this gonna be a sign that we're exited due to timeout. Such output should not be treated as an error but if you prefer I can zap the counter for this case. > > + if (!gc.cleanup_is_paused) { > > + int64_t scheduled = gc.cleanup_scheduled; > > + while (gc.cleanup_completed < scheduled) > > + fiber_cond_wait(&gc.cleanup_cond); > > + } > > 12. The function is called 'wait_cleanup', but it does not really > wait in case GC is paused. It looks wrong. Hard to say. paused state is rather internal state and when someone is waiting for cleanup to complete but cleanup is turned off then I consider it pretty natural to exit immediately. And I moved this "if" into the function itself simply because we may use this helper in future and we should not stuck forever in "waiting" if cleanup is disabled. I can move this "if" to the caller side though if you prefer. > > > > +static void > > +relay_gc_delay_unref(struct cmsg *msg) > > +{ > > + (void)msg; > > + gc_delay_unref(); > > 13. You don't need a separate callback, and don't need to call it > from the relay thread. Relay already works with GC - it calls > gc_consumer_register() before starting the thread. You can do the > unref in the same place. Starting from the consumer registration > the logs are going to be kept anyway. > > Looks like it would be simpler if it works. Actually initially i did it this way, right before creating relay thread. But I think this is not good and that's why: when relay is starting it may fail in a number of places (the thread itself is not created; thread is created but then faiber creation failed with eception) and I think we should decrement the reference when we only pretty sure that there won't be new errors inside relay cycle. What would happen when say relay fiber is triggering an error? iproto will write an error and as far as I understand the replica will try to reconnect. Thus we should keep the logs until relay is subscribed for sure. Cyrill ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-19 11:03 ` Cyrill Gorcunov via Tarantool-patches @ 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-22 9:05 ` Serge Petrenko via Tarantool-patches 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-19 22:17 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: Mons Anderson, tml >> If we would know the topology, we could tell which nodes will >> relay to who, and would be able to detect when a replica needs >> to keep the logs, and when don't. Not even the entire topology. >> Just the info from the adjacent nodes. From our own box.cfg.replication, >> and from box.cfg.replication of these nodes. >> >> I still don't understand what was wrong with implementing some kind >> of topology discovery for this 2-level topology tree. For instance >> when applier on a replica is connected, the remote node sends us a >> flag whether it is going to connect back. If the flag is false, we don't >> keep the logs for that instance. > > There is nothing wrong and I think we should do it. Could you please > elaborate the details? You mean to extend applier protocol data, so > that it would send not only vclock but also the flag if it going to > be setting up a relay? Nevermind, it won't work for certain topologies. The only way is to have it in _cluster, and make _cluster synchronous space. But this is a big separate task, so probably timeout is fine for now. >> See 13 comments below. >> >>> @TarantoolBot document >>> Title: Add wal_cleanup_delay configuration parameter >> >> 1. There are 2 issues with the name. Firstly, 'cleanup' conflicts >> with the existing 'gc' name, which is already exposed in box.info.gc. >> It is not called box.info.cleanup, so I would propose to use 'gc' >> here too. >> >> Another issue I described in the first patchset's discussion. It is >> not really a delay. Because it is simply ignored when the replication >> feels like it. It must either have 'replication' prefix to designate >> it is not just a fixated timeout to keep the WALs and it depends on >> the replication, or at least it must mention "max". To designate it >> is not the exact strict timeout for keeping the logs. > > Vlad, personally I don't mind to name this option whatever you like, > just gimme a name and I'll use it. I still don't like the name, but I am outvoted. We won't even add 'max' to it. The current name stays, unfortunately. >>> + } >>> +} >>> @@ -3045,6 +3055,7 @@ box_cfg_xc(void) >>> bootstrap(&instance_uuid, &replicaset_uuid, >>> &is_bootstrap_leader); >>> } >>> + gc_delay_unref(); >> >> 5. Why? > > For case where you don't have replicas at all thus you need > to unref yourself from counting and the counter shrinks to > zero enabling gc. Lets add a comment about this. >>> +void >>> +gc_delay_unref(void) >>> +{ >>> + if (gc.cleanup_is_paused) { >>> + assert(gc.delay_ref > 0); >>> + gc.delay_ref--; >> >> 11. If it is not paused and GC started earlier due to a >> timeout, the user will see 'delay_ref' in box.info even >> after all the replicas are connected. > > Yes, and this gonna be a sign that we're exited due to > timeout. Such output should not be treated as an error > but if you prefer I can zap the counter for this case. I would prefer not to have delay_ref in the public monitoring at all, but you should ask Mons about it now. >>> + if (!gc.cleanup_is_paused) { >>> + int64_t scheduled = gc.cleanup_scheduled; >>> + while (gc.cleanup_completed < scheduled) >>> + fiber_cond_wait(&gc.cleanup_cond); >>> + } >> >> 12. The function is called 'wait_cleanup', but it does not really >> wait in case GC is paused. It looks wrong. > > Hard to say. paused state is rather internal state and > when someone is waiting for cleanup to complete but cleanup > is turned off then I consider it pretty natural to exit > immediately. And I moved this "if" into the function itself > simply because we may use this helper in future and we should > not stuck forever in "waiting" if cleanup is disabled. > > I can move this "if" to the caller side though if you prefer. Probably this would be better. Because the function is called 'wait cleanup', not 'try wait or return immediately'. The comment in the place where the function is called says it guarantees that by the time box.snapshot() returns, all outdated checkpoint files have been removed It is misleading now. >>> +static void >>> +relay_gc_delay_unref(struct cmsg *msg) >>> +{ >>> + (void)msg; >>> + gc_delay_unref(); >> >> 13. You don't need a separate callback, and don't need to call it >> from the relay thread. Relay already works with GC - it calls >> gc_consumer_register() before starting the thread. You can do the >> unref in the same place. Starting from the consumer registration >> the logs are going to be kept anyway. >> >> Looks like it would be simpler if it works. > > Actually initially i did it this way, right before creating > relay thread. But I think this is not good and that's why: > when relay is starting it may fail in a number of places > (the thread itself is not created; thread is created but > then faiber creation failed with eception) and I think we > should decrement the reference when we only pretty sure that > there won't be new errors inside relay cycle. Why? New errors won't break anything. They also can happen after you did unref in your patch. > What would happen when say relay fiber is triggering an > error? iproto will write an error and as far as I understand > the replica will try to reconnect. Thus we should keep the > logs until relay is subscribed for sure. GC consumer keeps the logs while the replica tries to reconnect. But what I don't understand now - how does it work if the struct replica objects create the consumer objects right in replica_set_id()? Look at relay_subscribe(). If the replica is not anon, then 'replica->id != REPLICA_ID_NIL' is true (because there is an assertion). It means replica_set_id() was already called. And this means replica->gc is already not NULL. Therefore the check "replica->gc == NULL && !replica->anon" is never true. Am I missing something? Doesn't it mean all the _cluster nodes have a GC consumer on the current node, and nothing is deleted until all the nodes connect? Regardless of your patch. I feel like I miss something important. Because before the patch we have xlog gap errors somehow. Which means the consumers are dropped somewhere. Can you investigate? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-03-22 9:05 ` Serge Petrenko via Tarantool-patches 0 siblings, 0 replies; 14+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-03-22 9:05 UTC (permalink / raw) To: Vladislav Shpilevoy, Cyrill Gorcunov; +Cc: Mons Anderson, tml 20.03.2021 01:17, Vladislav Shpilevoy пишет: > But what I don't understand now - how does it work if the struct > replica objects create the consumer objects right in replica_set_id()? > > Look at relay_subscribe(). If the replica is not anon, then > 'replica->id != REPLICA_ID_NIL' is true (because there is an assertion). > It means replica_set_id() was already called. And this means replica->gc > is already not NULL. Therefore the check "replica->gc == NULL && !replica->anon" > is never true. Am I missing something? replica_set_id() only sets a gc consumer for a previously anonymous replica, which has just registered. This is not needed at all, likely. Because box_process_register() already creates a gc consumer. And replaces replica's gc consumer, if any, with this freshly created one later. So this piece of code in replica_set_id() is extraneous. It only makes sense when an anonymous replica was connected to some instance, then it registers on another instance and the first instance receives anonymous replica's _cluster registration via replication. This is still unneeded. Because once previously anonymous replica resubscribes, it'll get a gc consumer like everyone else does. This is quite misleading, indeed. It took me a while to understand what's happening. I'll probably return with a patch removing this piece in replica_set_id() altogether. -- Serge Petrenko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 11:03 ` Cyrill Gorcunov via Tarantool-patches @ 2021-03-19 13:40 ` Serge Petrenko via Tarantool-patches 2021-03-19 13:57 ` Konstantin Osipov via Tarantool-patches 1 sibling, 1 reply; 14+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-03-19 13:40 UTC (permalink / raw) To: Vladislav Shpilevoy, Cyrill Gorcunov, tml; +Cc: Mons Anderson 19.03.2021 02:04, Vladislav Shpilevoy пишет: > Hi! Thanks for the patch! > > Generally looks fine except details. > > > AFAIU, we go for the timeout option for the only reason that > there might be non-fullmesh typologies, when a replica is in > _cluster, but does not have relays to other nodes. > > If we would know the topology, we could tell which nodes will > relay to who, and would be able to detect when a replica needs > to keep the logs, and when don't. Not even the entire topology. > Just the info from the adjacent nodes. From our own box.cfg.replication, > and from box.cfg.replication of these nodes. > > I still don't understand what was wrong with implementing some kind > of topology discovery for this 2-level topology tree. For instance > when applier on a replica is connected, the remote node sends us a > flag whether it is going to connect back. If the flag is false, we don't > keep the logs for that instance. IMO this topology discovery is not so easy to implement. And we've chosen this particular fix approach instead of 'persistent GC state' for its simplicity. How do you know whether a node is going to connect back? It has a corresponding entry in box.cfg.replication, sure, but how does it understand that this entry (URI) corresponds to the replica that has just connected? IIRC we had similar problems with inability to understand who's who judging solely by URI in some other fix. Don't remember which one exactly. Moreover, you may have something strange like a cascade topology. Say, there are 1 <- 2 <- 3 servers, with arrows showing replica-to-master connection. When 2 is up, how can it understand that it should wait for 3? -- Serge Petrenko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-19 13:40 ` Serge Petrenko via Tarantool-patches @ 2021-03-19 13:57 ` Konstantin Osipov via Tarantool-patches 0 siblings, 0 replies; 14+ messages in thread From: Konstantin Osipov via Tarantool-patches @ 2021-03-19 13:57 UTC (permalink / raw) To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy * Serge Petrenko <sergepetrenko@tarantool.org> [21/03/19 16:42]: > IMO this topology discovery is not so easy to implement. And we've chosen > this > particular fix approach instead of 'persistent GC state' for its simplicity. The topology discovery was necessary before there was raft. In raft, one needs not discover topology, there is a configuration change log entry for a topology change. Oops, there is no configuration changes in tarantool raft... -- Konstantin Osipov, Moscow, Russia ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 1/2] " Cyrill Gorcunov via Tarantool-patches 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-03-19 13:50 ` Serge Petrenko via Tarantool-patches 2021-03-19 15:14 ` Cyrill Gorcunov via Tarantool-patches 1 sibling, 1 reply; 14+ messages in thread From: Serge Petrenko via Tarantool-patches @ 2021-03-19 13:50 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson, Vladislav Shpilevoy 18.03.2021 21:41, Cyrill Gorcunov пишет: > In case if replica managed to be far behind the master node > (so there are a number of xlog files present after the last ... > src/box/replication.cc | 2 + > test/app-tap/init_script.result | 1 + > test/box/admin.result | 2 + > test/box/cfg.result | 4 ++ > 10 files changed, 153 insertions(+), 3 deletions(-) Hi! Thanks for the patch! Please find 2 minor comments below. > > diff --git a/src/box/box.cc b/src/box/box.cc > index b4a1a5e07..ee017c2dc 100644 > --- a/src/box/box.cc > +++ b/src/box/box.cc > @@ -754,6 +754,15 @@ box_check_wal_mode(const char *mode_name) > return (enum wal_mode) mode; > } > > +static void > +box_check_wal_cleanup_delay(int tmo) > +{ > + if (tmo < 0 && tmo != -1) { > + tnt_raise(ClientError, ER_CFG, "wal_cleanup_delay", > + "the value must be either >= 0 or -1"); > + } > +} > + 1. AFAICS all other delay and timeout parameters are 'double'. Let's make this one also a double. > static void > box_check_readahead(int readahead) > { > @@ -899,6 +908,7 @@ box_check_config(void) > box_check_checkpoint_count(cfg_geti("checkpoint_count")); > box_check_wal_max_size(cfg_geti64("wal_max_size")); > box_check_wal_mode(cfg_gets("wal_mode")); > + box_check_wal_cleanup_delay(cfg_geti("wal_cleanup_delay")); > if (box_check_memory_quota("memtx_memory") < 0) > diag_raise(); > box_check_memtx_min_tuple_size(cfg_geti64("memtx_min_tuple_size")); > @@ -3045,6 +3055,7 @@ box_cfg_xc(void) > bootstrap(&instance_uuid, &replicaset_uuid, > &is_bootstrap_leader); > } > + gc_delay_unref(); > fiber_gc(); > > bootstrap_journal_guard.is_active = false; ... > diff --git a/src/box/relay.cc b/src/box/relay.cc > index 41f949e8e..5d09690ad 100644 > --- a/src/box/relay.cc > +++ b/src/box/relay.cc > @@ -668,6 +668,13 @@ relay_send_is_raft_enabled(struct relay *relay, > } > } > > +static void > +relay_gc_delay_unref(struct cmsg *msg) > +{ > + (void)msg; > + gc_delay_unref(); > +} > + > /** > * A libev callback invoked when a relay client socket is ready > * for read. This currently only happens when the client closes > @@ -721,6 +728,19 @@ relay_subscribe_f(va_list ap) > */ > relay_send_heartbeat(relay); > > + /* > + * Now we can resume wal/engine gc as relay > + * is up and running. > + */ > + if (!relay->replica->anon) { > + static const struct cmsg_hop gc_delay_route[] = { > + {relay_gc_delay_unref, NULL} > + }; > + struct cmsg gc_delay_msg; > + cmsg_init(&gc_delay_msg, gc_delay_route); > + cpipe_push(&relay->tx_pipe, &gc_delay_msg); > + } > + 2. I agree with Vlad here. No need to call gc_delay_unref() from the relay thread. We need to wait until a corresponding gc consumer is registered. This is done in relay_subscribe(), which is in tx thread. In fact, you may as well move the gc_delay_unref() call directly to gc_consumer_register(). This would look even better, IMO. > /* > * Run the event loop until the connection is broken > * or an error occurs. > diff --git a/src/box/replication.cc b/src/box/replication.cc > index 1fa8843e7..aefb812b3 100644 > --- a/src/box/replication.cc > +++ b/src/box/replication.cc > @@ -250,6 +250,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id) > tt_uuid_str(&replica->uuid)); > } > replicaset.replica_by_id[replica_id] = replica; > + gc_delay_ref(); > ++replicaset.registered_count; > say_info("assigned id %d to replica %s", > replica->id, tt_uuid_str(&replica->uuid)); > @@ -273,6 +274,7 @@ replica_clear_id(struct replica *replica) > replicaset.replica_by_id[replica->id] = NULL; > assert(replicaset.registered_count > 0); > --replicaset.registered_count; > + gc_delay_unref(); > if (replica->id == instance_id) { > /* See replica_check_id(). */ > assert(replicaset.is_joining); > ... -- Serge Petrenko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 1/2] gc/xlog: delay xlog cleanup until relays are subscribed 2021-03-19 13:50 ` Serge Petrenko via Tarantool-patches @ 2021-03-19 15:14 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 0 replies; 14+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-19 15:14 UTC (permalink / raw) To: Serge Petrenko; +Cc: Mons Anderson, tml, Vladislav Shpilevoy On Fri, Mar 19, 2021 at 04:50:38PM +0300, Serge Petrenko wrote: > > +static void > > +box_check_wal_cleanup_delay(int tmo) > > +{ > > + if (tmo < 0 && tmo != -1) { > > + tnt_raise(ClientError, ER_CFG, "wal_cleanup_delay", > > + "the value must be either >= 0 or -1"); > > + } > > +} > > + > > > 1. AFAICS all other delay and timeout parameters are 'double'. > Let's make this one also a double. Sure > > + /* > > + * Now we can resume wal/engine gc as relay > > + * is up and running. > > + */ > > + if (!relay->replica->anon) { > > + static const struct cmsg_hop gc_delay_route[] = { > > + {relay_gc_delay_unref, NULL} > > + }; > > + struct cmsg gc_delay_msg; > > + cmsg_init(&gc_delay_msg, gc_delay_route); > > + cpipe_push(&relay->tx_pipe, &gc_delay_msg); > > + } > > + > > 2. I agree with Vlad here. No need to call gc_delay_unref() from the relay > thread. > We need to wait until a corresponding gc consumer is registered. This is > done > in relay_subscribe(), which is in tx thread. > > In fact, you may as well move the gc_delay_unref() call directly to > gc_consumer_register(). > This would look even better, IMO. I don't mind to move it inside gc_consumer_register itself. Still this bothers me a bit, the gc_consumer_register might be used not only by relays since it is general api, would it be OK in a long term? ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option 2021-03-18 18:41 [Tarantool-patches] [PATCH 0/2] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 1/2] " Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 18:41 ` Cyrill Gorcunov via Tarantool-patches 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 1 sibling, 1 reply; 14+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 18:41 UTC (permalink / raw) To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson Part-of #5806 Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> --- test/replication/gh-5806-master.lua | 13 + test/replication/gh-5806-slave.lua | 13 + test/replication/gh-5806-xlog-cleanup.result | 336 ++++++++++++++++++ .../replication/gh-5806-xlog-cleanup.test.lua | 148 ++++++++ 4 files changed, 510 insertions(+) create mode 100644 test/replication/gh-5806-master.lua create mode 100644 test/replication/gh-5806-slave.lua create mode 100644 test/replication/gh-5806-xlog-cleanup.result create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua diff --git a/test/replication/gh-5806-master.lua b/test/replication/gh-5806-master.lua new file mode 100644 index 000000000..0404965d3 --- /dev/null +++ b/test/replication/gh-5806-master.lua @@ -0,0 +1,13 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) + +function func_xlog_snap(space, value) + space:insert(value) + box.snapshot() +end + +box.cfg({ + listen = os.getenv("LISTEN"), + wal_cleanup_delay = tonumber(arg[1]) or 0, +}) diff --git a/test/replication/gh-5806-slave.lua b/test/replication/gh-5806-slave.lua new file mode 100644 index 000000000..917dbb1ae --- /dev/null +++ b/test/replication/gh-5806-slave.lua @@ -0,0 +1,13 @@ +#!/usr/bin/env tarantool + +require('console').listen(os.getenv('ADMIN')) + +function func_xlog_snap(space, value) + space:insert(value) + box.snapshot() +end + +box.cfg({ + listen = os.getenv("LISTEN"), + replication = os.getenv("MASTER"), +}) diff --git a/test/replication/gh-5806-xlog-cleanup.result b/test/replication/gh-5806-xlog-cleanup.result new file mode 100644 index 000000000..97355a8bf --- /dev/null +++ b/test/replication/gh-5806-xlog-cleanup.result @@ -0,0 +1,336 @@ +-- test-run result file version 2 +-- +-- gh-5806: defer xlog cleanup to keep xlogs until +-- replicas present in "_cluster" are connected. +-- Otherwise we are getting XlogGapError since +-- master might go far forwad from replica and +-- replica won't be able to connect without full +-- rebootstrap. +-- + +fiber = require('fiber') + | --- + | ... +test_run = require('test_run').new() + | --- + | ... +engine = test_run:get_cfg('engine') + | --- + | ... + +-- +-- Case 1. +-- +-- First lets make sure we're getting XlogGapError in +-- case if wal_cleanup_delay is not used. +-- + +test_run:cmd('create server master with script="replication/gh-5806-master.lua"') + | --- + | - true + | ... +test_run:cmd('start server master with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:switch('master') + | --- + | - true + | ... +box.schema.user.grant('guest', 'replication') + | --- + | ... + +-- +-- Keep small number of snaps to force cleanup +-- procedure be more intensive. +box.cfg{checkpoint_count = 1} + | --- + | ... + +engine = test_run:get_cfg('engine') + | --- + | ... +s = box.schema.space.create('test', {engine = engine}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('create server replica with rpl_master=master,\ + script="replication/gh-5806-slave.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +box.cfg{checkpoint_count = 1} + | --- + | ... +s = box.schema.space.create('testtemp', {temporary = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end + | --- + | ... + +-- +-- Stop the replica node and generate +-- first range of xlogs on the master. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server replica') + | --- + | - true + | ... + +test_run:switch('master') + | --- + | - true + | ... +for i=1,2 do func_xlog_snap(box.space.test, {i}) end + | --- + | ... + +-- +-- Restart the masted and generate the +-- next range of xlogs. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('start server master with wait_load=True') + | --- + | - true + | ... +test_run:switch('master') + | --- + | - true + | ... +for i=3,4 do func_xlog_snap(box.space.test, {i}) end + | --- + | ... + +-- +-- Restart master node and the replica then. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('start server master with wait_load=True') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=False, wait_load=False') + | --- + | - true + | ... + +-- +-- Wait error to appear. +while test_run:grep_log("master", "XlogGapError") == nil do fiber.sleep(0.01) end + | --- + | ... + +-- +-- Cleanup. +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('stop server replica') + | --- + | - true + | ... +test_run:cmd('delete server master') + | --- + | - true + | ... +test_run:cmd('delete server replica') + | --- + | - true + | ... + +-- +-- Case 2. +-- +-- Lets make sure we're not getting XlogGapError in +-- case if wal_cleanup_delay is used. +-- + +test_run:cmd('create server master with script="replication/gh-5806-master.lua"') + | --- + | - true + | ... +test_run:cmd('start server master with args="3600", wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:switch('master') + | --- + | - true + | ... +box.schema.user.grant('guest', 'replication') + | --- + | ... + +-- +-- Keep small number of snaps to force cleanup +-- procedure be more intensive. +box.cfg{checkpoint_count = 1} + | --- + | ... + +engine = test_run:get_cfg('engine') + | --- + | ... +s = box.schema.space.create('test', {engine = engine}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... + +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('create server replica with rpl_master=master,\ + script="replication/gh-5806-slave.lua"') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=True, wait_load=True') + | --- + | - true + | ... + +test_run:switch('replica') + | --- + | - true + | ... +box.cfg{checkpoint_count = 1} + | --- + | ... +s = box.schema.space.create('testtemp', {temporary = true}) + | --- + | ... +_ = s:create_index('pk') + | --- + | ... +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end + | --- + | ... + +-- +-- Stop the replica node and generate +-- first range of xlogs on the master. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server replica') + | --- + | - true + | ... + +test_run:switch('master') + | --- + | - true + | ... +for i=1,2 do func_xlog_snap(box.space.test, {i}) end + | --- + | ... + +-- +-- Restart the masted and generate the +-- next range of xlogs. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('start server master with args="3600", wait=True, wait_load=True') + | --- + | - true + | ... +test_run:switch('master') + | --- + | - true + | ... +for i=3,4 do func_xlog_snap(box.space.test, {i}) end + | --- + | ... + +-- +-- Restart master node and the replica then. +test_run:switch('default') + | --- + | - true + | ... +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('start server master with args="3600", wait=True, wait_load=True') + | --- + | - true + | ... +test_run:cmd('start server replica with wait=True, wait_load=True') + | --- + | - true + | ... + +assert(test_run:grep_log("master", "XlogGapError") == nil) + | --- + | - true + | ... + +-- +-- Cleanup. +test_run:cmd('stop server master') + | --- + | - true + | ... +test_run:cmd('stop server replica') + | --- + | - true + | ... +test_run:cmd('delete server master') + | --- + | - true + | ... +test_run:cmd('delete server replica') + | --- + | - true + | ... diff --git a/test/replication/gh-5806-xlog-cleanup.test.lua b/test/replication/gh-5806-xlog-cleanup.test.lua new file mode 100644 index 000000000..6232e3964 --- /dev/null +++ b/test/replication/gh-5806-xlog-cleanup.test.lua @@ -0,0 +1,148 @@ +-- +-- gh-5806: defer xlog cleanup to keep xlogs until +-- replicas present in "_cluster" are connected. +-- Otherwise we are getting XlogGapError since +-- master might go far forwad from replica and +-- replica won't be able to connect without full +-- rebootstrap. +-- + +fiber = require('fiber') +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +-- +-- Case 1. +-- +-- First lets make sure we're getting XlogGapError in +-- case if wal_cleanup_delay is not used. +-- + +test_run:cmd('create server master with script="replication/gh-5806-master.lua"') +test_run:cmd('start server master with wait=True, wait_load=True') + +test_run:switch('master') +box.schema.user.grant('guest', 'replication') + +-- +-- Keep small number of snaps to force cleanup +-- procedure be more intensive. +box.cfg{checkpoint_count = 1} + +engine = test_run:get_cfg('engine') +s = box.schema.space.create('test', {engine = engine}) +_ = s:create_index('pk') + +test_run:switch('default') +test_run:cmd('create server replica with rpl_master=master,\ + script="replication/gh-5806-slave.lua"') +test_run:cmd('start server replica with wait=True, wait_load=True') + +test_run:switch('replica') +box.cfg{checkpoint_count = 1} +s = box.schema.space.create('testtemp', {temporary = true}) +_ = s:create_index('pk') +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end + +-- +-- Stop the replica node and generate +-- first range of xlogs on the master. +test_run:switch('default') +test_run:cmd('stop server replica') + +test_run:switch('master') +for i=1,2 do func_xlog_snap(box.space.test, {i}) end + +-- +-- Restart the masted and generate the +-- next range of xlogs. +test_run:switch('default') +test_run:cmd('stop server master') +test_run:cmd('start server master with wait_load=True') +test_run:switch('master') +for i=3,4 do func_xlog_snap(box.space.test, {i}) end + +-- +-- Restart master node and the replica then. +test_run:switch('default') +test_run:cmd('stop server master') +test_run:cmd('start server master with wait_load=True') +test_run:cmd('start server replica with wait=False, wait_load=False') + +-- +-- Wait error to appear. +while test_run:grep_log("master", "XlogGapError") == nil do fiber.sleep(0.01) end + +-- +-- Cleanup. +test_run:cmd('stop server master') +test_run:cmd('stop server replica') +test_run:cmd('delete server master') +test_run:cmd('delete server replica') + +-- +-- Case 2. +-- +-- Lets make sure we're not getting XlogGapError in +-- case if wal_cleanup_delay is used. +-- + +test_run:cmd('create server master with script="replication/gh-5806-master.lua"') +test_run:cmd('start server master with args="3600", wait=True, wait_load=True') + +test_run:switch('master') +box.schema.user.grant('guest', 'replication') + +-- +-- Keep small number of snaps to force cleanup +-- procedure be more intensive. +box.cfg{checkpoint_count = 1} + +engine = test_run:get_cfg('engine') +s = box.schema.space.create('test', {engine = engine}) +_ = s:create_index('pk') + +test_run:switch('default') +test_run:cmd('create server replica with rpl_master=master,\ + script="replication/gh-5806-slave.lua"') +test_run:cmd('start server replica with wait=True, wait_load=True') + +test_run:switch('replica') +box.cfg{checkpoint_count = 1} +s = box.schema.space.create('testtemp', {temporary = true}) +_ = s:create_index('pk') +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end + +-- +-- Stop the replica node and generate +-- first range of xlogs on the master. +test_run:switch('default') +test_run:cmd('stop server replica') + +test_run:switch('master') +for i=1,2 do func_xlog_snap(box.space.test, {i}) end + +-- +-- Restart the masted and generate the +-- next range of xlogs. +test_run:switch('default') +test_run:cmd('stop server master') +test_run:cmd('start server master with args="3600", wait=True, wait_load=True') +test_run:switch('master') +for i=3,4 do func_xlog_snap(box.space.test, {i}) end + +-- +-- Restart master node and the replica then. +test_run:switch('default') +test_run:cmd('stop server master') +test_run:cmd('start server master with args="3600", wait=True, wait_load=True') +test_run:cmd('start server replica with wait=True, wait_load=True') + +assert(test_run:grep_log("master", "XlogGapError") == nil) + +-- +-- Cleanup. +test_run:cmd('stop server master') +test_run:cmd('stop server replica') +test_run:cmd('delete server master') +test_run:cmd('delete server replica') -- 2.30.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches @ 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 12:14 ` Cyrill Gorcunov via Tarantool-patches 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-18 23:04 UTC (permalink / raw) To: Cyrill Gorcunov, tml; +Cc: Mons Anderson Thanks for the patch! See 8 comments below. On 18.03.2021 19:41, Cyrill Gorcunov via Tarantool-patches wrote: > Part-of #5806 > > Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> > --- > test/replication/gh-5806-master.lua | 13 + > test/replication/gh-5806-slave.lua | 13 + > test/replication/gh-5806-xlog-cleanup.result | 336 ++++++++++++++++++ > .../replication/gh-5806-xlog-cleanup.test.lua | 148 ++++++++ > 4 files changed, 510 insertions(+) > create mode 100644 test/replication/gh-5806-master.lua > create mode 100644 test/replication/gh-5806-slave.lua > create mode 100644 test/replication/gh-5806-xlog-cleanup.result > create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua > > diff --git a/test/replication/gh-5806-master.lua b/test/replication/gh-5806-master.lua > new file mode 100644 > index 000000000..0404965d3 > --- /dev/null > +++ b/test/replication/gh-5806-master.lua > @@ -0,0 +1,13 @@ > +#!/usr/bin/env tarantool > + > +require('console').listen(os.getenv('ADMIN')) > + > +function func_xlog_snap(space, value) 1. This fails luacheck (which I don't like having in the tests, but still we must keep it green): Checking test/replication/gh-5806-master.lua 1 warning test/replication/gh-5806-master.lua:5:10: (W111) setting non-standard global variable func_xlog_snap Checking test/replication/gh-5806-slave.lua 1 warning test/replication/gh-5806-slave.lua:5:10: (W111) setting non-standard global variable func_xlog_snap Also why does it have 'func' prefix? It is a function obviously, we don't add 'func' prefix to all functions. > + space:insert(value) > + box.snapshot() > +end > diff --git a/test/replication/gh-5806-xlog-cleanup.result b/test/replication/gh-5806-xlog-cleanup.result > new file mode 100644 > index 000000000..97355a8bf > --- /dev/null > +++ b/test/replication/gh-5806-xlog-cleanup.result > @@ -0,0 +1,336 @@ > +-- test-run result file version 2 > +-- > +-- gh-5806: defer xlog cleanup to keep xlogs until > +-- replicas present in "_cluster" are connected. > +-- Otherwise we are getting XlogGapError since > +-- master might go far forwad from replica and 2. forwad -> forward. > +-- replica won't be able to connect without full > +-- rebootstrap. > +-- > + > +fiber = require('fiber') > + | --- > + | ... > +test_run = require('test_run').new() > + | --- > + | ... > +engine = test_run:get_cfg('engine') > + | --- > + | ... > + > +-- > +-- Case 1. > +-- > +-- First lets make sure we're getting XlogGapError in > +-- case if wal_cleanup_delay is not used. > +-- > + > +test_run:cmd('create server master with script="replication/gh-5806-master.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server master with wait=True, wait_load=True') > + | --- > + | - true > + | ... > + > +test_run:switch('master') > + | --- > + | - true > + | ... > +box.schema.user.grant('guest', 'replication') > + | --- > + | ... > + > +-- > +-- Keep small number of snaps to force cleanup > +-- procedure be more intensive. > +box.cfg{checkpoint_count = 1} > + | --- > + | ... > + > +engine = test_run:get_cfg('engine') > + | --- > + | ... > +s = box.schema.space.create('test', {engine = engine}) > + | --- > + | ... > +_ = s:create_index('pk') > + | --- > + | ... > + > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('create server replica with rpl_master=master,\ > + script="replication/gh-5806-slave.lua"') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica with wait=True, wait_load=True') > + | --- > + | - true > + | ... > + > +test_run:switch('replica') > + | --- > + | - true > + | ... > +box.cfg{checkpoint_count = 1} > + | --- > + | ... > +s = box.schema.space.create('testtemp', {temporary = true}) > + | --- > + | ... > +_ = s:create_index('pk') > + | --- > + | ... > +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end 3. Honestly, it would look much simpler if it would be just 4 lines with 2 inserts and 2 snapshots. 4. Why do you do rw requests both on the replica and master? And why do you need 2 spaces? > + | --- > + | ... > + > +-- > +-- Stop the replica node and generate > +-- first range of xlogs on the master. > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server replica') > + | --- > + | - true > + | ... > + > +test_run:switch('master') > + | --- > + | - true > + | ... > +for i=1,2 do func_xlog_snap(box.space.test, {i}) end > + | --- > + | ... > + > +-- > +-- Restart the masted and generate the 5. masted -> master. > +-- next range of xlogs. > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server master') > + | --- > + | - true > + | ... > +test_run:cmd('start server master with wait_load=True') 6. Does 'restart server master' command work? > + | --- > + | - true > + | ... > +test_run:switch('master') > + | --- > + | - true > + | ... > +for i=3,4 do func_xlog_snap(box.space.test, {i}) end > + | --- > + | ... > + > +-- > +-- Restart master node and the replica then. 7. Why do you need to restart the master 2 times? > +test_run:switch('default') > + | --- > + | - true > + | ... > +test_run:cmd('stop server master') > + | --- > + | - true > + | ... > +test_run:cmd('start server master with wait_load=True') > + | --- > + | - true > + | ... > +test_run:cmd('start server replica with wait=False, wait_load=False') > + | --- > + | - true > + | ... > + > +-- > +-- Wait error to appear. > +while test_run:grep_log("master", "XlogGapError") == nil do fiber.sleep(0.01) end 8. We have test_run:wait_log(). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches @ 2021-03-19 12:14 ` Cyrill Gorcunov via Tarantool-patches 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 1 reply; 14+ messages in thread From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-19 12:14 UTC (permalink / raw) To: Vladislav Shpilevoy; +Cc: tml, Mons Anderson On Fri, Mar 19, 2021 at 12:04:11AM +0100, Vladislav Shpilevoy wrote: > > + > > +function func_xlog_snap(space, value) > > 1. This fails luacheck (which I don't like having in the tests, > but still we must keep it green): > > Checking test/replication/gh-5806-master.lua 1 warning > > test/replication/gh-5806-master.lua:5:10: (W111) setting non-standard global variable func_xlog_snap > > Checking test/replication/gh-5806-slave.lua 1 warning > > test/replication/gh-5806-slave.lua:5:10: (W111) setting non-standard global variable func_xlog_snap Thanks, Vlad! Actually I don't understand what this warning means. Should I make these functions local or what? > > Also why does it have 'func' prefix? It is a function > obviously, we don't add 'func' prefix to all functions. OK > > +-- > > +-- gh-5806: defer xlog cleanup to keep xlogs until > > +-- replicas present in "_cluster" are connected. > > +-- Otherwise we are getting XlogGapError since > > +-- master might go far forwad from replica and > > 2. forwad -> forward. Thanks! > > + | --- > > + | ... > > +for i=1,2 do func_xlog_snap(box.space.testtemp, {i}) end > > 3. Honestly, it would look much simpler if it would be just 4 > lines with 2 inserts and 2 snapshots. Sure, I can make it so. > > 4. Why do you do rw requests both on the replica and master? To intermix data so that lsns would be different > And why do you need 2 spaces? This allows to write xlog data specifically on replica and only in this case I managed to reach XlogGapError (otherwise the replica reported that it is too old and restarted rejoin procedure). > > + > > +-- > > +-- Restart the masted and generate the > > 5. masted -> master. Thanks:) > > > +-- next range of xlogs. > > +test_run:switch('default') > > + | --- > > + | - true > > + | ... > > +test_run:cmd('stop server master') > > + | --- > > + | - true > > + | ... > > +test_run:cmd('start server master with wait_load=True') > > 6. Does 'restart server master' command work? I must confess I didn't know about this command. Will try. > > +-- > > +-- Restart master node and the replica then. > > 7. Why do you need to restart the master 2 times? To make sure the GC passed a few iterations. Maybe we could manage with one as well. Thanks for the point will try. >> > +-- > > +-- Wait error to appear. > > +while test_run:grep_log("master", "XlogGapError") == nil do fiber.sleep(0.01) end > > 8. We have test_run:wait_log(). Didn't know, thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option 2021-03-19 12:14 ` Cyrill Gorcunov via Tarantool-patches @ 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches 0 siblings, 0 replies; 14+ messages in thread From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-19 22:17 UTC (permalink / raw) To: Cyrill Gorcunov; +Cc: tml, Mons Anderson On 19.03.2021 13:14, Cyrill Gorcunov via Tarantool-patches wrote: > On Fri, Mar 19, 2021 at 12:04:11AM +0100, Vladislav Shpilevoy wrote: >>> + >>> +function func_xlog_snap(space, value) >> >> 1. This fails luacheck (which I don't like having in the tests, >> but still we must keep it green): >> >> Checking test/replication/gh-5806-master.lua 1 warning >> >> test/replication/gh-5806-master.lua:5:10: (W111) setting non-standard global variable func_xlog_snap >> >> Checking test/replication/gh-5806-slave.lua 1 warning >> >> test/replication/gh-5806-slave.lua:5:10: (W111) setting non-standard global variable func_xlog_snap > > Thanks, Vlad! Actually I don't understand what this warning means. Should I make > these functions local or what? Yeah, we can't declare global variables/functions. When you want a function, you must either assign it to an index in _G, or have it in a separate module and add it via 'require()' in the test's body. But it should not matter now if you delete this function. >> 4. Why do you do rw requests both on the replica and master? > > To intermix data so that lsns would be different Why should they be different (this all better be in the comments)? >> And why do you need 2 spaces? > > This allows to write xlog data specifically on replica and > only in this case I managed to reach XlogGapError (otherwise > the replica reported that it is too old and restarted rejoin > procedure). Aha, so it can't rejoin automatically, because this would lead to data loss. I see now. Please, document it in the test. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-03-22 9:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-18 18:41 [Tarantool-patches] [PATCH 0/2] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 1/2] " Cyrill Gorcunov via Tarantool-patches 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 11:03 ` Cyrill Gorcunov via Tarantool-patches 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-22 9:05 ` Serge Petrenko via Tarantool-patches 2021-03-19 13:40 ` Serge Petrenko via Tarantool-patches 2021-03-19 13:57 ` Konstantin Osipov via Tarantool-patches 2021-03-19 13:50 ` Serge Petrenko via Tarantool-patches 2021-03-19 15:14 ` Cyrill Gorcunov via Tarantool-patches 2021-03-18 18:41 ` [Tarantool-patches] [PATCH 2/2] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches 2021-03-18 23:04 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-19 12:14 ` Cyrill Gorcunov via Tarantool-patches 2021-03-19 22:17 ` Vladislav Shpilevoy via Tarantool-patches
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox