* [Tarantool-patches] [PATCH v2 0/3] gc/xlog: delay xlog cleanup until relays are subscribed
@ 2021-03-20 13:15 Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-20 13:15 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson
Take a look please.
v2:
- rebase code to the fresh master branch
- keep wal_cleanup_delay option name
- pass wal_cleanup_delay as an option to gc_init, so it
won't be dependent on cfg engine
- add comment about gc_delay_unref in plain bootstrap mode
- allow to setup wal_cleanup_delay dynamically
- update comment in gc_wait_cleanup and call it conditionally
- declare wal_cleanup_delay as a double
- rename gc.cleanup_is_paused to gc.is_paused and update output
- do not show ref counter in box.info.gc() output
- update documentation
- move gc_delay_unref inside relay_subscribe call which runs
in tx context (instead of relay's context)
- update tests:
- add a comment why we need a temp space on replica node
- use explicit insert/snapshot operations
- shrkink the number of insert/snapshot to speedup testing
- use "restart" instead of stop/start pair
- use wait_log helper instead of own function
- add is_paused test
issue https://github.com/tarantool/tarantool/issues/5806
branch gorcunov/gh-5806-xlog-gc-2
Cyrill Gorcunov (3):
gc/xlog: delay xlog cleanup until relays are subscribed
test: add a test for wal_cleanup_delay option
test: box-tap/gc -- add test for is_paused field
src/box/box.cc | 46 +-
src/box/box.h | 1 +
src/box/gc.c | 97 ++++-
src/box/gc.h | 38 +-
src/box/lua/cfg.cc | 9 +
src/box/lua/info.c | 4 +
src/box/lua/load_cfg.lua | 4 +
src/box/relay.cc | 1 +
src/box/replication.cc | 2 +
test/app-tap/init_script.result | 1 +
test/box-tap/gc.test.lua | 3 +-
test/box/admin.result | 2 +
test/box/cfg.result | 4 +
test/replication/gh-5806-master.lua | 8 +
test/replication/gh-5806-slave.lua | 8 +
test/replication/gh-5806-xlog-cleanup.result | 397 ++++++++++++++++++
.../replication/gh-5806-xlog-cleanup.test.lua | 166 ++++++++
test/unit/snap_quorum_delay.cc | 2 +-
18 files changed, 784 insertions(+), 9 deletions(-)
create mode 100644 test/replication/gh-5806-master.lua
create mode 100644 test/replication/gh-5806-slave.lua
create mode 100644 test/replication/gh-5806-xlog-cleanup.result
create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua
base-commit: f4e248c0c13a46beee238fbebc38ef687ef09d02
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches
@ 2021-03-20 13:15 ` Cyrill Gorcunov via Tarantool-patches
2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches
2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field Cyrill Gorcunov via Tarantool-patches
2 siblings, 2 replies; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-20 13:15 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson
In case if replica managed to be far behind the master node
(so there are a number of xlog files present after the last
master's snapshot) then once master node get restarted it
may clean up the xlogs needed by the replica to subscribe
in a fast way and instead the replica will have to rejoin
reading a number of data back.
Lets try to address this by delaying xlog files cleanup
until replicas are got subscribed and relays are up
and running. For this sake we start with cleanup fiber
spinning in nop cycle ("paused" mode) and use a delay
counter to wait until relays decrement them.
This implies that if `_cluster` system space is not empty
upon restart and the registered replica somehow vanished
completely and won't ever come back, then the node
administrator has to drop this replica from `_cluster`
manually.
Note that this delayed cleanup start doesn't prevent
WAL engine from removing old files if there is no
space left on a storage device. The WAL will simply
drop old data without a question.
We need to take into account that some administrators
might not need this functionality at all, for this
sake we introduce "wal_cleanup_delay" configuration
option which allows to enable or disable the delay.
Closes #5806
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
@TarantoolBot document
Title: Add wal_cleanup_delay configuration parameter
The `wal_cleanup_delay` option defines a delay in seconds
before write ahead log files (`*.xlog`) are getting started
to prune upon a node restart.
This option is ignored in case if a node is running as
an anonymous replica (`replication_anon = true`). Similarly
if replication is unused or there is no plans to use
replication at all then this option should not be considered.
An initial problem to solve is the case where a node is operating
so fast that its replicas do not manage to reach the node state
and in case if the node is restarted at this moment (for various
reasons, for example due to power outage) then `*.xlog` files might
be pruned during restart. In result replicas will not find these
files on the main node and have to reread all data back which
is a very expensive procedure.
Since replicas are tracked via `_cluster` system space this we use
its content to count subscribed replicas and when all of them are
up and running the cleanup procedure is automatically enabled even
if `wal_cleanup_delay` is not expired.
The `wal_cleanup_delay` should be set to:
- `-1` to wait infinitely until all existing replicas are subscribed;
- `0` to disable the cleanup delay;
- `>0` to wait for specified number of seconds.
By default it is set to `14400` seconds (ie `4` hours).
In case if registered replica is lost forever and timeout is set to
infinity then a preferred way to enable cleanup procedure is not setting
up a small timeout value but rather to delete this replica from `_cluster`
space manually.
Note that the option does *not* prevent WAL engine from removing
old `*.xlog` files if there is no space left on a storage device,
WAL engine can remove them in a force way.
Current state of `*.xlog` garbage collector can be found in
`box.info.gc()` output. For example
``` Lua
tarantool> box.info.gc()
---
...
cleanup_is_paused: false
```
The `cleanup_is_paused` shows if cleanup fiber is paused or not.
---
src/box/box.cc | 46 +++++++++++++++-
src/box/box.h | 1 +
src/box/gc.c | 97 +++++++++++++++++++++++++++++++--
src/box/gc.h | 38 ++++++++++++-
src/box/lua/cfg.cc | 9 +++
src/box/lua/info.c | 4 ++
src/box/lua/load_cfg.lua | 4 ++
src/box/relay.cc | 1 +
src/box/replication.cc | 2 +
test/app-tap/init_script.result | 1 +
test/box/admin.result | 2 +
test/box/cfg.result | 4 ++
test/unit/snap_quorum_delay.cc | 2 +-
13 files changed, 203 insertions(+), 8 deletions(-)
diff --git a/src/box/box.cc b/src/box/box.cc
index cc59564e1..b9fd7af00 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -771,6 +771,29 @@ box_check_wal_queue_max_size(void)
return size;
}
+static double
+box_check_wal_cleanup_delay(void)
+{
+ double value = cfg_geti("wal_cleanup_delay");
+ if (value < 0) {
+ diag_set(ClientError, ER_CFG, "wal_cleanup_delay",
+ "the value must be >= 0");
+ return -1;
+ }
+
+ /*
+ * Anonymous replicas do not require
+ * delay the cleanup procedure since they
+ * are read only.
+ */
+ if (cfg_geti("replication_anon") != 0) {
+ if (value != 0)
+ value = 0;
+ }
+
+ return value;
+}
+
static void
box_check_readahead(int readahead)
{
@@ -918,6 +941,8 @@ box_check_config(void)
box_check_wal_mode(cfg_gets("wal_mode"));
if (box_check_wal_queue_max_size() < 0)
diag_raise();
+ if (box_check_wal_cleanup_delay() < 0)
+ diag_raise();
if (box_check_memory_quota("memtx_memory") < 0)
diag_raise();
box_check_memtx_min_tuple_size(cfg_geti64("memtx_min_tuple_size"));
@@ -1465,6 +1490,16 @@ box_set_wal_queue_max_size(void)
return 0;
}
+int
+box_set_wal_cleanup_delay(void)
+{
+ double delay = box_check_wal_cleanup_delay();
+ if (delay < 0)
+ return -1;
+ gc_set_wal_cleanup_delay(delay);
+ return 0;
+}
+
void
box_set_vinyl_memory(void)
{
@@ -3000,7 +3035,7 @@ box_cfg_xc(void)
rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX);
rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST);
- gc_init();
+ gc_init(box_check_wal_cleanup_delay());
engine_init();
schema_init();
replication_init();
@@ -3076,6 +3111,15 @@ box_cfg_xc(void)
}
fiber_gc();
+ /*
+ * Exclude self from GC delay because we care
+ * about remote replicas only, still for ref/unref
+ * balance we do reference self node initially and
+ * downgrade it to zero when there is no replication
+ * set at all.
+ */
+ gc_delay_unref();
+
bootstrap_journal_guard.is_active = false;
assert(current_journal != &bootstrap_journal);
diff --git a/src/box/box.h b/src/box/box.h
index 65215b087..e2321b9b0 100644
--- a/src/box/box.h
+++ b/src/box/box.h
@@ -243,6 +243,7 @@ void box_set_checkpoint_count(void);
void box_set_checkpoint_interval(void);
void box_set_checkpoint_wal_threshold(void);
int box_set_wal_queue_max_size(void);
+int box_set_wal_cleanup_delay(void);
void box_set_memtx_memory(void);
void box_set_memtx_max_tuple_size(void);
void box_set_vinyl_memory(void);
diff --git a/src/box/gc.c b/src/box/gc.c
index 9af4ef958..5418fd31d 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -46,6 +46,7 @@
#include <small/rlist.h>
#include <tarantool_ev.h>
+#include "cfg.h"
#include "diag.h"
#include "errcode.h"
#include "fiber.h"
@@ -102,11 +103,18 @@ gc_checkpoint_delete(struct gc_checkpoint *checkpoint)
}
void
-gc_init(void)
+gc_init(double wal_cleanup_delay)
{
/* Don't delete any files until recovery is complete. */
gc.min_checkpoint_count = INT_MAX;
+ gc.wal_cleanup_delay = wal_cleanup_delay;
+ gc.is_paused = wal_cleanup_delay > 0;
+ gc.delay_ref = 0;
+
+ if (gc.is_paused)
+ say_info("wal/engine cleanup is paused");
+
vclock_create(&gc.vclock);
rlist_create(&gc.checkpoints);
gc_tree_new(&gc.consumers);
@@ -238,6 +246,47 @@ static int
gc_cleanup_fiber_f(va_list ap)
{
(void)ap;
+
+ /*
+ * Stage 1 (optional): in case if we're booting
+ * up with cleanup disabled lets do wait in a
+ * separate cycle to minimize branching on stage 2.
+ */
+ if (gc.is_paused) {
+ ev_tstamp timeout = gc.wal_cleanup_delay;
+ while (!fiber_is_cancelled()) {
+ ev_tstamp clock_start = fiber_clock();
+ if (fiber_yield_timeout(timeout)) {
+ say_info("wal/engine cleanup is resumed "
+ "due to timeout expiration");
+ gc.is_paused = false;
+ gc.delay_ref = 0;
+ break;
+ }
+
+ /*
+ * If a last reference is dropped
+ * we can exit out early.
+ */
+ if (!gc.is_paused)
+ break;
+
+ ev_tstamp elapsed = fiber_clock() - clock_start;
+ timeout = gc.wal_cleanup_delay - elapsed;
+ if (timeout <= 0) {
+ say_info("wal/engine cleanup is resumed "
+ "due to reconfigured timeout "
+ "expiration");
+ gc.is_paused = false;
+ gc.delay_ref = 0;
+ break;
+ }
+ }
+ }
+
+ /*
+ * Stage 2: a regular cleanup cycle.
+ */
while (!fiber_is_cancelled()) {
int64_t delta = gc.cleanup_scheduled - gc.cleanup_completed;
if (delta == 0) {
@@ -253,6 +302,43 @@ gc_cleanup_fiber_f(va_list ap)
return 0;
}
+void
+gc_set_wal_cleanup_delay(double wal_cleanup_delay)
+{
+ gc.wal_cleanup_delay = wal_cleanup_delay;
+ /*
+ * This routine may be called at arbitrary
+ * moment thus we must be sure the cleanup
+ * fiber is paused to not wake up it when
+ * it is already in a regular cleanup stage.
+ */
+ if (gc.is_paused)
+ fiber_wakeup(gc.cleanup_fiber);
+}
+
+void
+gc_delay_ref(void)
+{
+ if (gc.is_paused) {
+ assert(gc.delay_ref >= 0);
+ gc.delay_ref++;
+ }
+}
+
+void
+gc_delay_unref(void)
+{
+ if (gc.is_paused) {
+ assert(gc.delay_ref > 0);
+ gc.delay_ref--;
+ if (gc.delay_ref == 0) {
+ say_info("wal/engine cleanup is resumed");
+ gc.is_paused = false;
+ fiber_wakeup(gc.cleanup_fiber);
+ }
+ }
+}
+
/**
* Trigger asynchronous garbage collection.
*/
@@ -462,11 +548,12 @@ gc_checkpoint(void)
* Wait for background garbage collection that might
* have been triggered by this checkpoint to complete.
* Strictly speaking, it isn't necessary, but it
- * simplifies testing as it guarantees that by the
- * time box.snapshot() returns, all outdated checkpoint
- * files have been removed.
+ * simplifies testing. Same time if GC is paused and
+ * waiting for old XLOGs to be read by replicas the
+ * cleanup won't happen immediately after the checkpoint.
*/
- gc_wait_cleanup();
+ if (!gc.is_paused)
+ gc_wait_cleanup();
return 0;
}
diff --git a/src/box/gc.h b/src/box/gc.h
index 2a568c5f9..aa8f2070a 100644
--- a/src/box/gc.h
+++ b/src/box/gc.h
@@ -147,6 +147,24 @@ struct gc_state {
* taken at that moment of time.
*/
int64_t cleanup_completed, cleanup_scheduled;
+ /**
+ * A counter to wait until all replicas are managed to
+ * subscribe so that we can enable cleanup fiber to
+ * remove old XLOGs. Otherwise some replicas might be
+ * far behind the master node and after the master
+ * node been restarted they will have to reread all
+ * data back due to XlogGapError, ie too early deleted
+ * XLOGs.
+ */
+ int64_t delay_ref;
+ /**
+ * Delay timeout in seconds.
+ */
+ double wal_cleanup_delay;
+ /**
+ * When set the cleanup fiber is paused.
+ */
+ bool is_paused;
/**
* Set if there's a fiber making a checkpoint right now.
*/
@@ -198,7 +216,7 @@ gc_last_checkpoint(void)
* Initialize the garbage collection state.
*/
void
-gc_init(void);
+gc_init(double wal_cleanup_delay);
/**
* Destroy the garbage collection state.
@@ -206,6 +224,24 @@ gc_init(void);
void
gc_free(void);
+/**
+ * Set a new delay value.
+ */
+void
+gc_set_wal_cleanup_delay(double wal_cleanup_delay);
+
+/**
+ * Increment a reference to delay counter.
+ */
+void
+gc_delay_ref(void);
+
+/**
+ * Decrement a reference from the delay counter.
+ */
+void
+gc_delay_unref(void);
+
/**
* Advance the garbage collector vclock to the given position.
* Deactivate WAL consumers that need older data.
diff --git a/src/box/lua/cfg.cc b/src/box/lua/cfg.cc
index b37a93ed8..1142e2726 100644
--- a/src/box/lua/cfg.cc
+++ b/src/box/lua/cfg.cc
@@ -172,6 +172,14 @@ lbox_cfg_set_wal_queue_max_size(struct lua_State *L)
return 0;
}
+static int
+lbox_cfg_set_wal_cleanup_delay(struct lua_State *L)
+{
+ if (box_set_wal_cleanup_delay() < 0)
+ luaT_error(L);
+ return 0;
+}
+
static int
lbox_cfg_set_read_only(struct lua_State *L)
{
@@ -408,6 +416,7 @@ box_lua_cfg_init(struct lua_State *L)
{"cfg_set_checkpoint_interval", lbox_cfg_set_checkpoint_interval},
{"cfg_set_checkpoint_wal_threshold", lbox_cfg_set_checkpoint_wal_threshold},
{"cfg_set_wal_queue_max_size", lbox_cfg_set_wal_queue_max_size},
+ {"cfg_set_wal_cleanup_delay", lbox_cfg_set_wal_cleanup_delay},
{"cfg_set_read_only", lbox_cfg_set_read_only},
{"cfg_set_memtx_memory", lbox_cfg_set_memtx_memory},
{"cfg_set_memtx_max_tuple_size", lbox_cfg_set_memtx_max_tuple_size},
diff --git a/src/box/lua/info.c b/src/box/lua/info.c
index c4c9fa0a0..fb39799a1 100644
--- a/src/box/lua/info.c
+++ b/src/box/lua/info.c
@@ -445,6 +445,10 @@ lbox_info_gc_call(struct lua_State *L)
lua_pushboolean(L, gc.checkpoint_is_in_progress);
lua_settable(L, -3);
+ lua_pushstring(L, "is_paused");
+ lua_pushboolean(L, gc.is_paused);
+ lua_settable(L, -3);
+
lua_pushstring(L, "checkpoints");
lua_newtable(L);
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 9967f992e..5ed8c12f0 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -73,6 +73,7 @@ local default_cfg = {
wal_max_size = 256 * 1024 * 1024,
wal_dir_rescan_delay= 2,
wal_queue_max_size = 16 * 1024 * 1024,
+ wal_cleanup_delay = 4 * 3600,
force_recovery = false,
replication = nil,
instance_uuid = nil,
@@ -155,6 +156,7 @@ local template_cfg = {
wal_mode = 'string',
wal_max_size = 'number',
wal_dir_rescan_delay= 'number',
+ wal_cleanup_delay = 'number',
force_recovery = 'boolean',
replication = 'string, number, table',
instance_uuid = 'string',
@@ -289,6 +291,7 @@ local dynamic_cfg = {
feedback_interval = ifdef_feedback_set_params,
-- do nothing, affects new replicas, which query this value on start
wal_dir_rescan_delay = function() end,
+ wal_cleanup_delay = private.cfg_set_wal_cleanup_delay,
custom_proc_title = function()
require('title').update(box.cfg.custom_proc_title)
end,
@@ -381,6 +384,7 @@ local dynamic_cfg_skip_at_load = {
replication_skip_conflict = true,
replication_anon = true,
wal_dir_rescan_delay = true,
+ wal_cleanup_delay = true,
custom_proc_title = true,
force_recovery = true,
instance_uuid = true,
diff --git a/src/box/relay.cc b/src/box/relay.cc
index 41f949e8e..6edee86bf 100644
--- a/src/box/relay.cc
+++ b/src/box/relay.cc
@@ -824,6 +824,7 @@ relay_subscribe(struct replica *replica, int fd, uint64_t sync,
tt_uuid_str(&replica->uuid));
if (replica->gc == NULL)
diag_raise();
+ gc_delay_unref();
}
relay_start(relay, fd, sync, relay_send_row);
diff --git a/src/box/replication.cc b/src/box/replication.cc
index 1fa8843e7..aefb812b3 100644
--- a/src/box/replication.cc
+++ b/src/box/replication.cc
@@ -250,6 +250,7 @@ replica_set_id(struct replica *replica, uint32_t replica_id)
tt_uuid_str(&replica->uuid));
}
replicaset.replica_by_id[replica_id] = replica;
+ gc_delay_ref();
++replicaset.registered_count;
say_info("assigned id %d to replica %s",
replica->id, tt_uuid_str(&replica->uuid));
@@ -273,6 +274,7 @@ replica_clear_id(struct replica *replica)
replicaset.replica_by_id[replica->id] = NULL;
assert(replicaset.registered_count > 0);
--replicaset.registered_count;
+ gc_delay_unref();
if (replica->id == instance_id) {
/* See replica_check_id(). */
assert(replicaset.is_joining);
diff --git a/test/app-tap/init_script.result b/test/app-tap/init_script.result
index 74cdb3efb..1fdd9a227 100644
--- a/test/app-tap/init_script.result
+++ b/test/app-tap/init_script.result
@@ -53,6 +53,7 @@ vinyl_run_count_per_level:2
vinyl_run_size_ratio:3.5
vinyl_timeout:60
vinyl_write_threads:4
+wal_cleanup_delay:14400
wal_dir:.
wal_dir_rescan_delay:2
wal_max_size:268435456
diff --git a/test/box/admin.result b/test/box/admin.result
index b619ae6cb..f8e8808e3 100644
--- a/test/box/admin.result
+++ b/test/box/admin.result
@@ -127,6 +127,8 @@ cfg_filter(box.cfg)
- 60
- - vinyl_write_threads
- 4
+ - - wal_cleanup_delay
+ - 14400
- - wal_dir
- <hidden>
- - wal_dir_rescan_delay
diff --git a/test/box/cfg.result b/test/box/cfg.result
index 2c3d5a981..693c1b521 100644
--- a/test/box/cfg.result
+++ b/test/box/cfg.result
@@ -115,6 +115,8 @@ cfg_filter(box.cfg)
| - 60
| - - vinyl_write_threads
| - 4
+ | - - wal_cleanup_delay
+ | - 14400
| - - wal_dir
| - <hidden>
| - - wal_dir_rescan_delay
@@ -234,6 +236,8 @@ cfg_filter(box.cfg)
| - 60
| - - vinyl_write_threads
| - 4
+ | - - wal_cleanup_delay
+ | - 14400
| - - wal_dir
| - <hidden>
| - - wal_dir_rescan_delay
diff --git a/test/unit/snap_quorum_delay.cc b/test/unit/snap_quorum_delay.cc
index 803bbbea8..c4edbcbcb 100644
--- a/test/unit/snap_quorum_delay.cc
+++ b/test/unit/snap_quorum_delay.cc
@@ -236,7 +236,7 @@ main(void)
{
memory_init();
fiber_init(fiber_c_invoke);
- gc_init();
+ gc_init(0);
txn_limbo_init();
instance_id = 1;
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option
2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
@ 2021-03-20 13:15 ` Cyrill Gorcunov via Tarantool-patches
2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field Cyrill Gorcunov via Tarantool-patches
2 siblings, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-20 13:15 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson
Part-of #5806
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
test/replication/gh-5806-master.lua | 8 +
test/replication/gh-5806-slave.lua | 8 +
test/replication/gh-5806-xlog-cleanup.result | 397 ++++++++++++++++++
.../replication/gh-5806-xlog-cleanup.test.lua | 166 ++++++++
4 files changed, 579 insertions(+)
create mode 100644 test/replication/gh-5806-master.lua
create mode 100644 test/replication/gh-5806-slave.lua
create mode 100644 test/replication/gh-5806-xlog-cleanup.result
create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua
diff --git a/test/replication/gh-5806-master.lua b/test/replication/gh-5806-master.lua
new file mode 100644
index 000000000..bc15dab67
--- /dev/null
+++ b/test/replication/gh-5806-master.lua
@@ -0,0 +1,8 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+ listen = os.getenv("LISTEN"),
+ wal_cleanup_delay = tonumber(arg[1]) or 0,
+})
diff --git a/test/replication/gh-5806-slave.lua b/test/replication/gh-5806-slave.lua
new file mode 100644
index 000000000..3abb3e035
--- /dev/null
+++ b/test/replication/gh-5806-slave.lua
@@ -0,0 +1,8 @@
+#!/usr/bin/env tarantool
+
+require('console').listen(os.getenv('ADMIN'))
+
+box.cfg({
+ listen = os.getenv("LISTEN"),
+ replication = os.getenv("MASTER"),
+})
diff --git a/test/replication/gh-5806-xlog-cleanup.result b/test/replication/gh-5806-xlog-cleanup.result
new file mode 100644
index 000000000..26e7519fc
--- /dev/null
+++ b/test/replication/gh-5806-xlog-cleanup.result
@@ -0,0 +1,397 @@
+-- test-run result file version 2
+--
+-- gh-5806: defer xlog cleanup to keep xlogs until
+-- replicas present in "_cluster" are connected.
+-- Otherwise we are getting XlogGapError since
+-- master might go far forwad from replica and
+-- replica won't be able to connect without full
+-- rebootstrap.
+--
+
+fiber = require('fiber')
+ | ---
+ | ...
+test_run = require('test_run').new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+
+--
+-- Case 1.
+--
+-- First lets make sure we're getting XlogGapError in
+-- case if wal_cleanup_delay is not used.
+--
+
+test_run:cmd('create server master with script="replication/gh-5806-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+--
+-- Keep small number of snaps to force cleanup
+-- procedure be more intensive.
+box.cfg{checkpoint_count = 1}
+ | ---
+ | ...
+
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+s = box.schema.space.create('test', {engine = engine})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica with rpl_master=master,\
+ script="replication/gh-5806-slave.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+--
+-- On replica we create an own space which allows us to
+-- use more complex scenario and disableds replica from
+-- automatic rejoin. Otherwise XlogGapError won't happen.
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg{checkpoint_count = 1}
+ | ---
+ | ...
+s = box.schema.space.create('testtemp', {temporary = true})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+box.space.testtemp:insert({1})
+ | ---
+ | - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.testtemp:insert({2})
+ | ---
+ | - [2]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+--
+-- Stop the replica node and generate
+-- first range of xlogs on the master.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.space.test:insert({1})
+ | ---
+ | - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.test:insert({2})
+ | ---
+ | - [2]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+--
+-- Restart the master and generate the
+-- next range of xlogs.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('restart server master with wait_load=True')
+ | ---
+ | - true
+ | ...
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.space.test:insert({3})
+ | ---
+ | - [3]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.test:insert({4})
+ | ---
+ | - [4]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+--
+-- Start replica and wait for error.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=False, wait_load=False')
+ | ---
+ | - true
+ | ...
+
+--
+-- Wait error to appear.
+test_run:wait_log('master', 'XlogGapError', 1024, 0.1) ~= nil
+ | ---
+ | - true
+ | ...
+
+--
+-- Cleanup.
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
+
+--
+-- Case 2.
+--
+-- Lets make sure we're not getting XlogGapError in
+-- case if wal_cleanup_delay is used.
+--
+
+test_run:cmd('create server master with script="replication/gh-5806-master.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server master with args="3600", wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.schema.user.grant('guest', 'replication')
+ | ---
+ | ...
+
+--
+-- Keep small number of snaps to force cleanup
+-- procedure be more intensive.
+box.cfg{checkpoint_count = 1}
+ | ---
+ | ...
+
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+s = box.schema.space.create('test', {engine = engine})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('create server replica with rpl_master=master,\
+ script="replication/gh-5806-slave.lua"')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('replica')
+ | ---
+ | - true
+ | ...
+box.cfg{checkpoint_count = 1}
+ | ---
+ | ...
+s = box.schema.space.create('testtemp', {temporary = true})
+ | ---
+ | ...
+_ = s:create_index('pk')
+ | ---
+ | ...
+box.space.testtemp:insert({1})
+ | ---
+ | - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.testtemp:insert({2})
+ | ---
+ | - [2]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+--
+-- Stop the replica node and generate
+-- first range of xlogs on the master.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.space.test:insert({1})
+ | ---
+ | - [1]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.test:insert({2})
+ | ---
+ | - [2]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+--
+-- Restart the master and generate the
+-- next range of xlogs.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('restart server master with args="3600", wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+test_run:switch('master')
+ | ---
+ | - true
+ | ...
+box.space.test:insert({3})
+ | ---
+ | - [3]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+box.space.test:insert({4})
+ | ---
+ | - [4]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+--
+-- Restart master node and the replica then.
+test_run:switch('default')
+ | ---
+ | - true
+ | ...
+test_run:cmd('start server replica with wait=True, wait_load=True')
+ | ---
+ | - true
+ | ...
+
+--
+-- Make sure no error happened.
+assert(test_run:grep_log("master", "XlogGapError") == nil)
+ | ---
+ | - true
+ | ...
+
+--
+-- Cleanup.
+test_run:cmd('stop server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('stop server replica')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server master')
+ | ---
+ | - true
+ | ...
+test_run:cmd('delete server replica')
+ | ---
+ | - true
+ | ...
diff --git a/test/replication/gh-5806-xlog-cleanup.test.lua b/test/replication/gh-5806-xlog-cleanup.test.lua
new file mode 100644
index 000000000..2ff5a4f3a
--- /dev/null
+++ b/test/replication/gh-5806-xlog-cleanup.test.lua
@@ -0,0 +1,166 @@
+--
+-- gh-5806: defer xlog cleanup to keep xlogs until
+-- replicas present in "_cluster" are connected.
+-- Otherwise we are getting XlogGapError since
+-- master might go far forwad from replica and
+-- replica won't be able to connect without full
+-- rebootstrap.
+--
+
+fiber = require('fiber')
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+
+--
+-- Case 1.
+--
+-- First lets make sure we're getting XlogGapError in
+-- case if wal_cleanup_delay is not used.
+--
+
+test_run:cmd('create server master with script="replication/gh-5806-master.lua"')
+test_run:cmd('start server master with wait=True, wait_load=True')
+
+test_run:switch('master')
+box.schema.user.grant('guest', 'replication')
+
+--
+-- Keep small number of snaps to force cleanup
+-- procedure be more intensive.
+box.cfg{checkpoint_count = 1}
+
+engine = test_run:get_cfg('engine')
+s = box.schema.space.create('test', {engine = engine})
+_ = s:create_index('pk')
+
+test_run:switch('default')
+test_run:cmd('create server replica with rpl_master=master,\
+ script="replication/gh-5806-slave.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+--
+-- On replica we create an own space which allows us to
+-- use more complex scenario and disableds replica from
+-- automatic rejoin. Otherwise XlogGapError won't happen.
+test_run:switch('replica')
+box.cfg{checkpoint_count = 1}
+s = box.schema.space.create('testtemp', {temporary = true})
+_ = s:create_index('pk')
+box.space.testtemp:insert({1})
+box.snapshot()
+box.space.testtemp:insert({2})
+box.snapshot()
+
+--
+-- Stop the replica node and generate
+-- first range of xlogs on the master.
+test_run:switch('default')
+test_run:cmd('stop server replica')
+
+test_run:switch('master')
+box.space.test:insert({1})
+box.snapshot()
+box.space.test:insert({2})
+box.snapshot()
+
+--
+-- Restart the master and generate the
+-- next range of xlogs.
+test_run:switch('default')
+test_run:cmd('restart server master with wait_load=True')
+test_run:switch('master')
+box.space.test:insert({3})
+box.snapshot()
+box.space.test:insert({4})
+box.snapshot()
+
+--
+-- Start replica and wait for error.
+test_run:switch('default')
+test_run:cmd('start server replica with wait=False, wait_load=False')
+
+--
+-- Wait error to appear.
+test_run:wait_log('master', 'XlogGapError', 1024, 0.1) ~= nil
+
+--
+-- Cleanup.
+test_run:cmd('stop server master')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server master')
+test_run:cmd('delete server replica')
+
+--
+-- Case 2.
+--
+-- Lets make sure we're not getting XlogGapError in
+-- case if wal_cleanup_delay is used.
+--
+
+test_run:cmd('create server master with script="replication/gh-5806-master.lua"')
+test_run:cmd('start server master with args="3600", wait=True, wait_load=True')
+
+test_run:switch('master')
+box.schema.user.grant('guest', 'replication')
+
+--
+-- Keep small number of snaps to force cleanup
+-- procedure be more intensive.
+box.cfg{checkpoint_count = 1}
+
+engine = test_run:get_cfg('engine')
+s = box.schema.space.create('test', {engine = engine})
+_ = s:create_index('pk')
+
+test_run:switch('default')
+test_run:cmd('create server replica with rpl_master=master,\
+ script="replication/gh-5806-slave.lua"')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+test_run:switch('replica')
+box.cfg{checkpoint_count = 1}
+s = box.schema.space.create('testtemp', {temporary = true})
+_ = s:create_index('pk')
+box.space.testtemp:insert({1})
+box.snapshot()
+box.space.testtemp:insert({2})
+box.snapshot()
+
+--
+-- Stop the replica node and generate
+-- first range of xlogs on the master.
+test_run:switch('default')
+test_run:cmd('stop server replica')
+
+test_run:switch('master')
+box.space.test:insert({1})
+box.snapshot()
+box.space.test:insert({2})
+box.snapshot()
+
+--
+-- Restart the master and generate the
+-- next range of xlogs.
+test_run:switch('default')
+test_run:cmd('restart server master with args="3600", wait=True, wait_load=True')
+test_run:switch('master')
+box.space.test:insert({3})
+box.snapshot()
+box.space.test:insert({4})
+box.snapshot()
+
+--
+-- Restart master node and the replica then.
+test_run:switch('default')
+test_run:cmd('start server replica with wait=True, wait_load=True')
+
+--
+-- Make sure no error happened.
+assert(test_run:grep_log("master", "XlogGapError") == nil)
+
+--
+-- Cleanup.
+test_run:cmd('stop server master')
+test_run:cmd('stop server replica')
+test_run:cmd('delete server master')
+test_run:cmd('delete server replica')
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field
2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches
@ 2021-03-20 13:15 ` Cyrill Gorcunov via Tarantool-patches
2 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-20 13:15 UTC (permalink / raw)
To: tml; +Cc: Vladislav Shpilevoy, Mons Anderson
Once simple bootstrap is complete and there is no
replicas used we should run with gc unpaused.
Part-of #5806
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
test/box-tap/gc.test.lua | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/test/box-tap/gc.test.lua b/test/box-tap/gc.test.lua
index f0155779c..ced87547e 100755
--- a/test/box-tap/gc.test.lua
+++ b/test/box-tap/gc.test.lua
@@ -8,11 +8,12 @@ local debug = type(box.error.injection) == "table"
-- check box.info.gc() is false if snapshot is not in progress
local test = tap.test('box.info.gc')
-test:plan(1 + (debug and 1 or 0))
+test:plan(2 + (debug and 1 or 0))
local gc = box.info.gc()
test:is(gc.checkpoint_is_in_progress, false, "checkpoint is not in progress")
+test:is(gc.is_paused, false, "GC is not paused")
-- check box.info.gc() is true if snapshot is in progress
--
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
@ 2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches
2021-03-22 8:30 ` Cyrill Gorcunov via Tarantool-patches
2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
1 sibling, 1 reply; 11+ messages in thread
From: Konstantin Osipov via Tarantool-patches @ 2021-03-22 8:07 UTC (permalink / raw)
To: Cyrill Gorcunov; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
* Cyrill Gorcunov <gorcunov@gmail.com> [21/03/22 10:02]:
> Since replicas are tracked via `_cluster` system space this we use
> its content to count subscribed replicas and when all of them are
> up and running the cleanup procedure is automatically enabled even
> if `wal_cleanup_delay` is not expired.
>
> The `wal_cleanup_delay` should be set to:
>
> - `-1` to wait infinitely until all existing replicas are subscribed;
> - `0` to disable the cleanup delay;
> - `>0` to wait for specified number of seconds.
wal_gc_delay maybe?
--
Konstantin Osipov, Moscow, Russia
https://scylladb.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches
@ 2021-03-22 8:30 ` Cyrill Gorcunov via Tarantool-patches
0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-22 8:30 UTC (permalink / raw)
To: Konstantin Osipov; +Cc: Mons Anderson, tml, Vladislav Shpilevoy
On Mon, Mar 22, 2021 at 11:07:46AM +0300, Konstantin Osipov wrote:
> >
> > The `wal_cleanup_delay` should be set to:
> >
> > - `-1` to wait infinitely until all existing replicas are subscribed;
> > - `0` to disable the cleanup delay;
> > - `>0` to wait for specified number of seconds.
>
> wal_gc_delay maybe?
I like this name even more than wal_cleanup_delay, thanks! If noone
object I will rename. Lets wait for other people replies.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches
@ 2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-23 7:28 ` Cyrill Gorcunov via Tarantool-patches
1 sibling, 1 reply; 11+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-22 21:40 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson
Hi! Thanks for working on this!
See 8 comments below.
> Current state of `*.xlog` garbage collector can be found in
> `box.info.gc()` output. For example
>
> ``` Lua
> tarantool> box.info.gc()
> ---
> ...
> cleanup_is_paused: false
> ```
1. Isn't it 'is_paused' instead of 'cleanup_is_paused'?
> The `cleanup_is_paused` shows if cleanup fiber is paused or not.
> ---
> diff --git a/src/box/box.cc b/src/box/box.cc
> index cc59564e1..b9fd7af00 100644
> --- a/src/box/box.cc
> +++ b/src/box/box.cc
> @@ -771,6 +771,29 @@ box_check_wal_queue_max_size(void)
> return size;
> }
>
> +static double
> +box_check_wal_cleanup_delay(void)
> +{
> + double value = cfg_geti("wal_cleanup_delay");
2. cfg_geti() returns an integer. Please, use cfg_getd().
> + if (value < 0) {
> + diag_set(ClientError, ER_CFG, "wal_cleanup_delay",
> + "the value must be >= 0");
> + return -1;
> + }
> +
> + /*
> + * Anonymous replicas do not require
> + * delay the cleanup procedure since they
> + * are read only.
> + */
> + if (cfg_geti("replication_anon") != 0) {
> + if (value != 0)
> + value = 0;
3. Still it makes sense to check that if the option is set,
it is valid. Regardless of what is the anon value.
I also think the function should return the delay exactly as
it is set, not corrected by the anon value. Because these
box_check functions are not for configuring. They are for
checking a single option.
Also you should use 'replication_anon' global variable instead
of the config, which might be not installed at this moment yet.
> + }
> +
> + return value;
> +}
> @@ -1465,6 +1490,16 @@ box_set_wal_queue_max_size(void)
> return 0;
> }
>
> +int
> +box_set_wal_cleanup_delay(void)
> +{
> + double delay = box_check_wal_cleanup_delay();
> + if (delay < 0)
> + return -1;
4. Here you could do the correction. Look at the option
value and at 'replication_anon' global variable. Not at
the anon config.
> + gc_set_wal_cleanup_delay(delay);
> + return 0;
> +}
> +
> void
> box_set_vinyl_memory(void)
> {
> @@ -3000,7 +3035,7 @@ box_cfg_xc(void)
> rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX);
> rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST);
>
> - gc_init();
> + gc_init(box_check_wal_cleanup_delay());
5. Here is the problem: you rely on replication_anon setting, but it is installed
below, after GC is initialized.
Perhaps you could set the delay to infinite and after the non-dynamic options
are done, box_set_wal_cleanup_delay() would be called by load_cfg.lua and
would lower the delay to 0 or whatever is configured.
> engine_init();
> schema_init();
> replication_init();
> diff --git a/src/box/gc.c b/src/box/gc.c
> index 9af4ef958..5418fd31d 100644
> --- a/src/box/gc.c
> +++ b/src/box/gc.c
> @@ -46,6 +46,7 @@
> #include <small/rlist.h>
> #include <tarantool_ev.h>
>
> +#include "cfg.h"
6. Not necessary anymore.
> #include "diag.h"
> #include "errcode.h"
> #include "fiber.h"
> @@ -238,6 +246,47 @@ static int
> gc_cleanup_fiber_f(va_list ap)
> {
> (void)ap;
> +
> + /*
> + * Stage 1 (optional): in case if we're booting
> + * up with cleanup disabled lets do wait in a
> + * separate cycle to minimize branching on stage 2.
> + */
> + if (gc.is_paused) {
> + ev_tstamp timeout = gc.wal_cleanup_delay;
7. Please, use double for timestamps and timeouts. Because it
is used in almost all the other code working with time.
> + while (!fiber_is_cancelled()) {
> + ev_tstamp clock_start = fiber_clock();
> + if (fiber_yield_timeout(timeout)) {
> + say_info("wal/engine cleanup is resumed "
> + "due to timeout expiration");
> + gc.is_paused = false;
> + gc.delay_ref = 0;
> + break;
> + }
> +
> + /*
> + * If a last reference is dropped
> + * we can exit out early.
> + */
> + if (!gc.is_paused)
> + break;
> +
> + ev_tstamp elapsed = fiber_clock() - clock_start;
> + timeout = gc.wal_cleanup_delay - elapsed;
8. In the previous review I said about remembering a timestamp not
accidentally. But because otherwise I can make this loop roll
forever if I update wal_cleanup_delay with some period < timeout.
For instance, it was 5. You remember clock_start. Then 2.5 secs
pass and I update it to 4.99999. elapsed is 2.5 secs. Timeout is
4.99999 - 2.5 = ~2.5. It starts again. I update the timeout to
4.99998 after another 2.5 secs. Elapsed is 2.5 secs again, and the
next timeout is about 2.5 secs. And so on.
In your patch it will happen even if I increase the timeout, not
only decrease. The fluctuations might be accidental in case I
calculate it from something in Lua, get precision loss, and call
box.cfg{} with a certain period until the instance accepts
requests.
This might be fixed if clock_start is moved out of the loop.
> + if (timeout <= 0) {
> + say_info("wal/engine cleanup is resumed "
> + "due to reconfigured timeout "
> + "expiration");
> + gc.is_paused = false;
> + gc.delay_ref = 0;
> + break;
> + }
> + }
> + }
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches
@ 2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
0 siblings, 0 replies; 11+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-22 21:40 UTC (permalink / raw)
To: Cyrill Gorcunov, tml; +Cc: Mons Anderson
I appreciate the work you did here!
See 4 comments below.
> diff --git a/test/replication/gh-5806-xlog-cleanup.result b/test/replication/gh-5806-xlog-cleanup.result
> new file mode 100644
> index 000000000..26e7519fc
> --- /dev/null
> +++ b/test/replication/gh-5806-xlog-cleanup.result
> @@ -0,0 +1,397 @@
> +--
> +-- Keep small number of snaps to force cleanup
> +-- procedure be more intensive.
> +box.cfg{checkpoint_count = 1}
> + | ---
> + | ...
> +
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +s = box.schema.space.create('test', {engine = engine})
> + | ---
> + | ...
> +_ = s:create_index('pk')
> + | ---
> + | ...
> +
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('create server replica with rpl_master=master,\
> + script="replication/gh-5806-slave.lua"')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('start server replica with wait=True, wait_load=True')
> + | ---
> + | - true
> + | ...
> +
> +--
> +-- On replica we create an own space which allows us to
> +-- use more complex scenario and disableds replica from
1. disableds -> disables.
2. Please, be more specific. Why does it prevent from automatic
rejoin? Because it would lead to loss of the replica's own
data.
> +-- automatic rejoin. Otherwise XlogGapError won't happen.
> +test_run:switch('replica')
> + | ---
> + | - true
> + | ...
> +box.cfg{checkpoint_count = 1}
> + | ---
> + | ...
> +s = box.schema.space.create('testtemp', {temporary = true})
3. Why do you need it to be temporary = true?
> + | ---
> + | ...
> +_ = s:create_index('pk')
> + | ---
> + | ...
> +box.space.testtemp:insert({1})
> + | ---
> + | - [1]
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +box.space.testtemp:insert({2})
> + | ---
> + | - [2]
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +
> +--
> +-- Stop the replica node and generate
> +-- first range of xlogs on the master.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
> +test_run:cmd('stop server replica')
> + | ---
> + | - true
> + | ...
> +
> +test_run:switch('master')
> + | ---
> + | - true
> + | ...
> +box.space.test:insert({1})
> + | ---
> + | - [1]
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +box.space.test:insert({2})
> + | ---
> + | - [2]
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +
> +--
> +-- Restart the master and generate the
> +-- next range of xlogs.
4. I think I already asked, but why do you need to restart the
master and generate a second pair of snapshots? Why isn't it enough
to have 1 pair created before the restart? Whatever is the answer,
it must be in the comments.
> +test_run:switch('default')
> + | ---
> + | - true
> + | ...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-03-23 7:28 ` Cyrill Gorcunov via Tarantool-patches
2021-03-23 11:25 ` Cyrill Gorcunov via Tarantool-patches
0 siblings, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-23 7:28 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: tml, Mons Anderson
On Mon, Mar 22, 2021 at 10:40:34PM +0100, Vladislav Shpilevoy wrote:
> Hi! Thanks for working on this!
>
> See 8 comments below.
>
> > Current state of `*.xlog` garbage collector can be found in
> > `box.info.gc()` output. For example
> >
> > ``` Lua
> > tarantool> box.info.gc()
> > ---
> > ...
> > cleanup_is_paused: false
> > ```
>
> 1. Isn't it 'is_paused' instead of 'cleanup_is_paused'?
Yeah, sorry. Thanks!
> >
> > +static double
> > +box_check_wal_cleanup_delay(void)
> > +{
> > + double value = cfg_geti("wal_cleanup_delay");
>
> 2. cfg_geti() returns an integer. Please, use cfg_getd().
I did it on a purpose: we accepts time in whole seconds,
not milliseconds and etc. But sure, no problem will change
it to cfg_getd.
> > +
> > + /*
> > + * Anonymous replicas do not require
> > + * delay the cleanup procedure since they
> > + * are read only.
> > + */
> > + if (cfg_geti("replication_anon") != 0) {
> > + if (value != 0)
> > + value = 0;
>
> 3. Still it makes sense to check that if the option is set,
> it is valid. Regardless of what is the anon value.
OK
>
> I also think the function should return the delay exactly as
> it is set, not corrected by the anon value. Because these
> box_check functions are not for configuring. They are for
> checking a single option.
OK
>
> Also you should use 'replication_anon' global variable instead
> of the config, which might be not installed at this moment yet.
What would happen if one setup both 'wal_cleanup_delay' and
'replication_anon' in config at once. Which C's replication_anon
value I will be reading? The C's replication_anon is set after
the cfg procedure complete, so since I operate on values obrained
from Lua level I need to use cfg_geti("replication_anon") because
at this moment only Lua level is consistent and replication_anon
may have a value from previous box.cfg call.
> >
> > +int
> > +box_set_wal_cleanup_delay(void)
> > +{
> > + double delay = box_check_wal_cleanup_delay();
> > + if (delay < 0)
> > + return -1;
>
> 4. Here you could do the correction. Look at the option
> value and at 'replication_anon' global variable. Not at
> the anon config.
Need to think about this moment, thanks!
>
> > + gc_set_wal_cleanup_delay(delay);
> > + return 0;
> > +}
> > +
> > void
> > box_set_vinyl_memory(void)
> > {
> > @@ -3000,7 +3035,7 @@ box_cfg_xc(void)
> > rmean_box = rmean_new(iproto_type_strs, IPROTO_TYPE_STAT_MAX);
> > rmean_error = rmean_new(rmean_error_strings, RMEAN_ERROR_LAST);
> >
> > - gc_init();
> > + gc_init(box_check_wal_cleanup_delay());
>
> 5. Here is the problem: you rely on replication_anon setting, but it is installed
> below, after GC is initialized.
Exactly, that's why I use raw cfg_geti("replication_anon") instead of
global replication_anon variable.
> Perhaps you could set the delay to infinite and after the non-dynamic options
> are done, box_set_wal_cleanup_delay() would be called by load_cfg.lua and
> would lower the delay to 0 or whatever is configured.
>
> > engine_init();
> > schema_init();
> > replication_init();
> > diff --git a/src/box/gc.c b/src/box/gc.c
> > index 9af4ef958..5418fd31d 100644
> > --- a/src/box/gc.c
> > +++ b/src/box/gc.c
> > @@ -46,6 +46,7 @@
> > #include <small/rlist.h>
> > #include <tarantool_ev.h>
> >
> > +#include "cfg.h"
>
> 6. Not necessary anymore.
+1
> > gc_cleanup_fiber_f(va_list ap)
> > {
> > (void)ap;
> > +
> > + /*
> > + * Stage 1 (optional): in case if we're booting
> > + * up with cleanup disabled lets do wait in a
> > + * separate cycle to minimize branching on stage 2.
> > + */
> > + if (gc.is_paused) {
> > + ev_tstamp timeout = gc.wal_cleanup_delay;
>
> 7. Please, use double for timestamps and timeouts. Because it
> is used in almost all the other code working with time.
Sure
> > + while (!fiber_is_cancelled()) {
> > + ev_tstamp clock_start = fiber_clock();
> > + if (fiber_yield_timeout(timeout)) {
> > + say_info("wal/engine cleanup is resumed "
> > + "due to timeout expiration");
> > + gc.is_paused = false;
> > + gc.delay_ref = 0;
> > + break;
> > + }
> > +
> > + /*
> > + * If a last reference is dropped
> > + * we can exit out early.
> > + */
> > + if (!gc.is_paused)
> > + break;
> > +
> > + ev_tstamp elapsed = fiber_clock() - clock_start;
> > + timeout = gc.wal_cleanup_delay - elapsed;
>
> 8. In the previous review I said about remembering a timestamp not
> accidentally. But because otherwise I can make this loop roll
> forever if I update wal_cleanup_delay with some period < timeout.
>
> For instance, it was 5. You remember clock_start. Then 2.5 secs
> pass and I update it to 4.99999. elapsed is 2.5 secs. Timeout is
> 4.99999 - 2.5 = ~2.5. It starts again. I update the timeout to
> 4.99998 after another 2.5 secs. Elapsed is 2.5 secs again, and the
> next timeout is about 2.5 secs. And so on.
>
> In your patch it will happen even if I increase the timeout, not
> only decrease. The fluctuations might be accidental in case I
> calculate it from something in Lua, get precision loss, and call
> box.cfg{} with a certain period until the instance accepts
> requests.
>
> This might be fixed if clock_start is moved out of the loop.
What you're talking about is fp rouding errors and you're to
work really hard to enter into a such loop. But I see your point
and will update. Btw the use of FP value for seconds is a big
mistake in architecure, not only here but all over the code
and I think we should get rid of this very strange approach
(and I must confess I don't understand *why* on earth the
FP is used for delays in ev library).
Cyrill
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-23 7:28 ` Cyrill Gorcunov via Tarantool-patches
@ 2021-03-23 11:25 ` Cyrill Gorcunov via Tarantool-patches
2021-03-23 12:43 ` Cyrill Gorcunov via Tarantool-patches
0 siblings, 1 reply; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-23 11:25 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml
On Tue, Mar 23, 2021 at 10:28:46AM +0300, Cyrill Gorcunov wrote:
>
> >
> > Also you should use 'replication_anon' global variable instead
> > of the config, which might be not installed at this moment yet.
>
> What would happen if one setup both 'wal_cleanup_delay' and
> 'replication_anon' in config at once. Which C's replication_anon
> value I will be reading? The C's replication_anon is set after
> the cfg procedure complete, so since I operate on values obrained
> from Lua level I need to use cfg_geti("replication_anon") because
> at this moment only Lua level is consistent and replication_anon
> may have a value from previous box.cfg call.
Vlad, here are some moments I don't understand.
1) gc_init is called from box_cfg_xc, and here is a call chain
box.cfg{}
box.cfg = locked(load_cfg)
pcall(private.cfg_check)
-- goes into C layer --
lbox_cfg_check
box_check_config
...
box_check_replication
if (box_check_wal_cleanup_delay() < 0)
diag_raise();
here either all values are sane and
passes the tests, or configuration
stage aborts
-- back to Lua level --
lbox_cfg_load
-- jusmp to C again --
box_cfg
box_cfg_xc
...
gc_init(box_check_wal_cleanup_delay());
here the box_check_wal_cleanup_delay won't fail because
it is already verified but we can't be sure that any
dependent variable (as global replication_anon) been
already set or not. And we should not bound to order
of box_check_wal_cleanup_delay() invocation relatively
to some other helpers.
Thus I think we should use cfg_geti("replication_anon") inside
box_check_wal_cleanup_delay code instead because only the
level cfg_x() is consistent here.
On reconfiguration stage the scheme is similar we must
not access C's global replication_anon variable which
is rather a cached value from cfg_geti("replication_anon").
Am I convinced you? I mean current code with access to
cfg_geti("replication_anon") looks more preferred.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Tarantool-patches] [PATCH v2 1/3] gc/xlog: delay xlog cleanup until relays are subscribed
2021-03-23 11:25 ` Cyrill Gorcunov via Tarantool-patches
@ 2021-03-23 12:43 ` Cyrill Gorcunov via Tarantool-patches
0 siblings, 0 replies; 11+ messages in thread
From: Cyrill Gorcunov via Tarantool-patches @ 2021-03-23 12:43 UTC (permalink / raw)
To: Vladislav Shpilevoy; +Cc: Mons Anderson, tml
On Tue, Mar 23, 2021 at 02:25:26PM +0300, Cyrill Gorcunov wrote:
> On Tue, Mar 23, 2021 at 10:28:46AM +0300, Cyrill Gorcunov wrote:
> >
> > >
> > > Also you should use 'replication_anon' global variable instead
> > > of the config, which might be not installed at this moment yet.
> >
> > What would happen if one setup both 'wal_cleanup_delay' and
> > 'replication_anon' in config at once. Which C's replication_anon
> > value I will be reading? The C's replication_anon is set after
> > the cfg procedure complete, so since I operate on values obrained
> > from Lua level I need to use cfg_geti("replication_anon") because
> > at this moment only Lua level is consistent and replication_anon
> > may have a value from previous box.cfg call.
Here is a diff, take a look please.
---
diff --git a/src/box/box.cc b/src/box/box.cc
index b9fd7af00..81e001680 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -774,23 +774,13 @@ box_check_wal_queue_max_size(void)
static double
box_check_wal_cleanup_delay(void)
{
- double value = cfg_geti("wal_cleanup_delay");
+ double value = cfg_getd("wal_cleanup_delay");
if (value < 0) {
diag_set(ClientError, ER_CFG, "wal_cleanup_delay",
"the value must be >= 0");
return -1;
}
- /*
- * Anonymous replicas do not require
- * delay the cleanup procedure since they
- * are read only.
- */
- if (cfg_geti("replication_anon") != 0) {
- if (value != 0)
- value = 0;
- }
-
return value;
}
@@ -1496,6 +1486,13 @@ box_set_wal_cleanup_delay(void)
double delay = box_check_wal_cleanup_delay();
if (delay < 0)
return -1;
+ /*
+ * Anonymous replicas do not require
+ * delay the cleanup procedure since they
+ * are read only.
+ */
+ if (replication_anon)
+ delay = 0;
gc_set_wal_cleanup_delay(delay);
return 0;
}
diff --git a/src/box/gc.c b/src/box/gc.c
index 5418fd31d..e1d7a1187 100644
--- a/src/box/gc.c
+++ b/src/box/gc.c
@@ -46,7 +46,6 @@
#include <small/rlist.h>
#include <tarantool_ev.h>
-#include "cfg.h"
#include "diag.h"
#include "errcode.h"
#include "fiber.h"
@@ -253,10 +252,13 @@ gc_cleanup_fiber_f(va_list ap)
* separate cycle to minimize branching on stage 2.
*/
if (gc.is_paused) {
- ev_tstamp timeout = gc.wal_cleanup_delay;
+ double start_time = fiber_clock();
while (!fiber_is_cancelled()) {
- ev_tstamp clock_start = fiber_clock();
- if (fiber_yield_timeout(timeout)) {
+ double deadline = start_time + gc.wal_cleanup_delay;
+ double timeout = gc.wal_cleanup_delay;
+
+ if (fiber_clock() >= deadline ||
+ fiber_yield_timeout(timeout)) {
say_info("wal/engine cleanup is resumed "
"due to timeout expiration");
gc.is_paused = false;
@@ -270,17 +272,6 @@ gc_cleanup_fiber_f(va_list ap)
*/
if (!gc.is_paused)
break;
-
- ev_tstamp elapsed = fiber_clock() - clock_start;
- timeout = gc.wal_cleanup_delay - elapsed;
- if (timeout <= 0) {
- say_info("wal/engine cleanup is resumed "
- "due to reconfigured timeout "
- "expiration");
- gc.is_paused = false;
- gc.delay_ref = 0;
- break;
- }
}
}
diff --git a/src/box/lua/load_cfg.lua b/src/box/lua/load_cfg.lua
index 5ed8c12f0..44bb95ed1 100644
--- a/src/box/lua/load_cfg.lua
+++ b/src/box/lua/load_cfg.lua
@@ -352,6 +352,8 @@ local dynamic_cfg_order = {
-- the new one. This should be fixed when box.cfg is able to
-- apply some parameters together and atomically.
replication_anon = 250,
+ -- Cleanup delay should be ignored if replication_anon is set.
+ wal_cleanup_delay = 260,
election_mode = 300,
election_timeout = 320,
}
@@ -384,7 +386,6 @@ local dynamic_cfg_skip_at_load = {
replication_skip_conflict = true,
replication_anon = true,
wal_dir_rescan_delay = true,
- wal_cleanup_delay = true,
custom_proc_title = true,
force_recovery = true,
instance_uuid = true,
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-03-23 12:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-20 13:15 [Tarantool-patches] [PATCH v2 0/3] gc/xlog: delay xlog cleanup until relays are subscribed Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 1/3] " Cyrill Gorcunov via Tarantool-patches
2021-03-22 8:07 ` Konstantin Osipov via Tarantool-patches
2021-03-22 8:30 ` Cyrill Gorcunov via Tarantool-patches
2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-23 7:28 ` Cyrill Gorcunov via Tarantool-patches
2021-03-23 11:25 ` Cyrill Gorcunov via Tarantool-patches
2021-03-23 12:43 ` Cyrill Gorcunov via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 2/3] test: add a test for wal_cleanup_delay option Cyrill Gorcunov via Tarantool-patches
2021-03-22 21:40 ` Vladislav Shpilevoy via Tarantool-patches
2021-03-20 13:15 ` [Tarantool-patches] [PATCH v2 3/3] test: box-tap/gc -- add test for is_paused field Cyrill Gorcunov via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox