[PATCH v2 00/10] Allow to limit size of WAL files
Vladimir Davydov
vdavydov.dev at gmail.com
Sat Dec 8 18:48:04 MSK 2018
Tarantool makes checkpoints every box.cfg.checkpoint_interval seconds
and keeps last box.cfg.checkpoint_count checkpoints. It also keeps all
intermediate WAL files. Currently, it isn't possible to set a checkpoint
trigger based on the sum size of WAL files, which makes it difficult to
estimate the minimal amount of disk space that needs to be allotted to a
Tarantool instance for storing WALs to eliminate the possibility of
ENOSPC errors. For example, under normal conditions a Tarantool instance
may write 1 GB of WAL files every box.cfg.checkpoint_interval seconds
and so one may assume that 1 GB times box.cfg.checkpoint_count should be
enough for the WAL partition, but there's no guarantee it won't write 10
GB between checkpoints when the load is extreme.
So we've agreed that we must provide users with one more configuration
option that could be used to impose the limit on the sum size of WAL
files. The new option is called box.cfg.checkpoint_wal_threshold. Once
the configured threshold is exceeded, the WAL thread notifies the
checkpoint daemon that it's time to make a new checkpoint and delete
old WAL files. Note, the new option only limits the size of WAL files
created since the last checkpoint, because backup WAL files are not
needed for recovery and can be deleted in case of emergency ENOSPC.
https://github.com/tarantool/tarantool/issues/1082
https://github.com/tarantool/tarantool/commits/dv/gh-1082-wal-checkpoint-threshold
v2 addresses Kostja's comments to v1:
https://www.freelists.org/post/tarantool-patches/PATCH-09-Allow-to-limit-size-of-WAL-files
The most important changes are:
- Don't piggyback on the WAL request for notifying TX about WAL events.
- Factor out checkpoint scheduling logic and cover it with a unit test.
- Move checkpoint daemon to the GC module.
- Don't treat checkpoint_wal_threshold=0 specially and change the
default value to 1 exabyte.
Vladimir Davydov (10):
gc: do not use WAL watcher API for deactivating stale consumers
wal: simplify watcher API
box: fix certain cfg options initialized twice on recovery
box: don't use box_checkpoint_is_in_progress outside box.cc
box: move checkpointing to gc module
gc: some renames
Introduce checkpoint schedule module
Rewrite checkpoint daemon in C
wal: pass struct instead of vclock to checkpoint methods
wal: trigger checkpoint if there are too many WALs
src/box/CMakeLists.txt | 2 +-
src/box/box.cc | 72 ++++------
src/box/box.h | 7 +-
src/box/checkpoint_schedule.c | 76 ++++++++++
src/box/checkpoint_schedule.h | 85 +++++++++++
src/box/gc.c | 246 +++++++++++++++++++++++++-------
src/box/gc.h | 72 +++++++---
src/box/lua/cfg.cc | 24 ++++
src/box/lua/checkpoint_daemon.lua | 136 ------------------
src/box/lua/init.c | 2 -
src/box/lua/load_cfg.lua | 13 +-
src/box/relay.cc | 12 +-
src/box/wal.c | 237 +++++++++++++++++++++++-------
src/box/wal.h | 85 ++++++-----
src/main.cc | 21 ++-
test/app-tap/init_script.result | 87 +++++------
test/box/admin.result | 2 +
test/box/cfg.result | 4 +
test/unit/CMakeLists.txt | 6 +
test/unit/checkpoint_schedule.c | 96 +++++++++++++
test/unit/checkpoint_schedule.result | 41 ++++++
test/xlog/checkpoint_daemon.result | 143 ++-----------------
test/xlog/checkpoint_daemon.test.lua | 61 +-------
test/xlog/checkpoint_threshold.result | 115 +++++++++++++++
test/xlog/checkpoint_threshold.test.lua | 63 ++++++++
test/xlog/suite.ini | 2 +-
26 files changed, 1119 insertions(+), 591 deletions(-)
create mode 100644 src/box/checkpoint_schedule.c
create mode 100644 src/box/checkpoint_schedule.h
delete mode 100644 src/box/lua/checkpoint_daemon.lua
create mode 100644 test/unit/checkpoint_schedule.c
create mode 100644 test/unit/checkpoint_schedule.result
create mode 100644 test/xlog/checkpoint_threshold.result
create mode 100644 test/xlog/checkpoint_threshold.test.lua
--
2.11.0
More information about the Tarantool-patches
mailing list