[PATCH v3 0/5] Delete old WAL files if running out of disk space

Vladimir Davydov vdavydov.dev at gmail.com
Wed Oct 24 16:43:12 MSK 2018


If a replica permanently stops working for some reason, it will pin WAL
files it would need to resume until it is deleted from the _cluster
system space or the master is restarted. This happens in production when
an admin drops a replica and forgets to remove it from the master, and
this is quite annoying, because it may result in ENOSPC errors on the
master.

This patch set attempts to mitigate this problem by making the WAL
thread delete old WAL files and shoot off old replicas automatically
when it runs out of disk space.

https://github.com/tarantool/tarantool/issues/3397
https://github.com/tarantool/tarantool/commits/dv/gh-3397-wal-auto-deletion

Changes in v3:
 - Do some renames suggested by Kostja (journal_entry->len et al =>
   approx_len; xlog->alloc_len => allocated).
 - Do not subscribe the garbage collector to all WAL events.

v2: https://www.freelists.org/post/tarantool-patches/PATCH-v2-04-Delete-old-WAL-files-if-running-out-of-disk-space

Changes in v2:
 - Simplify WAL fallocate logic and move it from xlog.c (xlog_fallocate)
   to wal.c (wal_fallocate), because it's a business of the WAL thread.
   Now we simply fallocate() in 1 MB blocks.
 - Rework xdir_collect_garbage(): pass flags instead of bool + number of
   files to delete; also, introduce xdir_has_garbage() to make the code
   more straightforward.

v1: https://www.freelists.org/post/tarantool-patches/PATCH-05-Delete-old-WAL-files-if-running-out-of-disk-space

Vladimir Davydov (5):
  wal: preallocate disk space before writing rows
  wal: pass wal_watcher_msg to wal_watcher callback
  wal: rename wal_watcher->events to pending_events
  wal: add event_mask to wal_watcher
  wal: delete old wal files when running out of disk space

 CMakeLists.txt                        |   1 +
 src/box/box.cc                        |   9 +-
 src/box/gc.c                          |  66 +++++++++-
 src/box/gc.h                          |  31 +++++
 src/box/journal.c                     |   1 +
 src/box/journal.h                     |   4 +
 src/box/relay.cc                      |  12 +-
 src/box/txn.c                         |   1 +
 src/box/wal.c                         | 161 ++++++++++++++++++-----
 src/box/wal.h                         |  51 ++++++--
 src/box/xlog.c                        |  48 +++++++
 src/box/xlog.h                        |  49 +++++++
 src/box/xrow.h                        |  13 ++
 src/errinj.h                          |   1 +
 src/trivia/config.h.cmake             |   1 +
 test/box/errinj.result                |  18 +--
 test/replication/gc_no_space.result   | 234 ++++++++++++++++++++++++++++++++++
 test/replication/gc_no_space.test.lua | 103 +++++++++++++++
 test/replication/suite.ini            |   2 +-
 19 files changed, 743 insertions(+), 63 deletions(-)
 create mode 100644 test/replication/gc_no_space.result
 create mode 100644 test/replication/gc_no_space.test.lua

-- 
2.11.0




More information about the Tarantool-patches mailing list