[Tarantool-patches] [PATCH v5 1/3] gc/xlog: delay xlog cleanup until relays are subscribed

Serge Petrenko sergepetrenko at tarantool.org
Fri Mar 26 16:42:00 MSK 2021



26.03.2021 15:06, Cyrill Gorcunov пишет:
> In case if replica managed to be far behind the master node
> (so there are a number of xlog files present after the last
> master's snapshot) then once master node get restarted it
> may clean up the xlogs needed by the replica to subscribe
> in a fast way and instead the replica will have to rejoin
> reading a number of data back.
>
> Lets try to address this by delaying xlog files cleanup
> until replicas are got subscribed and relays are up
> and running. For this sake we start with cleanup fiber
> spinning in nop cycle ("paused" mode) and use a delay
> counter to wait until relays decrement them.
>
> This implies that if `_cluster` system space is not empty
> upon restart and the registered replica somehow vanished
> completely and won't ever come back, then the node
> administrator has to drop this replica from `_cluster`
> manually.
>
> Note that this delayed cleanup start doesn't prevent
> WAL engine from removing old files if there is no
> space left on a storage device. The WAL will simply
> drop old data without a question.
>
> We need to take into account that some administrators
> might not need this functionality at all, for this
> sake we introduce "wal_cleanup_delay" configuration
> option which allows to enable or disable the delay.
>
> Closes #5806
>
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
>
> @TarantoolBot document
> Title: Add wal_cleanup_delay configuration parameter
>
> The `wal_cleanup_delay` option defines a delay in seconds
> before write ahead log files (`*.xlog`) are getting started
> to prune upon a node restart.
>
> This option is ignored in case if a node is running as
> an anonymous replica (`replication_anon = true`). Similarly
> if replication is unused or there is no plans to use
> replication at all then this option should not be considered.
>
> An initial problem to solve is the case where a node is operating
> so fast that its replicas do not manage to reach the node state
> and in case if the node is restarted at this moment (for various
> reasons, for example due to power outage) then `*.xlog` files might
> be pruned during restart. In result replicas will not find these
> files on the main node and have to reread all data back which
> is a very expensive procedure.
>
> Since replicas are tracked via `_cluster` system space this we use
> its content to count subscribed replicas and when all of them are
> up and running the cleanup procedure is automatically enabled even
> if `wal_cleanup_delay` is not expired.
>
> The `wal_cleanup_delay` should be set to:
>
>   - `0` to disable the cleanup delay;
>   - `>= 0` to wait for specified number of seconds.
>
> By default it is set to `14400` seconds (ie `4` hours).
>
> In case if registered replica is lost forever and timeout is set to
> infinity then a preferred way to enable cleanup procedure is not setting
> up a small timeout value but rather to delete this replica from `_cluster`
> space manually.
>
> Note that the option does *not* prevent WAL engine from removing
> old `*.xlog` files if there is no space left on a storage device,
> WAL engine can remove them in a force way.
>
> Current state of `*.xlog` garbage collector can be found in
> `box.info.gc()` output. For example
>
> ``` Lua
>   tarantool> box.info.gc()
>   ---
>     ...
>     is_paused: false
> ```
>
> The `is_paused` shows if cleanup fiber is paused or not.
> ---
>   .../unreleased/add-wal_cleanup_delay.md       |  5 +
>   src/box/box.cc                                | 41 ++++++++
>   src/box/box.h                                 |  1 +
>   src/box/gc.c                                  | 95 ++++++++++++++++++-
>   src/box/gc.h                                  | 36 +++++++
>   src/box/lua/cfg.cc                            |  9 ++
>   src/box/lua/info.c                            |  4 +
>   src/box/lua/load_cfg.lua                      |  5 +
>   src/box/relay.cc                              |  1 +
>   src/box/replication.cc                        |  2 +
>   test/app-tap/init_script.result               |  1 +
>   test/box/admin.result                         |  2 +
>   test/box/cfg.result                           |  4 +
>   test/replication/replica_rejoin.lua           | 22 +++++
>   test/replication/replica_rejoin.result        | 18 +++-
>   test/replication/replica_rejoin.test.lua      | 11 ++-
>   test/vinyl/replica_rejoin.lua                 |  5 +-
>   test/vinyl/replica_rejoin.result              | 13 +++
>   test/vinyl/replica_rejoin.test.lua            |  8 ++
>   19 files changed, 275 insertions(+), 8 deletions(-)
>   create mode 100644 changelogs/unreleased/add-wal_cleanup_delay.md
>   create mode 100644 test/replication/replica_rejoin.lua

Thanks for the  patch! LGTM.

-- 
Serge Petrenko



More information about the Tarantool-patches mailing list