[Tarantool-patches] [PATCH v5 0/3] gc/xlog: delay xlog cleanup until relays are subscribed

Cyrill Gorcunov gorcunov at gmail.com
Fri Mar 26 15:06:02 MSK 2021


Take a look please.

v2:
 - rebase code to the fresh master branch
 - keep wal_cleanup_delay option name
 - pass wal_cleanup_delay as an option to gc_init, so it
   won't be dependent on cfg engine
 - add comment about gc_delay_unref in plain bootstrap mode
 - allow to setup wal_cleanup_delay dynamically
 - update comment in gc_wait_cleanup and call it conditionally
 - declare wal_cleanup_delay as a double
 - rename gc.cleanup_is_paused to gc.is_paused and update output
 - do not show ref counter in box.info.gc() output
 - update documentation
 - move gc_delay_unref inside relay_subscribe call which runs
   in tx context (instead of relay's context)
 - update tests:
   - add a comment why we need a temp space on replica node
   - use explicit insert/snapshot operations
   - shrkink the number of insert/snapshot to speedup testing
   - use "restart" instead of stop/start pair
   - use wait_log helper instead of own function
   - add is_paused test
v3:
 - fix changelog
 - rework box_check_wal_cleanup_delay, the replication_anon
   setting is considered only in box_set_wal_cleanup_delay,
   ie when config is checked and parsed, moreover the order
   of setup is set to be behind "replication_anon" option
   processing
 - delay cycle now considers deadline instead of per cycle
   calculation
 - use `double` type for timestamp
 - test update
   - verify `.is_paused` value
   - minimize number of inserts
   - no need to use temporary space, regular space works as well
   - add comments on why we should restart the master node
v4:
 - drop argument from gc_init(), since we're configuring delay
   value from load_cfg.lua script there is no need to read the
   delay early, simply start gc paused and unpause it on demand
 - move unpause message to main wait cycle
 - test update:
   - verify tests and fix replication/replica_rejoin since it waits
     for xlogs to be cleaned up too early
   - use 10 seconds for XlogGapError instead of 0.1 second, this is
     a common deadline value
v5:
 - define limits for `wal_cleanup_delay`: it should be either 0,
   or in range [0.001; TIMEOUT_INFINITY]. This is done to not consider
   fp epsilon as a meaningul value
 - fix comment about why anon replica is not using delay
 - rework cleanup delay'ed cycle
 - test update:
   - update vinyl/replica_rejoin -- we need to disable cleanup
     delay explicitly
   - update replication/replica_rejoin for same reason
   - drop unneded test_run:switch() calls
   - add a testcase where timeout is decreased and cleanup
     fiber is kicked to run even with stuck replica

issue https://github.com/tarantool/tarantool/issues/5806
branch gorcunov/gh-5806-xlog-gc-5

Cyrill Gorcunov (3):
  gc/xlog: delay xlog cleanup until relays are subscribed
  test: add a test for wal_cleanup_delay option
  test: box-tap/gc -- add test for is_paused field

 .../unreleased/add-wal_cleanup_delay.md       |   5 +
 src/box/box.cc                                |  41 ++
 src/box/box.h                                 |   1 +
 src/box/gc.c                                  |  95 +++-
 src/box/gc.h                                  |  36 ++
 src/box/lua/cfg.cc                            |   9 +
 src/box/lua/info.c                            |   4 +
 src/box/lua/load_cfg.lua                      |   5 +
 src/box/relay.cc                              |   1 +
 src/box/replication.cc                        |   2 +
 test/app-tap/init_script.result               |   1 +
 test/box-tap/gc.test.lua                      |   3 +-
 test/box/admin.result                         |   2 +
 test/box/cfg.result                           |   4 +
 test/replication/gh-5806-master.lua           |   8 +
 test/replication/gh-5806-slave.lua            |   8 +
 test/replication/gh-5806-xlog-cleanup.result  | 435 ++++++++++++++++++
 .../replication/gh-5806-xlog-cleanup.test.lua | 188 ++++++++
 test/replication/replica_rejoin.lua           |  22 +
 test/replication/replica_rejoin.result        |  18 +-
 test/replication/replica_rejoin.test.lua      |  11 +-
 test/vinyl/replica_rejoin.lua                 |   5 +-
 test/vinyl/replica_rejoin.result              |  13 +
 test/vinyl/replica_rejoin.test.lua            |   8 +
 24 files changed, 916 insertions(+), 9 deletions(-)
 create mode 100644 changelogs/unreleased/add-wal_cleanup_delay.md
 create mode 100644 test/replication/gh-5806-master.lua
 create mode 100644 test/replication/gh-5806-slave.lua
 create mode 100644 test/replication/gh-5806-xlog-cleanup.result
 create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua
 create mode 100644 test/replication/replica_rejoin.lua


base-commit: f4e248c0c13a46beee238fbebc38ef687ef09d02
-- 
2.30.2



More information about the Tarantool-patches mailing list