[Tarantool-patches] [PATCH v5 0/3] gc/xlog: delay xlog cleanup until relays are subscribed
Cyrill Gorcunov
gorcunov at gmail.com
Fri Mar 26 15:06:02 MSK 2021
Take a look please.
v2:
- rebase code to the fresh master branch
- keep wal_cleanup_delay option name
- pass wal_cleanup_delay as an option to gc_init, so it
won't be dependent on cfg engine
- add comment about gc_delay_unref in plain bootstrap mode
- allow to setup wal_cleanup_delay dynamically
- update comment in gc_wait_cleanup and call it conditionally
- declare wal_cleanup_delay as a double
- rename gc.cleanup_is_paused to gc.is_paused and update output
- do not show ref counter in box.info.gc() output
- update documentation
- move gc_delay_unref inside relay_subscribe call which runs
in tx context (instead of relay's context)
- update tests:
- add a comment why we need a temp space on replica node
- use explicit insert/snapshot operations
- shrkink the number of insert/snapshot to speedup testing
- use "restart" instead of stop/start pair
- use wait_log helper instead of own function
- add is_paused test
v3:
- fix changelog
- rework box_check_wal_cleanup_delay, the replication_anon
setting is considered only in box_set_wal_cleanup_delay,
ie when config is checked and parsed, moreover the order
of setup is set to be behind "replication_anon" option
processing
- delay cycle now considers deadline instead of per cycle
calculation
- use `double` type for timestamp
- test update
- verify `.is_paused` value
- minimize number of inserts
- no need to use temporary space, regular space works as well
- add comments on why we should restart the master node
v4:
- drop argument from gc_init(), since we're configuring delay
value from load_cfg.lua script there is no need to read the
delay early, simply start gc paused and unpause it on demand
- move unpause message to main wait cycle
- test update:
- verify tests and fix replication/replica_rejoin since it waits
for xlogs to be cleaned up too early
- use 10 seconds for XlogGapError instead of 0.1 second, this is
a common deadline value
v5:
- define limits for `wal_cleanup_delay`: it should be either 0,
or in range [0.001; TIMEOUT_INFINITY]. This is done to not consider
fp epsilon as a meaningul value
- fix comment about why anon replica is not using delay
- rework cleanup delay'ed cycle
- test update:
- update vinyl/replica_rejoin -- we need to disable cleanup
delay explicitly
- update replication/replica_rejoin for same reason
- drop unneded test_run:switch() calls
- add a testcase where timeout is decreased and cleanup
fiber is kicked to run even with stuck replica
issue https://github.com/tarantool/tarantool/issues/5806
branch gorcunov/gh-5806-xlog-gc-5
Cyrill Gorcunov (3):
gc/xlog: delay xlog cleanup until relays are subscribed
test: add a test for wal_cleanup_delay option
test: box-tap/gc -- add test for is_paused field
.../unreleased/add-wal_cleanup_delay.md | 5 +
src/box/box.cc | 41 ++
src/box/box.h | 1 +
src/box/gc.c | 95 +++-
src/box/gc.h | 36 ++
src/box/lua/cfg.cc | 9 +
src/box/lua/info.c | 4 +
src/box/lua/load_cfg.lua | 5 +
src/box/relay.cc | 1 +
src/box/replication.cc | 2 +
test/app-tap/init_script.result | 1 +
test/box-tap/gc.test.lua | 3 +-
test/box/admin.result | 2 +
test/box/cfg.result | 4 +
test/replication/gh-5806-master.lua | 8 +
test/replication/gh-5806-slave.lua | 8 +
test/replication/gh-5806-xlog-cleanup.result | 435 ++++++++++++++++++
.../replication/gh-5806-xlog-cleanup.test.lua | 188 ++++++++
test/replication/replica_rejoin.lua | 22 +
test/replication/replica_rejoin.result | 18 +-
test/replication/replica_rejoin.test.lua | 11 +-
test/vinyl/replica_rejoin.lua | 5 +-
test/vinyl/replica_rejoin.result | 13 +
test/vinyl/replica_rejoin.test.lua | 8 +
24 files changed, 916 insertions(+), 9 deletions(-)
create mode 100644 changelogs/unreleased/add-wal_cleanup_delay.md
create mode 100644 test/replication/gh-5806-master.lua
create mode 100644 test/replication/gh-5806-slave.lua
create mode 100644 test/replication/gh-5806-xlog-cleanup.result
create mode 100644 test/replication/gh-5806-xlog-cleanup.test.lua
create mode 100644 test/replication/replica_rejoin.lua
base-commit: f4e248c0c13a46beee238fbebc38ef687ef09d02
--
2.30.2
More information about the Tarantool-patches
mailing list