[Tarantool-patches] [PATCH v6 0/3] gc/xlog: delay xlog cleanup until relays are subscribed
Kirill Yukhin
kyukhin at tarantool.org
Wed Mar 31 11:28:39 MSK 2021
Hello,
On 27 мар 14:13, Cyrill Gorcunov wrote:
> Take a look please.
>
> v2:
> - rebase code to the fresh master branch
> - keep wal_cleanup_delay option name
> - pass wal_cleanup_delay as an option to gc_init, so it
> won't be dependent on cfg engine
> - add comment about gc_delay_unref in plain bootstrap mode
> - allow to setup wal_cleanup_delay dynamically
> - update comment in gc_wait_cleanup and call it conditionally
> - declare wal_cleanup_delay as a double
> - rename gc.cleanup_is_paused to gc.is_paused and update output
> - do not show ref counter in box.info.gc() output
> - update documentation
> - move gc_delay_unref inside relay_subscribe call which runs
> in tx context (instead of relay's context)
> - update tests:
> - add a comment why we need a temp space on replica node
> - use explicit insert/snapshot operations
> - shrkink the number of insert/snapshot to speedup testing
> - use "restart" instead of stop/start pair
> - use wait_log helper instead of own function
> - add is_paused test
> v3:
> - fix changelog
> - rework box_check_wal_cleanup_delay, the replication_anon
> setting is considered only in box_set_wal_cleanup_delay,
> ie when config is checked and parsed, moreover the order
> of setup is set to be behind "replication_anon" option
> processing
> - delay cycle now considers deadline instead of per cycle
> calculation
> - use `double` type for timestamp
> - test update
> - verify `.is_paused` value
> - minimize number of inserts
> - no need to use temporary space, regular space works as well
> - add comments on why we should restart the master node
> v4:
> - drop argument from gc_init(), since we're configuring delay
> value from load_cfg.lua script there is no need to read the
> delay early, simply start gc paused and unpause it on demand
> - move unpause message to main wait cycle
> - test update:
> - verify tests and fix replication/replica_rejoin since it waits
> for xlogs to be cleaned up too early
> - use 10 seconds for XlogGapError instead of 0.1 second, this is
> a common deadline value
> v5:
> - define limits for `wal_cleanup_delay`: it should be either 0,
> or in range [0.001; TIMEOUT_INFINITY]. This is done to not consider
> fp epsilon as a meaningul value
> - fix comment about why anon replica is not using delay
> - rework cleanup delay'ed cycle
> - test update:
> - update vinyl/replica_rejoin -- we need to disable cleanup
> delay explicitly
> - update replication/replica_rejoin for same reason
> - drop unneded test_run:switch() calls
> - add a testcase where timeout is decreased and cleanup
> fiber is kicked to run even with stuck replica
>
> v6:
> - test update:
> - replica_rejoin.lua simplified to drop not needed data
> - update main test to check if _cluster sleanup triggers
> the fiber to run
>
> issue https://github.com/tarantool/tarantool/issues/5806
> branch gorcunov/gh-5806-xlog-gc-6
I've checked your patch into 2.6, 2.7 and master.
--
Regards, Kirill Yukhin
More information about the Tarantool-patches
mailing list