[Tarantool-patches] [PATCH v6 0/3] gc/xlog: delay xlog cleanup until relays are subscribed

Kirill Yukhin kyukhin at tarantool.org
Wed Mar 31 11:28:39 MSK 2021


Hello,

On 27 мар 14:13, Cyrill Gorcunov wrote:
> Take a look please.
> 
> v2:
>  - rebase code to the fresh master branch
>  - keep wal_cleanup_delay option name
>  - pass wal_cleanup_delay as an option to gc_init, so it
>    won't be dependent on cfg engine
>  - add comment about gc_delay_unref in plain bootstrap mode
>  - allow to setup wal_cleanup_delay dynamically
>  - update comment in gc_wait_cleanup and call it conditionally
>  - declare wal_cleanup_delay as a double
>  - rename gc.cleanup_is_paused to gc.is_paused and update output
>  - do not show ref counter in box.info.gc() output
>  - update documentation
>  - move gc_delay_unref inside relay_subscribe call which runs
>    in tx context (instead of relay's context)
>  - update tests:
>    - add a comment why we need a temp space on replica node
>    - use explicit insert/snapshot operations
>    - shrkink the number of insert/snapshot to speedup testing
>    - use "restart" instead of stop/start pair
>    - use wait_log helper instead of own function
>    - add is_paused test
> v3:
>  - fix changelog
>  - rework box_check_wal_cleanup_delay, the replication_anon
>    setting is considered only in box_set_wal_cleanup_delay,
>    ie when config is checked and parsed, moreover the order
>    of setup is set to be behind "replication_anon" option
>    processing
>  - delay cycle now considers deadline instead of per cycle
>    calculation
>  - use `double` type for timestamp
>  - test update
>    - verify `.is_paused` value
>    - minimize number of inserts
>    - no need to use temporary space, regular space works as well
>    - add comments on why we should restart the master node
> v4:
>  - drop argument from gc_init(), since we're configuring delay
>    value from load_cfg.lua script there is no need to read the
>    delay early, simply start gc paused and unpause it on demand
>  - move unpause message to main wait cycle
>  - test update:
>    - verify tests and fix replication/replica_rejoin since it waits
>      for xlogs to be cleaned up too early
>    - use 10 seconds for XlogGapError instead of 0.1 second, this is
>      a common deadline value
> v5:
>  - define limits for `wal_cleanup_delay`: it should be either 0,
>    or in range [0.001; TIMEOUT_INFINITY]. This is done to not consider
>    fp epsilon as a meaningul value
>  - fix comment about why anon replica is not using delay
>  - rework cleanup delay'ed cycle
>  - test update:
>    - update vinyl/replica_rejoin -- we need to disable cleanup
>      delay explicitly
>    - update replication/replica_rejoin for same reason
>    - drop unneded test_run:switch() calls
>    - add a testcase where timeout is decreased and cleanup
>      fiber is kicked to run even with stuck replica
> 
> v6:
>   - test update:
>    - replica_rejoin.lua simplified to drop not needed data
>    - update main test to check if _cluster sleanup triggers
>      the fiber to run
> 
> issue https://github.com/tarantool/tarantool/issues/5806
> branch gorcunov/gh-5806-xlog-gc-6

I've checked your patch into 2.6, 2.7 and master.


--
Regards, Kirill Yukhin


More information about the Tarantool-patches mailing list