Tarantool development patches archive
 help / color / mirror / Atom feed
From: Oleg Babin via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
	tarantool-patches@dev.tarantool.org,
	yaroslav.dynnikov@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector
Date: Thu, 11 Feb 2021 09:50:58 +0300	[thread overview]
Message-ID: <c14f8fe2-d15b-9e35-3104-8efebacc9982@tarantool.org> (raw)
In-Reply-To: <edbf5fc1-b206-a28e-34d8-fc70bf63f704@tarantool.org>

Thanks for your fixes! LGTM.

On 11/02/2021 01:35, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
> On 10.02.2021 10:00, Oleg Babin wrote:
>> Thanks for your patch.
>>
>> As I see you've introduced some new parameters: "LUA_CHUNK_SIZE" and "GC_BACKOFF_INTERVAL".
> I decided not to go into too deep details and not describe private
> constants in the commit message. GC_BACKOFF_INTERVAL is explained
> in the place where it is used. LUA_CHUNK_SIZE is quite obvious if
> you look at its usage.
>
>> I think it's better to describe them in commit message to understand more clear how new algorithm.
> These constants are not super relevant to the algorithm's core
> idea. It does not matter much for the reactive GC concept if I
> yield in table utility functions, or if I have a backoff timeout.
> These could be considered 'optimizations', 'amendments'. I would
> consider them small details not worth mentioning in the commit
> message.
>
>> I see that you didn't update comment above "gc_bucket_f" function. Is it still relevant?
> No, irrelevant, thanks for noticing. Here is the diff:
>
> ====================
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 99f92a0..1ea8069 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -1543,14 +1543,16 @@ local function gc_bucket_drop(status, route_map)
>   end
>   
>   --
> --- Garbage collector. Works on masters. The garbage collector
> --- wakes up once per specified time.
> +-- Garbage collector. Works on masters. The garbage collector wakes up when
> +-- state of any bucket changes.
>   -- After wakeup it follows the plan:
> --- 1) Check if _bucket has changed. If not, then sleep again;
> --- 2) Scan user spaces for sent and garbage buckets, delete
> ---    garbage data in batches of limited size;
> --- 3) Delete GARBAGE buckets from _bucket immediately, and
> ---    schedule SENT buckets for deletion after a timeout;
> +-- 1) Check if state of any bucket has really changed. If not, then sleep again;
> +-- 2) Delete all GARBAGE and SENT buckets along with their data in chunks of
> +--    limited size.
> +-- 3) Bucket destinations are saved into a global route_map to reroute incoming
> +--    requests from routers in case they didn't notice the buckets being moved.
> +--    The saved routes are scheduled for deletion after a timeout, which is
> +--    checked on each iteration of this loop.
>   -- 4) Sleep, go to (1).
>   -- For each step details see comments in the code.
>   --
> ====================
>
> The full new patch below.
>
> ====================
>      gc: introduce reactive garbage collector
>      
>      Garbage collector is a fiber on a master node which deletes
>      GARBAGE and SENT buckets along with their data.
>      
>      It was proactive. It used to wakeup with a constant period to
>      find and delete the needed buckets.
>      
>      But this won't work with the future feature called 'map-reduce'.
>      Map-reduce as a preparation stage will need to ensure that all
>      buckets on a storage are readable and writable. With the current
>      GC algorithm if a bucket is sent, it won't be deleted for the next
>      5 seconds by default. During this time all new map-reduce requests
>      can't execute.
>      
>      This is not acceptable. As well as too frequent wakeup of GC fiber
>      because it would waste TX thread time.
>      
>      The patch makes GC fiber wakeup not by a timeout but by events
>      happening with _bucket space. GC fiber sleeps on a condition
>      variable which is signaled when _bucket is changed.
>      
>      Once GC sees work to do, it won't sleep until it is done. It will
>      only yield.
>      
>      This makes GC delete SENT and GARBAGE buckets as soon as possible
>      reducing the waiting time for the incoming map-reduce requests.
>      
>      Needed for #147
>      
>      @TarantoolBot document
>      Title: VShard: deprecate cfg option 'collect_bucket_garbage_interval'
>      It was used to specify the interval between bucket garbage
>      collection steps. It was needed because garbage collection in
>      vshard was proactive. It didn't react to newly appeared garbage
>      buckets immediately.
>      
>      Since now (0.1.17) garbage collection became reactive. It starts
>      working with garbage buckets immediately as they appear. And
>      sleeps rest of the time. The option is not used now and does not
>      affect behaviour of anything.
>      
>      I suppose it can be deleted from the documentation. Or left with
>      a big label 'deprecated' + the explanation above.
>      
>      An attempt to use the option does not cause an error, but logs a
>      warning.
>
> diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
> index 21409bd..8df89f6 100644
> --- a/test/lua_libs/storage_template.lua
> +++ b/test/lua_libs/storage_template.lua
> @@ -172,6 +172,5 @@ function wait_bucket_is_collected(id)
>               return true
>           end
>           vshard.storage.recovery_wakeup()
> -        vshard.storage.garbage_collector_wakeup()
>       end)
>   end
> diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result
> index 168be5d..3b34841 100644
> --- a/test/misc/reconfigure.result
> +++ b/test/misc/reconfigure.result
> @@ -83,9 +83,6 @@ cfg.collect_lua_garbage = true
>   cfg.rebalancer_max_receiving = 1000
>   ---
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> ----
> -...
>   cfg.invalid_option = 'kek'
>   ---
>   ...
> @@ -105,10 +102,6 @@ vshard.storage.internal.rebalancer_max_receiving ~= 1000
>   ---
>   - true
>   ...
> -vshard.storage.internal.collect_bucket_garbage_interval ~= 100
> ----
> -- true
> -...
>   cfg.sync_timeout = nil
>   ---
>   ...
> @@ -118,9 +111,6 @@ cfg.collect_lua_garbage = nil
>   cfg.rebalancer_max_receiving = nil
>   ---
>   ...
> -cfg.collect_bucket_garbage_interval = nil
> ----
> -...
>   cfg.invalid_option = nil
>   ---
>   ...
> diff --git a/test/misc/reconfigure.test.lua b/test/misc/reconfigure.test.lua
> index e891010..348628c 100644
> --- a/test/misc/reconfigure.test.lua
> +++ b/test/misc/reconfigure.test.lua
> @@ -33,17 +33,14 @@ vshard.storage.internal.sync_timeout
>   cfg.sync_timeout = 100
>   cfg.collect_lua_garbage = true
>   cfg.rebalancer_max_receiving = 1000
> -cfg.collect_bucket_garbage_interval = 100
>   cfg.invalid_option = 'kek'
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
>   not vshard.storage.internal.collect_lua_garbage
>   vshard.storage.internal.sync_timeout
>   vshard.storage.internal.rebalancer_max_receiving ~= 1000
> -vshard.storage.internal.collect_bucket_garbage_interval ~= 100
>   cfg.sync_timeout = nil
>   cfg.collect_lua_garbage = nil
>   cfg.rebalancer_max_receiving = nil
> -cfg.collect_bucket_garbage_interval = nil
>   cfg.invalid_option = nil
>   
>   --
> diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
> index b8fc7ff..9df7480 100644
> --- a/test/rebalancer/bucket_ref.result
> +++ b/test/rebalancer/bucket_ref.result
> @@ -184,9 +184,6 @@ vshard.storage.bucket_unref(1, 'read')
>   - true
>   ...
>   -- Force GC to take an RO lock on the bucket now.
> -vshard.storage.garbage_collector_wakeup()
> ----
> -...
>   vshard.storage.buckets_info(1)
>   ---
>   - 1:
> @@ -203,7 +200,6 @@ while true do
>       if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
>           break
>       end
> -    vshard.storage.garbage_collector_wakeup()
>       fiber.sleep(0.01)
>   end;
>   ---
> @@ -235,14 +231,6 @@ finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
>   ---
>   ...
> -vshard.storage.buckets_info(1)
> ----
> -- 1:
> -    status: garbage
> -    ro_lock: true
> -    destination: <replicaset_2>
> -    id: 1
> -...
>   wait_bucket_is_collected(1)
>   ---
>   ...
> diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
> index 213ced3..1b032ff 100644
> --- a/test/rebalancer/bucket_ref.test.lua
> +++ b/test/rebalancer/bucket_ref.test.lua
> @@ -56,7 +56,6 @@ vshard.storage.bucket_unref(1, 'write') -- Error, no refs.
>   vshard.storage.bucket_ref(1, 'read')
>   vshard.storage.bucket_unref(1, 'read')
>   -- Force GC to take an RO lock on the bucket now.
> -vshard.storage.garbage_collector_wakeup()
>   vshard.storage.buckets_info(1)
>   _ = test_run:cmd("setopt delimiter ';'")
>   while true do
> @@ -64,7 +63,6 @@ while true do
>       if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
>           break
>       end
> -    vshard.storage.garbage_collector_wakeup()
>       fiber.sleep(0.01)
>   end;
>   _ = test_run:cmd("setopt delimiter ''");
> @@ -72,7 +70,6 @@ vshard.storage.buckets_info(1)
>   vshard.storage.bucket_refro(1)
>   finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
> -vshard.storage.buckets_info(1)
>   wait_bucket_is_collected(1)
>   _ = test_run:switch('box_2_a')
>   vshard.storage.buckets_info(1)
> diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
> index e50eb72..0ddb1c9 100644
> --- a/test/rebalancer/errinj.result
> +++ b/test/rebalancer/errinj.result
> @@ -226,17 +226,6 @@ ret2, err2
>   - true
>   - null
>   ...
> -_bucket:get{35}
> ----
> -- [35, 'sent', '<replicaset_2>']
> -...
> -_bucket:get{36}
> ----
> -- [36, 'sent', '<replicaset_2>']
> -...
> --- Buckets became 'active' on box_2_a, but still are sending on
> --- box_1_a. Wait until it is marked as garbage on box_1_a by the
> --- recovery fiber.
>   wait_bucket_is_collected(35)
>   ---
>   ...
> diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
> index 2cc4a69..a60f3d7 100644
> --- a/test/rebalancer/errinj.test.lua
> +++ b/test/rebalancer/errinj.test.lua
> @@ -102,11 +102,6 @@ _ = test_run:switch('box_1_a')
>   while f1:status() ~= 'dead' or f2:status() ~= 'dead' do fiber.sleep(0.001) end
>   ret1, err1
>   ret2, err2
> -_bucket:get{35}
> -_bucket:get{36}
> --- Buckets became 'active' on box_2_a, but still are sending on
> --- box_1_a. Wait until it is marked as garbage on box_1_a by the
> --- recovery fiber.
>   wait_bucket_is_collected(35)
>   wait_bucket_is_collected(36)
>   _ = test_run:switch('box_2_a')
> diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
> index 7d3612b..ad93445 100644
> --- a/test/rebalancer/receiving_bucket.result
> +++ b/test/rebalancer/receiving_bucket.result
> @@ -366,14 +366,6 @@ vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
>   ---
>   - true
>   ...
> -vshard.storage.buckets_info(1)
> ----
> -- 1:
> -    status: sent
> -    ro_lock: true
> -    destination: <replicaset_1>
> -    id: 1
> -...
>   wait_bucket_is_collected(1)
>   ---
>   ...
> diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
> index 24534b3..2cf6382 100644
> --- a/test/rebalancer/receiving_bucket.test.lua
> +++ b/test/rebalancer/receiving_bucket.test.lua
> @@ -136,7 +136,6 @@ box.space.test3:select{100}
>   -- Now the bucket is unreferenced and can be transferred.
>   _ = test_run:switch('box_2_a')
>   vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
> -vshard.storage.buckets_info(1)
>   wait_bucket_is_collected(1)
>   vshard.storage.buckets_info(1)
>   _ = test_run:switch('box_1_a')
> diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
> index 753687f..9d30a04 100644
> --- a/test/reload_evolution/storage.result
> +++ b/test/reload_evolution/storage.result
> @@ -92,7 +92,7 @@ test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to')
>   ...
>   vshard.storage.internal.reload_version
>   ---
> -- 2
> +- 3
>   ...
>   --
>   -- gh-237: should be only one trigger. During gh-237 the trigger installation
> diff --git a/test/router/reroute_wrong_bucket.result b/test/router/reroute_wrong_bucket.result
> index 049bdef..ac340eb 100644
> --- a/test/router/reroute_wrong_bucket.result
> +++ b/test/router/reroute_wrong_bucket.result
> @@ -37,7 +37,7 @@ test_run:switch('storage_1_a')
>   ---
>   - true
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   ---
>   ...
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
> @@ -53,7 +53,7 @@ test_run:switch('storage_2_a')
>   ---
>   - true
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   ---
>   ...
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
> @@ -202,12 +202,12 @@ test_run:grep_log('router_1', 'please update configuration')
>   err
>   ---
>   - bucket_id: 100
> -  reason: write is prohibited
> +  reason: Not found
>     code: 1
>     destination: ac522f65-aa94-4134-9f64-51ee384f1a54
>     type: ShardingError
>     name: WRONG_BUCKET
> -  message: 'Cannot perform action with bucket 100, reason: write is prohibited'
> +  message: 'Cannot perform action with bucket 100, reason: Not found'
>   ...
>   --
>   -- Now try again, but update configuration during call(). It must
> diff --git a/test/router/reroute_wrong_bucket.test.lua b/test/router/reroute_wrong_bucket.test.lua
> index 9e6e804..207aac3 100644
> --- a/test/router/reroute_wrong_bucket.test.lua
> +++ b/test/router/reroute_wrong_bucket.test.lua
> @@ -11,13 +11,13 @@ util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memt
>   test_run:cmd('create server router_1 with script="router/router_1.lua"')
>   test_run:cmd('start server router_1')
>   test_run:switch('storage_1_a')
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
>   vshard.storage.rebalancer_disable()
>   for i = 1, 100 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
>   
>   test_run:switch('storage_2_a')
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
>   vshard.storage.rebalancer_disable()
>   for i = 101, 200 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
> diff --git a/test/storage/recovery.result b/test/storage/recovery.result
> index f833fe7..8ccb0b9 100644
> --- a/test/storage/recovery.result
> +++ b/test/storage/recovery.result
> @@ -79,8 +79,7 @@ _bucket = box.space._bucket
>   ...
>   _bucket:select{}
>   ---
> -- - [2, 'garbage', '<replicaset_2>']
> -  - [3, 'garbage', '<replicaset_2>']
> +- []
>   ...
>   _ = test_run:switch('storage_2_a')
>   ---
> diff --git a/test/storage/storage.result b/test/storage/storage.result
> index 424bc4c..0550ad1 100644
> --- a/test/storage/storage.result
> +++ b/test/storage/storage.result
> @@ -547,6 +547,9 @@ vshard.storage.bucket_send(1, util.replicasets[2])
>   ---
>   - true
>   ...
> +wait_bucket_is_collected(1)
> +---
> +...
>   _ = test_run:switch("storage_2_a")
>   ---
>   ...
> @@ -567,12 +570,7 @@ _ = test_run:switch("storage_1_a")
>   ...
>   vshard.storage.buckets_info()
>   ---
> -- 1:
> -    status: sent
> -    ro_lock: true
> -    destination: <replicaset_2>
> -    id: 1
> -  2:
> +- 2:
>       status: active
>       id: 2
>   ...
> diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
> index d631b51..d8fbd94 100644
> --- a/test/storage/storage.test.lua
> +++ b/test/storage/storage.test.lua
> @@ -136,6 +136,7 @@ vshard.storage.bucket_send(1, util.replicasets[1])
>   
>   -- Successful transfer.
>   vshard.storage.bucket_send(1, util.replicasets[2])
> +wait_bucket_is_collected(1)
>   _ = test_run:switch("storage_2_a")
>   vshard.storage.buckets_info()
>   _ = test_run:switch("storage_1_a")
> diff --git a/test/unit/config.result b/test/unit/config.result
> index dfd0219..e0b2482 100644
> --- a/test/unit/config.result
> +++ b/test/unit/config.result
> @@ -428,33 +428,6 @@ _ = lcfg.check(cfg)
>   --
>   -- gh-77: garbage collection options.
>   --
> -cfg.collect_bucket_garbage_interval = 'str'
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = 0
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = -1
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = 100.5
> ----
> -...
> -_ = lcfg.check(cfg)
> ----
> -...
>   cfg.collect_lua_garbage = 100
>   ---
>   ...
> @@ -615,6 +588,12 @@ lcfg.check(cfg).rebalancer_max_sending
>   cfg.rebalancer_max_sending = nil
>   ---
>   ...
> -cfg.sharding = nil
> +--
> +-- Deprecated option does not break anything.
> +--
> +cfg.collect_bucket_garbage_interval = 100
> +---
> +...
> +_ = lcfg.check(cfg)
>   ---
>   ...
> diff --git a/test/unit/config.test.lua b/test/unit/config.test.lua
> index ada43db..a1c9f07 100644
> --- a/test/unit/config.test.lua
> +++ b/test/unit/config.test.lua
> @@ -175,15 +175,6 @@ _ = lcfg.check(cfg)
>   --
>   -- gh-77: garbage collection options.
>   --
> -cfg.collect_bucket_garbage_interval = 'str'
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = 0
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = -1
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = 100.5
> -_ = lcfg.check(cfg)
> -
>   cfg.collect_lua_garbage = 100
>   check(cfg)
>   cfg.collect_lua_garbage = true
> @@ -244,4 +235,9 @@ util.check_error(lcfg.check, cfg)
>   cfg.rebalancer_max_sending = 15
>   lcfg.check(cfg).rebalancer_max_sending
>   cfg.rebalancer_max_sending = nil
> -cfg.sharding = nil
> +
> +--
> +-- Deprecated option does not break anything.
> +--
> +cfg.collect_bucket_garbage_interval = 100
> +_ = lcfg.check(cfg)
> diff --git a/test/unit/garbage.result b/test/unit/garbage.result
> index 74d9ccf..a530496 100644
> --- a/test/unit/garbage.result
> +++ b/test/unit/garbage.result
> @@ -31,9 +31,6 @@ test_run:cmd("setopt delimiter ''");
>   vshard.storage.internal.shard_index = 'bucket_id'
>   ---
>   ...
> -vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
> ----
> -...
>   --
>   -- Find nothing if no bucket_id anywhere, or there is no index
>   -- by it, or bucket_id is not unsigned.
> @@ -151,6 +148,9 @@ format[1] = {name = 'id', type = 'unsigned'}
>   format[2] = {name = 'status', type = 'string'}
>   ---
>   ...
> +format[3] = {name = 'destination', type = 'string', is_nullable = true}
> +---
> +...
>   _bucket = box.schema.create_space('_bucket', {format = format})
>   ---
>   ...
> @@ -172,22 +172,6 @@ _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
>   ---
>   - [3, 'active']
>   ...
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> ----
> -- [4, 'sent']
> -...
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [5, 'garbage']
> -...
> -_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [6, 'garbage']
> -...
> -_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [200, 'garbage']
> -...
>   s = box.schema.create_space('test', {engine = engine})
>   ---
>   ...
> @@ -213,7 +197,7 @@ s:replace{4, 2}
>   ---
>   - [4, 2]
>   ...
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> +gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
>   ---
>   ...
>   s2 = box.schema.create_space('test2', {engine = engine})
> @@ -249,6 +233,10 @@ function fill_spaces_with_garbage()
>       s2:replace{6, 4}
>       s2:replace{7, 5}
>       s2:replace{7, 6}
> +    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
> +    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> +    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
> +    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   end;
>   ---
>   ...
> @@ -267,12 +255,22 @@ fill_spaces_with_garbage()
>   ---
>   - 1107
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
>   ---
> -- - 5
> -  - 6
> -  - 200
>   - true
> +- null
> +...
> +route_map
> +---
> +- - null
> +  - null
> +  - null
> +  - null
> +  - null
> +  - destination2
>   ...
>   #s2:select{}
>   ---
> @@ -282,10 +280,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
>   ---
>   - 7
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
>   ---
> -- - 4
>   - true
> +- null
> +...
> +route_map
> +---
> +- - null
> +  - null
> +  - null
> +  - destination1
>   ...
>   s2:select{}
>   ---
> @@ -303,17 +311,22 @@ s:select{}
>     - [6, 100]
>   ...
>   -- Nothing deleted - update collected generation.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
>   ---
> -- - 5
> -  - 6
> -  - 200
>   - true
> +- null
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
>   ---
> -- - 4
>   - true
> +- null
> +...
> +route_map
> +---
> +- []
>   ...
>   #s2:select{}
>   ---
> @@ -329,15 +342,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
>   fill_spaces_with_garbage()
>   ---
>   ...
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> +_ = _bucket:on_replace(function()                                               \
> +    local gen = vshard.storage.internal.bucket_generation                       \
> +    vshard.storage.internal.bucket_generation = gen + 1                         \
> +    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
> +end)
>   ---
>   ...
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   ---
>   ...
>   -- Wait until garbage collection is finished.
> -while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
>   ---
> +- true
>   ...
>   s:select{}
>   ---
> @@ -360,7 +378,6 @@ _bucket:select{}
>   - - [1, 'active']
>     - [2, 'receiving']
>     - [3, 'active']
> -  - [4, 'sent']
>   ...
>   --
>   -- Test deletion of 'sent' buckets after a specified timeout.
> @@ -370,8 +387,9 @@ _bucket:replace{2, vshard.consts.BUCKET.SENT}
>   - [2, 'sent']
>   ...
>   -- Wait deletion after a while.
> -while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{2} end)
>   ---
> +- true
>   ...
>   _bucket:select{}
>   ---
> @@ -410,8 +428,9 @@ _bucket:replace{4, vshard.consts.BUCKET.SENT}
>   ---
>   - [4, 'sent']
>   ...
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   ---
> +- true
>   ...
>   --
>   -- Test WAL errors during deletion from _bucket.
> @@ -434,11 +453,14 @@ s:replace{6, 4}
>   ---
>   - [6, 4]
>   ...
> -while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_log('default', 'Error during garbage collection step',            \
> +                  65536, 10)
>   ---
> +- Error during garbage collection step
>   ...
> -while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return #sk:select{4} == 0 end)
>   ---
> +- true
>   ...
>   s:select{}
>   ---
> @@ -454,8 +476,9 @@ _bucket:select{}
>   _ = _bucket:on_replace(nil, rollback_on_delete)
>   ---
>   ...
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   ---
> +- true
>   ...
>   f:cancel()
>   ---
> @@ -562,8 +585,9 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   ---
>   ...
> -while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return _bucket:count() == 0 end)
>   ---
> +- true
>   ...
>   _bucket:select{}
>   ---
> diff --git a/test/unit/garbage.test.lua b/test/unit/garbage.test.lua
> index 30079fa..250afb0 100644
> --- a/test/unit/garbage.test.lua
> +++ b/test/unit/garbage.test.lua
> @@ -15,7 +15,6 @@ end;
>   test_run:cmd("setopt delimiter ''");
>   
>   vshard.storage.internal.shard_index = 'bucket_id'
> -vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
>   
>   --
>   -- Find nothing if no bucket_id anywhere, or there is no index
> @@ -75,16 +74,13 @@ s:drop()
>   format = {}
>   format[1] = {name = 'id', type = 'unsigned'}
>   format[2] = {name = 'status', type = 'string'}
> +format[3] = {name = 'destination', type = 'string', is_nullable = true}
>   _bucket = box.schema.create_space('_bucket', {format = format})
>   _ = _bucket:create_index('pk')
>   _ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
>   _bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
>   _bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
>   _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> -_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
> -_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   
>   s = box.schema.create_space('test', {engine = engine})
>   pk = s:create_index('pk')
> @@ -94,7 +90,7 @@ s:replace{2, 1}
>   s:replace{3, 2}
>   s:replace{4, 2}
>   
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> +gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
>   s2 = box.schema.create_space('test2', {engine = engine})
>   pk2 = s2:create_index('pk')
>   sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> @@ -114,6 +110,10 @@ function fill_spaces_with_garbage()
>       s2:replace{6, 4}
>       s2:replace{7, 5}
>       s2:replace{7, 6}
> +    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
> +    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> +    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
> +    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   end;
>   test_run:cmd("setopt delimiter ''");
>   
> @@ -121,15 +121,21 @@ fill_spaces_with_garbage()
>   
>   #s2:select{}
>   #s:select{}
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
> +route_map
>   #s2:select{}
>   #s:select{}
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
> +route_map
>   s2:select{}
>   s:select{}
>   -- Nothing deleted - update collected generation.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
> +route_map
>   #s2:select{}
>   #s:select{}
>   
> @@ -137,10 +143,14 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
>   -- Test continuous garbage collection via background fiber.
>   --
>   fill_spaces_with_garbage()
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> +_ = _bucket:on_replace(function()                                               \
> +    local gen = vshard.storage.internal.bucket_generation                       \
> +    vshard.storage.internal.bucket_generation = gen + 1                         \
> +    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
> +end)
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   -- Wait until garbage collection is finished.
> -while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
>   s:select{}
>   s2:select{}
>   -- Check garbage bucket is deleted by background fiber.
> @@ -150,7 +160,7 @@ _bucket:select{}
>   --
>   _bucket:replace{2, vshard.consts.BUCKET.SENT}
>   -- Wait deletion after a while.
> -while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{2} end)
>   _bucket:select{}
>   s:select{}
>   s2:select{}
> @@ -162,7 +172,7 @@ _bucket:replace{4, vshard.consts.BUCKET.ACTIVE}
>   s:replace{5, 4}
>   s:replace{6, 4}
>   _bucket:replace{4, vshard.consts.BUCKET.SENT}
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   
>   --
>   -- Test WAL errors during deletion from _bucket.
> @@ -172,12 +182,13 @@ _ = _bucket:on_replace(rollback_on_delete)
>   _bucket:replace{4, vshard.consts.BUCKET.SENT}
>   s:replace{5, 4}
>   s:replace{6, 4}
> -while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> -while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_log('default', 'Error during garbage collection step',            \
> +                  65536, 10)
> +test_run:wait_cond(function() return #sk:select{4} == 0 end)
>   s:select{}
>   _bucket:select{}
>   _ = _bucket:on_replace(nil, rollback_on_delete)
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   
>   f:cancel()
>   
> @@ -220,7 +231,7 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
>   #s:select{}
>   #s2:select{}
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
> -while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return _bucket:count() == 0 end)
>   _bucket:select{}
>   s:select{}
>   s2:select{}
> diff --git a/test/unit/garbage_errinj.result b/test/unit/garbage_errinj.result
> deleted file mode 100644
> index 92c8039..0000000
> --- a/test/unit/garbage_errinj.result
> +++ /dev/null
> @@ -1,223 +0,0 @@
> -test_run = require('test_run').new()
> ----
> -...
> -vshard = require('vshard')
> ----
> -...
> -fiber = require('fiber')
> ----
> -...
> -engine = test_run:get_cfg('engine')
> ----
> -...
> -vshard.storage.internal.shard_index = 'bucket_id'
> ----
> -...
> -format = {}
> ----
> -...
> -format[1] = {name = 'id', type = 'unsigned'}
> ----
> -...
> -format[2] = {name = 'status', type = 'string', is_nullable = true}
> ----
> -...
> -_bucket = box.schema.create_space('_bucket', {format = format})
> ----
> -...
> -_ = _bucket:create_index('pk')
> ----
> -...
> -_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
> ----
> -...
> -_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
> ----
> -- [1, 'active']
> -...
> -_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
> ----
> -- [2, 'receiving']
> -...
> -_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> ----
> -- [3, 'active']
> -...
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> ----
> -- [4, 'sent']
> -...
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [5, 'garbage']
> -...
> -s = box.schema.create_space('test', {engine = engine})
> ----
> -...
> -pk = s:create_index('pk')
> ----
> -...
> -sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> ----
> -...
> -s:replace{1, 1}
> ----
> -- [1, 1]
> -...
> -s:replace{2, 1}
> ----
> -- [2, 1]
> -...
> -s:replace{3, 2}
> ----
> -- [3, 2]
> -...
> -s:replace{4, 2}
> ----
> -- [4, 2]
> -...
> -s:replace{5, 100}
> ----
> -- [5, 100]
> -...
> -s:replace{6, 100}
> ----
> -- [6, 100]
> -...
> -s:replace{7, 4}
> ----
> -- [7, 4]
> -...
> -s:replace{8, 5}
> ----
> -- [8, 5]
> -...
> -s2 = box.schema.create_space('test2', {engine = engine})
> ----
> -...
> -pk2 = s2:create_index('pk')
> ----
> -...
> -sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> ----
> -...
> -s2:replace{1, 1}
> ----
> -- [1, 1]
> -...
> -s2:replace{3, 3}
> ----
> -- [3, 3]
> -...
> -for i = 7, 1107 do s:replace{i, 200} end
> ----
> -...
> -s2:replace{4, 200}
> ----
> -- [4, 200]
> -...
> -s2:replace{5, 100}
> ----
> -- [5, 100]
> -...
> -s2:replace{5, 300}
> ----
> -- [5, 300]
> -...
> -s2:replace{6, 4}
> ----
> -- [6, 4]
> -...
> -s2:replace{7, 5}
> ----
> -- [7, 5]
> -...
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> ----
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> ----
> -- - 4
> -- true
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> ----
> -- - 5
> -- true
> -...
> ---
> --- Test _bucket generation change during garbage buckets search.
> ---
> -s:truncate()
> ----
> -...
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> ----
> -...
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
> ----
> -...
> -f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
> ----
> -...
> -_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [4, 'garbage']
> -...
> -s:replace{5, 4}
> ----
> -- [5, 4]
> -...
> -s:replace{6, 4}
> ----
> -- [6, 4]
> -...
> -#s:select{}
> ----
> -- 2
> -...
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
> ----
> -...
> -while f:status() ~= 'dead' do fiber.sleep(0.1) end
> ----
> -...
> --- Nothing is deleted - _bucket:replace() has changed _bucket
> --- generation during search of garbage buckets.
> -#s:select{}
> ----
> -- 2
> -...
> -_bucket:select{4}
> ----
> -- - [4, 'garbage']
> -...
> --- Next step deletes garbage ok.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> ----
> -- []
> -- true
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> ----
> -- - 4
> -  - 5
> -- true
> -...
> -#s:select{}
> ----
> -- 0
> -...
> -_bucket:delete{4}
> ----
> -- [4, 'garbage']
> -...
> -s2:drop()
> ----
> -...
> -s:drop()
> ----
> -...
> -_bucket:drop()
> ----
> -...
> diff --git a/test/unit/garbage_errinj.test.lua b/test/unit/garbage_errinj.test.lua
> deleted file mode 100644
> index 31184b9..0000000
> --- a/test/unit/garbage_errinj.test.lua
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -test_run = require('test_run').new()
> -vshard = require('vshard')
> -fiber = require('fiber')
> -
> -engine = test_run:get_cfg('engine')
> -vshard.storage.internal.shard_index = 'bucket_id'
> -
> -format = {}
> -format[1] = {name = 'id', type = 'unsigned'}
> -format[2] = {name = 'status', type = 'string', is_nullable = true}
> -_bucket = box.schema.create_space('_bucket', {format = format})
> -_ = _bucket:create_index('pk')
> -_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
> -_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
> -_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> -
> -s = box.schema.create_space('test', {engine = engine})
> -pk = s:create_index('pk')
> -sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> -s:replace{1, 1}
> -s:replace{2, 1}
> -s:replace{3, 2}
> -s:replace{4, 2}
> -s:replace{5, 100}
> -s:replace{6, 100}
> -s:replace{7, 4}
> -s:replace{8, 5}
> -
> -s2 = box.schema.create_space('test2', {engine = engine})
> -pk2 = s2:create_index('pk')
> -sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> -s2:replace{1, 1}
> -s2:replace{3, 3}
> -for i = 7, 1107 do s:replace{i, 200} end
> -s2:replace{4, 200}
> -s2:replace{5, 100}
> -s2:replace{5, 300}
> -s2:replace{6, 4}
> -s2:replace{7, 5}
> -
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -
> ---
> --- Test _bucket generation change during garbage buckets search.
> ---
> -s:truncate()
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
> -f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
> -_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
> -s:replace{5, 4}
> -s:replace{6, 4}
> -#s:select{}
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
> -while f:status() ~= 'dead' do fiber.sleep(0.1) end
> --- Nothing is deleted - _bucket:replace() has changed _bucket
> --- generation during search of garbage buckets.
> -#s:select{}
> -_bucket:select{4}
> --- Next step deletes garbage ok.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -#s:select{}
> -_bucket:delete{4}
> -
> -s2:drop()
> -s:drop()
> -_bucket:drop()
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index f7d5dbc..63d5414 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -251,9 +251,8 @@ local cfg_template = {
>           max = consts.REBALANCER_MAX_SENDING_MAX
>       },
>       collect_bucket_garbage_interval = {
> -        type = 'positive number', name = 'Garbage bucket collect interval',
> -        is_optional = true,
> -        default = consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
> +        name = 'Garbage bucket collect interval', is_deprecated = true,
> +        reason = 'Has no effect anymore'
>       },
>       collect_lua_garbage = {
>           type = 'boolean', name = 'Garbage Lua collect necessity',
> diff --git a/vshard/consts.lua b/vshard/consts.lua
> index 8c2a8b0..3f1585a 100644
> --- a/vshard/consts.lua
> +++ b/vshard/consts.lua
> @@ -23,6 +23,7 @@ return {
>       DEFAULT_BUCKET_COUNT = 3000;
>       BUCKET_SENT_GARBAGE_DELAY = 0.5;
>       BUCKET_CHUNK_SIZE = 1000;
> +    LUA_CHUNK_SIZE = 100000,
>       DEFAULT_REBALANCER_DISBALANCE_THRESHOLD = 1;
>       REBALANCER_IDLE_INTERVAL = 60 * 60;
>       REBALANCER_WORK_INTERVAL = 10;
> @@ -37,7 +38,7 @@ return {
>       DEFAULT_FAILOVER_PING_TIMEOUT = 5;
>       DEFAULT_SYNC_TIMEOUT = 1;
>       RECONNECT_TIMEOUT = 0.5;
> -    DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL = 0.5;
> +    GC_BACKOFF_INTERVAL = 5,
>       RECOVERY_INTERVAL = 5;
>       COLLECT_LUA_GARBAGE_INTERVAL = 100;
>   
> @@ -45,4 +46,6 @@ return {
>       DISCOVERY_WORK_INTERVAL = 1,
>       DISCOVERY_WORK_STEP = 0.01,
>       DISCOVERY_TIMEOUT = 10,
> +
> +    TIMEOUT_INFINITY = 500 * 365 * 86400,
>   }
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index adf1c20..1ea8069 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -69,7 +69,6 @@ if not M then
>           total_bucket_count = 0,
>           errinj = {
>               ERRINJ_CFG = false,
> -            ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false,
>               ERRINJ_RELOAD = false,
>               ERRINJ_CFG_DELAY = false,
>               ERRINJ_LONG_RECEIVE = false,
> @@ -96,6 +95,8 @@ if not M then
>           -- detect that _bucket was not changed between yields.
>           --
>           bucket_generation = 0,
> +        -- Condition variable fired on generation update.
> +        bucket_generation_cond = lfiber.cond(),
>           --
>           -- Reference to the function used as on_replace trigger on
>           -- _bucket space. It is used to replace the trigger with
> @@ -107,12 +108,14 @@ if not M then
>           -- replace the old function is to keep its reference.
>           --
>           bucket_on_replace = nil,
> +        -- Redirects for recently sent buckets. They are kept for a while to
> +        -- help routers to find a new location for sent and deleted buckets
> +        -- without whole cluster scan.
> +        route_map = {},
>   
>           ------------------- Garbage collection -------------------
>           -- Fiber to remove garbage buckets data.
>           collect_bucket_garbage_fiber = nil,
> -        -- Do buckets garbage collection once per this time.
> -        collect_bucket_garbage_interval = nil,
>           -- Boolean lua_gc state (create periodic gc task).
>           collect_lua_garbage = nil,
>   
> @@ -173,6 +176,7 @@ end
>   --
>   local function bucket_generation_increment()
>       M.bucket_generation = M.bucket_generation + 1
> +    M.bucket_generation_cond:broadcast()
>   end
>   
>   --
> @@ -758,8 +762,9 @@ local function bucket_check_state(bucket_id, mode)
>       else
>           return bucket
>       end
> +    local dst = bucket and bucket.destination or M.route_map[bucket_id]
>       return bucket, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, reason,
> -                                 bucket and bucket.destination)
> +                                 dst)
>   end
>   
>   --
> @@ -804,11 +809,23 @@ end
>   --
>   local function bucket_unrefro(bucket_id)
>       local ref = M.bucket_refs[bucket_id]
> -    if not ref or ref.ro == 0 then
> +    local count = ref and ref.ro or 0
> +    if count == 0 then
>           return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id,
>                                     "no refs", nil)
>       end
> -    ref.ro = ref.ro - 1
> +    if count == 1 then
> +        ref.ro = 0
> +        if ref.ro_lock then
> +            -- Garbage collector is waiting for the bucket if RO
> +            -- is locked. Let it know it has one more bucket to
> +            -- collect. It relies on generation, so its increment
> +            -- it enough.
> +            bucket_generation_increment()
> +        end
> +        return true
> +    end
> +    ref.ro = count - 1
>       return true
>   end
>   
> @@ -1481,79 +1498,44 @@ local function gc_bucket_in_space(space, bucket_id, status)
>   end
>   
>   --
> --- Remove tuples from buckets of a specified type.
> --- @param type Type of buckets to gc.
> --- @retval List of ids of empty buckets of the type.
> +-- Drop buckets with the given status along with their data in all spaces.
> +-- @param status Status of target buckets.
> +-- @param route_map Destinations of deleted buckets are saved into this table.
>   --
> -local function gc_bucket_step_by_type(type)
> -    local sharded_spaces = find_sharded_spaces()
> -    local empty_buckets = {}
> +local function gc_bucket_drop_xc(status, route_map)
>       local limit = consts.BUCKET_CHUNK_SIZE
> -    local is_all_collected = true
> -    for _, bucket in box.space._bucket.index.status:pairs(type) do
> -        local bucket_id = bucket.id
> -        local ref = M.bucket_refs[bucket_id]
> +    local _bucket = box.space._bucket
> +    local sharded_spaces = find_sharded_spaces()
> +    for _, b in _bucket.index.status:pairs(status) do
> +        local id = b.id
> +        local ref = M.bucket_refs[id]
>           if ref then
>               assert(ref.rw == 0)
>               if ref.ro ~= 0 then
>                   ref.ro_lock = true
> -                is_all_collected = false
>                   goto continue
>               end
> -            M.bucket_refs[bucket_id] = nil
> +            M.bucket_refs[id] = nil
>           end
>           for _, space in pairs(sharded_spaces) do
> -            gc_bucket_in_space_xc(space, bucket_id, type)
> +            gc_bucket_in_space_xc(space, id, status)
>               limit = limit - 1
>               if limit == 0 then
>                   lfiber.sleep(0)
>                   limit = consts.BUCKET_CHUNK_SIZE
>               end
>           end
> -        table.insert(empty_buckets, bucket.id)
> -::continue::
> +        route_map[id] = b.destination
> +        _bucket:delete{id}
> +    ::continue::
>       end
> -    return empty_buckets, is_all_collected
> -end
> -
> ---
> --- Drop buckets with ids in the list.
> --- @param bucket_ids Bucket ids to drop.
> --- @param status Expected bucket status.
> ---
> -local function gc_bucket_drop_xc(bucket_ids, status)
> -    if #bucket_ids == 0 then
> -        return
> -    end
> -    local limit = consts.BUCKET_CHUNK_SIZE
> -    box.begin()
> -    local _bucket = box.space._bucket
> -    for _, id in pairs(bucket_ids) do
> -        local bucket_exists = _bucket:get{id} ~= nil
> -        local b = _bucket:get{id}
> -        if b then
> -            if b.status ~= status then
> -                return error(string.format('Bucket %d status is changed. Was '..
> -                                           '%s, became %s', id, status,
> -                                           b.status))
> -            end
> -            _bucket:delete{id}
> -        end
> -        limit = limit - 1
> -        if limit == 0 then
> -            box.commit()
> -            box.begin()
> -            limit = consts.BUCKET_CHUNK_SIZE
> -        end
> -    end
> -    box.commit()
>   end
>   
>   --
>   -- Exception safe version of gc_bucket_drop_xc.
>   --
> -local function gc_bucket_drop(bucket_ids, status)
> -    local status, err = pcall(gc_bucket_drop_xc, bucket_ids, status)
> +local function gc_bucket_drop(status, route_map)
> +    local status, err = pcall(gc_bucket_drop_xc, status, route_map)
>       if not status then
>           box.rollback()
>       end
> @@ -1561,14 +1543,16 @@ local function gc_bucket_drop(bucket_ids, status)
>   end
>   
>   --
> --- Garbage collector. Works on masters. The garbage collector
> --- wakes up once per specified time.
> +-- Garbage collector. Works on masters. The garbage collector wakes up when
> +-- state of any bucket changes.
>   -- After wakeup it follows the plan:
> --- 1) Check if _bucket has changed. If not, then sleep again;
> --- 2) Scan user spaces for sent and garbage buckets, delete
> ---    garbage data in batches of limited size;
> --- 3) Delete GARBAGE buckets from _bucket immediately, and
> ---    schedule SENT buckets for deletion after a timeout;
> +-- 1) Check if state of any bucket has really changed. If not, then sleep again;
> +-- 2) Delete all GARBAGE and SENT buckets along with their data in chunks of
> +--    limited size.
> +-- 3) Bucket destinations are saved into a global route_map to reroute incoming
> +--    requests from routers in case they didn't notice the buckets being moved.
> +--    The saved routes are scheduled for deletion after a timeout, which is
> +--    checked on each iteration of this loop.
>   -- 4) Sleep, go to (1).
>   -- For each step details see comments in the code.
>   --
> @@ -1580,65 +1564,75 @@ function gc_bucket_f()
>       -- generation == bucket generation. In such a case the fiber
>       -- does nothing until next _bucket change.
>       local bucket_generation_collected = -1
> -    -- Empty sent buckets are collected into an array. After a
> -    -- specified time interval the buckets are deleted both from
> -    -- this array and from _bucket space.
> -    local buckets_for_redirect = {}
> -    local buckets_for_redirect_ts = fiber_clock()
> -    -- Empty sent buckets, updated after each step, and when
> -    -- buckets_for_redirect is deleted, it gets empty_sent_buckets
> -    -- for next deletion.
> -    local empty_garbage_buckets, empty_sent_buckets, status, err
> +    local bucket_generation_current = M.bucket_generation
> +    -- Deleted buckets are saved into a route map to redirect routers if they
> +    -- didn't discover new location of the buckets yet. However route map does
> +    -- not grow infinitely. Otherwise it would end up storing redirects for all
> +    -- buckets in the cluster. Which could also be outdated.
> +    -- Garbage collector periodically drops old routes from the map. For that it
> +    -- remembers state of route map in one moment, and after a while clears the
> +    -- remembered routes from the global route map.
> +    local route_map = M.route_map
> +    local route_map_old = {}
> +    local route_map_deadline = 0
> +    local status, err
>       while M.module_version == module_version do
> -        -- Check if no changes in buckets configuration.
> -        if bucket_generation_collected ~= M.bucket_generation then
> -            local bucket_generation = M.bucket_generation
> -            local is_sent_collected, is_garbage_collected
> -            status, empty_garbage_buckets, is_garbage_collected =
> -                pcall(gc_bucket_step_by_type, consts.BUCKET.GARBAGE)
> -            if not status then
> -                err = empty_garbage_buckets
> -                goto check_error
> -            end
> -            status, empty_sent_buckets, is_sent_collected =
> -                pcall(gc_bucket_step_by_type, consts.BUCKET.SENT)
> -            if not status then
> -                err = empty_sent_buckets
> -                goto check_error
> +        if bucket_generation_collected ~= bucket_generation_current then
> +            status, err = gc_bucket_drop(consts.BUCKET.GARBAGE, route_map)
> +            if status then
> +                status, err = gc_bucket_drop(consts.BUCKET.SENT, route_map)
>               end
> -            status, err = gc_bucket_drop(empty_garbage_buckets,
> -                                         consts.BUCKET.GARBAGE)
> -::check_error::
>               if not status then
>                   box.rollback()
>                   log.error('Error during garbage collection step: %s', err)
> -                goto continue
> +            else
> +                -- Don't use global generation. During the collection it could
> +                -- already change. Instead, remember the generation known before
> +                -- the collection has started.
> +                -- Since the collection also changes the generation, it makes
> +                -- the GC happen always at least twice. But typically on the
> +                -- second iteration it should not find any buckets to collect,
> +                -- and then the collected generation matches the global one.
> +                bucket_generation_collected = bucket_generation_current
>               end
> -            if is_sent_collected and is_garbage_collected then
> -                bucket_generation_collected = bucket_generation
> +        else
> +            status = true
> +        end
> +
> +        local sleep_time = route_map_deadline - fiber_clock()
> +        if sleep_time <= 0 then
> +            local chunk = consts.LUA_CHUNK_SIZE
> +            util.table_minus_yield(route_map, route_map_old, chunk)
> +            route_map_old = util.table_copy_yield(route_map, chunk)
> +            if next(route_map_old) then
> +                sleep_time = consts.BUCKET_SENT_GARBAGE_DELAY
> +            else
> +                sleep_time = consts.TIMEOUT_INFINITY
>               end
> +            route_map_deadline = fiber_clock() + sleep_time
>           end
> +        bucket_generation_current = M.bucket_generation
>   
> -        if fiber_clock() - buckets_for_redirect_ts >=
> -           consts.BUCKET_SENT_GARBAGE_DELAY then
> -            status, err = gc_bucket_drop(buckets_for_redirect,
> -                                         consts.BUCKET.SENT)
> -            if not status then
> -                buckets_for_redirect = {}
> -                empty_sent_buckets = {}
> -                bucket_generation_collected = -1
> -                log.error('Error during deletion of empty sent buckets: %s',
> -                          err)
> -            elseif M.module_version ~= module_version then
> -                return
> +        if bucket_generation_current ~= bucket_generation_collected then
> +            -- Generation was changed during collection. Or *by* collection.
> +            if status then
> +                -- Retry immediately. If the generation was changed by the
> +                -- collection itself, it will notice it next iteration, and go
> +                -- to proper sleep.
> +                sleep_time = 0
>               else
> -                buckets_for_redirect = empty_sent_buckets or {}
> -                empty_sent_buckets = nil
> -                buckets_for_redirect_ts = fiber_clock()
> +                -- An error happened during the collection. Does not make sense
> +                -- to retry on each iteration of the event loop. The most likely
> +                -- errors are either a WAL error or a transaction abort - both
> +                -- look like an issue in the user's code and can't be fixed
> +                -- quickly anyway. Backoff.
> +                sleep_time = consts.GC_BACKOFF_INTERVAL
>               end
>           end
> -::continue::
> -        lfiber.sleep(M.collect_bucket_garbage_interval)
> +
> +        if M.module_version == module_version then
> +            M.bucket_generation_cond:wait(sleep_time)
> +        end
>       end
>   end
>   
> @@ -2423,8 +2417,6 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
>           vshard_cfg.rebalancer_disbalance_threshold
>       M.rebalancer_receiving_quota = vshard_cfg.rebalancer_max_receiving
>       M.shard_index = vshard_cfg.shard_index
> -    M.collect_bucket_garbage_interval =
> -        vshard_cfg.collect_bucket_garbage_interval
>       M.collect_lua_garbage = vshard_cfg.collect_lua_garbage
>       M.rebalancer_worker_count = vshard_cfg.rebalancer_max_sending
>       M.current_cfg = cfg
> @@ -2678,6 +2670,9 @@ else
>           storage_cfg(M.current_cfg, M.this_replica.uuid, true)
>       end
>       M.module_version = M.module_version + 1
> +    -- Background fibers could sleep waiting for bucket changes.
> +    -- Let them know it is time to reload.
> +    bucket_generation_increment()
>   end
>   
>   M.recovery_f = recovery_f
> @@ -2688,7 +2683,7 @@ M.gc_bucket_f = gc_bucket_f
>   -- These functions are saved in M not for atomic reload, but for
>   -- unit testing.
>   --
> -M.gc_bucket_step_by_type = gc_bucket_step_by_type
> +M.gc_bucket_drop = gc_bucket_drop
>   M.rebalancer_build_routes = rebalancer_build_routes
>   M.rebalancer_calculate_metrics = rebalancer_calculate_metrics
>   M.cached_find_sharded_spaces = find_sharded_spaces
> diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
> index f38af74..484f499 100644
> --- a/vshard/storage/reload_evolution.lua
> +++ b/vshard/storage/reload_evolution.lua
> @@ -4,6 +4,7 @@
>   -- in a commit.
>   --
>   local log = require('log')
> +local fiber = require('fiber')
>   
>   --
>   -- Array of upgrade functions.
> @@ -25,6 +26,13 @@ migrations[#migrations + 1] = function(M)
>       end
>   end
>   
> +migrations[#migrations + 1] = function(M)
> +    if not M.route_map then
> +        M.bucket_generation_cond = fiber.cond()
> +        M.route_map = {}
> +    end
> +end
> +
>   --
>   -- Perform an update based on a version stored in `M` (internals).
>   -- @param M Old module internals which should be updated.

  reply	other threads:[~2021-02-11  6:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
2021-02-11  6:50     ` Oleg Babin via Tarantool-patches
2021-02-12  0:09       ` Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector Vladislav Shpilevoy via Tarantool-patches
2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
2021-02-10 22:35     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches [this message]
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery Vladislav Shpilevoy via Tarantool-patches
2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure Vladislav Shpilevoy via Tarantool-patches
2021-02-10  9:01   ` Oleg Babin via Tarantool-patches
2021-02-10 22:36     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:51       ` Oleg Babin via Tarantool-patches
2021-02-12  0:09         ` Vladislav Shpilevoy via Tarantool-patches
2021-03-05 22:03   ` Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:51 ` [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
2021-02-12 11:02   ` Oleg Babin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c14f8fe2-d15b-9e35-3104-8efebacc9982@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=yaroslav.dynnikov@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox