[Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations

Tarantool development patches archive
 help / color / mirror / Atom feed

* [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations
@ 2021-02-09 23:46 Vladislav Shpilevoy via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module Vladislav Shpilevoy via Tarantool-patches
                   ` (9 more replies)
  0 siblings, 10 replies; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

The patchset makes some preparations for the incoming consistent map-reduce
feature. Mostly it reworks bucket GC and recovery so as they wouldn't block
map-reduce requests for too long time.

The last commit is a first big part related directly to map-reduce. It
introduces binary heap data structure implementation, which is going to be a
core storage for map-reduce artifacts.

Some commits were done alongside while working on the code. These are rlist
extraction, and fix of a bug in bucket_recv. I did them in first version of the
patchset as necessary, and then after some reworks they were not used anymore.
But I still want to merge them. One fixes a potential bug, other improves code
readability.

Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-147-map-reduce-part1
Issue: https://github.com/tarantool/tarantool/issues/147

Vladislav Shpilevoy (9):
  rlist: move rlist to a new module
  Use fiber.clock() instead of .time() everywhere
  test: introduce a helper to wait for bucket GC
  storage: bucket_recv() should check rs lock
  util: introduce yielding table functions
  cfg: introduce 'deprecated option' feature
  gc: introduce reactive garbage collector
  recovery: introduce reactive recovery
  util: introduce binary heap data structure

 test/failover/failover.result                 |   4 +-
 test/failover/failover.test.lua               |   4 +-
 test/lua_libs/storage_template.lua            |   9 +
 test/misc/reconfigure.result                  |  10 -
 test/misc/reconfigure.test.lua                |   3 -
 test/rebalancer/bucket_ref.result             |  19 +-
 test/rebalancer/bucket_ref.test.lua           |   8 +-
 test/rebalancer/errinj.result                 |  20 +-
 test/rebalancer/errinj.test.lua               |  12 +-
 test/rebalancer/rebalancer.result             |   5 +-
 test/rebalancer/rebalancer.test.lua           |   3 +-
 .../rebalancer/rebalancer_lock_and_pin.result |  14 +
 .../rebalancer_lock_and_pin.test.lua          |   4 +
 test/rebalancer/receiving_bucket.result       |  10 +-
 test/rebalancer/receiving_bucket.test.lua     |   3 +-
 test/reload_evolution/storage.result          |   7 +-
 test/reload_evolution/storage.test.lua        |   3 +-
 test/router/reroute_wrong_bucket.result       |   8 +-
 test/router/reroute_wrong_bucket.test.lua     |   4 +-
 test/router/router.result                     |  22 +-
 test/router/router.test.lua                   |  13 +-
 test/storage/recovery.result                  |  11 +-
 test/storage/recovery.test.lua                |   5 +
 test/storage/recovery_errinj.result           |  16 +-
 test/storage/recovery_errinj.test.lua         |   9 +-
 test/storage/storage.result                   |  10 +-
 test/storage/storage.test.lua                 |   1 +
 test/unit-tap/heap.test.lua                   | 310 ++++++++++++++++
 test/unit-tap/suite.ini                       |   4 +
 test/unit/config.result                       |  35 +-
 test/unit/config.test.lua                     |  16 +-
 test/unit/garbage.result                      | 106 +++---
 test/unit/garbage.test.lua                    |  47 ++-
 test/unit/garbage_errinj.result               | 223 ------------
 test/unit/garbage_errinj.test.lua             |  73 ----
 test/unit/rebalancer.result                   |  99 -----
 test/unit/rebalancer.test.lua                 |  27 --
 test/unit/rlist.result                        | 114 ++++++
 test/unit/rlist.test.lua                      |  33 ++
 test/unit/util.result                         | 113 ++++++
 test/unit/util.test.lua                       |  49 +++
 vshard/cfg.lua                                |  10 +-
 vshard/consts.lua                             |   7 +-
 vshard/heap.lua                               | 226 ++++++++++++
 vshard/replicaset.lua                         |  13 +-
 vshard/rlist.lua                              |  53 +++
 vshard/router/init.lua                        |  16 +-
 vshard/storage/init.lua                       | 342 +++++++++---------
 vshard/storage/reload_evolution.lua           |   8 +
 vshard/util.lua                               |  40 ++
 50 files changed, 1368 insertions(+), 833 deletions(-)
 create mode 100755 test/unit-tap/heap.test.lua
 create mode 100644 test/unit-tap/suite.ini
 delete mode 100644 test/unit/garbage_errinj.result
 delete mode 100644 test/unit/garbage_errinj.test.lua
 create mode 100644 test/unit/rlist.result
 create mode 100644 test/unit/rlist.test.lua
 create mode 100644 vshard/heap.lua
 create mode 100644 vshard/rlist.lua

-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere Vladislav Shpilevoy via Tarantool-patches
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Rlist in storage/init.lua implemented a container similar to rlist
in libsmall in Tarantool core. Doubly-linked list.

It does not depend on anything in storage/init.lua, and should
have been done in a separate module from the beginning.

Now init.lua is going to grow even more in scope of map-reduce
feature, beyond 3k lines if nothing would be moved out. It was
decided (by me) that it crosses the border of when it is time to
split init.lua into separate modules.

The patch takes the low hanging fruit by moving rlist into its
own module.
---
 test/unit/rebalancer.result   |  99 -----------------------------
 test/unit/rebalancer.test.lua |  27 --------
 test/unit/rlist.result        | 114 ++++++++++++++++++++++++++++++++++
 test/unit/rlist.test.lua      |  33 ++++++++++
 vshard/rlist.lua              |  53 ++++++++++++++++
 vshard/storage/init.lua       |  68 +++-----------------
 6 files changed, 208 insertions(+), 186 deletions(-)
 create mode 100644 test/unit/rlist.result
 create mode 100644 test/unit/rlist.test.lua
 create mode 100644 vshard/rlist.lua

diff --git a/test/unit/rebalancer.result b/test/unit/rebalancer.result
index 2fb30e2..19aa480 100644
--- a/test/unit/rebalancer.result
+++ b/test/unit/rebalancer.result
@@ -1008,105 +1008,6 @@ build_routes(replicasets)
 -- the latter is a dispenser. It is a structure which hands out
 -- destination UUIDs in a round-robin manner to worker fibers.
 --
-list = rlist.new()
----
-...
-list
----
-- count: 0
-...
-obj1 = {i = 1}
----
-...
-rlist.remove(list, obj1)
----
-...
-list
----
-- count: 0
-...
-rlist.add_tail(list, obj1)
----
-...
-list
----
-- count: 1
-  last: &0
-    i: 1
-  first: *0
-...
-rlist.remove(list, obj1)
----
-...
-list
----
-- count: 0
-...
-obj1
----
-- i: 1
-...
-rlist.add_tail(list, obj1)
----
-...
-obj2 = {i = 2}
----
-...
-rlist.add_tail(list, obj2)
----
-...
-list
----
-- count: 2
-  last: &0
-    i: 2
-    prev: &1
-      i: 1
-      next: *0
-  first: *1
-...
-obj3 = {i = 3}
----
-...
-rlist.add_tail(list, obj3)
----
-...
-list
----
-- count: 3
-  last: &0
-    i: 3
-    prev: &1
-      i: 2
-      next: *0
-      prev: &2
-        i: 1
-        next: *1
-  first: *2
-...
-rlist.remove(list, obj2)
----
-...
-list
----
-- count: 2
-  last: &0
-    i: 3
-    prev: &1
-      i: 1
-      next: *0
-  first: *1
-...
-rlist.remove(list, obj1)
----
-...
-list
----
-- count: 1
-  last: &0
-    i: 3
-  first: *0
-...
 d = dispenser.create({uuid = 15})
 ---
 ...
diff --git a/test/unit/rebalancer.test.lua b/test/unit/rebalancer.test.lua
index a4e18c1..8087d42 100644
--- a/test/unit/rebalancer.test.lua
+++ b/test/unit/rebalancer.test.lua
@@ -246,33 +246,6 @@ build_routes(replicasets)
 -- the latter is a dispenser. It is a structure which hands out
 -- destination UUIDs in a round-robin manner to worker fibers.
 --
-list = rlist.new()
-list
-
-obj1 = {i = 1}
-rlist.remove(list, obj1)
-list
-
-rlist.add_tail(list, obj1)
-list
-
-rlist.remove(list, obj1)
-list
-obj1
-
-rlist.add_tail(list, obj1)
-obj2 = {i = 2}
-rlist.add_tail(list, obj2)
-list
-obj3 = {i = 3}
-rlist.add_tail(list, obj3)
-list
-
-rlist.remove(list, obj2)
-list
-rlist.remove(list, obj1)
-list
-
 d = dispenser.create({uuid = 15})
 dispenser.pop(d)
 for i = 1, 14 do assert(dispenser.pop(d) == 'uuid', i) end
diff --git a/test/unit/rlist.result b/test/unit/rlist.result
new file mode 100644
index 0000000..c8aabc0
--- /dev/null
+++ b/test/unit/rlist.result
@@ -0,0 +1,114 @@
+-- test-run result file version 2
+--
+-- gh-161: parallel rebalancer. One of the most important part of the latter is
+-- a dispenser. It is a structure which hands out destination UUIDs in a
+-- round-robin manner to worker fibers. It uses rlist data structure.
+--
+rlist = require('vshard.rlist')
+ | ---
+ | ...
+
+list = rlist.new()
+ | ---
+ | ...
+list
+ | ---
+ | - count: 0
+ | ...
+
+obj1 = {i = 1}
+ | ---
+ | ...
+list:remove(obj1)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 0
+ | ...
+
+list:add_tail(obj1)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 1
+ |   last: &0
+ |     i: 1
+ |   first: *0
+ | ...
+
+list:remove(obj1)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 0
+ | ...
+obj1
+ | ---
+ | - i: 1
+ | ...
+
+list:add_tail(obj1)
+ | ---
+ | ...
+obj2 = {i = 2}
+ | ---
+ | ...
+list:add_tail(obj2)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 2
+ |   last: &0
+ |     i: 2
+ |     prev: &1
+ |       i: 1
+ |       next: *0
+ |   first: *1
+ | ...
+obj3 = {i = 3}
+ | ---
+ | ...
+list:add_tail(obj3)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 3
+ |   last: &0
+ |     i: 3
+ |     prev: &1
+ |       i: 2
+ |       next: *0
+ |       prev: &2
+ |         i: 1
+ |         next: *1
+ |   first: *2
+ | ...
+
+list:remove(obj2)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 2
+ |   last: &0
+ |     i: 3
+ |     prev: &1
+ |       i: 1
+ |       next: *0
+ |   first: *1
+ | ...
+list:remove(obj1)
+ | ---
+ | ...
+list
+ | ---
+ | - count: 1
+ |   last: &0
+ |     i: 3
+ |   first: *0
+ | ...
diff --git a/test/unit/rlist.test.lua b/test/unit/rlist.test.lua
new file mode 100644
index 0000000..db52955
--- /dev/null
+++ b/test/unit/rlist.test.lua
@@ -0,0 +1,33 @@
+--
+-- gh-161: parallel rebalancer. One of the most important part of the latter is
+-- a dispenser. It is a structure which hands out destination UUIDs in a
+-- round-robin manner to worker fibers. It uses rlist data structure.
+--
+rlist = require('vshard.rlist')
+
+list = rlist.new()
+list
+
+obj1 = {i = 1}
+list:remove(obj1)
+list
+
+list:add_tail(obj1)
+list
+
+list:remove(obj1)
+list
+obj1
+
+list:add_tail(obj1)
+obj2 = {i = 2}
+list:add_tail(obj2)
+list
+obj3 = {i = 3}
+list:add_tail(obj3)
+list
+
+list:remove(obj2)
+list
+list:remove(obj1)
+list
diff --git a/vshard/rlist.lua b/vshard/rlist.lua
new file mode 100644
index 0000000..4be5382
--- /dev/null
+++ b/vshard/rlist.lua
@@ -0,0 +1,53 @@
+--
+-- A subset of rlist methods from the main repository. Rlist is a
+-- doubly linked list, and is used here to implement a queue of
+-- routes in the parallel rebalancer.
+--
+local rlist_mt = {}
+
+function rlist_mt.add_tail(rlist, object)
+    local last = rlist.last
+    if last then
+        last.next = object
+        object.prev = last
+    else
+        rlist.first = object
+    end
+    rlist.last = object
+    rlist.count = rlist.count + 1
+end
+
+function rlist_mt.remove(rlist, object)
+    local prev = object.prev
+    local next = object.next
+    local belongs_to_list = false
+    if prev then
+        belongs_to_list = true
+        prev.next = next
+    end
+    if next then
+        belongs_to_list = true
+        next.prev = prev
+    end
+    object.prev = nil
+    object.next = nil
+    if rlist.last == object then
+        belongs_to_list = true
+        rlist.last = prev
+    end
+    if rlist.first == object then
+        belongs_to_list = true
+        rlist.first = next
+    end
+    if belongs_to_list then
+        rlist.count = rlist.count - 1
+    end
+end
+
+local function rlist_new()
+    return setmetatable({count = 0}, {__index = rlist_mt})
+end
+
+return {
+    new = rlist_new,
+}
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 5464824..1b48bf1 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -13,12 +13,13 @@ if rawget(_G, MODULE_INTERNALS) then
         'vshard.consts', 'vshard.error', 'vshard.cfg',
         'vshard.replicaset', 'vshard.util',
         'vshard.storage.reload_evolution',
-        'vshard.lua_gc',
+        'vshard.lua_gc', 'vshard.rlist'
     }
     for _, module in pairs(vshard_modules) do
         package.loaded[module] = nil
     end
 end
+local rlist = require('vshard.rlist')
 local consts = require('vshard.consts')
 local lerror = require('vshard.error')
 local lcfg = require('vshard.cfg')
@@ -1786,54 +1787,6 @@ local function rebalancer_build_routes(replicasets)
     return bucket_routes
 end
 
---
--- A subset of rlist methods from the main repository. Rlist is a
--- doubly linked list, and is used here to implement a queue of
--- routes in the parallel rebalancer.
---
-local function rlist_new()
-    return {count = 0}
-end
-
-local function rlist_add_tail(rlist, object)
-    local last = rlist.last
-    if last then
-        last.next = object
-        object.prev = last
-    else
-        rlist.first = object
-    end
-    rlist.last = object
-    rlist.count = rlist.count + 1
-end
-
-local function rlist_remove(rlist, object)
-    local prev = object.prev
-    local next = object.next
-    local belongs_to_list = false
-    if prev then
-        belongs_to_list = true
-        prev.next = next
-    end
-    if next then
-        belongs_to_list = true
-        next.prev = prev
-    end
-    object.prev = nil
-    object.next = nil
-    if rlist.last == object then
-        belongs_to_list = true
-        rlist.last = prev
-    end
-    if rlist.first == object then
-        belongs_to_list = true
-        rlist.first = next
-    end
-    if belongs_to_list then
-        rlist.count = rlist.count - 1
-    end
-end
-
 --
 -- Dispenser is a container of routes received from the
 -- rebalancer. Its task is to hand out the routes to worker fibers
@@ -1842,7 +1795,7 @@ end
 -- receiver nodes.
 --
 local function route_dispenser_create(routes)
-    local rlist = rlist_new()
+    local rlist = rlist.new()
     local map = {}
     for uuid, bucket_count in pairs(routes) do
         local new = {
@@ -1873,7 +1826,7 @@ local function route_dispenser_create(routes)
         --    the main applier fiber does some analysis on the
         --    destinations.
         map[uuid] = new
-        rlist_add_tail(rlist, new)
+        rlist:add_tail(new)
     end
     return {
         rlist = rlist,
@@ -1892,7 +1845,7 @@ local function route_dispenser_put(dispenser, uuid)
         local bucket_count = dst.bucket_count + 1
         dst.bucket_count = bucket_count
         if bucket_count == 1 then
-            rlist_add_tail(dispenser.rlist, dst)
+            dispenser.rlist:add_tail(dst)
         end
     end
 end
@@ -1909,7 +1862,7 @@ local function route_dispenser_skip(dispenser, uuid)
     local dst = map[uuid]
     if dst then
         map[uuid] = nil
-        rlist_remove(dispenser.rlist, dst)
+        dispenser.rlist:remove(dst)
     end
 end
 
@@ -1952,9 +1905,9 @@ local function route_dispenser_pop(dispenser)
     if dst then
         local bucket_count = dst.bucket_count - 1
         dst.bucket_count = bucket_count
-        rlist_remove(rlist, dst)
+        rlist:remove(dst)
         if bucket_count > 0 then
-            rlist_add_tail(rlist, dst)
+            rlist:add_tail(dst)
         end
         return dst.uuid
     end
@@ -2742,11 +2695,6 @@ M.route_dispenser = {
     pop = route_dispenser_pop,
     sent = route_dispenser_sent,
 }
-M.rlist = {
-    new = rlist_new,
-    add_tail = rlist_add_tail,
-    remove = rlist_remove,
-}
 M.schema_latest_version = schema_latest_version
 M.schema_current_version = schema_current_version
 M.schema_upgrade_master = schema_upgrade_master
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
  2021-02-11  6:50     ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  8:57 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for your patch. LGTM.

On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Rlist in storage/init.lua implemented a container similar to rlist
> in libsmall in Tarantool core. Doubly-linked list.
>
> It does not depend on anything in storage/init.lua, and should
> have been done in a separate module from the beginning.
>
> Now init.lua is going to grow even more in scope of map-reduce
> feature, beyond 3k lines if nothing would be moved out. It was
> decided (by me) that it crosses the border of when it is time to
> split init.lua into separate modules.
>
> The patch takes the low hanging fruit by moving rlist into its
> own module.
> ---
>   test/unit/rebalancer.result   |  99 -----------------------------
>   test/unit/rebalancer.test.lua |  27 --------
>   test/unit/rlist.result        | 114 ++++++++++++++++++++++++++++++++++
>   test/unit/rlist.test.lua      |  33 ++++++++++
>   vshard/rlist.lua              |  53 ++++++++++++++++
>   vshard/storage/init.lua       |  68 +++-----------------
>   6 files changed, 208 insertions(+), 186 deletions(-)
>   create mode 100644 test/unit/rlist.result
>   create mode 100644 test/unit/rlist.test.lua
>   create mode 100644 vshard/rlist.lua
>
> diff --git a/test/unit/rebalancer.result b/test/unit/rebalancer.result
> index 2fb30e2..19aa480 100644
> --- a/test/unit/rebalancer.result
> +++ b/test/unit/rebalancer.result
> @@ -1008,105 +1008,6 @@ build_routes(replicasets)
>   -- the latter is a dispenser. It is a structure which hands out
>   -- destination UUIDs in a round-robin manner to worker fibers.
>   --
> -list = rlist.new()
> ----
> -...
> -list
> ----
> -- count: 0
> -...
> -obj1 = {i = 1}
> ----
> -...
> -rlist.remove(list, obj1)
> ----
> -...
> -list
> ----
> -- count: 0
> -...
> -rlist.add_tail(list, obj1)
> ----
> -...
> -list
> ----
> -- count: 1
> -  last: &0
> -    i: 1
> -  first: *0
> -...
> -rlist.remove(list, obj1)
> ----
> -...
> -list
> ----
> -- count: 0
> -...
> -obj1
> ----
> -- i: 1
> -...
> -rlist.add_tail(list, obj1)
> ----
> -...
> -obj2 = {i = 2}
> ----
> -...
> -rlist.add_tail(list, obj2)
> ----
> -...
> -list
> ----
> -- count: 2
> -  last: &0
> -    i: 2
> -    prev: &1
> -      i: 1
> -      next: *0
> -  first: *1
> -...
> -obj3 = {i = 3}
> ----
> -...
> -rlist.add_tail(list, obj3)
> ----
> -...
> -list
> ----
> -- count: 3
> -  last: &0
> -    i: 3
> -    prev: &1
> -      i: 2
> -      next: *0
> -      prev: &2
> -        i: 1
> -        next: *1
> -  first: *2
> -...
> -rlist.remove(list, obj2)
> ----
> -...
> -list
> ----
> -- count: 2
> -  last: &0
> -    i: 3
> -    prev: &1
> -      i: 1
> -      next: *0
> -  first: *1
> -...
> -rlist.remove(list, obj1)
> ----
> -...
> -list
> ----
> -- count: 1
> -  last: &0
> -    i: 3
> -  first: *0
> -...
>   d = dispenser.create({uuid = 15})
>   ---
>   ...
> diff --git a/test/unit/rebalancer.test.lua b/test/unit/rebalancer.test.lua
> index a4e18c1..8087d42 100644
> --- a/test/unit/rebalancer.test.lua
> +++ b/test/unit/rebalancer.test.lua
> @@ -246,33 +246,6 @@ build_routes(replicasets)
>   -- the latter is a dispenser. It is a structure which hands out
>   -- destination UUIDs in a round-robin manner to worker fibers.
>   --
> -list = rlist.new()
> -list
> -
> -obj1 = {i = 1}
> -rlist.remove(list, obj1)
> -list
> -
> -rlist.add_tail(list, obj1)
> -list
> -
> -rlist.remove(list, obj1)
> -list
> -obj1
> -
> -rlist.add_tail(list, obj1)
> -obj2 = {i = 2}
> -rlist.add_tail(list, obj2)
> -list
> -obj3 = {i = 3}
> -rlist.add_tail(list, obj3)
> -list
> -
> -rlist.remove(list, obj2)
> -list
> -rlist.remove(list, obj1)
> -list
> -
>   d = dispenser.create({uuid = 15})
>   dispenser.pop(d)
>   for i = 1, 14 do assert(dispenser.pop(d) == 'uuid', i) end
> diff --git a/test/unit/rlist.result b/test/unit/rlist.result
> new file mode 100644
> index 0000000..c8aabc0
> --- /dev/null
> +++ b/test/unit/rlist.result
> @@ -0,0 +1,114 @@
> +-- test-run result file version 2
> +--
> +-- gh-161: parallel rebalancer. One of the most important part of the latter is
> +-- a dispenser. It is a structure which hands out destination UUIDs in a
> +-- round-robin manner to worker fibers. It uses rlist data structure.
> +--
> +rlist = require('vshard.rlist')
> + | ---
> + | ...
> +
> +list = rlist.new()
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 0
> + | ...
> +
> +obj1 = {i = 1}
> + | ---
> + | ...
> +list:remove(obj1)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 0
> + | ...
> +
> +list:add_tail(obj1)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 1
> + |   last: &0
> + |     i: 1
> + |   first: *0
> + | ...
> +
> +list:remove(obj1)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 0
> + | ...
> +obj1
> + | ---
> + | - i: 1
> + | ...
> +
> +list:add_tail(obj1)
> + | ---
> + | ...
> +obj2 = {i = 2}
> + | ---
> + | ...
> +list:add_tail(obj2)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 2
> + |   last: &0
> + |     i: 2
> + |     prev: &1
> + |       i: 1
> + |       next: *0
> + |   first: *1
> + | ...
> +obj3 = {i = 3}
> + | ---
> + | ...
> +list:add_tail(obj3)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 3
> + |   last: &0
> + |     i: 3
> + |     prev: &1
> + |       i: 2
> + |       next: *0
> + |       prev: &2
> + |         i: 1
> + |         next: *1
> + |   first: *2
> + | ...
> +
> +list:remove(obj2)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 2
> + |   last: &0
> + |     i: 3
> + |     prev: &1
> + |       i: 1
> + |       next: *0
> + |   first: *1
> + | ...
> +list:remove(obj1)
> + | ---
> + | ...
> +list
> + | ---
> + | - count: 1
> + |   last: &0
> + |     i: 3
> + |   first: *0
> + | ...
> diff --git a/test/unit/rlist.test.lua b/test/unit/rlist.test.lua
> new file mode 100644
> index 0000000..db52955
> --- /dev/null
> +++ b/test/unit/rlist.test.lua
> @@ -0,0 +1,33 @@
> +--
> +-- gh-161: parallel rebalancer. One of the most important part of the latter is
> +-- a dispenser. It is a structure which hands out destination UUIDs in a
> +-- round-robin manner to worker fibers. It uses rlist data structure.
> +--
> +rlist = require('vshard.rlist')
> +
> +list = rlist.new()
> +list
> +
> +obj1 = {i = 1}
> +list:remove(obj1)
> +list
> +
> +list:add_tail(obj1)
> +list
> +
> +list:remove(obj1)
> +list
> +obj1
> +
> +list:add_tail(obj1)
> +obj2 = {i = 2}
> +list:add_tail(obj2)
> +list
> +obj3 = {i = 3}
> +list:add_tail(obj3)
> +list
> +
> +list:remove(obj2)
> +list
> +list:remove(obj1)
> +list
> diff --git a/vshard/rlist.lua b/vshard/rlist.lua
> new file mode 100644
> index 0000000..4be5382
> --- /dev/null
> +++ b/vshard/rlist.lua
> @@ -0,0 +1,53 @@
> +--
> +-- A subset of rlist methods from the main repository. Rlist is a
> +-- doubly linked list, and is used here to implement a queue of
> +-- routes in the parallel rebalancer.
> +--
> +local rlist_mt = {}
> +
> +function rlist_mt.add_tail(rlist, object)
> +    local last = rlist.last
> +    if last then
> +        last.next = object
> +        object.prev = last
> +    else
> +        rlist.first = object
> +    end
> +    rlist.last = object
> +    rlist.count = rlist.count + 1
> +end
> +
> +function rlist_mt.remove(rlist, object)
> +    local prev = object.prev
> +    local next = object.next
> +    local belongs_to_list = false
> +    if prev then
> +        belongs_to_list = true
> +        prev.next = next
> +    end
> +    if next then
> +        belongs_to_list = true
> +        next.prev = prev
> +    end
> +    object.prev = nil
> +    object.next = nil
> +    if rlist.last == object then
> +        belongs_to_list = true
> +        rlist.last = prev
> +    end
> +    if rlist.first == object then
> +        belongs_to_list = true
> +        rlist.first = next
> +    end
> +    if belongs_to_list then
> +        rlist.count = rlist.count - 1
> +    end
> +end
> +
> +local function rlist_new()
> +    return setmetatable({count = 0}, {__index = rlist_mt})
> +end
> +
> +return {
> +    new = rlist_new,
> +}
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 5464824..1b48bf1 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -13,12 +13,13 @@ if rawget(_G, MODULE_INTERNALS) then
>           'vshard.consts', 'vshard.error', 'vshard.cfg',
>           'vshard.replicaset', 'vshard.util',
>           'vshard.storage.reload_evolution',
> -        'vshard.lua_gc',
> +        'vshard.lua_gc', 'vshard.rlist'
>       }
>       for _, module in pairs(vshard_modules) do
>           package.loaded[module] = nil
>       end
>   end
> +local rlist = require('vshard.rlist')
>   local consts = require('vshard.consts')
>   local lerror = require('vshard.error')
>   local lcfg = require('vshard.cfg')
> @@ -1786,54 +1787,6 @@ local function rebalancer_build_routes(replicasets)
>       return bucket_routes
>   end
>   
> ---
> --- A subset of rlist methods from the main repository. Rlist is a
> --- doubly linked list, and is used here to implement a queue of
> --- routes in the parallel rebalancer.
> ---
> -local function rlist_new()
> -    return {count = 0}
> -end
> -
> -local function rlist_add_tail(rlist, object)
> -    local last = rlist.last
> -    if last then
> -        last.next = object
> -        object.prev = last
> -    else
> -        rlist.first = object
> -    end
> -    rlist.last = object
> -    rlist.count = rlist.count + 1
> -end
> -
> -local function rlist_remove(rlist, object)
> -    local prev = object.prev
> -    local next = object.next
> -    local belongs_to_list = false
> -    if prev then
> -        belongs_to_list = true
> -        prev.next = next
> -    end
> -    if next then
> -        belongs_to_list = true
> -        next.prev = prev
> -    end
> -    object.prev = nil
> -    object.next = nil
> -    if rlist.last == object then
> -        belongs_to_list = true
> -        rlist.last = prev
> -    end
> -    if rlist.first == object then
> -        belongs_to_list = true
> -        rlist.first = next
> -    end
> -    if belongs_to_list then
> -        rlist.count = rlist.count - 1
> -    end
> -end
> -
>   --
>   -- Dispenser is a container of routes received from the
>   -- rebalancer. Its task is to hand out the routes to worker fibers
> @@ -1842,7 +1795,7 @@ end
>   -- receiver nodes.
>   --
>   local function route_dispenser_create(routes)
> -    local rlist = rlist_new()
> +    local rlist = rlist.new()
>       local map = {}
>       for uuid, bucket_count in pairs(routes) do
>           local new = {
> @@ -1873,7 +1826,7 @@ local function route_dispenser_create(routes)
>           --    the main applier fiber does some analysis on the
>           --    destinations.
>           map[uuid] = new
> -        rlist_add_tail(rlist, new)
> +        rlist:add_tail(new)
>       end
>       return {
>           rlist = rlist,
> @@ -1892,7 +1845,7 @@ local function route_dispenser_put(dispenser, uuid)
>           local bucket_count = dst.bucket_count + 1
>           dst.bucket_count = bucket_count
>           if bucket_count == 1 then
> -            rlist_add_tail(dispenser.rlist, dst)
> +            dispenser.rlist:add_tail(dst)
>           end
>       end
>   end
> @@ -1909,7 +1862,7 @@ local function route_dispenser_skip(dispenser, uuid)
>       local dst = map[uuid]
>       if dst then
>           map[uuid] = nil
> -        rlist_remove(dispenser.rlist, dst)
> +        dispenser.rlist:remove(dst)
>       end
>   end
>   
> @@ -1952,9 +1905,9 @@ local function route_dispenser_pop(dispenser)
>       if dst then
>           local bucket_count = dst.bucket_count - 1
>           dst.bucket_count = bucket_count
> -        rlist_remove(rlist, dst)
> +        rlist:remove(dst)
>           if bucket_count > 0 then
> -            rlist_add_tail(rlist, dst)
> +            rlist:add_tail(dst)
>           end
>           return dst.uuid
>       end
> @@ -2742,11 +2695,6 @@ M.route_dispenser = {
>       pop = route_dispenser_pop,
>       sent = route_dispenser_sent,
>   }
> -M.rlist = {
> -    new = rlist_new,
> -    add_tail = rlist_add_tail,
> -    remove = rlist_remove,
> -}
>   M.schema_latest_version = schema_latest_version
>   M.schema_current_version = schema_current_version
>   M.schema_upgrade_master = schema_upgrade_master

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module
  2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
@ 2021-02-11  6:50     ` Oleg Babin via Tarantool-patches
  2021-02-12  0:09       ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-11  6:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

I've noticed that you've missed to add new file to vshard/CMakeList.txt [1]

It will break the build.


[1] https://github.com/tarantool/vshard/blob/master/vshard/CMakeLists.txt#L9


On 10/02/2021 11:57, Oleg Babin via Tarantool-patches wrote:
> Hi! Thanks for your patch. LGTM.
>
> On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
>> Rlist in storage/init.lua implemented a container similar to rlist
>> in libsmall in Tarantool core. Doubly-linked list.
>>
>> It does not depend on anything in storage/init.lua, and should
>> have been done in a separate module from the beginning.
>>
>> Now init.lua is going to grow even more in scope of map-reduce
>> feature, beyond 3k lines if nothing would be moved out. It was
>> decided (by me) that it crosses the border of when it is time to
>> split init.lua into separate modules.
>>
>> The patch takes the low hanging fruit by moving rlist into its
>> own module.
>> ---
>>   test/unit/rebalancer.result   |  99 -----------------------------
>>   test/unit/rebalancer.test.lua |  27 --------
>>   test/unit/rlist.result        | 114 ++++++++++++++++++++++++++++++++++
>>   test/unit/rlist.test.lua      |  33 ++++++++++
>>   vshard/rlist.lua              |  53 ++++++++++++++++
>>   vshard/storage/init.lua       |  68 +++-----------------
>>   6 files changed, 208 insertions(+), 186 deletions(-)
>>   create mode 100644 test/unit/rlist.result
>>   create mode 100644 test/unit/rlist.test.lua
>>   create mode 100644 vshard/rlist.lua
>>
>> diff --git a/test/unit/rebalancer.result b/test/unit/rebalancer.result
>> index 2fb30e2..19aa480 100644
>> --- a/test/unit/rebalancer.result
>> +++ b/test/unit/rebalancer.result
>> @@ -1008,105 +1008,6 @@ build_routes(replicasets)
>>   -- the latter is a dispenser. It is a structure which hands out
>>   -- destination UUIDs in a round-robin manner to worker fibers.
>>   --
>> -list = rlist.new()
>> ----
>> -...
>> -list
>> ----
>> -- count: 0
>> -...
>> -obj1 = {i = 1}
>> ----
>> -...
>> -rlist.remove(list, obj1)
>> ----
>> -...
>> -list
>> ----
>> -- count: 0
>> -...
>> -rlist.add_tail(list, obj1)
>> ----
>> -...
>> -list
>> ----
>> -- count: 1
>> -  last: &0
>> -    i: 1
>> -  first: *0
>> -...
>> -rlist.remove(list, obj1)
>> ----
>> -...
>> -list
>> ----
>> -- count: 0
>> -...
>> -obj1
>> ----
>> -- i: 1
>> -...
>> -rlist.add_tail(list, obj1)
>> ----
>> -...
>> -obj2 = {i = 2}
>> ----
>> -...
>> -rlist.add_tail(list, obj2)
>> ----
>> -...
>> -list
>> ----
>> -- count: 2
>> -  last: &0
>> -    i: 2
>> -    prev: &1
>> -      i: 1
>> -      next: *0
>> -  first: *1
>> -...
>> -obj3 = {i = 3}
>> ----
>> -...
>> -rlist.add_tail(list, obj3)
>> ----
>> -...
>> -list
>> ----
>> -- count: 3
>> -  last: &0
>> -    i: 3
>> -    prev: &1
>> -      i: 2
>> -      next: *0
>> -      prev: &2
>> -        i: 1
>> -        next: *1
>> -  first: *2
>> -...
>> -rlist.remove(list, obj2)
>> ----
>> -...
>> -list
>> ----
>> -- count: 2
>> -  last: &0
>> -    i: 3
>> -    prev: &1
>> -      i: 1
>> -      next: *0
>> -  first: *1
>> -...
>> -rlist.remove(list, obj1)
>> ----
>> -...
>> -list
>> ----
>> -- count: 1
>> -  last: &0
>> -    i: 3
>> -  first: *0
>> -...
>>   d = dispenser.create({uuid = 15})
>>   ---
>>   ...
>> diff --git a/test/unit/rebalancer.test.lua 
>> b/test/unit/rebalancer.test.lua
>> index a4e18c1..8087d42 100644
>> --- a/test/unit/rebalancer.test.lua
>> +++ b/test/unit/rebalancer.test.lua
>> @@ -246,33 +246,6 @@ build_routes(replicasets)
>>   -- the latter is a dispenser. It is a structure which hands out
>>   -- destination UUIDs in a round-robin manner to worker fibers.
>>   --
>> -list = rlist.new()
>> -list
>> -
>> -obj1 = {i = 1}
>> -rlist.remove(list, obj1)
>> -list
>> -
>> -rlist.add_tail(list, obj1)
>> -list
>> -
>> -rlist.remove(list, obj1)
>> -list
>> -obj1
>> -
>> -rlist.add_tail(list, obj1)
>> -obj2 = {i = 2}
>> -rlist.add_tail(list, obj2)
>> -list
>> -obj3 = {i = 3}
>> -rlist.add_tail(list, obj3)
>> -list
>> -
>> -rlist.remove(list, obj2)
>> -list
>> -rlist.remove(list, obj1)
>> -list
>> -
>>   d = dispenser.create({uuid = 15})
>>   dispenser.pop(d)
>>   for i = 1, 14 do assert(dispenser.pop(d) == 'uuid', i) end
>> diff --git a/test/unit/rlist.result b/test/unit/rlist.result
>> new file mode 100644
>> index 0000000..c8aabc0
>> --- /dev/null
>> +++ b/test/unit/rlist.result
>> @@ -0,0 +1,114 @@
>> +-- test-run result file version 2
>> +--
>> +-- gh-161: parallel rebalancer. One of the most important part of 
>> the latter is
>> +-- a dispenser. It is a structure which hands out destination UUIDs 
>> in a
>> +-- round-robin manner to worker fibers. It uses rlist data structure.
>> +--
>> +rlist = require('vshard.rlist')
>> + | ---
>> + | ...
>> +
>> +list = rlist.new()
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 0
>> + | ...
>> +
>> +obj1 = {i = 1}
>> + | ---
>> + | ...
>> +list:remove(obj1)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 0
>> + | ...
>> +
>> +list:add_tail(obj1)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 1
>> + |   last: &0
>> + |     i: 1
>> + |   first: *0
>> + | ...
>> +
>> +list:remove(obj1)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 0
>> + | ...
>> +obj1
>> + | ---
>> + | - i: 1
>> + | ...
>> +
>> +list:add_tail(obj1)
>> + | ---
>> + | ...
>> +obj2 = {i = 2}
>> + | ---
>> + | ...
>> +list:add_tail(obj2)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 2
>> + |   last: &0
>> + |     i: 2
>> + |     prev: &1
>> + |       i: 1
>> + |       next: *0
>> + |   first: *1
>> + | ...
>> +obj3 = {i = 3}
>> + | ---
>> + | ...
>> +list:add_tail(obj3)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 3
>> + |   last: &0
>> + |     i: 3
>> + |     prev: &1
>> + |       i: 2
>> + |       next: *0
>> + |       prev: &2
>> + |         i: 1
>> + |         next: *1
>> + |   first: *2
>> + | ...
>> +
>> +list:remove(obj2)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 2
>> + |   last: &0
>> + |     i: 3
>> + |     prev: &1
>> + |       i: 1
>> + |       next: *0
>> + |   first: *1
>> + | ...
>> +list:remove(obj1)
>> + | ---
>> + | ...
>> +list
>> + | ---
>> + | - count: 1
>> + |   last: &0
>> + |     i: 3
>> + |   first: *0
>> + | ...
>> diff --git a/test/unit/rlist.test.lua b/test/unit/rlist.test.lua
>> new file mode 100644
>> index 0000000..db52955
>> --- /dev/null
>> +++ b/test/unit/rlist.test.lua
>> @@ -0,0 +1,33 @@
>> +--
>> +-- gh-161: parallel rebalancer. One of the most important part of 
>> the latter is
>> +-- a dispenser. It is a structure which hands out destination UUIDs 
>> in a
>> +-- round-robin manner to worker fibers. It uses rlist data structure.
>> +--
>> +rlist = require('vshard.rlist')
>> +
>> +list = rlist.new()
>> +list
>> +
>> +obj1 = {i = 1}
>> +list:remove(obj1)
>> +list
>> +
>> +list:add_tail(obj1)
>> +list
>> +
>> +list:remove(obj1)
>> +list
>> +obj1
>> +
>> +list:add_tail(obj1)
>> +obj2 = {i = 2}
>> +list:add_tail(obj2)
>> +list
>> +obj3 = {i = 3}
>> +list:add_tail(obj3)
>> +list
>> +
>> +list:remove(obj2)
>> +list
>> +list:remove(obj1)
>> +list
>> diff --git a/vshard/rlist.lua b/vshard/rlist.lua
>> new file mode 100644
>> index 0000000..4be5382
>> --- /dev/null
>> +++ b/vshard/rlist.lua
>> @@ -0,0 +1,53 @@
>> +--
>> +-- A subset of rlist methods from the main repository. Rlist is a
>> +-- doubly linked list, and is used here to implement a queue of
>> +-- routes in the parallel rebalancer.
>> +--
>> +local rlist_mt = {}
>> +
>> +function rlist_mt.add_tail(rlist, object)
>> +    local last = rlist.last
>> +    if last then
>> +        last.next = object
>> +        object.prev = last
>> +    else
>> +        rlist.first = object
>> +    end
>> +    rlist.last = object
>> +    rlist.count = rlist.count + 1
>> +end
>> +
>> +function rlist_mt.remove(rlist, object)
>> +    local prev = object.prev
>> +    local next = object.next
>> +    local belongs_to_list = false
>> +    if prev then
>> +        belongs_to_list = true
>> +        prev.next = next
>> +    end
>> +    if next then
>> +        belongs_to_list = true
>> +        next.prev = prev
>> +    end
>> +    object.prev = nil
>> +    object.next = nil
>> +    if rlist.last == object then
>> +        belongs_to_list = true
>> +        rlist.last = prev
>> +    end
>> +    if rlist.first == object then
>> +        belongs_to_list = true
>> +        rlist.first = next
>> +    end
>> +    if belongs_to_list then
>> +        rlist.count = rlist.count - 1
>> +    end
>> +end
>> +
>> +local function rlist_new()
>> +    return setmetatable({count = 0}, {__index = rlist_mt})
>> +end
>> +
>> +return {
>> +    new = rlist_new,
>> +}
>> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
>> index 5464824..1b48bf1 100644
>> --- a/vshard/storage/init.lua
>> +++ b/vshard/storage/init.lua
>> @@ -13,12 +13,13 @@ if rawget(_G, MODULE_INTERNALS) then
>>           'vshard.consts', 'vshard.error', 'vshard.cfg',
>>           'vshard.replicaset', 'vshard.util',
>>           'vshard.storage.reload_evolution',
>> -        'vshard.lua_gc',
>> +        'vshard.lua_gc', 'vshard.rlist'
>>       }
>>       for _, module in pairs(vshard_modules) do
>>           package.loaded[module] = nil
>>       end
>>   end
>> +local rlist = require('vshard.rlist')
>>   local consts = require('vshard.consts')
>>   local lerror = require('vshard.error')
>>   local lcfg = require('vshard.cfg')
>> @@ -1786,54 +1787,6 @@ local function 
>> rebalancer_build_routes(replicasets)
>>       return bucket_routes
>>   end
>>   ---
>> --- A subset of rlist methods from the main repository. Rlist is a
>> --- doubly linked list, and is used here to implement a queue of
>> --- routes in the parallel rebalancer.
>> ---
>> -local function rlist_new()
>> -    return {count = 0}
>> -end
>> -
>> -local function rlist_add_tail(rlist, object)
>> -    local last = rlist.last
>> -    if last then
>> -        last.next = object
>> -        object.prev = last
>> -    else
>> -        rlist.first = object
>> -    end
>> -    rlist.last = object
>> -    rlist.count = rlist.count + 1
>> -end
>> -
>> -local function rlist_remove(rlist, object)
>> -    local prev = object.prev
>> -    local next = object.next
>> -    local belongs_to_list = false
>> -    if prev then
>> -        belongs_to_list = true
>> -        prev.next = next
>> -    end
>> -    if next then
>> -        belongs_to_list = true
>> -        next.prev = prev
>> -    end
>> -    object.prev = nil
>> -    object.next = nil
>> -    if rlist.last == object then
>> -        belongs_to_list = true
>> -        rlist.last = prev
>> -    end
>> -    if rlist.first == object then
>> -        belongs_to_list = true
>> -        rlist.first = next
>> -    end
>> -    if belongs_to_list then
>> -        rlist.count = rlist.count - 1
>> -    end
>> -end
>> -
>>   --
>>   -- Dispenser is a container of routes received from the
>>   -- rebalancer. Its task is to hand out the routes to worker fibers
>> @@ -1842,7 +1795,7 @@ end
>>   -- receiver nodes.
>>   --
>>   local function route_dispenser_create(routes)
>> -    local rlist = rlist_new()
>> +    local rlist = rlist.new()
>>       local map = {}
>>       for uuid, bucket_count in pairs(routes) do
>>           local new = {
>> @@ -1873,7 +1826,7 @@ local function route_dispenser_create(routes)
>>           --    the main applier fiber does some analysis on the
>>           --    destinations.
>>           map[uuid] = new
>> -        rlist_add_tail(rlist, new)
>> +        rlist:add_tail(new)
>>       end
>>       return {
>>           rlist = rlist,
>> @@ -1892,7 +1845,7 @@ local function route_dispenser_put(dispenser, 
>> uuid)
>>           local bucket_count = dst.bucket_count + 1
>>           dst.bucket_count = bucket_count
>>           if bucket_count == 1 then
>> -            rlist_add_tail(dispenser.rlist, dst)
>> +            dispenser.rlist:add_tail(dst)
>>           end
>>       end
>>   end
>> @@ -1909,7 +1862,7 @@ local function route_dispenser_skip(dispenser, 
>> uuid)
>>       local dst = map[uuid]
>>       if dst then
>>           map[uuid] = nil
>> -        rlist_remove(dispenser.rlist, dst)
>> +        dispenser.rlist:remove(dst)
>>       end
>>   end
>>   @@ -1952,9 +1905,9 @@ local function route_dispenser_pop(dispenser)
>>       if dst then
>>           local bucket_count = dst.bucket_count - 1
>>           dst.bucket_count = bucket_count
>> -        rlist_remove(rlist, dst)
>> +        rlist:remove(dst)
>>           if bucket_count > 0 then
>> -            rlist_add_tail(rlist, dst)
>> +            rlist:add_tail(dst)
>>           end
>>           return dst.uuid
>>       end
>> @@ -2742,11 +2695,6 @@ M.route_dispenser = {
>>       pop = route_dispenser_pop,
>>       sent = route_dispenser_sent,
>>   }
>> -M.rlist = {
>> -    new = rlist_new,
>> -    add_tail = rlist_add_tail,
>> -    remove = rlist_remove,
>> -}
>>   M.schema_latest_version = schema_latest_version
>>   M.schema_current_version = schema_current_version
>>   M.schema_upgrade_master = schema_upgrade_master

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module
  2021-02-11  6:50     ` Oleg Babin via Tarantool-patches
@ 2021-02-12  0:09       ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-12  0:09 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

On 11.02.2021 07:50, Oleg Babin wrote:
> I've noticed that you've missed to add new file to vshard/CMakeList.txt [1]
> 
> It will break the build.
> 
> 
> [1] https://github.com/tarantool/vshard/blob/master/vshard/CMakeLists.txt#L9

Thanks for noticing! Fixed:

====================
diff --git a/vshard/CMakeLists.txt b/vshard/CMakeLists.txt
index 607be54..1063da8 100644
--- a/vshard/CMakeLists.txt
+++ b/vshard/CMakeLists.txt
@@ -7,4 +7,4 @@ add_subdirectory(router)
 
 # Install module
 install(FILES cfg.lua error.lua consts.lua hash.lua init.lua replicaset.lua
-        util.lua lua_gc.lua DESTINATION ${TARANTOOL_INSTALL_LUADIR}/vshard)
+        util.lua lua_gc.lua rlist.lua DESTINATION ${TARANTOOL_INSTALL_LUADIR}/vshard)
====================

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC Vladislav Shpilevoy via Tarantool-patches
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Fiber.time() returns real time. It is affected by time corrections
in the system, and can be not monotonic.

The patch makes everything in vshard use fiber.clock() instead of
fiber.time(). Also fiber.clock function is saved as an upvalue for
all functions in all modules using it. This makes the code a bit
shorter and saves 1 indexing of 'fiber' table.

The main reason - in the future map-reduce feature the current
time will be used quite often. In some places it probably will be
the slowest action (given how slow FFI can be when not compiled by
JIT).

Needed for #147
---
 test/failover/failover.result   |  4 ++--
 test/failover/failover.test.lua |  4 ++--
 vshard/replicaset.lua           | 13 +++++++------
 vshard/router/init.lua          | 16 ++++++++--------
 vshard/storage/init.lua         | 16 ++++++++--------
 5 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/test/failover/failover.result b/test/failover/failover.result
index 452694c..bae57fa 100644
--- a/test/failover/failover.result
+++ b/test/failover/failover.result
@@ -261,13 +261,13 @@ test_run:cmd('start server box_1_d')
 ---
 - true
 ...
-ts1 = fiber.time()
+ts1 = fiber.clock()
 ---
 ...
 while rs1.replica.name ~= 'box_1_d' do fiber.sleep(0.1) end
 ---
 ...
-ts2 = fiber.time()
+ts2 = fiber.clock()
 ---
 ...
 ts2 - ts1 < vshard.consts.FAILOVER_UP_TIMEOUT
diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua
index 13c517b..a969e0e 100644
--- a/test/failover/failover.test.lua
+++ b/test/failover/failover.test.lua
@@ -109,9 +109,9 @@ test_run:switch('router_1')
 -- Revive the best replica. A router must reconnect to it in
 -- FAILOVER_UP_TIMEOUT seconds.
 test_run:cmd('start server box_1_d')
-ts1 = fiber.time()
+ts1 = fiber.clock()
 while rs1.replica.name ~= 'box_1_d' do fiber.sleep(0.1) end
-ts2 = fiber.time()
+ts2 = fiber.clock()
 ts2 - ts1 < vshard.consts.FAILOVER_UP_TIMEOUT
 test_run:grep_log('router_1', 'New replica box_1_d%(storage%@')
 
diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index b13d05e..a74c0f8 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -54,6 +54,7 @@ local luri = require('uri')
 local luuid = require('uuid')
 local ffi = require('ffi')
 local util = require('vshard.util')
+local clock = fiber.clock
 local gsc = util.generate_self_checker
 
 --
@@ -88,7 +89,7 @@ local function netbox_on_connect(conn)
         -- biggest priority. Really, it is not neccessary to
         -- increase replica connection priority, if the current
         -- one already has the biggest priority. (See failover_f).
-        rs.replica_up_ts = fiber.time()
+        rs.replica_up_ts = clock()
     end
 end
 
@@ -100,7 +101,7 @@ local function netbox_on_disconnect(conn)
     assert(conn.replica)
     -- Replica is down - remember this time to decrease replica
     -- priority after FAILOVER_DOWN_TIMEOUT seconds.
-    conn.replica.down_ts = fiber.time()
+    conn.replica.down_ts = clock()
 end
 
 --
@@ -174,7 +175,7 @@ local function replicaset_up_replica_priority(replicaset)
     local old_replica = replicaset.replica
     if old_replica == replicaset.priority_list[1] and
        old_replica:is_connected() then
-        replicaset.replica_up_ts = fiber.time()
+        replicaset.replica_up_ts = clock()
         return
     end
     for _, replica in pairs(replicaset.priority_list) do
@@ -403,7 +404,7 @@ local function replicaset_template_multicallro(prefer_replica, balance)
             net_status, err = pcall(box.error, box.error.TIMEOUT)
             return nil, lerror.make(err)
         end
-        local end_time = fiber.time() + timeout
+        local end_time = clock() + timeout
         while not net_status and timeout > 0 do
             replica, err = pick_next_replica(replicaset)
             if not replica then
@@ -412,7 +413,7 @@ local function replicaset_template_multicallro(prefer_replica, balance)
             opts.timeout = timeout
             net_status, storage_status, retval, err =
                 replica_call(replica, func, args, opts)
-            timeout = end_time - fiber.time()
+            timeout = end_time - clock()
             if not net_status and not storage_status and
                not can_retry_after_error(retval) then
                 -- There is no sense to retry LuaJit errors, such as
@@ -680,7 +681,7 @@ local function buildall(sharding_cfg)
     else
         zone_weights = {}
     end
-    local curr_ts = fiber.time()
+    local curr_ts = clock()
     for replicaset_uuid, replicaset in pairs(sharding_cfg.sharding) do
         local new_replicaset = setmetatable({
             replicas = {},
diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index ba1f863..a530c29 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -1,6 +1,7 @@
 local log = require('log')
 local lfiber = require('fiber')
 local table_new = require('table.new')
+local clock = lfiber.clock
 
 local MODULE_INTERNALS = '__module_vshard_router'
 -- Reload requirements, in case this module is reloaded manually.
@@ -527,7 +528,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
     end
     local timeout = opts.timeout or consts.CALL_TIMEOUT_MIN
     local replicaset, err
-    local tend = lfiber.time() + timeout
+    local tend = clock() + timeout
     if bucket_id > router.total_bucket_count or bucket_id <= 0 then
         error('Bucket is unreachable: bucket id is out of range')
     end
@@ -551,7 +552,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
         replicaset, err = bucket_resolve(router, bucket_id)
         if replicaset then
 ::replicaset_is_found::
-            opts.timeout = tend - lfiber.time()
+            opts.timeout = tend - clock()
             local storage_call_status, call_status, call_error =
                 replicaset[call](replicaset, 'vshard.storage.call',
                                  {bucket_id, mode, func, args}, opts)
@@ -583,7 +584,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
                         -- if reconfiguration had been started,
                         -- and while is not executed on router,
                         -- but already is executed on storages.
-                        while lfiber.time() <= tend do
+                        while clock() <= tend do
                             lfiber.sleep(0.05)
                             replicaset = router.replicasets[err.destination]
                             if replicaset then
@@ -598,7 +599,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
                         -- case of broken cluster, when a bucket
                         -- is sent on two replicasets to each
                         -- other.
-                        if replicaset and lfiber.time() <= tend then
+                        if replicaset and clock() <= tend then
                             goto replicaset_is_found
                         end
                     end
@@ -623,7 +624,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
             end
         end
         lfiber.yield()
-    until lfiber.time() > tend
+    until clock() > tend
     if err then
         return nil, err
     else
@@ -749,7 +750,7 @@ end
 -- connections must be updated.
 --
 local function failover_collect_to_update(router)
-    local ts = lfiber.time()
+    local ts = clock()
     local uuid_to_update = {}
     for uuid, rs in pairs(router.replicasets) do
         if failover_need_down_priority(rs, ts) or
@@ -772,7 +773,7 @@ local function failover_step(router)
     if #uuid_to_update == 0 then
         return false
     end
-    local curr_ts = lfiber.time()
+    local curr_ts = clock()
     local replica_is_changed = false
     for _, uuid in pairs(uuid_to_update) do
         local rs = router.replicasets[uuid]
@@ -1230,7 +1231,6 @@ local function router_sync(router, timeout)
         timeout = router.sync_timeout
     end
     local arg = {timeout}
-    local clock = lfiber.clock
     local deadline = timeout and (clock() + timeout)
     local opts = {timeout = timeout}
     for rs_uuid, replicaset in pairs(router.replicasets) do
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 1b48bf1..c7335fc 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -5,6 +5,7 @@ local netbox = require('net.box') -- for net.box:self()
 local trigger = require('internal.trigger')
 local ffi = require('ffi')
 local yaml_encode = require('yaml').encode
+local clock = lfiber.clock
 
 local MODULE_INTERNALS = '__module_vshard_storage'
 -- Reload requirements, in case this module is reloaded manually.
@@ -695,7 +696,7 @@ local function sync(timeout)
     log.debug("Synchronizing replicaset...")
     timeout = timeout or M.sync_timeout
     local vclock = box.info.vclock
-    local tstart = lfiber.time()
+    local tstart = clock()
     repeat
         local done = true
         for _, replica in ipairs(box.info.replication) do
@@ -711,7 +712,7 @@ local function sync(timeout)
             return true
         end
         lfiber.sleep(0.001)
-    until not (lfiber.time() <= tstart + timeout)
+    until not (clock() <= tstart + timeout)
     log.warn("Timed out during synchronizing replicaset")
     local ok, err = pcall(box.error, box.error.TIMEOUT)
     return nil, lerror.make(err)
@@ -1280,10 +1281,9 @@ local function bucket_send_xc(bucket_id, destination, opts, exception_guard)
     ref.rw_lock = true
     exception_guard.ref = ref
     exception_guard.drop_rw_lock = true
-    local deadline = lfiber.clock() + (opts and opts.timeout or 10)
+    local deadline = clock() + (opts and opts.timeout or 10)
     while ref.rw ~= 0 do
-        if not M.bucket_rw_lock_is_ready_cond:wait(deadline -
-                                                   lfiber.clock()) then
+        if not M.bucket_rw_lock_is_ready_cond:wait(deadline - clock()) then
             status, err = pcall(box.error, box.error.TIMEOUT)
             return nil, lerror.make(err)
         end
@@ -1579,7 +1579,7 @@ function gc_bucket_f()
     -- specified time interval the buckets are deleted both from
     -- this array and from _bucket space.
     local buckets_for_redirect = {}
-    local buckets_for_redirect_ts = lfiber.time()
+    local buckets_for_redirect_ts = clock()
     -- Empty sent buckets, updated after each step, and when
     -- buckets_for_redirect is deleted, it gets empty_sent_buckets
     -- for next deletion.
@@ -1614,7 +1614,7 @@ function gc_bucket_f()
             end
         end
 
-        if lfiber.time() - buckets_for_redirect_ts >=
+        if clock() - buckets_for_redirect_ts >=
            consts.BUCKET_SENT_GARBAGE_DELAY then
             status, err = gc_bucket_drop(buckets_for_redirect,
                                          consts.BUCKET.SENT)
@@ -1629,7 +1629,7 @@ function gc_bucket_f()
             else
                 buckets_for_redirect = empty_sent_buckets or {}
                 empty_sent_buckets = nil
-                buckets_for_redirect_ts = lfiber.time()
+                buckets_for_redirect_ts = clock()
             end
         end
 ::continue::
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
  2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  8:57 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM except two nits:

- Seems you need to put "Closes #246"

- Tarantool has "clock" module. I suggest to use "fiber_clock()" instead 
of simple "clock" to avoid possible confusing.

On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Fiber.time() returns real time. It is affected by time corrections
> in the system, and can be not monotonic.
>
> The patch makes everything in vshard use fiber.clock() instead of
> fiber.time(). Also fiber.clock function is saved as an upvalue for
> all functions in all modules using it. This makes the code a bit
> shorter and saves 1 indexing of 'fiber' table.
>
> The main reason - in the future map-reduce feature the current
> time will be used quite often. In some places it probably will be
> the slowest action (given how slow FFI can be when not compiled by
> JIT).
>
> Needed for #147
> ---
>   test/failover/failover.result   |  4 ++--
>   test/failover/failover.test.lua |  4 ++--
>   vshard/replicaset.lua           | 13 +++++++------
>   vshard/router/init.lua          | 16 ++++++++--------
>   vshard/storage/init.lua         | 16 ++++++++--------
>   5 files changed, 27 insertions(+), 26 deletions(-)
>
> diff --git a/test/failover/failover.result b/test/failover/failover.result
> index 452694c..bae57fa 100644
> --- a/test/failover/failover.result
> +++ b/test/failover/failover.result
> @@ -261,13 +261,13 @@ test_run:cmd('start server box_1_d')
>   ---
>   - true
>   ...
> -ts1 = fiber.time()
> +ts1 = fiber.clock()
>   ---
>   ...
>   while rs1.replica.name ~= 'box_1_d' do fiber.sleep(0.1) end
>   ---
>   ...
> -ts2 = fiber.time()
> +ts2 = fiber.clock()
>   ---
>   ...
>   ts2 - ts1 < vshard.consts.FAILOVER_UP_TIMEOUT
> diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua
> index 13c517b..a969e0e 100644
> --- a/test/failover/failover.test.lua
> +++ b/test/failover/failover.test.lua
> @@ -109,9 +109,9 @@ test_run:switch('router_1')
>   -- Revive the best replica. A router must reconnect to it in
>   -- FAILOVER_UP_TIMEOUT seconds.
>   test_run:cmd('start server box_1_d')
> -ts1 = fiber.time()
> +ts1 = fiber.clock()
>   while rs1.replica.name ~= 'box_1_d' do fiber.sleep(0.1) end
> -ts2 = fiber.time()
> +ts2 = fiber.clock()
>   ts2 - ts1 < vshard.consts.FAILOVER_UP_TIMEOUT
>   test_run:grep_log('router_1', 'New replica box_1_d%(storage%@')
>   
> diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
> index b13d05e..a74c0f8 100644
> --- a/vshard/replicaset.lua
> +++ b/vshard/replicaset.lua
> @@ -54,6 +54,7 @@ local luri = require('uri')
>   local luuid = require('uuid')
>   local ffi = require('ffi')
>   local util = require('vshard.util')
> +local clock = fiber.clock
>   local gsc = util.generate_self_checker
>   
>   --
> @@ -88,7 +89,7 @@ local function netbox_on_connect(conn)
>           -- biggest priority. Really, it is not neccessary to
>           -- increase replica connection priority, if the current
>           -- one already has the biggest priority. (See failover_f).
> -        rs.replica_up_ts = fiber.time()
> +        rs.replica_up_ts = clock()
>       end
>   end
>   
> @@ -100,7 +101,7 @@ local function netbox_on_disconnect(conn)
>       assert(conn.replica)
>       -- Replica is down - remember this time to decrease replica
>       -- priority after FAILOVER_DOWN_TIMEOUT seconds.
> -    conn.replica.down_ts = fiber.time()
> +    conn.replica.down_ts = clock()
>   end
>   
>   --
> @@ -174,7 +175,7 @@ local function replicaset_up_replica_priority(replicaset)
>       local old_replica = replicaset.replica
>       if old_replica == replicaset.priority_list[1] and
>          old_replica:is_connected() then
> -        replicaset.replica_up_ts = fiber.time()
> +        replicaset.replica_up_ts = clock()
>           return
>       end
>       for _, replica in pairs(replicaset.priority_list) do
> @@ -403,7 +404,7 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>               net_status, err = pcall(box.error, box.error.TIMEOUT)
>               return nil, lerror.make(err)
>           end
> -        local end_time = fiber.time() + timeout
> +        local end_time = clock() + timeout
>           while not net_status and timeout > 0 do
>               replica, err = pick_next_replica(replicaset)
>               if not replica then
> @@ -412,7 +413,7 @@ local function replicaset_template_multicallro(prefer_replica, balance)
>               opts.timeout = timeout
>               net_status, storage_status, retval, err =
>                   replica_call(replica, func, args, opts)
> -            timeout = end_time - fiber.time()
> +            timeout = end_time - clock()
>               if not net_status and not storage_status and
>                  not can_retry_after_error(retval) then
>                   -- There is no sense to retry LuaJit errors, such as
> @@ -680,7 +681,7 @@ local function buildall(sharding_cfg)
>       else
>           zone_weights = {}
>       end
> -    local curr_ts = fiber.time()
> +    local curr_ts = clock()
>       for replicaset_uuid, replicaset in pairs(sharding_cfg.sharding) do
>           local new_replicaset = setmetatable({
>               replicas = {},
> diff --git a/vshard/router/init.lua b/vshard/router/init.lua
> index ba1f863..a530c29 100644
> --- a/vshard/router/init.lua
> +++ b/vshard/router/init.lua
> @@ -1,6 +1,7 @@
>   local log = require('log')
>   local lfiber = require('fiber')
>   local table_new = require('table.new')
> +local clock = lfiber.clock
>   
>   local MODULE_INTERNALS = '__module_vshard_router'
>   -- Reload requirements, in case this module is reloaded manually.
> @@ -527,7 +528,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
>       end
>       local timeout = opts.timeout or consts.CALL_TIMEOUT_MIN
>       local replicaset, err
> -    local tend = lfiber.time() + timeout
> +    local tend = clock() + timeout
>       if bucket_id > router.total_bucket_count or bucket_id <= 0 then
>           error('Bucket is unreachable: bucket id is out of range')
>       end
> @@ -551,7 +552,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
>           replicaset, err = bucket_resolve(router, bucket_id)
>           if replicaset then
>   ::replicaset_is_found::
> -            opts.timeout = tend - lfiber.time()
> +            opts.timeout = tend - clock()
>               local storage_call_status, call_status, call_error =
>                   replicaset[call](replicaset, 'vshard.storage.call',
>                                    {bucket_id, mode, func, args}, opts)
> @@ -583,7 +584,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
>                           -- if reconfiguration had been started,
>                           -- and while is not executed on router,
>                           -- but already is executed on storages.
> -                        while lfiber.time() <= tend do
> +                        while clock() <= tend do
>                               lfiber.sleep(0.05)
>                               replicaset = router.replicasets[err.destination]
>                               if replicaset then
> @@ -598,7 +599,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
>                           -- case of broken cluster, when a bucket
>                           -- is sent on two replicasets to each
>                           -- other.
> -                        if replicaset and lfiber.time() <= tend then
> +                        if replicaset and clock() <= tend then
>                               goto replicaset_is_found
>                           end
>                       end
> @@ -623,7 +624,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
>               end
>           end
>           lfiber.yield()
> -    until lfiber.time() > tend
> +    until clock() > tend
>       if err then
>           return nil, err
>       else
> @@ -749,7 +750,7 @@ end
>   -- connections must be updated.
>   --
>   local function failover_collect_to_update(router)
> -    local ts = lfiber.time()
> +    local ts = clock()
>       local uuid_to_update = {}
>       for uuid, rs in pairs(router.replicasets) do
>           if failover_need_down_priority(rs, ts) or
> @@ -772,7 +773,7 @@ local function failover_step(router)
>       if #uuid_to_update == 0 then
>           return false
>       end
> -    local curr_ts = lfiber.time()
> +    local curr_ts = clock()
>       local replica_is_changed = false
>       for _, uuid in pairs(uuid_to_update) do
>           local rs = router.replicasets[uuid]
> @@ -1230,7 +1231,6 @@ local function router_sync(router, timeout)
>           timeout = router.sync_timeout
>       end
>       local arg = {timeout}
> -    local clock = lfiber.clock
>       local deadline = timeout and (clock() + timeout)
>       local opts = {timeout = timeout}
>       for rs_uuid, replicaset in pairs(router.replicasets) do
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 1b48bf1..c7335fc 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -5,6 +5,7 @@ local netbox = require('net.box') -- for net.box:self()
>   local trigger = require('internal.trigger')
>   local ffi = require('ffi')
>   local yaml_encode = require('yaml').encode
> +local clock = lfiber.clock
>   
>   local MODULE_INTERNALS = '__module_vshard_storage'
>   -- Reload requirements, in case this module is reloaded manually.
> @@ -695,7 +696,7 @@ local function sync(timeout)
>       log.debug("Synchronizing replicaset...")
>       timeout = timeout or M.sync_timeout
>       local vclock = box.info.vclock
> -    local tstart = lfiber.time()
> +    local tstart = clock()
>       repeat
>           local done = true
>           for _, replica in ipairs(box.info.replication) do
> @@ -711,7 +712,7 @@ local function sync(timeout)
>               return true
>           end
>           lfiber.sleep(0.001)
> -    until not (lfiber.time() <= tstart + timeout)
> +    until not (clock() <= tstart + timeout)
>       log.warn("Timed out during synchronizing replicaset")
>       local ok, err = pcall(box.error, box.error.TIMEOUT)
>       return nil, lerror.make(err)
> @@ -1280,10 +1281,9 @@ local function bucket_send_xc(bucket_id, destination, opts, exception_guard)
>       ref.rw_lock = true
>       exception_guard.ref = ref
>       exception_guard.drop_rw_lock = true
> -    local deadline = lfiber.clock() + (opts and opts.timeout or 10)
> +    local deadline = clock() + (opts and opts.timeout or 10)
>       while ref.rw ~= 0 do
> -        if not M.bucket_rw_lock_is_ready_cond:wait(deadline -
> -                                                   lfiber.clock()) then
> +        if not M.bucket_rw_lock_is_ready_cond:wait(deadline - clock()) then
>               status, err = pcall(box.error, box.error.TIMEOUT)
>               return nil, lerror.make(err)
>           end
> @@ -1579,7 +1579,7 @@ function gc_bucket_f()
>       -- specified time interval the buckets are deleted both from
>       -- this array and from _bucket space.
>       local buckets_for_redirect = {}
> -    local buckets_for_redirect_ts = lfiber.time()
> +    local buckets_for_redirect_ts = clock()
>       -- Empty sent buckets, updated after each step, and when
>       -- buckets_for_redirect is deleted, it gets empty_sent_buckets
>       -- for next deletion.
> @@ -1614,7 +1614,7 @@ function gc_bucket_f()
>               end
>           end
>   
> -        if lfiber.time() - buckets_for_redirect_ts >=
> +        if clock() - buckets_for_redirect_ts >=
>              consts.BUCKET_SENT_GARBAGE_DELAY then
>               status, err = gc_bucket_drop(buckets_for_redirect,
>                                            consts.BUCKET.SENT)
> @@ -1629,7 +1629,7 @@ function gc_bucket_f()
>               else
>                   buckets_for_redirect = empty_sent_buckets or {}
>                   empty_sent_buckets = nil
> -                buckets_for_redirect_ts = lfiber.time()
> +                buckets_for_redirect_ts = clock()
>               end
>           end
>   ::continue::

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere
  2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
@ 2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-10 22:33 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for the review!

On 10.02.2021 09:57, Oleg Babin via Tarantool-patches wrote:
> Thanks for your patch. LGTM except two nits:
> 
> - Seems you need to put "Closes #246"

Indeed. I had a feeling that I saw this clock task somewhere.

> - Tarantool has "clock" module. I suggest to use "fiber_clock()" instead of simple "clock" to avoid possible confusing.

Both comments fixed. The new patch below. No diff because it
is big and obvious - a plain rename.

====================
    Use fiber.clock() instead of .time() everywhere
    
    Fiber.time() returns real time. It is affected by time corrections
    in the system, and can be not monotonic.
    
    The patch makes everything in vshard use fiber.clock() instead of
    fiber.time(). Also fiber.clock function is saved as an upvalue for
    all functions in all modules using it. This makes the code a bit
    shorter and saves 1 indexing of 'fiber' table.
    
    The main reason - in the future map-reduce feature the current
    time will be used quite often. In some places it probably will be
    the slowest action (given how slow FFI can be when not compiled by
    JIT).
    
    Needed for #147
    Closes #246

diff --git a/test/failover/failover.result b/test/failover/failover.result
index 452694c..bae57fa 100644
--- a/test/failover/failover.result
+++ b/test/failover/failover.result
@@ -261,13 +261,13 @@ test_run:cmd('start server box_1_d')
 ---
 - true
 ...
-ts1 = fiber.time()
+ts1 = fiber.clock()
 ---
 ...
 while rs1.replica.name ~= 'box_1_d' do fiber.sleep(0.1) end
 ---
 ...
-ts2 = fiber.time()
+ts2 = fiber.clock()
 ---
 ...
 ts2 - ts1 < vshard.consts.FAILOVER_UP_TIMEOUT
diff --git a/test/failover/failover.test.lua b/test/failover/failover.test.lua
index 13c517b..a969e0e 100644
--- a/test/failover/failover.test.lua
+++ b/test/failover/failover.test.lua
@@ -109,9 +109,9 @@ test_run:switch('router_1')
 -- Revive the best replica. A router must reconnect to it in
 -- FAILOVER_UP_TIMEOUT seconds.
 test_run:cmd('start server box_1_d')
-ts1 = fiber.time()
+ts1 = fiber.clock()
 while rs1.replica.name ~= 'box_1_d' do fiber.sleep(0.1) end
-ts2 = fiber.time()
+ts2 = fiber.clock()
 ts2 - ts1 < vshard.consts.FAILOVER_UP_TIMEOUT
 test_run:grep_log('router_1', 'New replica box_1_d%(storage%@')
 
diff --git a/vshard/replicaset.lua b/vshard/replicaset.lua
index b13d05e..9c792b3 100644
--- a/vshard/replicaset.lua
+++ b/vshard/replicaset.lua
@@ -54,6 +54,7 @@ local luri = require('uri')
 local luuid = require('uuid')
 local ffi = require('ffi')
 local util = require('vshard.util')
+local fiber_clock = fiber.clock
 local gsc = util.generate_self_checker
 
 --
@@ -88,7 +89,7 @@ local function netbox_on_connect(conn)
         -- biggest priority. Really, it is not neccessary to
         -- increase replica connection priority, if the current
         -- one already has the biggest priority. (See failover_f).
-        rs.replica_up_ts = fiber.time()
+        rs.replica_up_ts = fiber_clock()
     end
 end
 
@@ -100,7 +101,7 @@ local function netbox_on_disconnect(conn)
     assert(conn.replica)
     -- Replica is down - remember this time to decrease replica
     -- priority after FAILOVER_DOWN_TIMEOUT seconds.
-    conn.replica.down_ts = fiber.time()
+    conn.replica.down_ts = fiber_clock()
 end
 
 --
@@ -174,7 +175,7 @@ local function replicaset_up_replica_priority(replicaset)
     local old_replica = replicaset.replica
     if old_replica == replicaset.priority_list[1] and
        old_replica:is_connected() then
-        replicaset.replica_up_ts = fiber.time()
+        replicaset.replica_up_ts = fiber_clock()
         return
     end
     for _, replica in pairs(replicaset.priority_list) do
@@ -403,7 +404,7 @@ local function replicaset_template_multicallro(prefer_replica, balance)
             net_status, err = pcall(box.error, box.error.TIMEOUT)
             return nil, lerror.make(err)
         end
-        local end_time = fiber.time() + timeout
+        local end_time = fiber_clock() + timeout
         while not net_status and timeout > 0 do
             replica, err = pick_next_replica(replicaset)
             if not replica then
@@ -412,7 +413,7 @@ local function replicaset_template_multicallro(prefer_replica, balance)
             opts.timeout = timeout
             net_status, storage_status, retval, err =
                 replica_call(replica, func, args, opts)
-            timeout = end_time - fiber.time()
+            timeout = end_time - fiber_clock()
             if not net_status and not storage_status and
                not can_retry_after_error(retval) then
                 -- There is no sense to retry LuaJit errors, such as
@@ -680,7 +681,7 @@ local function buildall(sharding_cfg)
     else
         zone_weights = {}
     end
-    local curr_ts = fiber.time()
+    local curr_ts = fiber_clock()
     for replicaset_uuid, replicaset in pairs(sharding_cfg.sharding) do
         local new_replicaset = setmetatable({
             replicas = {},
diff --git a/vshard/router/init.lua b/vshard/router/init.lua
index ba1f863..eeb7515 100644
--- a/vshard/router/init.lua
+++ b/vshard/router/init.lua
@@ -1,6 +1,7 @@
 local log = require('log')
 local lfiber = require('fiber')
 local table_new = require('table.new')
+local fiber_clock = lfiber.clock
 
 local MODULE_INTERNALS = '__module_vshard_router'
 -- Reload requirements, in case this module is reloaded manually.
@@ -527,7 +528,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
     end
     local timeout = opts.timeout or consts.CALL_TIMEOUT_MIN
     local replicaset, err
-    local tend = lfiber.time() + timeout
+    local tend = fiber_clock() + timeout
     if bucket_id > router.total_bucket_count or bucket_id <= 0 then
         error('Bucket is unreachable: bucket id is out of range')
     end
@@ -551,7 +552,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
         replicaset, err = bucket_resolve(router, bucket_id)
         if replicaset then
 ::replicaset_is_found::
-            opts.timeout = tend - lfiber.time()
+            opts.timeout = tend - fiber_clock()
             local storage_call_status, call_status, call_error =
                 replicaset[call](replicaset, 'vshard.storage.call',
                                  {bucket_id, mode, func, args}, opts)
@@ -583,7 +584,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
                         -- if reconfiguration had been started,
                         -- and while is not executed on router,
                         -- but already is executed on storages.
-                        while lfiber.time() <= tend do
+                        while fiber_clock() <= tend do
                             lfiber.sleep(0.05)
                             replicaset = router.replicasets[err.destination]
                             if replicaset then
@@ -598,7 +599,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
                         -- case of broken cluster, when a bucket
                         -- is sent on two replicasets to each
                         -- other.
-                        if replicaset and lfiber.time() <= tend then
+                        if replicaset and fiber_clock() <= tend then
                             goto replicaset_is_found
                         end
                     end
@@ -623,7 +624,7 @@ local function router_call_impl(router, bucket_id, mode, prefer_replica,
             end
         end
         lfiber.yield()
-    until lfiber.time() > tend
+    until fiber_clock() > tend
     if err then
         return nil, err
     else
@@ -749,7 +750,7 @@ end
 -- connections must be updated.
 --
 local function failover_collect_to_update(router)
-    local ts = lfiber.time()
+    local ts = fiber_clock()
     local uuid_to_update = {}
     for uuid, rs in pairs(router.replicasets) do
         if failover_need_down_priority(rs, ts) or
@@ -772,7 +773,7 @@ local function failover_step(router)
     if #uuid_to_update == 0 then
         return false
     end
-    local curr_ts = lfiber.time()
+    local curr_ts = fiber_clock()
     local replica_is_changed = false
     for _, uuid in pairs(uuid_to_update) do
         local rs = router.replicasets[uuid]
@@ -1230,8 +1231,7 @@ local function router_sync(router, timeout)
         timeout = router.sync_timeout
     end
     local arg = {timeout}
-    local clock = lfiber.clock
-    local deadline = timeout and (clock() + timeout)
+    local deadline = timeout and (fiber_clock() + timeout)
     local opts = {timeout = timeout}
     for rs_uuid, replicaset in pairs(router.replicasets) do
         if timeout < 0 then
@@ -1244,7 +1244,7 @@ local function router_sync(router, timeout)
             err.replicaset = rs_uuid
             return nil, err
         end
-        timeout = deadline - clock()
+        timeout = deadline - fiber_clock()
         arg[1] = timeout
         opts.timeout = timeout
     end
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 1b48bf1..38cdf19 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -5,6 +5,7 @@ local netbox = require('net.box') -- for net.box:self()
 local trigger = require('internal.trigger')
 local ffi = require('ffi')
 local yaml_encode = require('yaml').encode
+local fiber_clock = lfiber.clock
 
 local MODULE_INTERNALS = '__module_vshard_storage'
 -- Reload requirements, in case this module is reloaded manually.
@@ -695,7 +696,7 @@ local function sync(timeout)
     log.debug("Synchronizing replicaset...")
     timeout = timeout or M.sync_timeout
     local vclock = box.info.vclock
-    local tstart = lfiber.time()
+    local tstart = fiber_clock()
     repeat
         local done = true
         for _, replica in ipairs(box.info.replication) do
@@ -711,7 +712,7 @@ local function sync(timeout)
             return true
         end
         lfiber.sleep(0.001)
-    until not (lfiber.time() <= tstart + timeout)
+    until fiber_clock() > tstart + timeout
     log.warn("Timed out during synchronizing replicaset")
     local ok, err = pcall(box.error, box.error.TIMEOUT)
     return nil, lerror.make(err)
@@ -1280,10 +1281,11 @@ local function bucket_send_xc(bucket_id, destination, opts, exception_guard)
     ref.rw_lock = true
     exception_guard.ref = ref
     exception_guard.drop_rw_lock = true
-    local deadline = lfiber.clock() + (opts and opts.timeout or 10)
+    local timeout = opts and opts.timeout or 10
+    local deadline = fiber_clock() + timeout
     while ref.rw ~= 0 do
-        if not M.bucket_rw_lock_is_ready_cond:wait(deadline -
-                                                   lfiber.clock()) then
+        timeout = deadline - fiber_clock()
+        if not M.bucket_rw_lock_is_ready_cond:wait(timeout) then
             status, err = pcall(box.error, box.error.TIMEOUT)
             return nil, lerror.make(err)
         end
@@ -1579,7 +1581,7 @@ function gc_bucket_f()
     -- specified time interval the buckets are deleted both from
     -- this array and from _bucket space.
     local buckets_for_redirect = {}
-    local buckets_for_redirect_ts = lfiber.time()
+    local buckets_for_redirect_ts = fiber_clock()
     -- Empty sent buckets, updated after each step, and when
     -- buckets_for_redirect is deleted, it gets empty_sent_buckets
     -- for next deletion.
@@ -1614,7 +1616,7 @@ function gc_bucket_f()
             end
         end
 
-        if lfiber.time() - buckets_for_redirect_ts >=
+        if fiber_clock() - buckets_for_redirect_ts >=
            consts.BUCKET_SENT_GARBAGE_DELAY then
             status, err = gc_bucket_drop(buckets_for_redirect,
                                          consts.BUCKET.SENT)
@@ -1629,7 +1631,7 @@ function gc_bucket_f()
             else
                 buckets_for_redirect = empty_sent_buckets or {}
                 empty_sent_buckets = nil
-                buckets_for_redirect_ts = lfiber.time()
+                buckets_for_redirect_ts = fiber_clock()
             end
         end
 ::continue::

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module Vladislav Shpilevoy via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock Vladislav Shpilevoy via Tarantool-patches
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

In the tests to wait for bucket deletion by GC it was necessary
to have a long loop expression which checks _bucket space and
wakes up GC fiber if the bucket is not deleted yet.

Soon the GC wakeup won't be necessary as GC algorithm will become
reactive instead of proactive.

In order not to remove the wakeup from all places in the main
patch, and to simplify the waiting the patch introduces a function
wait_bucket_is_collected().

The reactive GC will delete GC wakeup from this function and all
the tests still will pass in time.
---
 test/lua_libs/storage_template.lua        | 10 ++++++++++
 test/rebalancer/bucket_ref.result         |  7 ++-----
 test/rebalancer/bucket_ref.test.lua       |  5 ++---
 test/rebalancer/errinj.result             | 13 +++++--------
 test/rebalancer/errinj.test.lua           |  7 +++----
 test/rebalancer/rebalancer.result         |  5 +----
 test/rebalancer/rebalancer.test.lua       |  3 +--
 test/rebalancer/receiving_bucket.result   |  2 +-
 test/rebalancer/receiving_bucket.test.lua |  2 +-
 test/reload_evolution/storage.result      |  5 +----
 test/reload_evolution/storage.test.lua    |  3 +--
 11 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
index 84e4180..21409bd 100644
--- a/test/lua_libs/storage_template.lua
+++ b/test/lua_libs/storage_template.lua
@@ -165,3 +165,13 @@ function wait_rebalancer_state(state, test_run)
         vshard.storage.rebalancer_wakeup()
     end
 end
+
+function wait_bucket_is_collected(id)
+    test_run:wait_cond(function()
+        if not box.space._bucket:get{id} then
+            return true
+        end
+        vshard.storage.recovery_wakeup()
+        vshard.storage.garbage_collector_wakeup()
+    end)
+end
diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
index b66e449..b8fc7ff 100644
--- a/test/rebalancer/bucket_ref.result
+++ b/test/rebalancer/bucket_ref.result
@@ -243,7 +243,7 @@ vshard.storage.buckets_info(1)
     destination: <replicaset_2>
     id: 1
 ...
-while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
+wait_bucket_is_collected(1)
 ---
 ...
 _ = test_run:switch('box_2_a')
@@ -292,10 +292,7 @@ vshard.storage.buckets_info(1)
 finish_refs = true
 ---
 ...
-while vshard.storage.buckets_info(1)[1].rw_lock do fiber.sleep(0.01) end
----
-...
-while box.space._bucket:get{1} do fiber.sleep(0.01) end
+wait_bucket_is_collected(1)
 ---
 ...
 _ = test_run:switch('box_1_a')
diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
index 49ba583..213ced3 100644
--- a/test/rebalancer/bucket_ref.test.lua
+++ b/test/rebalancer/bucket_ref.test.lua
@@ -73,7 +73,7 @@ vshard.storage.bucket_refro(1)
 finish_refs = true
 while f1:status() ~= 'dead' do fiber.sleep(0.01) end
 vshard.storage.buckets_info(1)
-while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
+wait_bucket_is_collected(1)
 _ = test_run:switch('box_2_a')
 vshard.storage.buckets_info(1)
 vshard.storage.internal.errinj.ERRINJ_LONG_RECEIVE = false
@@ -89,8 +89,7 @@ while not vshard.storage.buckets_info(1)[1].rw_lock do fiber.sleep(0.01) end
 fiber.sleep(0.2)
 vshard.storage.buckets_info(1)
 finish_refs = true
-while vshard.storage.buckets_info(1)[1].rw_lock do fiber.sleep(0.01) end
-while box.space._bucket:get{1} do fiber.sleep(0.01) end
+wait_bucket_is_collected(1)
 _ = test_run:switch('box_1_a')
 vshard.storage.buckets_info(1)
 
diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
index 214e7d8..e50eb72 100644
--- a/test/rebalancer/errinj.result
+++ b/test/rebalancer/errinj.result
@@ -237,7 +237,10 @@ _bucket:get{36}
 -- Buckets became 'active' on box_2_a, but still are sending on
 -- box_1_a. Wait until it is marked as garbage on box_1_a by the
 -- recovery fiber.
-while _bucket:get{35} ~= nil or _bucket:get{36} ~= nil do vshard.storage.recovery_wakeup() fiber.sleep(0.001) end
+wait_bucket_is_collected(35)
+---
+...
+wait_bucket_is_collected(36)
 ---
 ...
 _ = test_run:switch('box_2_a')
@@ -278,7 +281,7 @@ while not _bucket:get{36} do fiber.sleep(0.0001) end
 _ = test_run:switch('box_1_a')
 ---
 ...
-while _bucket:get{36} do vshard.storage.recovery_wakeup() vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+wait_bucket_is_collected(36)
 ---
 ...
 _bucket:get{36}
@@ -295,12 +298,6 @@ box.error.injection.set('ERRINJ_WAL_DELAY', false)
 ---
 - ok
 ...
-_ = test_run:switch('box_1_a')
----
-...
-while _bucket:get{36} and _bucket:get{36}.status == vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.001) end
----
-...
 test_run:switch('default')
 ---
 - true
diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
index 66fbe5e..2cc4a69 100644
--- a/test/rebalancer/errinj.test.lua
+++ b/test/rebalancer/errinj.test.lua
@@ -107,7 +107,8 @@ _bucket:get{36}
 -- Buckets became 'active' on box_2_a, but still are sending on
 -- box_1_a. Wait until it is marked as garbage on box_1_a by the
 -- recovery fiber.
-while _bucket:get{35} ~= nil or _bucket:get{36} ~= nil do vshard.storage.recovery_wakeup() fiber.sleep(0.001) end
+wait_bucket_is_collected(35)
+wait_bucket_is_collected(36)
 _ = test_run:switch('box_2_a')
 _bucket:get{35}
 _bucket:get{36}
@@ -124,13 +125,11 @@ f1 = fiber.create(function() ret1, err1 = vshard.storage.bucket_send(36, util.re
 _ = test_run:switch('box_2_a')
 while not _bucket:get{36} do fiber.sleep(0.0001) end
 _ = test_run:switch('box_1_a')
-while _bucket:get{36} do vshard.storage.recovery_wakeup() vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+wait_bucket_is_collected(36)
 _bucket:get{36}
 _ = test_run:switch('box_2_a')
 _bucket:get{36}
 box.error.injection.set('ERRINJ_WAL_DELAY', false)
-_ = test_run:switch('box_1_a')
-while _bucket:get{36} and _bucket:get{36}.status == vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.001) end
 
 test_run:switch('default')
 test_run:drop_cluster(REPLICASET_2)
diff --git a/test/rebalancer/rebalancer.result b/test/rebalancer/rebalancer.result
index 3607e93..098b845 100644
--- a/test/rebalancer/rebalancer.result
+++ b/test/rebalancer/rebalancer.result
@@ -334,10 +334,7 @@ vshard.storage.rebalancer_wakeup()
 -- Now rebalancer makes a bucket SENT. After it the garbage
 -- collector cleans it and deletes after a timeout.
 --
-while _bucket:get{91}.status ~= vshard.consts.BUCKET.SENT do fiber.sleep(0.01) end
----
-...
-while _bucket:get{91} ~= nil do fiber.sleep(0.1) end
+wait_bucket_is_collected(91)
 ---
 ...
 wait_rebalancer_state("The cluster is balanced ok", test_run)
diff --git a/test/rebalancer/rebalancer.test.lua b/test/rebalancer/rebalancer.test.lua
index 63e690f..308e66d 100644
--- a/test/rebalancer/rebalancer.test.lua
+++ b/test/rebalancer/rebalancer.test.lua
@@ -162,8 +162,7 @@ vshard.storage.rebalancer_wakeup()
 -- Now rebalancer makes a bucket SENT. After it the garbage
 -- collector cleans it and deletes after a timeout.
 --
-while _bucket:get{91}.status ~= vshard.consts.BUCKET.SENT do fiber.sleep(0.01) end
-while _bucket:get{91} ~= nil do fiber.sleep(0.1) end
+wait_bucket_is_collected(91)
 wait_rebalancer_state("The cluster is balanced ok", test_run)
 _bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
 _bucket.index.status:min({vshard.consts.BUCKET.ACTIVE})
diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
index db6a67f..7d3612b 100644
--- a/test/rebalancer/receiving_bucket.result
+++ b/test/rebalancer/receiving_bucket.result
@@ -374,7 +374,7 @@ vshard.storage.buckets_info(1)
     destination: <replicaset_1>
     id: 1
 ...
-while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
+wait_bucket_is_collected(1)
 ---
 ...
 vshard.storage.buckets_info(1)
diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
index 1819cbb..24534b3 100644
--- a/test/rebalancer/receiving_bucket.test.lua
+++ b/test/rebalancer/receiving_bucket.test.lua
@@ -137,7 +137,7 @@ box.space.test3:select{100}
 _ = test_run:switch('box_2_a')
 vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
 vshard.storage.buckets_info(1)
-while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
+wait_bucket_is_collected(1)
 vshard.storage.buckets_info(1)
 _ = test_run:switch('box_1_a')
 box.space._bucket:get{1}
diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
index 4652c4f..753687f 100644
--- a/test/reload_evolution/storage.result
+++ b/test/reload_evolution/storage.result
@@ -129,10 +129,7 @@ vshard.storage.bucket_send(bucket_id_to_move, util.replicasets[1])
 ---
 - true
 ...
-vshard.storage.garbage_collector_wakeup()
----
-...
-while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end
+wait_bucket_is_collected(bucket_id_to_move)
 ---
 ...
 test_run:switch('storage_1_a')
diff --git a/test/reload_evolution/storage.test.lua b/test/reload_evolution/storage.test.lua
index 06f7117..639553e 100644
--- a/test/reload_evolution/storage.test.lua
+++ b/test/reload_evolution/storage.test.lua
@@ -51,8 +51,7 @@ vshard.storage.bucket_force_create(2000)
 vshard.storage.buckets_info()[2000]
 vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42})
 vshard.storage.bucket_send(bucket_id_to_move, util.replicasets[1])
-vshard.storage.garbage_collector_wakeup()
-while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end
+wait_bucket_is_collected(bucket_id_to_move)
 test_run:switch('storage_1_a')
 while box.space._bucket:get{bucket_id_to_move}.status ~= vshard.consts.BUCKET.ACTIVE do vshard.storage.recovery_wakeup() fiber.sleep(0.01) end
 vshard.storage.bucket_send(bucket_id_to_move, util.replicasets[2])
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
  2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  8:57 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Hi! Thanks for your patch! LGTM but I have one question.

Maybe it's reasonable to add some timeout in this function?

AFAIK test-run terminates tests after 120 seconds of inactivity it seems 
too long for such simple case.

But anyway it's up to you.

On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> In the tests to wait for bucket deletion by GC it was necessary
> to have a long loop expression which checks _bucket space and
> wakes up GC fiber if the bucket is not deleted yet.
>
> Soon the GC wakeup won't be necessary as GC algorithm will become
> reactive instead of proactive.
>
> In order not to remove the wakeup from all places in the main
> patch, and to simplify the waiting the patch introduces a function
> wait_bucket_is_collected().
>
> The reactive GC will delete GC wakeup from this function and all
> the tests still will pass in time.
> ---
>   test/lua_libs/storage_template.lua        | 10 ++++++++++
>   test/rebalancer/bucket_ref.result         |  7 ++-----
>   test/rebalancer/bucket_ref.test.lua       |  5 ++---
>   test/rebalancer/errinj.result             | 13 +++++--------
>   test/rebalancer/errinj.test.lua           |  7 +++----
>   test/rebalancer/rebalancer.result         |  5 +----
>   test/rebalancer/rebalancer.test.lua       |  3 +--
>   test/rebalancer/receiving_bucket.result   |  2 +-
>   test/rebalancer/receiving_bucket.test.lua |  2 +-
>   test/reload_evolution/storage.result      |  5 +----
>   test/reload_evolution/storage.test.lua    |  3 +--
>   11 files changed, 28 insertions(+), 34 deletions(-)
>
> diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
> index 84e4180..21409bd 100644
> --- a/test/lua_libs/storage_template.lua
> +++ b/test/lua_libs/storage_template.lua
> @@ -165,3 +165,13 @@ function wait_rebalancer_state(state, test_run)
>           vshard.storage.rebalancer_wakeup()
>       end
>   end
> +
> +function wait_bucket_is_collected(id)
> +    test_run:wait_cond(function()
> +        if not box.space._bucket:get{id} then
> +            return true
> +        end
> +        vshard.storage.recovery_wakeup()
> +        vshard.storage.garbage_collector_wakeup()
> +    end)
> +end
> diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
> index b66e449..b8fc7ff 100644
> --- a/test/rebalancer/bucket_ref.result
> +++ b/test/rebalancer/bucket_ref.result
> @@ -243,7 +243,7 @@ vshard.storage.buckets_info(1)
>       destination: <replicaset_2>
>       id: 1
>   ...
> -while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
> +wait_bucket_is_collected(1)
>   ---
>   ...
>   _ = test_run:switch('box_2_a')
> @@ -292,10 +292,7 @@ vshard.storage.buckets_info(1)
>   finish_refs = true
>   ---
>   ...
> -while vshard.storage.buckets_info(1)[1].rw_lock do fiber.sleep(0.01) end
> ----
> -...
> -while box.space._bucket:get{1} do fiber.sleep(0.01) end
> +wait_bucket_is_collected(1)
>   ---
>   ...
>   _ = test_run:switch('box_1_a')
> diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
> index 49ba583..213ced3 100644
> --- a/test/rebalancer/bucket_ref.test.lua
> +++ b/test/rebalancer/bucket_ref.test.lua
> @@ -73,7 +73,7 @@ vshard.storage.bucket_refro(1)
>   finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
>   vshard.storage.buckets_info(1)
> -while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
> +wait_bucket_is_collected(1)
>   _ = test_run:switch('box_2_a')
>   vshard.storage.buckets_info(1)
>   vshard.storage.internal.errinj.ERRINJ_LONG_RECEIVE = false
> @@ -89,8 +89,7 @@ while not vshard.storage.buckets_info(1)[1].rw_lock do fiber.sleep(0.01) end
>   fiber.sleep(0.2)
>   vshard.storage.buckets_info(1)
>   finish_refs = true
> -while vshard.storage.buckets_info(1)[1].rw_lock do fiber.sleep(0.01) end
> -while box.space._bucket:get{1} do fiber.sleep(0.01) end
> +wait_bucket_is_collected(1)
>   _ = test_run:switch('box_1_a')
>   vshard.storage.buckets_info(1)
>   
> diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
> index 214e7d8..e50eb72 100644
> --- a/test/rebalancer/errinj.result
> +++ b/test/rebalancer/errinj.result
> @@ -237,7 +237,10 @@ _bucket:get{36}
>   -- Buckets became 'active' on box_2_a, but still are sending on
>   -- box_1_a. Wait until it is marked as garbage on box_1_a by the
>   -- recovery fiber.
> -while _bucket:get{35} ~= nil or _bucket:get{36} ~= nil do vshard.storage.recovery_wakeup() fiber.sleep(0.001) end
> +wait_bucket_is_collected(35)
> +---
> +...
> +wait_bucket_is_collected(36)
>   ---
>   ...
>   _ = test_run:switch('box_2_a')
> @@ -278,7 +281,7 @@ while not _bucket:get{36} do fiber.sleep(0.0001) end
>   _ = test_run:switch('box_1_a')
>   ---
>   ...
> -while _bucket:get{36} do vshard.storage.recovery_wakeup() vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +wait_bucket_is_collected(36)
>   ---
>   ...
>   _bucket:get{36}
> @@ -295,12 +298,6 @@ box.error.injection.set('ERRINJ_WAL_DELAY', false)
>   ---
>   - ok
>   ...
> -_ = test_run:switch('box_1_a')
> ----
> -...
> -while _bucket:get{36} and _bucket:get{36}.status == vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.001) end
> ----
> -...
>   test_run:switch('default')
>   ---
>   - true
> diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
> index 66fbe5e..2cc4a69 100644
> --- a/test/rebalancer/errinj.test.lua
> +++ b/test/rebalancer/errinj.test.lua
> @@ -107,7 +107,8 @@ _bucket:get{36}
>   -- Buckets became 'active' on box_2_a, but still are sending on
>   -- box_1_a. Wait until it is marked as garbage on box_1_a by the
>   -- recovery fiber.
> -while _bucket:get{35} ~= nil or _bucket:get{36} ~= nil do vshard.storage.recovery_wakeup() fiber.sleep(0.001) end
> +wait_bucket_is_collected(35)
> +wait_bucket_is_collected(36)
>   _ = test_run:switch('box_2_a')
>   _bucket:get{35}
>   _bucket:get{36}
> @@ -124,13 +125,11 @@ f1 = fiber.create(function() ret1, err1 = vshard.storage.bucket_send(36, util.re
>   _ = test_run:switch('box_2_a')
>   while not _bucket:get{36} do fiber.sleep(0.0001) end
>   _ = test_run:switch('box_1_a')
> -while _bucket:get{36} do vshard.storage.recovery_wakeup() vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +wait_bucket_is_collected(36)
>   _bucket:get{36}
>   _ = test_run:switch('box_2_a')
>   _bucket:get{36}
>   box.error.injection.set('ERRINJ_WAL_DELAY', false)
> -_ = test_run:switch('box_1_a')
> -while _bucket:get{36} and _bucket:get{36}.status == vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.001) end
>   
>   test_run:switch('default')
>   test_run:drop_cluster(REPLICASET_2)
> diff --git a/test/rebalancer/rebalancer.result b/test/rebalancer/rebalancer.result
> index 3607e93..098b845 100644
> --- a/test/rebalancer/rebalancer.result
> +++ b/test/rebalancer/rebalancer.result
> @@ -334,10 +334,7 @@ vshard.storage.rebalancer_wakeup()
>   -- Now rebalancer makes a bucket SENT. After it the garbage
>   -- collector cleans it and deletes after a timeout.
>   --
> -while _bucket:get{91}.status ~= vshard.consts.BUCKET.SENT do fiber.sleep(0.01) end
> ----
> -...
> -while _bucket:get{91} ~= nil do fiber.sleep(0.1) end
> +wait_bucket_is_collected(91)
>   ---
>   ...
>   wait_rebalancer_state("The cluster is balanced ok", test_run)
> diff --git a/test/rebalancer/rebalancer.test.lua b/test/rebalancer/rebalancer.test.lua
> index 63e690f..308e66d 100644
> --- a/test/rebalancer/rebalancer.test.lua
> +++ b/test/rebalancer/rebalancer.test.lua
> @@ -162,8 +162,7 @@ vshard.storage.rebalancer_wakeup()
>   -- Now rebalancer makes a bucket SENT. After it the garbage
>   -- collector cleans it and deletes after a timeout.
>   --
> -while _bucket:get{91}.status ~= vshard.consts.BUCKET.SENT do fiber.sleep(0.01) end
> -while _bucket:get{91} ~= nil do fiber.sleep(0.1) end
> +wait_bucket_is_collected(91)
>   wait_rebalancer_state("The cluster is balanced ok", test_run)
>   _bucket.index.status:count({vshard.consts.BUCKET.ACTIVE})
>   _bucket.index.status:min({vshard.consts.BUCKET.ACTIVE})
> diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
> index db6a67f..7d3612b 100644
> --- a/test/rebalancer/receiving_bucket.result
> +++ b/test/rebalancer/receiving_bucket.result
> @@ -374,7 +374,7 @@ vshard.storage.buckets_info(1)
>       destination: <replicaset_1>
>       id: 1
>   ...
> -while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
> +wait_bucket_is_collected(1)
>   ---
>   ...
>   vshard.storage.buckets_info(1)
> diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
> index 1819cbb..24534b3 100644
> --- a/test/rebalancer/receiving_bucket.test.lua
> +++ b/test/rebalancer/receiving_bucket.test.lua
> @@ -137,7 +137,7 @@ box.space.test3:select{100}
>   _ = test_run:switch('box_2_a')
>   vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
>   vshard.storage.buckets_info(1)
> -while box.space._bucket:get{1} do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.01) end
> +wait_bucket_is_collected(1)
>   vshard.storage.buckets_info(1)
>   _ = test_run:switch('box_1_a')
>   box.space._bucket:get{1}
> diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
> index 4652c4f..753687f 100644
> --- a/test/reload_evolution/storage.result
> +++ b/test/reload_evolution/storage.result
> @@ -129,10 +129,7 @@ vshard.storage.bucket_send(bucket_id_to_move, util.replicasets[1])
>   ---
>   - true
>   ...
> -vshard.storage.garbage_collector_wakeup()
> ----
> -...
> -while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end
> +wait_bucket_is_collected(bucket_id_to_move)
>   ---
>   ...
>   test_run:switch('storage_1_a')
> diff --git a/test/reload_evolution/storage.test.lua b/test/reload_evolution/storage.test.lua
> index 06f7117..639553e 100644
> --- a/test/reload_evolution/storage.test.lua
> +++ b/test/reload_evolution/storage.test.lua
> @@ -51,8 +51,7 @@ vshard.storage.bucket_force_create(2000)
>   vshard.storage.buckets_info()[2000]
>   vshard.storage.call(bucket_id_to_move, 'read', 'do_select', {42})
>   vshard.storage.bucket_send(bucket_id_to_move, util.replicasets[1])
> -vshard.storage.garbage_collector_wakeup()
> -while box.space._bucket:get({bucket_id_to_move}) do fiber.sleep(0.01) end
> +wait_bucket_is_collected(bucket_id_to_move)
>   test_run:switch('storage_1_a')
>   while box.space._bucket:get{bucket_id_to_move}.status ~= vshard.consts.BUCKET.ACTIVE do vshard.storage.recovery_wakeup() fiber.sleep(0.01) end
>   vshard.storage.bucket_send(bucket_id_to_move, util.replicasets[2])

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC
  2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
@ 2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-10 22:33 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

On 10.02.2021 09:57, Oleg Babin wrote:
> Hi! Thanks for your patch! LGTM but I have one question.
> 
> Maybe it's reasonable to add some timeout in this function?
> 
> AFAIK test-run terminates tests after 120 seconds of inactivity it seems too long for such simple case.
> 
> But anyway it's up to you.

test_run:wait_cond() has default timeout 1 minute. I decided it
is fine.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC
  2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-11  6:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your answer. Yes, it's fine. LGTM.

On 11/02/2021 01:33, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
> On 10.02.2021 09:57, Oleg Babin wrote:
>> Hi! Thanks for your patch! LGTM but I have one question.
>>
>> Maybe it's reasonable to add some timeout in this function?
>>
>> AFAIK test-run terminates tests after 120 seconds of inactivity it seems too long for such simple case.
>>
>> But anyway it's up to you.
> test_run:wait_cond() has default timeout 1 minute. I decided it
> is fine.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (2 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions Vladislav Shpilevoy via Tarantool-patches
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Locked replicaset (via config) should not allow any bucket moves
from or to the replicaset.

But the lock check was only done by bucket_send(). Bucket_recv()
allowed to receive a bucket even if the replicaset is locked. The
patch fixes it.

It didn't affect automatic bucket sends, because lock is
accounted by the rebalancer from the config. Only manual bucket
moves could have this bug.
---
 test/rebalancer/rebalancer_lock_and_pin.result   | 14 ++++++++++++++
 test/rebalancer/rebalancer_lock_and_pin.test.lua |  4 ++++
 vshard/storage/init.lua                          |  3 +++
 3 files changed, 21 insertions(+)

diff --git a/test/rebalancer/rebalancer_lock_and_pin.result b/test/rebalancer/rebalancer_lock_and_pin.result
index 51dd36e..0bb4f45 100644
--- a/test/rebalancer/rebalancer_lock_and_pin.result
+++ b/test/rebalancer/rebalancer_lock_and_pin.result
@@ -156,6 +156,20 @@ vshard.storage.bucket_send(1, util.replicasets[2])
   message: Replicaset is locked
   code: 19
 ...
+test_run:switch('box_2_a')
+---
+- true
+...
+-- Does not allow to receive either. Send from a non-locked replicaset to a
+-- locked one fails.
+vshard.storage.bucket_send(101, util.replicasets[1])
+---
+- null
+- type: ShardingError
+  code: 19
+  name: REPLICASET_IS_LOCKED
+  message: Replicaset is locked
+...
 --
 -- Vshard ensures that if a replicaset is locked, then it will not
 -- allow to change its bucket set even if a rebalancer does not
diff --git a/test/rebalancer/rebalancer_lock_and_pin.test.lua b/test/rebalancer/rebalancer_lock_and_pin.test.lua
index c3412c1..7b87004 100644
--- a/test/rebalancer/rebalancer_lock_and_pin.test.lua
+++ b/test/rebalancer/rebalancer_lock_and_pin.test.lua
@@ -69,6 +69,10 @@ info.lock
 -- explicitly.
 --
 vshard.storage.bucket_send(1, util.replicasets[2])
+test_run:switch('box_2_a')
+-- Does not allow to receive either. Send from a non-locked replicaset to a
+-- locked one fails.
+vshard.storage.bucket_send(101, util.replicasets[1])
 
 --
 -- Vshard ensures that if a replicaset is locked, then it will not
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index c7335fc..298df71 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -995,6 +995,9 @@ local function bucket_recv_xc(bucket_id, from, data, opts)
             return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, msg,
                                       from)
         end
+        if is_this_replicaset_locked() then
+            return nil, lerror.vshard(lerror.code.REPLICASET_IS_LOCKED)
+        end
         if not bucket_receiving_quota_add(-1) then
             return nil, lerror.vshard(lerror.code.TOO_MANY_RECEIVING)
         end
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  8:59 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM.

On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Locked replicaset (via config) should not allow any bucket moves
> from or to the replicaset.
>
> But the lock check was only done by bucket_send(). Bucket_recv()
> allowed to receive a bucket even if the replicaset is locked. The
> patch fixes it.
>
> It didn't affect automatic bucket sends, because lock is
> accounted by the rebalancer from the config. Only manual bucket
> moves could have this bug.
> ---
>   test/rebalancer/rebalancer_lock_and_pin.result   | 14 ++++++++++++++
>   test/rebalancer/rebalancer_lock_and_pin.test.lua |  4 ++++
>   vshard/storage/init.lua                          |  3 +++
>   3 files changed, 21 insertions(+)
>
> diff --git a/test/rebalancer/rebalancer_lock_and_pin.result b/test/rebalancer/rebalancer_lock_and_pin.result
> index 51dd36e..0bb4f45 100644
> --- a/test/rebalancer/rebalancer_lock_and_pin.result
> +++ b/test/rebalancer/rebalancer_lock_and_pin.result
> @@ -156,6 +156,20 @@ vshard.storage.bucket_send(1, util.replicasets[2])
>     message: Replicaset is locked
>     code: 19
>   ...
> +test_run:switch('box_2_a')
> +---
> +- true
> +...
> +-- Does not allow to receive either. Send from a non-locked replicaset to a
> +-- locked one fails.
> +vshard.storage.bucket_send(101, util.replicasets[1])
> +---
> +- null
> +- type: ShardingError
> +  code: 19
> +  name: REPLICASET_IS_LOCKED
> +  message: Replicaset is locked
> +...
>   --
>   -- Vshard ensures that if a replicaset is locked, then it will not
>   -- allow to change its bucket set even if a rebalancer does not
> diff --git a/test/rebalancer/rebalancer_lock_and_pin.test.lua b/test/rebalancer/rebalancer_lock_and_pin.test.lua
> index c3412c1..7b87004 100644
> --- a/test/rebalancer/rebalancer_lock_and_pin.test.lua
> +++ b/test/rebalancer/rebalancer_lock_and_pin.test.lua
> @@ -69,6 +69,10 @@ info.lock
>   -- explicitly.
>   --
>   vshard.storage.bucket_send(1, util.replicasets[2])
> +test_run:switch('box_2_a')
> +-- Does not allow to receive either. Send from a non-locked replicaset to a
> +-- locked one fails.
> +vshard.storage.bucket_send(101, util.replicasets[1])
>   
>   --
>   -- Vshard ensures that if a replicaset is locked, then it will not
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index c7335fc..298df71 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -995,6 +995,9 @@ local function bucket_recv_xc(bucket_id, from, data, opts)
>               return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, msg,
>                                         from)
>           end
> +        if is_this_replicaset_locked() then
> +            return nil, lerror.vshard(lerror.code.REPLICASET_IS_LOCKED)
> +        end
>           if not bucket_receiving_quota_add(-1) then
>               return nil, lerror.vshard(lerror.code.TOO_MANY_RECEIVING)
>           end

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (3 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature Vladislav Shpilevoy via Tarantool-patches
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

The patch adds functions table_copy_yield and table_minus_yield.

Yielding copy creates a duplicate of a table but yields every
specified number of keys copied.

Yielding minus removes matching key-value pairs specified in one
table from another table. It yields every specified number of keys
passed.

The functions should help to process huge Lua tables (millions of
elements and more). These are going to be used on the storage in
the new GC algorithm.

The algorithm will need to keep a route table on the storage, just
like on the router, but with expiration time for the routes. Since
bucket count can be millions, it means GC will potentially operate
on a huge Lua table and could use some yields so as not to block
TX thread for long.

Needed for #147
---
 test/unit/util.result   | 113 ++++++++++++++++++++++++++++++++++++++++
 test/unit/util.test.lua |  49 +++++++++++++++++
 vshard/util.lua         |  40 ++++++++++++++
 3 files changed, 202 insertions(+)

diff --git a/test/unit/util.result b/test/unit/util.result
index 096e36f..c4fd84d 100644
--- a/test/unit/util.result
+++ b/test/unit/util.result
@@ -71,3 +71,116 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
 fib:cancel()
 ---
 ...
+-- Yielding table minus.
+minus_yield = util.table_minus_yield
+---
+...
+minus_yield({}, {}, 1)
+---
+- []
+...
+minus_yield({}, {k = 1}, 1)
+---
+- []
+...
+minus_yield({}, {k = 1}, 0)
+---
+- []
+...
+minus_yield({k = 1}, {k = 1}, 0)
+---
+- []
+...
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
+---
+- k2: 2
+...
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
+---
+- []
+...
+-- Mismatching values are not deleted.
+minus_yield({k1 = 1}, {k1 = 2}, 10)
+---
+- k1: 1
+...
+minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
+---
+- k3: 3
+  k2: 2
+...
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    f = fiber.create(function()                                                 \
+        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
+    end)                                                                        \
+    yield_count = 0                                                             \
+    while f:status() ~= 'dead' do                                               \
+        yield_count = yield_count + 1                                           \
+        fiber.yield()                                                           \
+    end                                                                         \
+end
+---
+...
+yield_count
+---
+- 2
+...
+t
+---
+- k4: 4
+  k1: 1
+...
+-- Yielding table copy.
+copy_yield = util.table_copy_yield
+---
+...
+copy_yield({}, 1)
+---
+- []
+...
+copy_yield({k = 1}, 1)
+---
+- k: 1
+...
+copy_yield({k1 = 1, k2 = 2}, 1)
+---
+- k1: 1
+  k2: 2
+...
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    res = nil                                                                   \
+    f = fiber.create(function()                                                 \
+        res = copy_yield(t, 2)                                                  \
+    end)                                                                        \
+    yield_count = 0                                                             \
+    while f:status() ~= 'dead' do                                               \
+        yield_count = yield_count + 1                                           \
+        fiber.yield()                                                           \
+    end                                                                         \
+end
+---
+...
+yield_count
+---
+- 2
+...
+t
+---
+- k3: 3
+  k4: 4
+  k1: 1
+  k2: 2
+...
+res
+---
+- k3: 3
+  k4: 4
+  k1: 1
+  k2: 2
+...
+t ~= res
+---
+- true
+...
diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
index 5f39e06..4d6cbe9 100644
--- a/test/unit/util.test.lua
+++ b/test/unit/util.test.lua
@@ -27,3 +27,52 @@ fib = util.reloadable_fiber_create('Worker_name', fake_M, 'reloadable_function')
 while not test_run:grep_log('default', 'module is reloaded, restarting') do fiber.sleep(0.01) end
 test_run:grep_log('default', 'reloadable_function has been started', 1000)
 fib:cancel()
+
+-- Yielding table minus.
+minus_yield = util.table_minus_yield
+minus_yield({}, {}, 1)
+minus_yield({}, {k = 1}, 1)
+minus_yield({}, {k = 1}, 0)
+minus_yield({k = 1}, {k = 1}, 0)
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
+-- Mismatching values are not deleted.
+minus_yield({k1 = 1}, {k1 = 2}, 10)
+minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
+
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    f = fiber.create(function()                                                 \
+        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
+    end)                                                                        \
+    yield_count = 0                                                             \
+    while f:status() ~= 'dead' do                                               \
+        yield_count = yield_count + 1                                           \
+        fiber.yield()                                                           \
+    end                                                                         \
+end
+yield_count
+t
+
+-- Yielding table copy.
+copy_yield = util.table_copy_yield
+copy_yield({}, 1)
+copy_yield({k = 1}, 1)
+copy_yield({k1 = 1, k2 = 2}, 1)
+
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    res = nil                                                                   \
+    f = fiber.create(function()                                                 \
+        res = copy_yield(t, 2)                                                  \
+    end)                                                                        \
+    yield_count = 0                                                             \
+    while f:status() ~= 'dead' do                                               \
+        yield_count = yield_count + 1                                           \
+        fiber.yield()                                                           \
+    end                                                                         \
+end
+yield_count
+t
+res
+t ~= res
diff --git a/vshard/util.lua b/vshard/util.lua
index d3b4e67..2362607 100644
--- a/vshard/util.lua
+++ b/vshard/util.lua
@@ -153,6 +153,44 @@ local function version_is_at_least(major_need, middle_need, minor_need)
     return minor >= minor_need
 end
 
+--
+-- Copy @a src table. Fiber yields every @a interval keys copied.
+--
+local function table_copy_yield(src, interval)
+    local res = {}
+    -- Time-To-Yield.
+    local tty = interval
+    for k, v in pairs(src) do
+        res[k] = v
+        tty = tty - 1
+        if tty <= 0 then
+            fiber.yield()
+            tty = interval
+        end
+    end
+    return res
+end
+
+--
+-- Remove @a src keys from @a dst if their values match. Fiber yields every
+-- @a interval iterations.
+--
+local function table_minus_yield(dst, src, interval)
+    -- Time-To-Yield.
+    local tty = interval
+    for k, srcv in pairs(src) do
+        if dst[k] == srcv then
+            dst[k] = nil
+        end
+        tty = tty - 1
+        if tty <= 0 then
+            fiber.yield()
+            tty = interval
+        end
+    end
+    return dst
+end
+
 return {
     tuple_extract_key = tuple_extract_key,
     reloadable_fiber_create = reloadable_fiber_create,
@@ -160,4 +198,6 @@ return {
     async_task = async_task,
     internal = M,
     version_is_at_least = version_is_at_least,
+    table_copy_yield = table_copy_yield,
+    table_minus_yield = table_minus_yield,
 }
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
  2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  8:59 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch 1 comment below.

On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> The patch adds functions table_copy_yield and table_minus_yield.
>
> Yielding copy creates a duplicate of a table but yields every
> specified number of keys copied.
>
> Yielding minus removes matching key-value pairs specified in one
> table from another table. It yields every specified number of keys
> passed.
>
> The functions should help to process huge Lua tables (millions of
> elements and more). These are going to be used on the storage in
> the new GC algorithm.
>
> The algorithm will need to keep a route table on the storage, just
> like on the router, but with expiration time for the routes. Since
> bucket count can be millions, it means GC will potentially operate
> on a huge Lua table and could use some yields so as not to block
> TX thread for long.
>
> Needed for #147
> ---
>   test/unit/util.result   | 113 ++++++++++++++++++++++++++++++++++++++++
>   test/unit/util.test.lua |  49 +++++++++++++++++
>   vshard/util.lua         |  40 ++++++++++++++
>   3 files changed, 202 insertions(+)
>
> diff --git a/test/unit/util.result b/test/unit/util.result
> index 096e36f..c4fd84d 100644
> --- a/test/unit/util.result
> +++ b/test/unit/util.result
> @@ -71,3 +71,116 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
>   fib:cancel()
>   ---
>   ...
> +-- Yielding table minus.
> +minus_yield = util.table_minus_yield
> +---
> +...
> +minus_yield({}, {}, 1)
> +---
> +- []
> +...
> +minus_yield({}, {k = 1}, 1)
> +---
> +- []
> +...
> +minus_yield({}, {k = 1}, 0)
> +---
> +- []
> +...
> +minus_yield({k = 1}, {k = 1}, 0)
> +---
> +- []
> +...
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
> +---
> +- k2: 2
> +...
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
> +---
> +- []
> +...
> +-- Mismatching values are not deleted.
> +minus_yield({k1 = 1}, {k1 = 2}, 10)
> +---
> +- k1: 1
> +...
> +minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
> +---
> +- k3: 3
> +  k2: 2
> +...
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    f = fiber.create(function()                                                 \
> +        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
> +    end)                                                                        \
> +    yield_count = 0                                                             \
> +    while f:status() ~= 'dead' do                                               \
> +        yield_count = yield_count + 1                                           \
> +        fiber.yield()                                                           \
> +    end                                                                         \
> +end
> +---


Why can't you use "csw" of fiber.self() instead? Also it's it reliable 
enough to simply count yields?

Could scheduler skip this fiber at some loop iteration? In other words, 
won't this test be flaky?


> +...
> +yield_count
> +---
> +- 2
> +...
> +t
> +---
> +- k4: 4
> +  k1: 1
> +...
> +-- Yielding table copy.
> +copy_yield = util.table_copy_yield
> +---
> +...
> +copy_yield({}, 1)
> +---
> +- []
> +...
> +copy_yield({k = 1}, 1)
> +---
> +- k: 1
> +...
> +copy_yield({k1 = 1, k2 = 2}, 1)
> +---
> +- k1: 1
> +  k2: 2
> +...
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    res = nil                                                                   \
> +    f = fiber.create(function()                                                 \
> +        res = copy_yield(t, 2)                                                  \
> +    end)                                                                        \
> +    yield_count = 0                                                             \
> +    while f:status() ~= 'dead' do                                               \
> +        yield_count = yield_count + 1                                           \
> +        fiber.yield()                                                           \
> +    end                                                                         \
> +end
> +---
> +...
> +yield_count
> +---
> +- 2
> +...
> +t
> +---
> +- k3: 3
> +  k4: 4
> +  k1: 1
> +  k2: 2
> +...
> +res
> +---
> +- k3: 3
> +  k4: 4
> +  k1: 1
> +  k2: 2
> +...
> +t ~= res
> +---
> +- true
> +...
> diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
> index 5f39e06..4d6cbe9 100644
> --- a/test/unit/util.test.lua
> +++ b/test/unit/util.test.lua
> @@ -27,3 +27,52 @@ fib = util.reloadable_fiber_create('Worker_name', fake_M, 'reloadable_function')
>   while not test_run:grep_log('default', 'module is reloaded, restarting') do fiber.sleep(0.01) end
>   test_run:grep_log('default', 'reloadable_function has been started', 1000)
>   fib:cancel()
> +
> +-- Yielding table minus.
> +minus_yield = util.table_minus_yield
> +minus_yield({}, {}, 1)
> +minus_yield({}, {k = 1}, 1)
> +minus_yield({}, {k = 1}, 0)
> +minus_yield({k = 1}, {k = 1}, 0)
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
> +-- Mismatching values are not deleted.
> +minus_yield({k1 = 1}, {k1 = 2}, 10)
> +minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
> +
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    f = fiber.create(function()                                                 \
> +        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
> +    end)                                                                        \
> +    yield_count = 0                                                             \
> +    while f:status() ~= 'dead' do                                               \
> +        yield_count = yield_count + 1                                           \
> +        fiber.yield()                                                           \
> +    end                                                                         \
> +end
> +yield_count
> +t
> +
> +-- Yielding table copy.
> +copy_yield = util.table_copy_yield
> +copy_yield({}, 1)
> +copy_yield({k = 1}, 1)
> +copy_yield({k1 = 1, k2 = 2}, 1)
> +
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    res = nil                                                                   \
> +    f = fiber.create(function()                                                 \
> +        res = copy_yield(t, 2)                                                  \
> +    end)                                                                        \
> +    yield_count = 0                                                             \
> +    while f:status() ~= 'dead' do                                               \
> +        yield_count = yield_count + 1                                           \
> +        fiber.yield()                                                           \
> +    end                                                                         \
> +end
> +yield_count
> +t
> +res
> +t ~= res
> diff --git a/vshard/util.lua b/vshard/util.lua
> index d3b4e67..2362607 100644
> --- a/vshard/util.lua
> +++ b/vshard/util.lua
> @@ -153,6 +153,44 @@ local function version_is_at_least(major_need, middle_need, minor_need)
>       return minor >= minor_need
>   end
>   
> +--
> +-- Copy @a src table. Fiber yields every @a interval keys copied.
> +--
> +local function table_copy_yield(src, interval)
> +    local res = {}
> +    -- Time-To-Yield.
> +    local tty = interval
> +    for k, v in pairs(src) do
> +        res[k] = v
> +        tty = tty - 1
> +        if tty <= 0 then
> +            fiber.yield()
> +            tty = interval
> +        end
> +    end
> +    return res
> +end
> +
> +--
> +-- Remove @a src keys from @a dst if their values match. Fiber yields every
> +-- @a interval iterations.
> +--
> +local function table_minus_yield(dst, src, interval)
> +    -- Time-To-Yield.
> +    local tty = interval
> +    for k, srcv in pairs(src) do
> +        if dst[k] == srcv then
> +            dst[k] = nil
> +        end
> +        tty = tty - 1
> +        if tty <= 0 then
> +            fiber.yield()
> +            tty = interval
> +        end
> +    end
> +    return dst
> +end
> +
>   return {
>       tuple_extract_key = tuple_extract_key,
>       reloadable_fiber_create = reloadable_fiber_create,
> @@ -160,4 +198,6 @@ return {
>       async_task = async_task,
>       internal = M,
>       version_is_at_least = version_is_at_least,
> +    table_copy_yield = table_copy_yield,
> +    table_minus_yield = table_minus_yield,
>   }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions
  2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
@ 2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-10 22:34 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

>> diff --git a/test/unit/util.result b/test/unit/util.result
>> index 096e36f..c4fd84d 100644
>> --- a/test/unit/util.result
>> +++ b/test/unit/util.result
>> @@ -71,3 +71,116 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
>> +do                                                                              \
>> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
>> +    f = fiber.create(function()                                                 \
>> +        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
>> +    end)                                                                        \
>> +    yield_count = 0                                                             \
>> +    while f:status() ~= 'dead' do                                               \
>> +        yield_count = yield_count + 1                                           \
>> +        fiber.yield()                                                           \
>> +    end                                                                         \
>> +end
>> +---
> 
> 
> Why can't you use "csw" of fiber.self() instead? Also it's it reliable enough to simply count yields?

Yup, will work too. See the diff below.

====================
diff --git a/test/unit/util.result b/test/unit/util.result
index c4fd84d..42a361a 100644
--- a/test/unit/util.result
+++ b/test/unit/util.result
@@ -111,14 +111,14 @@ minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
 ...
 do                                                                              \
     t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    yield_count = 0                                                             \
     f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
         minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
     end)                                                                        \
-    yield_count = 0                                                             \
-    while f:status() ~= 'dead' do                                               \
-        yield_count = yield_count + 1                                           \
-        fiber.yield()                                                           \
-    end                                                                         \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
 end
 ---
 ...
@@ -151,14 +151,14 @@ copy_yield({k1 = 1, k2 = 2}, 1)
 do                                                                              \
     t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
     res = nil                                                                   \
+    yield_count = 0                                                             \
     f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
         res = copy_yield(t, 2)                                                  \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
     end)                                                                        \
-    yield_count = 0                                                             \
-    while f:status() ~= 'dead' do                                               \
-        yield_count = yield_count + 1                                           \
-        fiber.yield()                                                           \
-    end                                                                         \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
 end
 ---
 ...
diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
index 4d6cbe9..9550a95 100644
--- a/test/unit/util.test.lua
+++ b/test/unit/util.test.lua
@@ -42,14 +42,14 @@ minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
 
 do                                                                              \
     t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    yield_count = 0                                                             \
     f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
         minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
     end)                                                                        \
-    yield_count = 0                                                             \
-    while f:status() ~= 'dead' do                                               \
-        yield_count = yield_count + 1                                           \
-        fiber.yield()                                                           \
-    end                                                                         \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
 end
 yield_count
 t
@@ -63,14 +63,14 @@ copy_yield({k1 = 1, k2 = 2}, 1)
 do                                                                              \
     t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
     res = nil                                                                   \
+    yield_count = 0                                                             \
     f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
         res = copy_yield(t, 2)                                                  \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
     end)                                                                        \
-    yield_count = 0                                                             \
-    while f:status() ~= 'dead' do                                               \
-        yield_count = yield_count + 1                                           \
-        fiber.yield()                                                           \
-    end                                                                         \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
 end
 yield_count
 t
====================

> Could scheduler skip this fiber at some loop iteration? In other words, won't this test be flaky?

Nope. Unless the fiber is sleeping on some condition or for a timeout, a plain
sleep(0) also known as fiber.yield() won't skip this fiber on the next
iteration of the loop. But does not matter if csw is used to count the yields.

Full new patch below.

====================
    util: introduce yielding table functions
    
    The patch adds functions table_copy_yield and table_minus_yield.
    
    Yielding copy creates a duplicate of a table but yields every
    specified number of keys copied.
    
    Yielding minus removes matching key-value pairs specified in one
    table from another table. It yields every specified number of keys
    passed.
    
    The functions should help to process huge Lua tables (millions of
    elements and more). These are going to be used on the storage in
    the new GC algorithm.
    
    The algorithm will need to keep a route table on the storage, just
    like on the router, but with expiration time for the routes. Since
    bucket count can be millions, it means GC will potentially operate
    on a huge Lua table and could use some yields so as not to block
    TX thread for long.
    
    Needed for #147

diff --git a/test/unit/util.result b/test/unit/util.result
index 096e36f..42a361a 100644
--- a/test/unit/util.result
+++ b/test/unit/util.result
@@ -71,3 +71,116 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
 fib:cancel()
 ---
 ...
+-- Yielding table minus.
+minus_yield = util.table_minus_yield
+---
+...
+minus_yield({}, {}, 1)
+---
+- []
+...
+minus_yield({}, {k = 1}, 1)
+---
+- []
+...
+minus_yield({}, {k = 1}, 0)
+---
+- []
+...
+minus_yield({k = 1}, {k = 1}, 0)
+---
+- []
+...
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
+---
+- k2: 2
+...
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
+---
+- []
+...
+-- Mismatching values are not deleted.
+minus_yield({k1 = 1}, {k1 = 2}, 10)
+---
+- k1: 1
+...
+minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
+---
+- k3: 3
+  k2: 2
+...
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    yield_count = 0                                                             \
+    f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
+        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
+    end)                                                                        \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
+end
+---
+...
+yield_count
+---
+- 2
+...
+t
+---
+- k4: 4
+  k1: 1
+...
+-- Yielding table copy.
+copy_yield = util.table_copy_yield
+---
+...
+copy_yield({}, 1)
+---
+- []
+...
+copy_yield({k = 1}, 1)
+---
+- k: 1
+...
+copy_yield({k1 = 1, k2 = 2}, 1)
+---
+- k1: 1
+  k2: 2
+...
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    res = nil                                                                   \
+    yield_count = 0                                                             \
+    f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
+        res = copy_yield(t, 2)                                                  \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
+    end)                                                                        \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
+end
+---
+...
+yield_count
+---
+- 2
+...
+t
+---
+- k3: 3
+  k4: 4
+  k1: 1
+  k2: 2
+...
+res
+---
+- k3: 3
+  k4: 4
+  k1: 1
+  k2: 2
+...
+t ~= res
+---
+- true
+...
diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
index 5f39e06..9550a95 100644
--- a/test/unit/util.test.lua
+++ b/test/unit/util.test.lua
@@ -27,3 +27,52 @@ fib = util.reloadable_fiber_create('Worker_name', fake_M, 'reloadable_function')
 while not test_run:grep_log('default', 'module is reloaded, restarting') do fiber.sleep(0.01) end
 test_run:grep_log('default', 'reloadable_function has been started', 1000)
 fib:cancel()
+
+-- Yielding table minus.
+minus_yield = util.table_minus_yield
+minus_yield({}, {}, 1)
+minus_yield({}, {k = 1}, 1)
+minus_yield({}, {k = 1}, 0)
+minus_yield({k = 1}, {k = 1}, 0)
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
+minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
+-- Mismatching values are not deleted.
+minus_yield({k1 = 1}, {k1 = 2}, 10)
+minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
+
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    yield_count = 0                                                             \
+    f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
+        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
+    end)                                                                        \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
+end
+yield_count
+t
+
+-- Yielding table copy.
+copy_yield = util.table_copy_yield
+copy_yield({}, 1)
+copy_yield({k = 1}, 1)
+copy_yield({k1 = 1, k2 = 2}, 1)
+
+do                                                                              \
+    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
+    res = nil                                                                   \
+    yield_count = 0                                                             \
+    f = fiber.create(function()                                                 \
+        local csw1 = fiber.info()[fiber.id()].csw                               \
+        res = copy_yield(t, 2)                                                  \
+        local csw2 = fiber.info()[fiber.id()].csw                               \
+        yield_count = csw2 - csw1                                               \
+    end)                                                                        \
+    test_run:wait_cond(function() return f:status() == 'dead' end)              \
+end
+yield_count
+t
+res
+t ~= res
diff --git a/vshard/util.lua b/vshard/util.lua
index d3b4e67..2362607 100644
--- a/vshard/util.lua
+++ b/vshard/util.lua
@@ -153,6 +153,44 @@ local function version_is_at_least(major_need, middle_need, minor_need)
     return minor >= minor_need
 end
 
+--
+-- Copy @a src table. Fiber yields every @a interval keys copied.
+--
+local function table_copy_yield(src, interval)
+    local res = {}
+    -- Time-To-Yield.
+    local tty = interval
+    for k, v in pairs(src) do
+        res[k] = v
+        tty = tty - 1
+        if tty <= 0 then
+            fiber.yield()
+            tty = interval
+        end
+    end
+    return res
+end
+
+--
+-- Remove @a src keys from @a dst if their values match. Fiber yields every
+-- @a interval iterations.
+--
+local function table_minus_yield(dst, src, interval)
+    -- Time-To-Yield.
+    local tty = interval
+    for k, srcv in pairs(src) do
+        if dst[k] == srcv then
+            dst[k] = nil
+        end
+        tty = tty - 1
+        if tty <= 0 then
+            fiber.yield()
+            tty = interval
+        end
+    end
+    return dst
+end
+
 return {
     tuple_extract_key = tuple_extract_key,
     reloadable_fiber_create = reloadable_fiber_create,
@@ -160,4 +198,6 @@ return {
     async_task = async_task,
     internal = M,
     version_is_at_least = version_is_at_least,
+    table_copy_yield = table_copy_yield,
+    table_minus_yield = table_minus_yield,
 }


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions
  2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-11  6:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your fixes! LGTM.

On 11/02/2021 01:34, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
>>> diff --git a/test/unit/util.result b/test/unit/util.result
>>> index 096e36f..c4fd84d 100644
>>> --- a/test/unit/util.result
>>> +++ b/test/unit/util.result
>>> @@ -71,3 +71,116 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
>>> +do                                                                              \
>>> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
>>> +    f = fiber.create(function()                                                 \
>>> +        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
>>> +    end)                                                                        \
>>> +    yield_count = 0                                                             \
>>> +    while f:status() ~= 'dead' do                                               \
>>> +        yield_count = yield_count + 1                                           \
>>> +        fiber.yield()                                                           \
>>> +    end                                                                         \
>>> +end
>>> +---
>> Why can't you use "csw" of fiber.self() instead? Also it's it reliable enough to simply count yields?
> Yup, will work too. See the diff below.
>
> ====================
> diff --git a/test/unit/util.result b/test/unit/util.result
> index c4fd84d..42a361a 100644
> --- a/test/unit/util.result
> +++ b/test/unit/util.result
> @@ -111,14 +111,14 @@ minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
>   ...
>   do                                                                              \
>       t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    yield_count = 0                                                             \
>       f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
>           minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
>       end)                                                                        \
> -    yield_count = 0                                                             \
> -    while f:status() ~= 'dead' do                                               \
> -        yield_count = yield_count + 1                                           \
> -        fiber.yield()                                                           \
> -    end                                                                         \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
>   end
>   ---
>   ...
> @@ -151,14 +151,14 @@ copy_yield({k1 = 1, k2 = 2}, 1)
>   do                                                                              \
>       t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
>       res = nil                                                                   \
> +    yield_count = 0                                                             \
>       f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
>           res = copy_yield(t, 2)                                                  \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
>       end)                                                                        \
> -    yield_count = 0                                                             \
> -    while f:status() ~= 'dead' do                                               \
> -        yield_count = yield_count + 1                                           \
> -        fiber.yield()                                                           \
> -    end                                                                         \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
>   end
>   ---
>   ...
> diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
> index 4d6cbe9..9550a95 100644
> --- a/test/unit/util.test.lua
> +++ b/test/unit/util.test.lua
> @@ -42,14 +42,14 @@ minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
>   
>   do                                                                              \
>       t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    yield_count = 0                                                             \
>       f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
>           minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
>       end)                                                                        \
> -    yield_count = 0                                                             \
> -    while f:status() ~= 'dead' do                                               \
> -        yield_count = yield_count + 1                                           \
> -        fiber.yield()                                                           \
> -    end                                                                         \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
>   end
>   yield_count
>   t
> @@ -63,14 +63,14 @@ copy_yield({k1 = 1, k2 = 2}, 1)
>   do                                                                              \
>       t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
>       res = nil                                                                   \
> +    yield_count = 0                                                             \
>       f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
>           res = copy_yield(t, 2)                                                  \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
>       end)                                                                        \
> -    yield_count = 0                                                             \
> -    while f:status() ~= 'dead' do                                               \
> -        yield_count = yield_count + 1                                           \
> -        fiber.yield()                                                           \
> -    end                                                                         \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
>   end
>   yield_count
>   t
> ====================
>
>> Could scheduler skip this fiber at some loop iteration? In other words, won't this test be flaky?
> Nope. Unless the fiber is sleeping on some condition or for a timeout, a plain
> sleep(0) also known as fiber.yield() won't skip this fiber on the next
> iteration of the loop. But does not matter if csw is used to count the yields.
>
> Full new patch below.
>
> ====================
>      util: introduce yielding table functions
>      
>      The patch adds functions table_copy_yield and table_minus_yield.
>      
>      Yielding copy creates a duplicate of a table but yields every
>      specified number of keys copied.
>      
>      Yielding minus removes matching key-value pairs specified in one
>      table from another table. It yields every specified number of keys
>      passed.
>      
>      The functions should help to process huge Lua tables (millions of
>      elements and more). These are going to be used on the storage in
>      the new GC algorithm.
>      
>      The algorithm will need to keep a route table on the storage, just
>      like on the router, but with expiration time for the routes. Since
>      bucket count can be millions, it means GC will potentially operate
>      on a huge Lua table and could use some yields so as not to block
>      TX thread for long.
>      
>      Needed for #147
>
> diff --git a/test/unit/util.result b/test/unit/util.result
> index 096e36f..42a361a 100644
> --- a/test/unit/util.result
> +++ b/test/unit/util.result
> @@ -71,3 +71,116 @@ test_run:grep_log('default', 'reloadable_function has been started', 1000)
>   fib:cancel()
>   ---
>   ...
> +-- Yielding table minus.
> +minus_yield = util.table_minus_yield
> +---
> +...
> +minus_yield({}, {}, 1)
> +---
> +- []
> +...
> +minus_yield({}, {k = 1}, 1)
> +---
> +- []
> +...
> +minus_yield({}, {k = 1}, 0)
> +---
> +- []
> +...
> +minus_yield({k = 1}, {k = 1}, 0)
> +---
> +- []
> +...
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
> +---
> +- k2: 2
> +...
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
> +---
> +- []
> +...
> +-- Mismatching values are not deleted.
> +minus_yield({k1 = 1}, {k1 = 2}, 10)
> +---
> +- k1: 1
> +...
> +minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
> +---
> +- k3: 3
> +  k2: 2
> +...
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    yield_count = 0                                                             \
> +    f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
> +        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
> +    end)                                                                        \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
> +end
> +---
> +...
> +yield_count
> +---
> +- 2
> +...
> +t
> +---
> +- k4: 4
> +  k1: 1
> +...
> +-- Yielding table copy.
> +copy_yield = util.table_copy_yield
> +---
> +...
> +copy_yield({}, 1)
> +---
> +- []
> +...
> +copy_yield({k = 1}, 1)
> +---
> +- k: 1
> +...
> +copy_yield({k1 = 1, k2 = 2}, 1)
> +---
> +- k1: 1
> +  k2: 2
> +...
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    res = nil                                                                   \
> +    yield_count = 0                                                             \
> +    f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
> +        res = copy_yield(t, 2)                                                  \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
> +    end)                                                                        \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
> +end
> +---
> +...
> +yield_count
> +---
> +- 2
> +...
> +t
> +---
> +- k3: 3
> +  k4: 4
> +  k1: 1
> +  k2: 2
> +...
> +res
> +---
> +- k3: 3
> +  k4: 4
> +  k1: 1
> +  k2: 2
> +...
> +t ~= res
> +---
> +- true
> +...
> diff --git a/test/unit/util.test.lua b/test/unit/util.test.lua
> index 5f39e06..9550a95 100644
> --- a/test/unit/util.test.lua
> +++ b/test/unit/util.test.lua
> @@ -27,3 +27,52 @@ fib = util.reloadable_fiber_create('Worker_name', fake_M, 'reloadable_function')
>   while not test_run:grep_log('default', 'module is reloaded, restarting') do fiber.sleep(0.01) end
>   test_run:grep_log('default', 'reloadable_function has been started', 1000)
>   fib:cancel()
> +
> +-- Yielding table minus.
> +minus_yield = util.table_minus_yield
> +minus_yield({}, {}, 1)
> +minus_yield({}, {k = 1}, 1)
> +minus_yield({}, {k = 1}, 0)
> +minus_yield({k = 1}, {k = 1}, 0)
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k3 = 3}, 10)
> +minus_yield({k1 = 1, k2 = 2}, {k1 = 1, k2 = 2}, 10)
> +-- Mismatching values are not deleted.
> +minus_yield({k1 = 1}, {k1 = 2}, 10)
> +minus_yield({k1 = 1, k2 = 2, k3 = 3}, {k1 = 1, k2 = 222}, 10)
> +
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    yield_count = 0                                                             \
> +    f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
> +        minus_yield(t, {k2 = 2, k3 = 3, k5 = 5, k4 = 444}, 2)                   \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
> +    end)                                                                        \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
> +end
> +yield_count
> +t
> +
> +-- Yielding table copy.
> +copy_yield = util.table_copy_yield
> +copy_yield({}, 1)
> +copy_yield({k = 1}, 1)
> +copy_yield({k1 = 1, k2 = 2}, 1)
> +
> +do                                                                              \
> +    t = {k1 = 1, k2 = 2, k3 = 3, k4 = 4}                                        \
> +    res = nil                                                                   \
> +    yield_count = 0                                                             \
> +    f = fiber.create(function()                                                 \
> +        local csw1 = fiber.info()[fiber.id()].csw                               \
> +        res = copy_yield(t, 2)                                                  \
> +        local csw2 = fiber.info()[fiber.id()].csw                               \
> +        yield_count = csw2 - csw1                                               \
> +    end)                                                                        \
> +    test_run:wait_cond(function() return f:status() == 'dead' end)              \
> +end
> +yield_count
> +t
> +res
> +t ~= res
> diff --git a/vshard/util.lua b/vshard/util.lua
> index d3b4e67..2362607 100644
> --- a/vshard/util.lua
> +++ b/vshard/util.lua
> @@ -153,6 +153,44 @@ local function version_is_at_least(major_need, middle_need, minor_need)
>       return minor >= minor_need
>   end
>   
> +--
> +-- Copy @a src table. Fiber yields every @a interval keys copied.
> +--
> +local function table_copy_yield(src, interval)
> +    local res = {}
> +    -- Time-To-Yield.
> +    local tty = interval
> +    for k, v in pairs(src) do
> +        res[k] = v
> +        tty = tty - 1
> +        if tty <= 0 then
> +            fiber.yield()
> +            tty = interval
> +        end
> +    end
> +    return res
> +end
> +
> +--
> +-- Remove @a src keys from @a dst if their values match. Fiber yields every
> +-- @a interval iterations.
> +--
> +local function table_minus_yield(dst, src, interval)
> +    -- Time-To-Yield.
> +    local tty = interval
> +    for k, srcv in pairs(src) do
> +        if dst[k] == srcv then
> +            dst[k] = nil
> +        end
> +        tty = tty - 1
> +        if tty <= 0 then
> +            fiber.yield()
> +            tty = interval
> +        end
> +    end
> +    return dst
> +end
> +
>   return {
>       tuple_extract_key = tuple_extract_key,
>       reloadable_fiber_create = reloadable_fiber_create,
> @@ -160,4 +198,6 @@ return {
>       async_task = async_task,
>       internal = M,
>       version_is_at_least = version_is_at_least,
> +    table_copy_yield = table_copy_yield,
> +    table_minus_yield = table_minus_yield,
>   }
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (4 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector Vladislav Shpilevoy via Tarantool-patches
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Some options in vshard are going to be eventually deprecated. For
instance, 'weigts' will be renamed, 'collect_lua_garbage' may be
deleted since it appears not to be so useful, 'sync_timeout' is
totally unnecessary since any 'sync' can take a timeout per-call.

But the patch is motivated by 'collect_bucket_garbage_interval'
which is going to become unused in the new GC algorithm.

New GC will be reactive instead of proactive. Instead of periodic
polling of _bucket space it will react on needed events
immediately. This will make the 'collect interval' unused.

The option will be deprecated and eventually in some far future
release its usage will lead to an error.

Needed for #147
---
 vshard/cfg.lua | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index 1ef1899..28c3400 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -59,7 +59,11 @@ local function validate_config(config, template, check_arg)
         local value = config[key]
         local name = template_value.name
         local expected_type = template_value.type
-        if value == nil then
+        if template_value.is_deprecated then
+            if value ~= nil then
+                log.warn('Option "%s" is deprecated', name)
+            end
+        elseif value == nil then
             if not template_value.is_optional then
                 error(string.format('%s must be specified', name))
             else
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
  2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  8:59 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch!

Is it possible to extend log message to "Option is deprecated and has no 
effect anymore"?

Also for some options could be useful: "Option is deprecated, use ... 
instead" (e.g. for "weights").

Seems it should be more configurable and gives some hint for user to do.


On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Some options in vshard are going to be eventually deprecated. For
> instance, 'weigts' will be renamed, 'collect_lua_garbage' may be


typo: weigts -> weights


> deleted since it appears not to be so useful, 'sync_timeout' is
> totally unnecessary since any 'sync' can take a timeout per-call.
>
> But the patch is motivated by 'collect_bucket_garbage_interval'
> which is going to become unused in the new GC algorithm.
>
> New GC will be reactive instead of proactive. Instead of periodic
> polling of _bucket space it will react on needed events
> immediately. This will make the 'collect interval' unused.
>
> The option will be deprecated and eventually in some far future
> release its usage will lead to an error.
>
> Needed for #147
> ---
>   vshard/cfg.lua | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index 1ef1899..28c3400 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -59,7 +59,11 @@ local function validate_config(config, template, check_arg)
>           local value = config[key]
>           local name = template_value.name
>           local expected_type = template_value.type
> -        if value == nil then
> +        if template_value.is_deprecated then
> +            if value ~= nil then
> +                log.warn('Option "%s" is deprecated', name)
> +            end
> +        elseif value == nil then
>               if not template_value.is_optional then
>                   error(string.format('%s must be specified', name))
>               else

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature
  2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
@ 2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-10 22:34 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

On 10.02.2021 09:59, Oleg Babin wrote:
> Thanks for your patch!
> 
> Is it possible to extend log message to "Option is deprecated and has no effect anymore"?

Good idea. See the diff in this commit.

====================
diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index 28c3400..f7d5dbc 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -61,7 +61,13 @@ local function validate_config(config, template, check_arg)
         local expected_type = template_value.type
         if template_value.is_deprecated then
             if value ~= nil then
-                log.warn('Option "%s" is deprecated', name)
+                local reason = template_value.reason
+                if reason then
+                    reason = '. '..reason
+                else
+                    reason = ''
+                end
+                log.warn('Option "%s" is deprecated'..reason, name)
             end
         elseif value == nil then
             if not template_value.is_optional then
====================

And in the next commit:

====================
diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index f7dd4c1..63d5414 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -252,6 +252,7 @@ local cfg_template = {
     },
     collect_bucket_garbage_interval = {
         name = 'Garbage bucket collect interval', is_deprecated = true,
+        reason = 'Has no effect anymore'
     },
     collect_lua_garbage = {
         type = 'boolean', name = 'Garbage Lua collect necessity',

====================

> Also for some options could be useful: "Option is deprecated, use ... instead" (e.g. for "weights").

With the updated version I can specify any 'reason'. Such as
'has no affect', 'use ... instead', etc.

> Seems it should be more configurable and gives some hint for user to do.
> 
> 
> On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
>> Some options in vshard are going to be eventually deprecated. For
>> instance, 'weigts' will be renamed, 'collect_lua_garbage' may be
> 
> 
> typo: weigts -> weights

Fixed. See the full new patch below.

====================
    cfg: introduce 'deprecated option' feature
    
    Some options in vshard are going to be eventually deprecated. For
    instance, 'weights' will be renamed, 'collect_lua_garbage' may be
    deleted since it appears not to be so useful, 'sync_timeout' is
    totally unnecessary since any 'sync' can take a timeout per-call.
    
    But the patch is motivated by 'collect_bucket_garbage_interval'
    which is going to become unused in the new GC algorithm.
    
    New GC will be reactive instead of proactive. Instead of periodic
    polling of _bucket space it will react on needed events
    immediately. This will make the 'collect interval' unused.
    
    The option will be deprecated and eventually in some far future
    release its usage will lead to an error.
    
    Needed for #147

diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index 1ef1899..f7d5dbc 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -59,7 +59,17 @@ local function validate_config(config, template, check_arg)
         local value = config[key]
         local name = template_value.name
         local expected_type = template_value.type
-        if value == nil then
+        if template_value.is_deprecated then
+            if value ~= nil then
+                local reason = template_value.reason
+                if reason then
+                    reason = '. '..reason
+                else
+                    reason = ''
+                end
+                log.warn('Option "%s" is deprecated'..reason, name)
+            end
+        elseif value == nil then
             if not template_value.is_optional then
                 error(string.format('%s must be specified', name))
             else

====================

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature
  2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-11  6:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your fixes. LGTM!

On 11/02/2021 01:34, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
> On 10.02.2021 09:59, Oleg Babin wrote:
>> Thanks for your patch!
>>
>> Is it possible to extend log message to "Option is deprecated and has no effect anymore"?
> Good idea. See the diff in this commit.
>
> ====================
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index 28c3400..f7d5dbc 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -61,7 +61,13 @@ local function validate_config(config, template, check_arg)
>           local expected_type = template_value.type
>           if template_value.is_deprecated then
>               if value ~= nil then
> -                log.warn('Option "%s" is deprecated', name)
> +                local reason = template_value.reason
> +                if reason then
> +                    reason = '. '..reason
> +                else
> +                    reason = ''
> +                end
> +                log.warn('Option "%s" is deprecated'..reason, name)
>               end
>           elseif value == nil then
>               if not template_value.is_optional then
> ====================
>
> And in the next commit:
>
> ====================
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index f7dd4c1..63d5414 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -252,6 +252,7 @@ local cfg_template = {
>       },
>       collect_bucket_garbage_interval = {
>           name = 'Garbage bucket collect interval', is_deprecated = true,
> +        reason = 'Has no effect anymore'
>       },
>       collect_lua_garbage = {
>           type = 'boolean', name = 'Garbage Lua collect necessity',
>
> ====================
>
>> Also for some options could be useful: "Option is deprecated, use ... instead" (e.g. for "weights").
> With the updated version I can specify any 'reason'. Such as
> 'has no affect', 'use ... instead', etc.
>
>> Seems it should be more configurable and gives some hint for user to do.
>>
>>
>> On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
>>> Some options in vshard are going to be eventually deprecated. For
>>> instance, 'weigts' will be renamed, 'collect_lua_garbage' may be
>> typo: weigts -> weights
> Fixed. See the full new patch below.
>
> ====================
>      cfg: introduce 'deprecated option' feature
>      
>      Some options in vshard are going to be eventually deprecated. For
>      instance, 'weights' will be renamed, 'collect_lua_garbage' may be
>      deleted since it appears not to be so useful, 'sync_timeout' is
>      totally unnecessary since any 'sync' can take a timeout per-call.
>      
>      But the patch is motivated by 'collect_bucket_garbage_interval'
>      which is going to become unused in the new GC algorithm.
>      
>      New GC will be reactive instead of proactive. Instead of periodic
>      polling of _bucket space it will react on needed events
>      immediately. This will make the 'collect interval' unused.
>      
>      The option will be deprecated and eventually in some far future
>      release its usage will lead to an error.
>      
>      Needed for #147
>
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index 1ef1899..f7d5dbc 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -59,7 +59,17 @@ local function validate_config(config, template, check_arg)
>           local value = config[key]
>           local name = template_value.name
>           local expected_type = template_value.type
> -        if value == nil then
> +        if template_value.is_deprecated then
> +            if value ~= nil then
> +                local reason = template_value.reason
> +                if reason then
> +                    reason = '. '..reason
> +                else
> +                    reason = ''
> +                end
> +                log.warn('Option "%s" is deprecated'..reason, name)
> +            end
> +        elseif value == nil then
>               if not template_value.is_optional then
>                   error(string.format('%s must be specified', name))
>               else
>
> ====================

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (5 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery Vladislav Shpilevoy via Tarantool-patches
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Garbage collector is a fiber on a master node which deletes
GARBAGE and SENT buckets along with their data.

It was proactive. It used to wakeup with a constant period to
find and delete the needed buckets.

But this won't work with the future feature called 'map-reduce'.
Map-reduce as a preparation stage will need to ensure that all
buckets on a storage are readable and writable. With the current
GC algorithm if a bucket is sent, it won't be deleted for the next
5 seconds by default. During this time all new map-reduce requests
can't execute.

This is not acceptable. As well as too frequent wakeup of GC fiber
because it would waste TX thread time.

The patch makes GC fiber wakeup not by a timeout but by events
happening with _bucket space. GC fiber sleeps on a condition
variable which is signaled when _bucket is changed.

Once GC sees work to do, it won't sleep until it is done. It will
only yield.

This makes GC delete SENT and GARBAGE buckets as soon as possible
reducing the waiting time for the incoming map-reduce requests.

Needed for #147

@TarantoolBot document
Title: VShard: deprecate cfg option 'collect_bucket_garbage_interval'
It was used to specify the interval between bucket garbage
collection steps. It was needed because garbage collection in
vshard was proactive. It didn't react to newly appeared garbage
buckets immediately.

Since now (0.1.17) garbage collection became reactive. It starts
working with garbage buckets immediately as they appear. And
sleeps rest of the time. The option is not used now and does not
affect behaviour of anything.

I suppose it can be deleted from the documentation. Or left with
a big label 'deprecated' + the explanation above.

An attempt to use the option does not cause an error, but logs a
warning.
---
 test/lua_libs/storage_template.lua        |   1 -
 test/misc/reconfigure.result              |  10 -
 test/misc/reconfigure.test.lua            |   3 -
 test/rebalancer/bucket_ref.result         |  12 --
 test/rebalancer/bucket_ref.test.lua       |   3 -
 test/rebalancer/errinj.result             |  11 --
 test/rebalancer/errinj.test.lua           |   5 -
 test/rebalancer/receiving_bucket.result   |   8 -
 test/rebalancer/receiving_bucket.test.lua |   1 -
 test/reload_evolution/storage.result      |   2 +-
 test/router/reroute_wrong_bucket.result   |   8 +-
 test/router/reroute_wrong_bucket.test.lua |   4 +-
 test/storage/recovery.result              |   3 +-
 test/storage/storage.result               |  10 +-
 test/storage/storage.test.lua             |   1 +
 test/unit/config.result                   |  35 +---
 test/unit/config.test.lua                 |  16 +-
 test/unit/garbage.result                  | 106 ++++++----
 test/unit/garbage.test.lua                |  47 +++--
 test/unit/garbage_errinj.result           | 223 ----------------------
 test/unit/garbage_errinj.test.lua         |  73 -------
 vshard/cfg.lua                            |   4 +-
 vshard/consts.lua                         |   5 +-
 vshard/storage/init.lua                   | 207 ++++++++++----------
 vshard/storage/reload_evolution.lua       |   8 +
 25 files changed, 233 insertions(+), 573 deletions(-)
 delete mode 100644 test/unit/garbage_errinj.result
 delete mode 100644 test/unit/garbage_errinj.test.lua

diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
index 21409bd..8df89f6 100644
--- a/test/lua_libs/storage_template.lua
+++ b/test/lua_libs/storage_template.lua
@@ -172,6 +172,5 @@ function wait_bucket_is_collected(id)
             return true
         end
         vshard.storage.recovery_wakeup()
-        vshard.storage.garbage_collector_wakeup()
     end)
 end
diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result
index 168be5d..3b34841 100644
--- a/test/misc/reconfigure.result
+++ b/test/misc/reconfigure.result
@@ -83,9 +83,6 @@ cfg.collect_lua_garbage = true
 cfg.rebalancer_max_receiving = 1000
 ---
 ...
-cfg.collect_bucket_garbage_interval = 100
----
-...
 cfg.invalid_option = 'kek'
 ---
 ...
@@ -105,10 +102,6 @@ vshard.storage.internal.rebalancer_max_receiving ~= 1000
 ---
 - true
 ...
-vshard.storage.internal.collect_bucket_garbage_interval ~= 100
----
-- true
-...
 cfg.sync_timeout = nil
 ---
 ...
@@ -118,9 +111,6 @@ cfg.collect_lua_garbage = nil
 cfg.rebalancer_max_receiving = nil
 ---
 ...
-cfg.collect_bucket_garbage_interval = nil
----
-...
 cfg.invalid_option = nil
 ---
 ...
diff --git a/test/misc/reconfigure.test.lua b/test/misc/reconfigure.test.lua
index e891010..348628c 100644
--- a/test/misc/reconfigure.test.lua
+++ b/test/misc/reconfigure.test.lua
@@ -33,17 +33,14 @@ vshard.storage.internal.sync_timeout
 cfg.sync_timeout = 100
 cfg.collect_lua_garbage = true
 cfg.rebalancer_max_receiving = 1000
-cfg.collect_bucket_garbage_interval = 100
 cfg.invalid_option = 'kek'
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
 not vshard.storage.internal.collect_lua_garbage
 vshard.storage.internal.sync_timeout
 vshard.storage.internal.rebalancer_max_receiving ~= 1000
-vshard.storage.internal.collect_bucket_garbage_interval ~= 100
 cfg.sync_timeout = nil
 cfg.collect_lua_garbage = nil
 cfg.rebalancer_max_receiving = nil
-cfg.collect_bucket_garbage_interval = nil
 cfg.invalid_option = nil
 
 --
diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
index b8fc7ff..9df7480 100644
--- a/test/rebalancer/bucket_ref.result
+++ b/test/rebalancer/bucket_ref.result
@@ -184,9 +184,6 @@ vshard.storage.bucket_unref(1, 'read')
 - true
 ...
 -- Force GC to take an RO lock on the bucket now.
-vshard.storage.garbage_collector_wakeup()
----
-...
 vshard.storage.buckets_info(1)
 ---
 - 1:
@@ -203,7 +200,6 @@ while true do
     if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
         break
     end
-    vshard.storage.garbage_collector_wakeup()
     fiber.sleep(0.01)
 end;
 ---
@@ -235,14 +231,6 @@ finish_refs = true
 while f1:status() ~= 'dead' do fiber.sleep(0.01) end
 ---
 ...
-vshard.storage.buckets_info(1)
----
-- 1:
-    status: garbage
-    ro_lock: true
-    destination: <replicaset_2>
-    id: 1
-...
 wait_bucket_is_collected(1)
 ---
 ...
diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
index 213ced3..1b032ff 100644
--- a/test/rebalancer/bucket_ref.test.lua
+++ b/test/rebalancer/bucket_ref.test.lua
@@ -56,7 +56,6 @@ vshard.storage.bucket_unref(1, 'write') -- Error, no refs.
 vshard.storage.bucket_ref(1, 'read')
 vshard.storage.bucket_unref(1, 'read')
 -- Force GC to take an RO lock on the bucket now.
-vshard.storage.garbage_collector_wakeup()
 vshard.storage.buckets_info(1)
 _ = test_run:cmd("setopt delimiter ';'")
 while true do
@@ -64,7 +63,6 @@ while true do
     if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
         break
     end
-    vshard.storage.garbage_collector_wakeup()
     fiber.sleep(0.01)
 end;
 _ = test_run:cmd("setopt delimiter ''");
@@ -72,7 +70,6 @@ vshard.storage.buckets_info(1)
 vshard.storage.bucket_refro(1)
 finish_refs = true
 while f1:status() ~= 'dead' do fiber.sleep(0.01) end
-vshard.storage.buckets_info(1)
 wait_bucket_is_collected(1)
 _ = test_run:switch('box_2_a')
 vshard.storage.buckets_info(1)
diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
index e50eb72..0ddb1c9 100644
--- a/test/rebalancer/errinj.result
+++ b/test/rebalancer/errinj.result
@@ -226,17 +226,6 @@ ret2, err2
 - true
 - null
 ...
-_bucket:get{35}
----
-- [35, 'sent', '<replicaset_2>']
-...
-_bucket:get{36}
----
-- [36, 'sent', '<replicaset_2>']
-...
--- Buckets became 'active' on box_2_a, but still are sending on
--- box_1_a. Wait until it is marked as garbage on box_1_a by the
--- recovery fiber.
 wait_bucket_is_collected(35)
 ---
 ...
diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
index 2cc4a69..a60f3d7 100644
--- a/test/rebalancer/errinj.test.lua
+++ b/test/rebalancer/errinj.test.lua
@@ -102,11 +102,6 @@ _ = test_run:switch('box_1_a')
 while f1:status() ~= 'dead' or f2:status() ~= 'dead' do fiber.sleep(0.001) end
 ret1, err1
 ret2, err2
-_bucket:get{35}
-_bucket:get{36}
--- Buckets became 'active' on box_2_a, but still are sending on
--- box_1_a. Wait until it is marked as garbage on box_1_a by the
--- recovery fiber.
 wait_bucket_is_collected(35)
 wait_bucket_is_collected(36)
 _ = test_run:switch('box_2_a')
diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
index 7d3612b..ad93445 100644
--- a/test/rebalancer/receiving_bucket.result
+++ b/test/rebalancer/receiving_bucket.result
@@ -366,14 +366,6 @@ vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
 ---
 - true
 ...
-vshard.storage.buckets_info(1)
----
-- 1:
-    status: sent
-    ro_lock: true
-    destination: <replicaset_1>
-    id: 1
-...
 wait_bucket_is_collected(1)
 ---
 ...
diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
index 24534b3..2cf6382 100644
--- a/test/rebalancer/receiving_bucket.test.lua
+++ b/test/rebalancer/receiving_bucket.test.lua
@@ -136,7 +136,6 @@ box.space.test3:select{100}
 -- Now the bucket is unreferenced and can be transferred.
 _ = test_run:switch('box_2_a')
 vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
-vshard.storage.buckets_info(1)
 wait_bucket_is_collected(1)
 vshard.storage.buckets_info(1)
 _ = test_run:switch('box_1_a')
diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
index 753687f..9d30a04 100644
--- a/test/reload_evolution/storage.result
+++ b/test/reload_evolution/storage.result
@@ -92,7 +92,7 @@ test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to')
 ...
 vshard.storage.internal.reload_version
 ---
-- 2
+- 3
 ...
 --
 -- gh-237: should be only one trigger. During gh-237 the trigger installation
diff --git a/test/router/reroute_wrong_bucket.result b/test/router/reroute_wrong_bucket.result
index 049bdef..ac340eb 100644
--- a/test/router/reroute_wrong_bucket.result
+++ b/test/router/reroute_wrong_bucket.result
@@ -37,7 +37,7 @@ test_run:switch('storage_1_a')
 ---
 - true
 ...
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 ---
 ...
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
@@ -53,7 +53,7 @@ test_run:switch('storage_2_a')
 ---
 - true
 ...
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 ---
 ...
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
@@ -202,12 +202,12 @@ test_run:grep_log('router_1', 'please update configuration')
 err
 ---
 - bucket_id: 100
-  reason: write is prohibited
+  reason: Not found
   code: 1
   destination: ac522f65-aa94-4134-9f64-51ee384f1a54
   type: ShardingError
   name: WRONG_BUCKET
-  message: 'Cannot perform action with bucket 100, reason: write is prohibited'
+  message: 'Cannot perform action with bucket 100, reason: Not found'
 ...
 --
 -- Now try again, but update configuration during call(). It must
diff --git a/test/router/reroute_wrong_bucket.test.lua b/test/router/reroute_wrong_bucket.test.lua
index 9e6e804..207aac3 100644
--- a/test/router/reroute_wrong_bucket.test.lua
+++ b/test/router/reroute_wrong_bucket.test.lua
@@ -11,13 +11,13 @@ util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memt
 test_run:cmd('create server router_1 with script="router/router_1.lua"')
 test_run:cmd('start server router_1')
 test_run:switch('storage_1_a')
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
 vshard.storage.rebalancer_disable()
 for i = 1, 100 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
 
 test_run:switch('storage_2_a')
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
 vshard.storage.rebalancer_disable()
 for i = 101, 200 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
diff --git a/test/storage/recovery.result b/test/storage/recovery.result
index f833fe7..8ccb0b9 100644
--- a/test/storage/recovery.result
+++ b/test/storage/recovery.result
@@ -79,8 +79,7 @@ _bucket = box.space._bucket
 ...
 _bucket:select{}
 ---
-- - [2, 'garbage', '<replicaset_2>']
-  - [3, 'garbage', '<replicaset_2>']
+- []
 ...
 _ = test_run:switch('storage_2_a')
 ---
diff --git a/test/storage/storage.result b/test/storage/storage.result
index 424bc4c..0550ad1 100644
--- a/test/storage/storage.result
+++ b/test/storage/storage.result
@@ -547,6 +547,9 @@ vshard.storage.bucket_send(1, util.replicasets[2])
 ---
 - true
 ...
+wait_bucket_is_collected(1)
+---
+...
 _ = test_run:switch("storage_2_a")
 ---
 ...
@@ -567,12 +570,7 @@ _ = test_run:switch("storage_1_a")
 ...
 vshard.storage.buckets_info()
 ---
-- 1:
-    status: sent
-    ro_lock: true
-    destination: <replicaset_2>
-    id: 1
-  2:
+- 2:
     status: active
     id: 2
 ...
diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
index d631b51..d8fbd94 100644
--- a/test/storage/storage.test.lua
+++ b/test/storage/storage.test.lua
@@ -136,6 +136,7 @@ vshard.storage.bucket_send(1, util.replicasets[1])
 
 -- Successful transfer.
 vshard.storage.bucket_send(1, util.replicasets[2])
+wait_bucket_is_collected(1)
 _ = test_run:switch("storage_2_a")
 vshard.storage.buckets_info()
 _ = test_run:switch("storage_1_a")
diff --git a/test/unit/config.result b/test/unit/config.result
index dfd0219..e0b2482 100644
--- a/test/unit/config.result
+++ b/test/unit/config.result
@@ -428,33 +428,6 @@ _ = lcfg.check(cfg)
 --
 -- gh-77: garbage collection options.
 --
-cfg.collect_bucket_garbage_interval = 'str'
----
-...
-check(cfg)
----
-- Garbage bucket collect interval must be positive number
-...
-cfg.collect_bucket_garbage_interval = 0
----
-...
-check(cfg)
----
-- Garbage bucket collect interval must be positive number
-...
-cfg.collect_bucket_garbage_interval = -1
----
-...
-check(cfg)
----
-- Garbage bucket collect interval must be positive number
-...
-cfg.collect_bucket_garbage_interval = 100.5
----
-...
-_ = lcfg.check(cfg)
----
-...
 cfg.collect_lua_garbage = 100
 ---
 ...
@@ -615,6 +588,12 @@ lcfg.check(cfg).rebalancer_max_sending
 cfg.rebalancer_max_sending = nil
 ---
 ...
-cfg.sharding = nil
+--
+-- Deprecated option does not break anything.
+--
+cfg.collect_bucket_garbage_interval = 100
+---
+...
+_ = lcfg.check(cfg)
 ---
 ...
diff --git a/test/unit/config.test.lua b/test/unit/config.test.lua
index ada43db..a1c9f07 100644
--- a/test/unit/config.test.lua
+++ b/test/unit/config.test.lua
@@ -175,15 +175,6 @@ _ = lcfg.check(cfg)
 --
 -- gh-77: garbage collection options.
 --
-cfg.collect_bucket_garbage_interval = 'str'
-check(cfg)
-cfg.collect_bucket_garbage_interval = 0
-check(cfg)
-cfg.collect_bucket_garbage_interval = -1
-check(cfg)
-cfg.collect_bucket_garbage_interval = 100.5
-_ = lcfg.check(cfg)
-
 cfg.collect_lua_garbage = 100
 check(cfg)
 cfg.collect_lua_garbage = true
@@ -244,4 +235,9 @@ util.check_error(lcfg.check, cfg)
 cfg.rebalancer_max_sending = 15
 lcfg.check(cfg).rebalancer_max_sending
 cfg.rebalancer_max_sending = nil
-cfg.sharding = nil
+
+--
+-- Deprecated option does not break anything.
+--
+cfg.collect_bucket_garbage_interval = 100
+_ = lcfg.check(cfg)
diff --git a/test/unit/garbage.result b/test/unit/garbage.result
index 74d9ccf..a530496 100644
--- a/test/unit/garbage.result
+++ b/test/unit/garbage.result
@@ -31,9 +31,6 @@ test_run:cmd("setopt delimiter ''");
 vshard.storage.internal.shard_index = 'bucket_id'
 ---
 ...
-vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
----
-...
 --
 -- Find nothing if no bucket_id anywhere, or there is no index
 -- by it, or bucket_id is not unsigned.
@@ -151,6 +148,9 @@ format[1] = {name = 'id', type = 'unsigned'}
 format[2] = {name = 'status', type = 'string'}
 ---
 ...
+format[3] = {name = 'destination', type = 'string', is_nullable = true}
+---
+...
 _bucket = box.schema.create_space('_bucket', {format = format})
 ---
 ...
@@ -172,22 +172,6 @@ _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
 ---
 - [3, 'active']
 ...
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
----
-- [4, 'sent']
-...
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
----
-- [5, 'garbage']
-...
-_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
----
-- [6, 'garbage']
-...
-_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
----
-- [200, 'garbage']
-...
 s = box.schema.create_space('test', {engine = engine})
 ---
 ...
@@ -213,7 +197,7 @@ s:replace{4, 2}
 ---
 - [4, 2]
 ...
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
+gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
 ---
 ...
 s2 = box.schema.create_space('test2', {engine = engine})
@@ -249,6 +233,10 @@ function fill_spaces_with_garbage()
     s2:replace{6, 4}
     s2:replace{7, 5}
     s2:replace{7, 6}
+    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
+    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
+    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
+    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
 end;
 ---
 ...
@@ -267,12 +255,22 @@ fill_spaces_with_garbage()
 ---
 - 1107
 ...
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
+route_map = {}
+---
+...
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
 ---
-- - 5
-  - 6
-  - 200
 - true
+- null
+...
+route_map
+---
+- - null
+  - null
+  - null
+  - null
+  - null
+  - destination2
 ...
 #s2:select{}
 ---
@@ -282,10 +280,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
 ---
 - 7
 ...
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+route_map = {}
+---
+...
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
 ---
-- - 4
 - true
+- null
+...
+route_map
+---
+- - null
+  - null
+  - null
+  - destination1
 ...
 s2:select{}
 ---
@@ -303,17 +311,22 @@ s:select{}
   - [6, 100]
 ...
 -- Nothing deleted - update collected generation.
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
+route_map = {}
+---
+...
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
 ---
-- - 5
-  - 6
-  - 200
 - true
+- null
 ...
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
 ---
-- - 4
 - true
+- null
+...
+route_map
+---
+- []
 ...
 #s2:select{}
 ---
@@ -329,15 +342,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
 fill_spaces_with_garbage()
 ---
 ...
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
+_ = _bucket:on_replace(function()                                               \
+    local gen = vshard.storage.internal.bucket_generation                       \
+    vshard.storage.internal.bucket_generation = gen + 1                         \
+    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
+end)
 ---
 ...
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
 ---
 ...
 -- Wait until garbage collection is finished.
-while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
 ---
+- true
 ...
 s:select{}
 ---
@@ -360,7 +378,6 @@ _bucket:select{}
 - - [1, 'active']
   - [2, 'receiving']
   - [3, 'active']
-  - [4, 'sent']
 ...
 --
 -- Test deletion of 'sent' buckets after a specified timeout.
@@ -370,8 +387,9 @@ _bucket:replace{2, vshard.consts.BUCKET.SENT}
 - [2, 'sent']
 ...
 -- Wait deletion after a while.
-while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{2} end)
 ---
+- true
 ...
 _bucket:select{}
 ---
@@ -410,8 +428,9 @@ _bucket:replace{4, vshard.consts.BUCKET.SENT}
 ---
 - [4, 'sent']
 ...
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 ---
+- true
 ...
 --
 -- Test WAL errors during deletion from _bucket.
@@ -434,11 +453,14 @@ s:replace{6, 4}
 ---
 - [6, 4]
 ...
-while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_log('default', 'Error during garbage collection step',            \
+                  65536, 10)
 ---
+- Error during garbage collection step
 ...
-while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return #sk:select{4} == 0 end)
 ---
+- true
 ...
 s:select{}
 ---
@@ -454,8 +476,9 @@ _bucket:select{}
 _ = _bucket:on_replace(nil, rollback_on_delete)
 ---
 ...
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 ---
+- true
 ...
 f:cancel()
 ---
@@ -562,8 +585,9 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
 ---
 ...
-while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return _bucket:count() == 0 end)
 ---
+- true
 ...
 _bucket:select{}
 ---
diff --git a/test/unit/garbage.test.lua b/test/unit/garbage.test.lua
index 30079fa..250afb0 100644
--- a/test/unit/garbage.test.lua
+++ b/test/unit/garbage.test.lua
@@ -15,7 +15,6 @@ end;
 test_run:cmd("setopt delimiter ''");
 
 vshard.storage.internal.shard_index = 'bucket_id'
-vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
 
 --
 -- Find nothing if no bucket_id anywhere, or there is no index
@@ -75,16 +74,13 @@ s:drop()
 format = {}
 format[1] = {name = 'id', type = 'unsigned'}
 format[2] = {name = 'status', type = 'string'}
+format[3] = {name = 'destination', type = 'string', is_nullable = true}
 _bucket = box.schema.create_space('_bucket', {format = format})
 _ = _bucket:create_index('pk')
 _ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
 _bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
 _bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
 _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
-_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
-_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
 
 s = box.schema.create_space('test', {engine = engine})
 pk = s:create_index('pk')
@@ -94,7 +90,7 @@ s:replace{2, 1}
 s:replace{3, 2}
 s:replace{4, 2}
 
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
+gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
 s2 = box.schema.create_space('test2', {engine = engine})
 pk2 = s2:create_index('pk')
 sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
@@ -114,6 +110,10 @@ function fill_spaces_with_garbage()
     s2:replace{6, 4}
     s2:replace{7, 5}
     s2:replace{7, 6}
+    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
+    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
+    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
+    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
 end;
 test_run:cmd("setopt delimiter ''");
 
@@ -121,15 +121,21 @@ fill_spaces_with_garbage()
 
 #s2:select{}
 #s:select{}
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
+route_map = {}
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
+route_map
 #s2:select{}
 #s:select{}
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+route_map = {}
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
+route_map
 s2:select{}
 s:select{}
 -- Nothing deleted - update collected generation.
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+route_map = {}
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
+route_map
 #s2:select{}
 #s:select{}
 
@@ -137,10 +143,14 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
 -- Test continuous garbage collection via background fiber.
 --
 fill_spaces_with_garbage()
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
+_ = _bucket:on_replace(function()                                               \
+    local gen = vshard.storage.internal.bucket_generation                       \
+    vshard.storage.internal.bucket_generation = gen + 1                         \
+    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
+end)
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
 -- Wait until garbage collection is finished.
-while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
 s:select{}
 s2:select{}
 -- Check garbage bucket is deleted by background fiber.
@@ -150,7 +160,7 @@ _bucket:select{}
 --
 _bucket:replace{2, vshard.consts.BUCKET.SENT}
 -- Wait deletion after a while.
-while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{2} end)
 _bucket:select{}
 s:select{}
 s2:select{}
@@ -162,7 +172,7 @@ _bucket:replace{4, vshard.consts.BUCKET.ACTIVE}
 s:replace{5, 4}
 s:replace{6, 4}
 _bucket:replace{4, vshard.consts.BUCKET.SENT}
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 
 --
 -- Test WAL errors during deletion from _bucket.
@@ -172,12 +182,13 @@ _ = _bucket:on_replace(rollback_on_delete)
 _bucket:replace{4, vshard.consts.BUCKET.SENT}
 s:replace{5, 4}
 s:replace{6, 4}
-while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
-while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_log('default', 'Error during garbage collection step',            \
+                  65536, 10)
+test_run:wait_cond(function() return #sk:select{4} == 0 end)
 s:select{}
 _bucket:select{}
 _ = _bucket:on_replace(nil, rollback_on_delete)
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 
 f:cancel()
 
@@ -220,7 +231,7 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
 #s:select{}
 #s2:select{}
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
-while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return _bucket:count() == 0 end)
 _bucket:select{}
 s:select{}
 s2:select{}
diff --git a/test/unit/garbage_errinj.result b/test/unit/garbage_errinj.result
deleted file mode 100644
index 92c8039..0000000
--- a/test/unit/garbage_errinj.result
+++ /dev/null
@@ -1,223 +0,0 @@
-test_run = require('test_run').new()
----
-...
-vshard = require('vshard')
----
-...
-fiber = require('fiber')
----
-...
-engine = test_run:get_cfg('engine')
----
-...
-vshard.storage.internal.shard_index = 'bucket_id'
----
-...
-format = {}
----
-...
-format[1] = {name = 'id', type = 'unsigned'}
----
-...
-format[2] = {name = 'status', type = 'string', is_nullable = true}
----
-...
-_bucket = box.schema.create_space('_bucket', {format = format})
----
-...
-_ = _bucket:create_index('pk')
----
-...
-_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
----
-...
-_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
----
-- [1, 'active']
-...
-_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
----
-- [2, 'receiving']
-...
-_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
----
-- [3, 'active']
-...
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
----
-- [4, 'sent']
-...
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
----
-- [5, 'garbage']
-...
-s = box.schema.create_space('test', {engine = engine})
----
-...
-pk = s:create_index('pk')
----
-...
-sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
----
-...
-s:replace{1, 1}
----
-- [1, 1]
-...
-s:replace{2, 1}
----
-- [2, 1]
-...
-s:replace{3, 2}
----
-- [3, 2]
-...
-s:replace{4, 2}
----
-- [4, 2]
-...
-s:replace{5, 100}
----
-- [5, 100]
-...
-s:replace{6, 100}
----
-- [6, 100]
-...
-s:replace{7, 4}
----
-- [7, 4]
-...
-s:replace{8, 5}
----
-- [8, 5]
-...
-s2 = box.schema.create_space('test2', {engine = engine})
----
-...
-pk2 = s2:create_index('pk')
----
-...
-sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
----
-...
-s2:replace{1, 1}
----
-- [1, 1]
-...
-s2:replace{3, 3}
----
-- [3, 3]
-...
-for i = 7, 1107 do s:replace{i, 200} end
----
-...
-s2:replace{4, 200}
----
-- [4, 200]
-...
-s2:replace{5, 100}
----
-- [5, 100]
-...
-s2:replace{5, 300}
----
-- [5, 300]
-...
-s2:replace{6, 4}
----
-- [6, 4]
-...
-s2:replace{7, 5}
----
-- [7, 5]
-...
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
----
-...
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
----
-- - 4
-- true
-...
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
----
-- - 5
-- true
-...
---
--- Test _bucket generation change during garbage buckets search.
---
-s:truncate()
----
-...
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
----
-...
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
----
-...
-f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
----
-...
-_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
----
-- [4, 'garbage']
-...
-s:replace{5, 4}
----
-- [5, 4]
-...
-s:replace{6, 4}
----
-- [6, 4]
-...
-#s:select{}
----
-- 2
-...
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
----
-...
-while f:status() ~= 'dead' do fiber.sleep(0.1) end
----
-...
--- Nothing is deleted - _bucket:replace() has changed _bucket
--- generation during search of garbage buckets.
-#s:select{}
----
-- 2
-...
-_bucket:select{4}
----
-- - [4, 'garbage']
-...
--- Next step deletes garbage ok.
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
----
-- []
-- true
-...
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
----
-- - 4
-  - 5
-- true
-...
-#s:select{}
----
-- 0
-...
-_bucket:delete{4}
----
-- [4, 'garbage']
-...
-s2:drop()
----
-...
-s:drop()
----
-...
-_bucket:drop()
----
-...
diff --git a/test/unit/garbage_errinj.test.lua b/test/unit/garbage_errinj.test.lua
deleted file mode 100644
index 31184b9..0000000
--- a/test/unit/garbage_errinj.test.lua
+++ /dev/null
@@ -1,73 +0,0 @@
-test_run = require('test_run').new()
-vshard = require('vshard')
-fiber = require('fiber')
-
-engine = test_run:get_cfg('engine')
-vshard.storage.internal.shard_index = 'bucket_id'
-
-format = {}
-format[1] = {name = 'id', type = 'unsigned'}
-format[2] = {name = 'status', type = 'string', is_nullable = true}
-_bucket = box.schema.create_space('_bucket', {format = format})
-_ = _bucket:create_index('pk')
-_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
-_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
-_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
-_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
-
-s = box.schema.create_space('test', {engine = engine})
-pk = s:create_index('pk')
-sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
-s:replace{1, 1}
-s:replace{2, 1}
-s:replace{3, 2}
-s:replace{4, 2}
-s:replace{5, 100}
-s:replace{6, 100}
-s:replace{7, 4}
-s:replace{8, 5}
-
-s2 = box.schema.create_space('test2', {engine = engine})
-pk2 = s2:create_index('pk')
-sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
-s2:replace{1, 1}
-s2:replace{3, 3}
-for i = 7, 1107 do s:replace{i, 200} end
-s2:replace{4, 200}
-s2:replace{5, 100}
-s2:replace{5, 300}
-s2:replace{6, 4}
-s2:replace{7, 5}
-
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
-
---
--- Test _bucket generation change during garbage buckets search.
---
-s:truncate()
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
-f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
-_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
-s:replace{5, 4}
-s:replace{6, 4}
-#s:select{}
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
-while f:status() ~= 'dead' do fiber.sleep(0.1) end
--- Nothing is deleted - _bucket:replace() has changed _bucket
--- generation during search of garbage buckets.
-#s:select{}
-_bucket:select{4}
--- Next step deletes garbage ok.
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
-#s:select{}
-_bucket:delete{4}
-
-s2:drop()
-s:drop()
-_bucket:drop()
diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index 28c3400..1345058 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -245,9 +245,7 @@ local cfg_template = {
         max = consts.REBALANCER_MAX_SENDING_MAX
     },
     collect_bucket_garbage_interval = {
-        type = 'positive number', name = 'Garbage bucket collect interval',
-        is_optional = true,
-        default = consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
+        name = 'Garbage bucket collect interval', is_deprecated = true,
     },
     collect_lua_garbage = {
         type = 'boolean', name = 'Garbage Lua collect necessity',
diff --git a/vshard/consts.lua b/vshard/consts.lua
index 8c2a8b0..3f1585a 100644
--- a/vshard/consts.lua
+++ b/vshard/consts.lua
@@ -23,6 +23,7 @@ return {
     DEFAULT_BUCKET_COUNT = 3000;
     BUCKET_SENT_GARBAGE_DELAY = 0.5;
     BUCKET_CHUNK_SIZE = 1000;
+    LUA_CHUNK_SIZE = 100000,
     DEFAULT_REBALANCER_DISBALANCE_THRESHOLD = 1;
     REBALANCER_IDLE_INTERVAL = 60 * 60;
     REBALANCER_WORK_INTERVAL = 10;
@@ -37,7 +38,7 @@ return {
     DEFAULT_FAILOVER_PING_TIMEOUT = 5;
     DEFAULT_SYNC_TIMEOUT = 1;
     RECONNECT_TIMEOUT = 0.5;
-    DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL = 0.5;
+    GC_BACKOFF_INTERVAL = 5,
     RECOVERY_INTERVAL = 5;
     COLLECT_LUA_GARBAGE_INTERVAL = 100;
 
@@ -45,4 +46,6 @@ return {
     DISCOVERY_WORK_INTERVAL = 1,
     DISCOVERY_WORK_STEP = 0.01,
     DISCOVERY_TIMEOUT = 10,
+
+    TIMEOUT_INFINITY = 500 * 365 * 86400,
 }
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 298df71..31a6fc7 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -69,7 +69,6 @@ if not M then
         total_bucket_count = 0,
         errinj = {
             ERRINJ_CFG = false,
-            ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false,
             ERRINJ_RELOAD = false,
             ERRINJ_CFG_DELAY = false,
             ERRINJ_LONG_RECEIVE = false,
@@ -96,6 +95,8 @@ if not M then
         -- detect that _bucket was not changed between yields.
         --
         bucket_generation = 0,
+        -- Condition variable fired on generation update.
+        bucket_generation_cond = lfiber.cond(),
         --
         -- Reference to the function used as on_replace trigger on
         -- _bucket space. It is used to replace the trigger with
@@ -107,12 +108,14 @@ if not M then
         -- replace the old function is to keep its reference.
         --
         bucket_on_replace = nil,
+        -- Redirects for recently sent buckets. They are kept for a while to
+        -- help routers to find a new location for sent and deleted buckets
+        -- without whole cluster scan.
+        route_map = {},
 
         ------------------- Garbage collection -------------------
         -- Fiber to remove garbage buckets data.
         collect_bucket_garbage_fiber = nil,
-        -- Do buckets garbage collection once per this time.
-        collect_bucket_garbage_interval = nil,
         -- Boolean lua_gc state (create periodic gc task).
         collect_lua_garbage = nil,
 
@@ -173,6 +176,7 @@ end
 --
 local function bucket_generation_increment()
     M.bucket_generation = M.bucket_generation + 1
+    M.bucket_generation_cond:broadcast()
 end
 
 --
@@ -758,8 +762,9 @@ local function bucket_check_state(bucket_id, mode)
     else
         return bucket
     end
+    local dst = bucket and bucket.destination or M.route_map[bucket_id]
     return bucket, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, reason,
-                                 bucket and bucket.destination)
+                                 dst)
 end
 
 --
@@ -804,11 +809,23 @@ end
 --
 local function bucket_unrefro(bucket_id)
     local ref = M.bucket_refs[bucket_id]
-    if not ref or ref.ro == 0 then
+    local count = ref and ref.ro or 0
+    if count == 0 then
         return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id,
                                   "no refs", nil)
     end
-    ref.ro = ref.ro - 1
+    if count == 1 then
+        ref.ro = 0
+        if ref.ro_lock then
+            -- Garbage collector is waiting for the bucket if RO
+            -- is locked. Let it know it has one more bucket to
+            -- collect. It relies on generation, so its increment
+            -- it enough.
+            bucket_generation_increment()
+        end
+        return true
+    end
+    ref.ro = count - 1
     return true
 end
 
@@ -1479,79 +1496,44 @@ local function gc_bucket_in_space(space, bucket_id, status)
 end
 
 --
--- Remove tuples from buckets of a specified type.
--- @param type Type of buckets to gc.
--- @retval List of ids of empty buckets of the type.
+-- Drop buckets with the given status along with their data in all spaces.
+-- @param status Status of target buckets.
+-- @param route_map Destinations of deleted buckets are saved into this table.
 --
-local function gc_bucket_step_by_type(type)
-    local sharded_spaces = find_sharded_spaces()
-    local empty_buckets = {}
+local function gc_bucket_drop_xc(status, route_map)
     local limit = consts.BUCKET_CHUNK_SIZE
-    local is_all_collected = true
-    for _, bucket in box.space._bucket.index.status:pairs(type) do
-        local bucket_id = bucket.id
-        local ref = M.bucket_refs[bucket_id]
+    local _bucket = box.space._bucket
+    local sharded_spaces = find_sharded_spaces()
+    for _, b in _bucket.index.status:pairs(status) do
+        local id = b.id
+        local ref = M.bucket_refs[id]
         if ref then
             assert(ref.rw == 0)
             if ref.ro ~= 0 then
                 ref.ro_lock = true
-                is_all_collected = false
                 goto continue
             end
-            M.bucket_refs[bucket_id] = nil
+            M.bucket_refs[id] = nil
         end
         for _, space in pairs(sharded_spaces) do
-            gc_bucket_in_space_xc(space, bucket_id, type)
+            gc_bucket_in_space_xc(space, id, status)
             limit = limit - 1
             if limit == 0 then
                 lfiber.sleep(0)
                 limit = consts.BUCKET_CHUNK_SIZE
             end
         end
-        table.insert(empty_buckets, bucket.id)
-::continue::
+        route_map[id] = b.destination
+        _bucket:delete{id}
+    ::continue::
     end
-    return empty_buckets, is_all_collected
-end
-
---
--- Drop buckets with ids in the list.
--- @param bucket_ids Bucket ids to drop.
--- @param status Expected bucket status.
---
-local function gc_bucket_drop_xc(bucket_ids, status)
-    if #bucket_ids == 0 then
-        return
-    end
-    local limit = consts.BUCKET_CHUNK_SIZE
-    box.begin()
-    local _bucket = box.space._bucket
-    for _, id in pairs(bucket_ids) do
-        local bucket_exists = _bucket:get{id} ~= nil
-        local b = _bucket:get{id}
-        if b then
-            if b.status ~= status then
-                return error(string.format('Bucket %d status is changed. Was '..
-                                           '%s, became %s', id, status,
-                                           b.status))
-            end
-            _bucket:delete{id}
-        end
-        limit = limit - 1
-        if limit == 0 then
-            box.commit()
-            box.begin()
-            limit = consts.BUCKET_CHUNK_SIZE
-        end
-    end
-    box.commit()
 end
 
 --
 -- Exception safe version of gc_bucket_drop_xc.
 --
-local function gc_bucket_drop(bucket_ids, status)
-    local status, err = pcall(gc_bucket_drop_xc, bucket_ids, status)
+local function gc_bucket_drop(status, route_map)
+    local status, err = pcall(gc_bucket_drop_xc, status, route_map)
     if not status then
         box.rollback()
     end
@@ -1578,65 +1560,75 @@ function gc_bucket_f()
     -- generation == bucket generation. In such a case the fiber
     -- does nothing until next _bucket change.
     local bucket_generation_collected = -1
-    -- Empty sent buckets are collected into an array. After a
-    -- specified time interval the buckets are deleted both from
-    -- this array and from _bucket space.
-    local buckets_for_redirect = {}
-    local buckets_for_redirect_ts = clock()
-    -- Empty sent buckets, updated after each step, and when
-    -- buckets_for_redirect is deleted, it gets empty_sent_buckets
-    -- for next deletion.
-    local empty_garbage_buckets, empty_sent_buckets, status, err
+    local bucket_generation_current = M.bucket_generation
+    -- Deleted buckets are saved into a route map to redirect routers if they
+    -- didn't discover new location of the buckets yet. However route map does
+    -- not grow infinitely. Otherwise it would end up storing redirects for all
+    -- buckets in the cluster. Which could also be outdated.
+    -- Garbage collector periodically drops old routes from the map. For that it
+    -- remembers state of route map in one moment, and after a while clears the
+    -- remembered routes from the global route map.
+    local route_map = M.route_map
+    local route_map_old = {}
+    local route_map_deadline = 0
+    local status, err
     while M.module_version == module_version do
-        -- Check if no changes in buckets configuration.
-        if bucket_generation_collected ~= M.bucket_generation then
-            local bucket_generation = M.bucket_generation
-            local is_sent_collected, is_garbage_collected
-            status, empty_garbage_buckets, is_garbage_collected =
-                pcall(gc_bucket_step_by_type, consts.BUCKET.GARBAGE)
-            if not status then
-                err = empty_garbage_buckets
-                goto check_error
-            end
-            status, empty_sent_buckets, is_sent_collected =
-                pcall(gc_bucket_step_by_type, consts.BUCKET.SENT)
-            if not status then
-                err = empty_sent_buckets
-                goto check_error
+        if bucket_generation_collected ~= bucket_generation_current then
+            status, err = gc_bucket_drop(consts.BUCKET.GARBAGE, route_map)
+            if status then
+                status, err = gc_bucket_drop(consts.BUCKET.SENT, route_map)
             end
-            status, err = gc_bucket_drop(empty_garbage_buckets,
-                                         consts.BUCKET.GARBAGE)
-::check_error::
             if not status then
                 box.rollback()
                 log.error('Error during garbage collection step: %s', err)
-                goto continue
+            else
+                -- Don't use global generation. During the collection it could
+                -- already change. Instead, remember the generation known before
+                -- the collection has started.
+                -- Since the collection also changes the generation, it makes
+                -- the GC happen always at least twice. But typically on the
+                -- second iteration it should not find any buckets to collect,
+                -- and then the collected generation matches the global one.
+                bucket_generation_collected = bucket_generation_current
             end
-            if is_sent_collected and is_garbage_collected then
-                bucket_generation_collected = bucket_generation
+        else
+            status = true
+        end
+
+        local sleep_time = route_map_deadline - clock()
+        if sleep_time <= 0 then
+            local chunk = consts.LUA_CHUNK_SIZE
+            util.table_minus_yield(route_map, route_map_old, chunk)
+            route_map_old = util.table_copy_yield(route_map, chunk)
+            if next(route_map_old) then
+                sleep_time = consts.BUCKET_SENT_GARBAGE_DELAY
+            else
+                sleep_time = consts.TIMEOUT_INFINITY
             end
+            route_map_deadline = clock() + sleep_time
         end
+        bucket_generation_current = M.bucket_generation
 
-        if clock() - buckets_for_redirect_ts >=
-           consts.BUCKET_SENT_GARBAGE_DELAY then
-            status, err = gc_bucket_drop(buckets_for_redirect,
-                                         consts.BUCKET.SENT)
-            if not status then
-                buckets_for_redirect = {}
-                empty_sent_buckets = {}
-                bucket_generation_collected = -1
-                log.error('Error during deletion of empty sent buckets: %s',
-                          err)
-            elseif M.module_version ~= module_version then
-                return
+        if bucket_generation_current ~= bucket_generation_collected then
+            -- Generation was changed during collection. Or *by* collection.
+            if status then
+                -- Retry immediately. If the generation was changed by the
+                -- collection itself, it will notice it next iteration, and go
+                -- to proper sleep.
+                sleep_time = 0
             else
-                buckets_for_redirect = empty_sent_buckets or {}
-                empty_sent_buckets = nil
-                buckets_for_redirect_ts = clock()
+                -- An error happened during the collection. Does not make sense
+                -- to retry on each iteration of the event loop. The most likely
+                -- errors are either a WAL error or a transaction abort - both
+                -- look like an issue in the user's code and can't be fixed
+                -- quickly anyway. Backoff.
+                sleep_time = consts.GC_BACKOFF_INTERVAL
             end
         end
-::continue::
-        lfiber.sleep(M.collect_bucket_garbage_interval)
+
+        if M.module_version == module_version then
+            M.bucket_generation_cond:wait(sleep_time)
+        end
     end
 end
 
@@ -2421,8 +2413,6 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
         vshard_cfg.rebalancer_disbalance_threshold
     M.rebalancer_receiving_quota = vshard_cfg.rebalancer_max_receiving
     M.shard_index = vshard_cfg.shard_index
-    M.collect_bucket_garbage_interval =
-        vshard_cfg.collect_bucket_garbage_interval
     M.collect_lua_garbage = vshard_cfg.collect_lua_garbage
     M.rebalancer_worker_count = vshard_cfg.rebalancer_max_sending
     M.current_cfg = cfg
@@ -2676,6 +2666,9 @@ else
         storage_cfg(M.current_cfg, M.this_replica.uuid, true)
     end
     M.module_version = M.module_version + 1
+    -- Background fibers could sleep waiting for bucket changes.
+    -- Let them know it is time to reload.
+    bucket_generation_increment()
 end
 
 M.recovery_f = recovery_f
@@ -2686,7 +2679,7 @@ M.gc_bucket_f = gc_bucket_f
 -- These functions are saved in M not for atomic reload, but for
 -- unit testing.
 --
-M.gc_bucket_step_by_type = gc_bucket_step_by_type
+M.gc_bucket_drop = gc_bucket_drop
 M.rebalancer_build_routes = rebalancer_build_routes
 M.rebalancer_calculate_metrics = rebalancer_calculate_metrics
 M.cached_find_sharded_spaces = find_sharded_spaces
diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
index f38af74..484f499 100644
--- a/vshard/storage/reload_evolution.lua
+++ b/vshard/storage/reload_evolution.lua
@@ -4,6 +4,7 @@
 -- in a commit.
 --
 local log = require('log')
+local fiber = require('fiber')
 
 --
 -- Array of upgrade functions.
@@ -25,6 +26,13 @@ migrations[#migrations + 1] = function(M)
     end
 end
 
+migrations[#migrations + 1] = function(M)
+    if not M.route_map then
+        M.bucket_generation_cond = fiber.cond()
+        M.route_map = {}
+    end
+end
+
 --
 -- Perform an update based on a version stored in `M` (internals).
 -- @param M Old module internals which should be updated.
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
  2021-02-10 22:35     ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  9:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch.

As I see you've introduced some new parameters: "LUA_CHUNK_SIZE" and 
"GC_BACKOFF_INTERVAL".

I think it's better to describe them in commit message to understand 
more clear how new algorithm.

I see that you didn't update comment above "gc_bucket_f" function. Is it 
still relevant?

In general patch LGTM.


On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Garbage collector is a fiber on a master node which deletes
> GARBAGE and SENT buckets along with their data.
>
> It was proactive. It used to wakeup with a constant period to
> find and delete the needed buckets.
>
> But this won't work with the future feature called 'map-reduce'.
> Map-reduce as a preparation stage will need to ensure that all
> buckets on a storage are readable and writable. With the current
> GC algorithm if a bucket is sent, it won't be deleted for the next
> 5 seconds by default. During this time all new map-reduce requests
> can't execute.
>
> This is not acceptable. As well as too frequent wakeup of GC fiber
> because it would waste TX thread time.
>
> The patch makes GC fiber wakeup not by a timeout but by events
> happening with _bucket space. GC fiber sleeps on a condition
> variable which is signaled when _bucket is changed.
>
> Once GC sees work to do, it won't sleep until it is done. It will
> only yield.
>
> This makes GC delete SENT and GARBAGE buckets as soon as possible
> reducing the waiting time for the incoming map-reduce requests.
>
> Needed for #147
>
> @TarantoolBot document
> Title: VShard: deprecate cfg option 'collect_bucket_garbage_interval'
> It was used to specify the interval between bucket garbage
> collection steps. It was needed because garbage collection in
> vshard was proactive. It didn't react to newly appeared garbage
> buckets immediately.
>
> Since now (0.1.17) garbage collection became reactive. It starts
> working with garbage buckets immediately as they appear. And
> sleeps rest of the time. The option is not used now and does not
> affect behaviour of anything.
>
> I suppose it can be deleted from the documentation. Or left with
> a big label 'deprecated' + the explanation above.
>
> An attempt to use the option does not cause an error, but logs a
> warning.
> ---
>   test/lua_libs/storage_template.lua        |   1 -
>   test/misc/reconfigure.result              |  10 -
>   test/misc/reconfigure.test.lua            |   3 -
>   test/rebalancer/bucket_ref.result         |  12 --
>   test/rebalancer/bucket_ref.test.lua       |   3 -
>   test/rebalancer/errinj.result             |  11 --
>   test/rebalancer/errinj.test.lua           |   5 -
>   test/rebalancer/receiving_bucket.result   |   8 -
>   test/rebalancer/receiving_bucket.test.lua |   1 -
>   test/reload_evolution/storage.result      |   2 +-
>   test/router/reroute_wrong_bucket.result   |   8 +-
>   test/router/reroute_wrong_bucket.test.lua |   4 +-
>   test/storage/recovery.result              |   3 +-
>   test/storage/storage.result               |  10 +-
>   test/storage/storage.test.lua             |   1 +
>   test/unit/config.result                   |  35 +---
>   test/unit/config.test.lua                 |  16 +-
>   test/unit/garbage.result                  | 106 ++++++----
>   test/unit/garbage.test.lua                |  47 +++--
>   test/unit/garbage_errinj.result           | 223 ----------------------
>   test/unit/garbage_errinj.test.lua         |  73 -------
>   vshard/cfg.lua                            |   4 +-
>   vshard/consts.lua                         |   5 +-
>   vshard/storage/init.lua                   | 207 ++++++++++----------
>   vshard/storage/reload_evolution.lua       |   8 +
>   25 files changed, 233 insertions(+), 573 deletions(-)
>   delete mode 100644 test/unit/garbage_errinj.result
>   delete mode 100644 test/unit/garbage_errinj.test.lua
>
> diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
> index 21409bd..8df89f6 100644
> --- a/test/lua_libs/storage_template.lua
> +++ b/test/lua_libs/storage_template.lua
> @@ -172,6 +172,5 @@ function wait_bucket_is_collected(id)
>               return true
>           end
>           vshard.storage.recovery_wakeup()
> -        vshard.storage.garbage_collector_wakeup()
>       end)
>   end
> diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result
> index 168be5d..3b34841 100644
> --- a/test/misc/reconfigure.result
> +++ b/test/misc/reconfigure.result
> @@ -83,9 +83,6 @@ cfg.collect_lua_garbage = true
>   cfg.rebalancer_max_receiving = 1000
>   ---
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> ----
> -...
>   cfg.invalid_option = 'kek'
>   ---
>   ...
> @@ -105,10 +102,6 @@ vshard.storage.internal.rebalancer_max_receiving ~= 1000
>   ---
>   - true
>   ...
> -vshard.storage.internal.collect_bucket_garbage_interval ~= 100
> ----
> -- true
> -...
>   cfg.sync_timeout = nil
>   ---
>   ...
> @@ -118,9 +111,6 @@ cfg.collect_lua_garbage = nil
>   cfg.rebalancer_max_receiving = nil
>   ---
>   ...
> -cfg.collect_bucket_garbage_interval = nil
> ----
> -...
>   cfg.invalid_option = nil
>   ---
>   ...
> diff --git a/test/misc/reconfigure.test.lua b/test/misc/reconfigure.test.lua
> index e891010..348628c 100644
> --- a/test/misc/reconfigure.test.lua
> +++ b/test/misc/reconfigure.test.lua
> @@ -33,17 +33,14 @@ vshard.storage.internal.sync_timeout
>   cfg.sync_timeout = 100
>   cfg.collect_lua_garbage = true
>   cfg.rebalancer_max_receiving = 1000
> -cfg.collect_bucket_garbage_interval = 100
>   cfg.invalid_option = 'kek'
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
>   not vshard.storage.internal.collect_lua_garbage
>   vshard.storage.internal.sync_timeout
>   vshard.storage.internal.rebalancer_max_receiving ~= 1000
> -vshard.storage.internal.collect_bucket_garbage_interval ~= 100
>   cfg.sync_timeout = nil
>   cfg.collect_lua_garbage = nil
>   cfg.rebalancer_max_receiving = nil
> -cfg.collect_bucket_garbage_interval = nil
>   cfg.invalid_option = nil
>   
>   --
> diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
> index b8fc7ff..9df7480 100644
> --- a/test/rebalancer/bucket_ref.result
> +++ b/test/rebalancer/bucket_ref.result
> @@ -184,9 +184,6 @@ vshard.storage.bucket_unref(1, 'read')
>   - true
>   ...
>   -- Force GC to take an RO lock on the bucket now.
> -vshard.storage.garbage_collector_wakeup()
> ----
> -...
>   vshard.storage.buckets_info(1)
>   ---
>   - 1:
> @@ -203,7 +200,6 @@ while true do
>       if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
>           break
>       end
> -    vshard.storage.garbage_collector_wakeup()
>       fiber.sleep(0.01)
>   end;
>   ---
> @@ -235,14 +231,6 @@ finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
>   ---
>   ...
> -vshard.storage.buckets_info(1)
> ----
> -- 1:
> -    status: garbage
> -    ro_lock: true
> -    destination: <replicaset_2>
> -    id: 1
> -...
>   wait_bucket_is_collected(1)
>   ---
>   ...
> diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
> index 213ced3..1b032ff 100644
> --- a/test/rebalancer/bucket_ref.test.lua
> +++ b/test/rebalancer/bucket_ref.test.lua
> @@ -56,7 +56,6 @@ vshard.storage.bucket_unref(1, 'write') -- Error, no refs.
>   vshard.storage.bucket_ref(1, 'read')
>   vshard.storage.bucket_unref(1, 'read')
>   -- Force GC to take an RO lock on the bucket now.
> -vshard.storage.garbage_collector_wakeup()
>   vshard.storage.buckets_info(1)
>   _ = test_run:cmd("setopt delimiter ';'")
>   while true do
> @@ -64,7 +63,6 @@ while true do
>       if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
>           break
>       end
> -    vshard.storage.garbage_collector_wakeup()
>       fiber.sleep(0.01)
>   end;
>   _ = test_run:cmd("setopt delimiter ''");
> @@ -72,7 +70,6 @@ vshard.storage.buckets_info(1)
>   vshard.storage.bucket_refro(1)
>   finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
> -vshard.storage.buckets_info(1)
>   wait_bucket_is_collected(1)
>   _ = test_run:switch('box_2_a')
>   vshard.storage.buckets_info(1)
> diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
> index e50eb72..0ddb1c9 100644
> --- a/test/rebalancer/errinj.result
> +++ b/test/rebalancer/errinj.result
> @@ -226,17 +226,6 @@ ret2, err2
>   - true
>   - null
>   ...
> -_bucket:get{35}
> ----
> -- [35, 'sent', '<replicaset_2>']
> -...
> -_bucket:get{36}
> ----
> -- [36, 'sent', '<replicaset_2>']
> -...
> --- Buckets became 'active' on box_2_a, but still are sending on
> --- box_1_a. Wait until it is marked as garbage on box_1_a by the
> --- recovery fiber.
>   wait_bucket_is_collected(35)
>   ---
>   ...
> diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
> index 2cc4a69..a60f3d7 100644
> --- a/test/rebalancer/errinj.test.lua
> +++ b/test/rebalancer/errinj.test.lua
> @@ -102,11 +102,6 @@ _ = test_run:switch('box_1_a')
>   while f1:status() ~= 'dead' or f2:status() ~= 'dead' do fiber.sleep(0.001) end
>   ret1, err1
>   ret2, err2
> -_bucket:get{35}
> -_bucket:get{36}
> --- Buckets became 'active' on box_2_a, but still are sending on
> --- box_1_a. Wait until it is marked as garbage on box_1_a by the
> --- recovery fiber.
>   wait_bucket_is_collected(35)
>   wait_bucket_is_collected(36)
>   _ = test_run:switch('box_2_a')
> diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
> index 7d3612b..ad93445 100644
> --- a/test/rebalancer/receiving_bucket.result
> +++ b/test/rebalancer/receiving_bucket.result
> @@ -366,14 +366,6 @@ vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
>   ---
>   - true
>   ...
> -vshard.storage.buckets_info(1)
> ----
> -- 1:
> -    status: sent
> -    ro_lock: true
> -    destination: <replicaset_1>
> -    id: 1
> -...
>   wait_bucket_is_collected(1)
>   ---
>   ...
> diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
> index 24534b3..2cf6382 100644
> --- a/test/rebalancer/receiving_bucket.test.lua
> +++ b/test/rebalancer/receiving_bucket.test.lua
> @@ -136,7 +136,6 @@ box.space.test3:select{100}
>   -- Now the bucket is unreferenced and can be transferred.
>   _ = test_run:switch('box_2_a')
>   vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
> -vshard.storage.buckets_info(1)
>   wait_bucket_is_collected(1)
>   vshard.storage.buckets_info(1)
>   _ = test_run:switch('box_1_a')
> diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
> index 753687f..9d30a04 100644
> --- a/test/reload_evolution/storage.result
> +++ b/test/reload_evolution/storage.result
> @@ -92,7 +92,7 @@ test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to')
>   ...
>   vshard.storage.internal.reload_version
>   ---
> -- 2
> +- 3
>   ...
>   --
>   -- gh-237: should be only one trigger. During gh-237 the trigger installation
> diff --git a/test/router/reroute_wrong_bucket.result b/test/router/reroute_wrong_bucket.result
> index 049bdef..ac340eb 100644
> --- a/test/router/reroute_wrong_bucket.result
> +++ b/test/router/reroute_wrong_bucket.result
> @@ -37,7 +37,7 @@ test_run:switch('storage_1_a')
>   ---
>   - true
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   ---
>   ...
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
> @@ -53,7 +53,7 @@ test_run:switch('storage_2_a')
>   ---
>   - true
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   ---
>   ...
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
> @@ -202,12 +202,12 @@ test_run:grep_log('router_1', 'please update configuration')
>   err
>   ---
>   - bucket_id: 100
> -  reason: write is prohibited
> +  reason: Not found
>     code: 1
>     destination: ac522f65-aa94-4134-9f64-51ee384f1a54
>     type: ShardingError
>     name: WRONG_BUCKET
> -  message: 'Cannot perform action with bucket 100, reason: write is prohibited'
> +  message: 'Cannot perform action with bucket 100, reason: Not found'
>   ...
>   --
>   -- Now try again, but update configuration during call(). It must
> diff --git a/test/router/reroute_wrong_bucket.test.lua b/test/router/reroute_wrong_bucket.test.lua
> index 9e6e804..207aac3 100644
> --- a/test/router/reroute_wrong_bucket.test.lua
> +++ b/test/router/reroute_wrong_bucket.test.lua
> @@ -11,13 +11,13 @@ util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memt
>   test_run:cmd('create server router_1 with script="router/router_1.lua"')
>   test_run:cmd('start server router_1')
>   test_run:switch('storage_1_a')
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
>   vshard.storage.rebalancer_disable()
>   for i = 1, 100 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
>   
>   test_run:switch('storage_2_a')
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
>   vshard.storage.rebalancer_disable()
>   for i = 101, 200 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
> diff --git a/test/storage/recovery.result b/test/storage/recovery.result
> index f833fe7..8ccb0b9 100644
> --- a/test/storage/recovery.result
> +++ b/test/storage/recovery.result
> @@ -79,8 +79,7 @@ _bucket = box.space._bucket
>   ...
>   _bucket:select{}
>   ---
> -- - [2, 'garbage', '<replicaset_2>']
> -  - [3, 'garbage', '<replicaset_2>']
> +- []
>   ...
>   _ = test_run:switch('storage_2_a')
>   ---
> diff --git a/test/storage/storage.result b/test/storage/storage.result
> index 424bc4c..0550ad1 100644
> --- a/test/storage/storage.result
> +++ b/test/storage/storage.result
> @@ -547,6 +547,9 @@ vshard.storage.bucket_send(1, util.replicasets[2])
>   ---
>   - true
>   ...
> +wait_bucket_is_collected(1)
> +---
> +...
>   _ = test_run:switch("storage_2_a")
>   ---
>   ...
> @@ -567,12 +570,7 @@ _ = test_run:switch("storage_1_a")
>   ...
>   vshard.storage.buckets_info()
>   ---
> -- 1:
> -    status: sent
> -    ro_lock: true
> -    destination: <replicaset_2>
> -    id: 1
> -  2:
> +- 2:
>       status: active
>       id: 2
>   ...
> diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
> index d631b51..d8fbd94 100644
> --- a/test/storage/storage.test.lua
> +++ b/test/storage/storage.test.lua
> @@ -136,6 +136,7 @@ vshard.storage.bucket_send(1, util.replicasets[1])
>   
>   -- Successful transfer.
>   vshard.storage.bucket_send(1, util.replicasets[2])
> +wait_bucket_is_collected(1)
>   _ = test_run:switch("storage_2_a")
>   vshard.storage.buckets_info()
>   _ = test_run:switch("storage_1_a")
> diff --git a/test/unit/config.result b/test/unit/config.result
> index dfd0219..e0b2482 100644
> --- a/test/unit/config.result
> +++ b/test/unit/config.result
> @@ -428,33 +428,6 @@ _ = lcfg.check(cfg)
>   --
>   -- gh-77: garbage collection options.
>   --
> -cfg.collect_bucket_garbage_interval = 'str'
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = 0
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = -1
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = 100.5
> ----
> -...
> -_ = lcfg.check(cfg)
> ----
> -...
>   cfg.collect_lua_garbage = 100
>   ---
>   ...
> @@ -615,6 +588,12 @@ lcfg.check(cfg).rebalancer_max_sending
>   cfg.rebalancer_max_sending = nil
>   ---
>   ...
> -cfg.sharding = nil
> +--
> +-- Deprecated option does not break anything.
> +--
> +cfg.collect_bucket_garbage_interval = 100
> +---
> +...
> +_ = lcfg.check(cfg)
>   ---
>   ...
> diff --git a/test/unit/config.test.lua b/test/unit/config.test.lua
> index ada43db..a1c9f07 100644
> --- a/test/unit/config.test.lua
> +++ b/test/unit/config.test.lua
> @@ -175,15 +175,6 @@ _ = lcfg.check(cfg)
>   --
>   -- gh-77: garbage collection options.
>   --
> -cfg.collect_bucket_garbage_interval = 'str'
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = 0
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = -1
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = 100.5
> -_ = lcfg.check(cfg)
> -
>   cfg.collect_lua_garbage = 100
>   check(cfg)
>   cfg.collect_lua_garbage = true
> @@ -244,4 +235,9 @@ util.check_error(lcfg.check, cfg)
>   cfg.rebalancer_max_sending = 15
>   lcfg.check(cfg).rebalancer_max_sending
>   cfg.rebalancer_max_sending = nil
> -cfg.sharding = nil
> +
> +--
> +-- Deprecated option does not break anything.
> +--
> +cfg.collect_bucket_garbage_interval = 100
> +_ = lcfg.check(cfg)
> diff --git a/test/unit/garbage.result b/test/unit/garbage.result
> index 74d9ccf..a530496 100644
> --- a/test/unit/garbage.result
> +++ b/test/unit/garbage.result
> @@ -31,9 +31,6 @@ test_run:cmd("setopt delimiter ''");
>   vshard.storage.internal.shard_index = 'bucket_id'
>   ---
>   ...
> -vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
> ----
> -...
>   --
>   -- Find nothing if no bucket_id anywhere, or there is no index
>   -- by it, or bucket_id is not unsigned.
> @@ -151,6 +148,9 @@ format[1] = {name = 'id', type = 'unsigned'}
>   format[2] = {name = 'status', type = 'string'}
>   ---
>   ...
> +format[3] = {name = 'destination', type = 'string', is_nullable = true}
> +---
> +...
>   _bucket = box.schema.create_space('_bucket', {format = format})
>   ---
>   ...
> @@ -172,22 +172,6 @@ _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
>   ---
>   - [3, 'active']
>   ...
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> ----
> -- [4, 'sent']
> -...
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [5, 'garbage']
> -...
> -_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [6, 'garbage']
> -...
> -_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [200, 'garbage']
> -...
>   s = box.schema.create_space('test', {engine = engine})
>   ---
>   ...
> @@ -213,7 +197,7 @@ s:replace{4, 2}
>   ---
>   - [4, 2]
>   ...
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> +gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
>   ---
>   ...
>   s2 = box.schema.create_space('test2', {engine = engine})
> @@ -249,6 +233,10 @@ function fill_spaces_with_garbage()
>       s2:replace{6, 4}
>       s2:replace{7, 5}
>       s2:replace{7, 6}
> +    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
> +    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> +    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
> +    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   end;
>   ---
>   ...
> @@ -267,12 +255,22 @@ fill_spaces_with_garbage()
>   ---
>   - 1107
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
>   ---
> -- - 5
> -  - 6
> -  - 200
>   - true
> +- null
> +...
> +route_map
> +---
> +- - null
> +  - null
> +  - null
> +  - null
> +  - null
> +  - destination2
>   ...
>   #s2:select{}
>   ---
> @@ -282,10 +280,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
>   ---
>   - 7
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
>   ---
> -- - 4
>   - true
> +- null
> +...
> +route_map
> +---
> +- - null
> +  - null
> +  - null
> +  - destination1
>   ...
>   s2:select{}
>   ---
> @@ -303,17 +311,22 @@ s:select{}
>     - [6, 100]
>   ...
>   -- Nothing deleted - update collected generation.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
>   ---
> -- - 5
> -  - 6
> -  - 200
>   - true
> +- null
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
>   ---
> -- - 4
>   - true
> +- null
> +...
> +route_map
> +---
> +- []
>   ...
>   #s2:select{}
>   ---
> @@ -329,15 +342,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
>   fill_spaces_with_garbage()
>   ---
>   ...
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> +_ = _bucket:on_replace(function()                                               \
> +    local gen = vshard.storage.internal.bucket_generation                       \
> +    vshard.storage.internal.bucket_generation = gen + 1                         \
> +    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
> +end)
>   ---
>   ...
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   ---
>   ...
>   -- Wait until garbage collection is finished.
> -while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
>   ---
> +- true
>   ...
>   s:select{}
>   ---
> @@ -360,7 +378,6 @@ _bucket:select{}
>   - - [1, 'active']
>     - [2, 'receiving']
>     - [3, 'active']
> -  - [4, 'sent']
>   ...
>   --
>   -- Test deletion of 'sent' buckets after a specified timeout.
> @@ -370,8 +387,9 @@ _bucket:replace{2, vshard.consts.BUCKET.SENT}
>   - [2, 'sent']
>   ...
>   -- Wait deletion after a while.
> -while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{2} end)
>   ---
> +- true
>   ...
>   _bucket:select{}
>   ---
> @@ -410,8 +428,9 @@ _bucket:replace{4, vshard.consts.BUCKET.SENT}
>   ---
>   - [4, 'sent']
>   ...
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   ---
> +- true
>   ...
>   --
>   -- Test WAL errors during deletion from _bucket.
> @@ -434,11 +453,14 @@ s:replace{6, 4}
>   ---
>   - [6, 4]
>   ...
> -while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_log('default', 'Error during garbage collection step',            \
> +                  65536, 10)
>   ---
> +- Error during garbage collection step
>   ...
> -while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return #sk:select{4} == 0 end)
>   ---
> +- true
>   ...
>   s:select{}
>   ---
> @@ -454,8 +476,9 @@ _bucket:select{}
>   _ = _bucket:on_replace(nil, rollback_on_delete)
>   ---
>   ...
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   ---
> +- true
>   ...
>   f:cancel()
>   ---
> @@ -562,8 +585,9 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   ---
>   ...
> -while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return _bucket:count() == 0 end)
>   ---
> +- true
>   ...
>   _bucket:select{}
>   ---
> diff --git a/test/unit/garbage.test.lua b/test/unit/garbage.test.lua
> index 30079fa..250afb0 100644
> --- a/test/unit/garbage.test.lua
> +++ b/test/unit/garbage.test.lua
> @@ -15,7 +15,6 @@ end;
>   test_run:cmd("setopt delimiter ''");
>   
>   vshard.storage.internal.shard_index = 'bucket_id'
> -vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
>   
>   --
>   -- Find nothing if no bucket_id anywhere, or there is no index
> @@ -75,16 +74,13 @@ s:drop()
>   format = {}
>   format[1] = {name = 'id', type = 'unsigned'}
>   format[2] = {name = 'status', type = 'string'}
> +format[3] = {name = 'destination', type = 'string', is_nullable = true}
>   _bucket = box.schema.create_space('_bucket', {format = format})
>   _ = _bucket:create_index('pk')
>   _ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
>   _bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
>   _bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
>   _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> -_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
> -_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   
>   s = box.schema.create_space('test', {engine = engine})
>   pk = s:create_index('pk')
> @@ -94,7 +90,7 @@ s:replace{2, 1}
>   s:replace{3, 2}
>   s:replace{4, 2}
>   
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> +gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
>   s2 = box.schema.create_space('test2', {engine = engine})
>   pk2 = s2:create_index('pk')
>   sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> @@ -114,6 +110,10 @@ function fill_spaces_with_garbage()
>       s2:replace{6, 4}
>       s2:replace{7, 5}
>       s2:replace{7, 6}
> +    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
> +    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> +    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
> +    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   end;
>   test_run:cmd("setopt delimiter ''");
>   
> @@ -121,15 +121,21 @@ fill_spaces_with_garbage()
>   
>   #s2:select{}
>   #s:select{}
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
> +route_map
>   #s2:select{}
>   #s:select{}
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
> +route_map
>   s2:select{}
>   s:select{}
>   -- Nothing deleted - update collected generation.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
> +route_map
>   #s2:select{}
>   #s:select{}
>   
> @@ -137,10 +143,14 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
>   -- Test continuous garbage collection via background fiber.
>   --
>   fill_spaces_with_garbage()
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> +_ = _bucket:on_replace(function()                                               \
> +    local gen = vshard.storage.internal.bucket_generation                       \
> +    vshard.storage.internal.bucket_generation = gen + 1                         \
> +    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
> +end)
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   -- Wait until garbage collection is finished.
> -while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
>   s:select{}
>   s2:select{}
>   -- Check garbage bucket is deleted by background fiber.
> @@ -150,7 +160,7 @@ _bucket:select{}
>   --
>   _bucket:replace{2, vshard.consts.BUCKET.SENT}
>   -- Wait deletion after a while.
> -while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{2} end)
>   _bucket:select{}
>   s:select{}
>   s2:select{}
> @@ -162,7 +172,7 @@ _bucket:replace{4, vshard.consts.BUCKET.ACTIVE}
>   s:replace{5, 4}
>   s:replace{6, 4}
>   _bucket:replace{4, vshard.consts.BUCKET.SENT}
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   
>   --
>   -- Test WAL errors during deletion from _bucket.
> @@ -172,12 +182,13 @@ _ = _bucket:on_replace(rollback_on_delete)
>   _bucket:replace{4, vshard.consts.BUCKET.SENT}
>   s:replace{5, 4}
>   s:replace{6, 4}
> -while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> -while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_log('default', 'Error during garbage collection step',            \
> +                  65536, 10)
> +test_run:wait_cond(function() return #sk:select{4} == 0 end)
>   s:select{}
>   _bucket:select{}
>   _ = _bucket:on_replace(nil, rollback_on_delete)
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   
>   f:cancel()
>   
> @@ -220,7 +231,7 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
>   #s:select{}
>   #s2:select{}
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
> -while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return _bucket:count() == 0 end)
>   _bucket:select{}
>   s:select{}
>   s2:select{}
> diff --git a/test/unit/garbage_errinj.result b/test/unit/garbage_errinj.result
> deleted file mode 100644
> index 92c8039..0000000
> --- a/test/unit/garbage_errinj.result
> +++ /dev/null
> @@ -1,223 +0,0 @@
> -test_run = require('test_run').new()
> ----
> -...
> -vshard = require('vshard')
> ----
> -...
> -fiber = require('fiber')
> ----
> -...
> -engine = test_run:get_cfg('engine')
> ----
> -...
> -vshard.storage.internal.shard_index = 'bucket_id'
> ----
> -...
> -format = {}
> ----
> -...
> -format[1] = {name = 'id', type = 'unsigned'}
> ----
> -...
> -format[2] = {name = 'status', type = 'string', is_nullable = true}
> ----
> -...
> -_bucket = box.schema.create_space('_bucket', {format = format})
> ----
> -...
> -_ = _bucket:create_index('pk')
> ----
> -...
> -_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
> ----
> -...
> -_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
> ----
> -- [1, 'active']
> -...
> -_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
> ----
> -- [2, 'receiving']
> -...
> -_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> ----
> -- [3, 'active']
> -...
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> ----
> -- [4, 'sent']
> -...
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [5, 'garbage']
> -...
> -s = box.schema.create_space('test', {engine = engine})
> ----
> -...
> -pk = s:create_index('pk')
> ----
> -...
> -sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> ----
> -...
> -s:replace{1, 1}
> ----
> -- [1, 1]
> -...
> -s:replace{2, 1}
> ----
> -- [2, 1]
> -...
> -s:replace{3, 2}
> ----
> -- [3, 2]
> -...
> -s:replace{4, 2}
> ----
> -- [4, 2]
> -...
> -s:replace{5, 100}
> ----
> -- [5, 100]
> -...
> -s:replace{6, 100}
> ----
> -- [6, 100]
> -...
> -s:replace{7, 4}
> ----
> -- [7, 4]
> -...
> -s:replace{8, 5}
> ----
> -- [8, 5]
> -...
> -s2 = box.schema.create_space('test2', {engine = engine})
> ----
> -...
> -pk2 = s2:create_index('pk')
> ----
> -...
> -sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> ----
> -...
> -s2:replace{1, 1}
> ----
> -- [1, 1]
> -...
> -s2:replace{3, 3}
> ----
> -- [3, 3]
> -...
> -for i = 7, 1107 do s:replace{i, 200} end
> ----
> -...
> -s2:replace{4, 200}
> ----
> -- [4, 200]
> -...
> -s2:replace{5, 100}
> ----
> -- [5, 100]
> -...
> -s2:replace{5, 300}
> ----
> -- [5, 300]
> -...
> -s2:replace{6, 4}
> ----
> -- [6, 4]
> -...
> -s2:replace{7, 5}
> ----
> -- [7, 5]
> -...
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> ----
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> ----
> -- - 4
> -- true
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> ----
> -- - 5
> -- true
> -...
> ---
> --- Test _bucket generation change during garbage buckets search.
> ---
> -s:truncate()
> ----
> -...
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> ----
> -...
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
> ----
> -...
> -f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
> ----
> -...
> -_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [4, 'garbage']
> -...
> -s:replace{5, 4}
> ----
> -- [5, 4]
> -...
> -s:replace{6, 4}
> ----
> -- [6, 4]
> -...
> -#s:select{}
> ----
> -- 2
> -...
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
> ----
> -...
> -while f:status() ~= 'dead' do fiber.sleep(0.1) end
> ----
> -...
> --- Nothing is deleted - _bucket:replace() has changed _bucket
> --- generation during search of garbage buckets.
> -#s:select{}
> ----
> -- 2
> -...
> -_bucket:select{4}
> ----
> -- - [4, 'garbage']
> -...
> --- Next step deletes garbage ok.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> ----
> -- []
> -- true
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> ----
> -- - 4
> -  - 5
> -- true
> -...
> -#s:select{}
> ----
> -- 0
> -...
> -_bucket:delete{4}
> ----
> -- [4, 'garbage']
> -...
> -s2:drop()
> ----
> -...
> -s:drop()
> ----
> -...
> -_bucket:drop()
> ----
> -...
> diff --git a/test/unit/garbage_errinj.test.lua b/test/unit/garbage_errinj.test.lua
> deleted file mode 100644
> index 31184b9..0000000
> --- a/test/unit/garbage_errinj.test.lua
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -test_run = require('test_run').new()
> -vshard = require('vshard')
> -fiber = require('fiber')
> -
> -engine = test_run:get_cfg('engine')
> -vshard.storage.internal.shard_index = 'bucket_id'
> -
> -format = {}
> -format[1] = {name = 'id', type = 'unsigned'}
> -format[2] = {name = 'status', type = 'string', is_nullable = true}
> -_bucket = box.schema.create_space('_bucket', {format = format})
> -_ = _bucket:create_index('pk')
> -_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
> -_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
> -_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> -
> -s = box.schema.create_space('test', {engine = engine})
> -pk = s:create_index('pk')
> -sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> -s:replace{1, 1}
> -s:replace{2, 1}
> -s:replace{3, 2}
> -s:replace{4, 2}
> -s:replace{5, 100}
> -s:replace{6, 100}
> -s:replace{7, 4}
> -s:replace{8, 5}
> -
> -s2 = box.schema.create_space('test2', {engine = engine})
> -pk2 = s2:create_index('pk')
> -sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> -s2:replace{1, 1}
> -s2:replace{3, 3}
> -for i = 7, 1107 do s:replace{i, 200} end
> -s2:replace{4, 200}
> -s2:replace{5, 100}
> -s2:replace{5, 300}
> -s2:replace{6, 4}
> -s2:replace{7, 5}
> -
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -
> ---
> --- Test _bucket generation change during garbage buckets search.
> ---
> -s:truncate()
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
> -f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
> -_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
> -s:replace{5, 4}
> -s:replace{6, 4}
> -#s:select{}
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
> -while f:status() ~= 'dead' do fiber.sleep(0.1) end
> --- Nothing is deleted - _bucket:replace() has changed _bucket
> --- generation during search of garbage buckets.
> -#s:select{}
> -_bucket:select{4}
> --- Next step deletes garbage ok.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -#s:select{}
> -_bucket:delete{4}
> -
> -s2:drop()
> -s:drop()
> -_bucket:drop()
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index 28c3400..1345058 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -245,9 +245,7 @@ local cfg_template = {
>           max = consts.REBALANCER_MAX_SENDING_MAX
>       },
>       collect_bucket_garbage_interval = {
> -        type = 'positive number', name = 'Garbage bucket collect interval',
> -        is_optional = true,
> -        default = consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
> +        name = 'Garbage bucket collect interval', is_deprecated = true,
>       },
>       collect_lua_garbage = {
>           type = 'boolean', name = 'Garbage Lua collect necessity',
> diff --git a/vshard/consts.lua b/vshard/consts.lua
> index 8c2a8b0..3f1585a 100644
> --- a/vshard/consts.lua
> +++ b/vshard/consts.lua
> @@ -23,6 +23,7 @@ return {
>       DEFAULT_BUCKET_COUNT = 3000;
>       BUCKET_SENT_GARBAGE_DELAY = 0.5;
>       BUCKET_CHUNK_SIZE = 1000;
> +    LUA_CHUNK_SIZE = 100000,
>       DEFAULT_REBALANCER_DISBALANCE_THRESHOLD = 1;
>       REBALANCER_IDLE_INTERVAL = 60 * 60;
>       REBALANCER_WORK_INTERVAL = 10;
> @@ -37,7 +38,7 @@ return {
>       DEFAULT_FAILOVER_PING_TIMEOUT = 5;
>       DEFAULT_SYNC_TIMEOUT = 1;
>       RECONNECT_TIMEOUT = 0.5;
> -    DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL = 0.5;
> +    GC_BACKOFF_INTERVAL = 5,
>       RECOVERY_INTERVAL = 5;
>       COLLECT_LUA_GARBAGE_INTERVAL = 100;
>   
> @@ -45,4 +46,6 @@ return {
>       DISCOVERY_WORK_INTERVAL = 1,
>       DISCOVERY_WORK_STEP = 0.01,
>       DISCOVERY_TIMEOUT = 10,
> +
> +    TIMEOUT_INFINITY = 500 * 365 * 86400,
>   }
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 298df71..31a6fc7 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -69,7 +69,6 @@ if not M then
>           total_bucket_count = 0,
>           errinj = {
>               ERRINJ_CFG = false,
> -            ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false,
>               ERRINJ_RELOAD = false,
>               ERRINJ_CFG_DELAY = false,
>               ERRINJ_LONG_RECEIVE = false,
> @@ -96,6 +95,8 @@ if not M then
>           -- detect that _bucket was not changed between yields.
>           --
>           bucket_generation = 0,
> +        -- Condition variable fired on generation update.
> +        bucket_generation_cond = lfiber.cond(),
>           --
>           -- Reference to the function used as on_replace trigger on
>           -- _bucket space. It is used to replace the trigger with
> @@ -107,12 +108,14 @@ if not M then
>           -- replace the old function is to keep its reference.
>           --
>           bucket_on_replace = nil,
> +        -- Redirects for recently sent buckets. They are kept for a while to
> +        -- help routers to find a new location for sent and deleted buckets
> +        -- without whole cluster scan.
> +        route_map = {},
>   
>           ------------------- Garbage collection -------------------
>           -- Fiber to remove garbage buckets data.
>           collect_bucket_garbage_fiber = nil,
> -        -- Do buckets garbage collection once per this time.
> -        collect_bucket_garbage_interval = nil,
>           -- Boolean lua_gc state (create periodic gc task).
>           collect_lua_garbage = nil,
>   
> @@ -173,6 +176,7 @@ end
>   --
>   local function bucket_generation_increment()
>       M.bucket_generation = M.bucket_generation + 1
> +    M.bucket_generation_cond:broadcast()
>   end
>   
>   --
> @@ -758,8 +762,9 @@ local function bucket_check_state(bucket_id, mode)
>       else
>           return bucket
>       end
> +    local dst = bucket and bucket.destination or M.route_map[bucket_id]
>       return bucket, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, reason,
> -                                 bucket and bucket.destination)
> +                                 dst)
>   end
>   
>   --
> @@ -804,11 +809,23 @@ end
>   --
>   local function bucket_unrefro(bucket_id)
>       local ref = M.bucket_refs[bucket_id]
> -    if not ref or ref.ro == 0 then
> +    local count = ref and ref.ro or 0
> +    if count == 0 then
>           return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id,
>                                     "no refs", nil)
>       end
> -    ref.ro = ref.ro - 1
> +    if count == 1 then
> +        ref.ro = 0
> +        if ref.ro_lock then
> +            -- Garbage collector is waiting for the bucket if RO
> +            -- is locked. Let it know it has one more bucket to
> +            -- collect. It relies on generation, so its increment
> +            -- it enough.
> +            bucket_generation_increment()
> +        end
> +        return true
> +    end
> +    ref.ro = count - 1
>       return true
>   end
>   
> @@ -1479,79 +1496,44 @@ local function gc_bucket_in_space(space, bucket_id, status)
>   end
>   
>   --
> --- Remove tuples from buckets of a specified type.
> --- @param type Type of buckets to gc.
> --- @retval List of ids of empty buckets of the type.
> +-- Drop buckets with the given status along with their data in all spaces.
> +-- @param status Status of target buckets.
> +-- @param route_map Destinations of deleted buckets are saved into this table.
>   --
> -local function gc_bucket_step_by_type(type)
> -    local sharded_spaces = find_sharded_spaces()
> -    local empty_buckets = {}
> +local function gc_bucket_drop_xc(status, route_map)
>       local limit = consts.BUCKET_CHUNK_SIZE
> -    local is_all_collected = true
> -    for _, bucket in box.space._bucket.index.status:pairs(type) do
> -        local bucket_id = bucket.id
> -        local ref = M.bucket_refs[bucket_id]
> +    local _bucket = box.space._bucket
> +    local sharded_spaces = find_sharded_spaces()
> +    for _, b in _bucket.index.status:pairs(status) do
> +        local id = b.id
> +        local ref = M.bucket_refs[id]
>           if ref then
>               assert(ref.rw == 0)
>               if ref.ro ~= 0 then
>                   ref.ro_lock = true
> -                is_all_collected = false
>                   goto continue
>               end
> -            M.bucket_refs[bucket_id] = nil
> +            M.bucket_refs[id] = nil
>           end
>           for _, space in pairs(sharded_spaces) do
> -            gc_bucket_in_space_xc(space, bucket_id, type)
> +            gc_bucket_in_space_xc(space, id, status)
>               limit = limit - 1
>               if limit == 0 then
>                   lfiber.sleep(0)
>                   limit = consts.BUCKET_CHUNK_SIZE
>               end
>           end
> -        table.insert(empty_buckets, bucket.id)
> -::continue::
> +        route_map[id] = b.destination
> +        _bucket:delete{id}
> +    ::continue::
>       end
> -    return empty_buckets, is_all_collected
> -end
> -
> ---
> --- Drop buckets with ids in the list.
> --- @param bucket_ids Bucket ids to drop.
> --- @param status Expected bucket status.
> ---
> -local function gc_bucket_drop_xc(bucket_ids, status)
> -    if #bucket_ids == 0 then
> -        return
> -    end
> -    local limit = consts.BUCKET_CHUNK_SIZE
> -    box.begin()
> -    local _bucket = box.space._bucket
> -    for _, id in pairs(bucket_ids) do
> -        local bucket_exists = _bucket:get{id} ~= nil
> -        local b = _bucket:get{id}
> -        if b then
> -            if b.status ~= status then
> -                return error(string.format('Bucket %d status is changed. Was '..
> -                                           '%s, became %s', id, status,
> -                                           b.status))
> -            end
> -            _bucket:delete{id}
> -        end
> -        limit = limit - 1
> -        if limit == 0 then
> -            box.commit()
> -            box.begin()
> -            limit = consts.BUCKET_CHUNK_SIZE
> -        end
> -    end
> -    box.commit()
>   end
>   
>   --
>   -- Exception safe version of gc_bucket_drop_xc.
>   --
> -local function gc_bucket_drop(bucket_ids, status)
> -    local status, err = pcall(gc_bucket_drop_xc, bucket_ids, status)
> +local function gc_bucket_drop(status, route_map)
> +    local status, err = pcall(gc_bucket_drop_xc, status, route_map)
>       if not status then
>           box.rollback()
>       end
> @@ -1578,65 +1560,75 @@ function gc_bucket_f()
>       -- generation == bucket generation. In such a case the fiber
>       -- does nothing until next _bucket change.
>       local bucket_generation_collected = -1
> -    -- Empty sent buckets are collected into an array. After a
> -    -- specified time interval the buckets are deleted both from
> -    -- this array and from _bucket space.
> -    local buckets_for_redirect = {}
> -    local buckets_for_redirect_ts = clock()
> -    -- Empty sent buckets, updated after each step, and when
> -    -- buckets_for_redirect is deleted, it gets empty_sent_buckets
> -    -- for next deletion.
> -    local empty_garbage_buckets, empty_sent_buckets, status, err
> +    local bucket_generation_current = M.bucket_generation
> +    -- Deleted buckets are saved into a route map to redirect routers if they
> +    -- didn't discover new location of the buckets yet. However route map does
> +    -- not grow infinitely. Otherwise it would end up storing redirects for all
> +    -- buckets in the cluster. Which could also be outdated.
> +    -- Garbage collector periodically drops old routes from the map. For that it
> +    -- remembers state of route map in one moment, and after a while clears the
> +    -- remembered routes from the global route map.
> +    local route_map = M.route_map
> +    local route_map_old = {}
> +    local route_map_deadline = 0
> +    local status, err
>       while M.module_version == module_version do
> -        -- Check if no changes in buckets configuration.
> -        if bucket_generation_collected ~= M.bucket_generation then
> -            local bucket_generation = M.bucket_generation
> -            local is_sent_collected, is_garbage_collected
> -            status, empty_garbage_buckets, is_garbage_collected =
> -                pcall(gc_bucket_step_by_type, consts.BUCKET.GARBAGE)
> -            if not status then
> -                err = empty_garbage_buckets
> -                goto check_error
> -            end
> -            status, empty_sent_buckets, is_sent_collected =
> -                pcall(gc_bucket_step_by_type, consts.BUCKET.SENT)
> -            if not status then
> -                err = empty_sent_buckets
> -                goto check_error
> +        if bucket_generation_collected ~= bucket_generation_current then
> +            status, err = gc_bucket_drop(consts.BUCKET.GARBAGE, route_map)
> +            if status then
> +                status, err = gc_bucket_drop(consts.BUCKET.SENT, route_map)
>               end
> -            status, err = gc_bucket_drop(empty_garbage_buckets,
> -                                         consts.BUCKET.GARBAGE)
> -::check_error::
>               if not status then
>                   box.rollback()
>                   log.error('Error during garbage collection step: %s', err)
> -                goto continue
> +            else
> +                -- Don't use global generation. During the collection it could
> +                -- already change. Instead, remember the generation known before
> +                -- the collection has started.
> +                -- Since the collection also changes the generation, it makes
> +                -- the GC happen always at least twice. But typically on the
> +                -- second iteration it should not find any buckets to collect,
> +                -- and then the collected generation matches the global one.
> +                bucket_generation_collected = bucket_generation_current
>               end
> -            if is_sent_collected and is_garbage_collected then
> -                bucket_generation_collected = bucket_generation
> +        else
> +            status = true
> +        end
> +
> +        local sleep_time = route_map_deadline - clock()
> +        if sleep_time <= 0 then
> +            local chunk = consts.LUA_CHUNK_SIZE
> +            util.table_minus_yield(route_map, route_map_old, chunk)
> +            route_map_old = util.table_copy_yield(route_map, chunk)
> +            if next(route_map_old) then
> +                sleep_time = consts.BUCKET_SENT_GARBAGE_DELAY
> +            else
> +                sleep_time = consts.TIMEOUT_INFINITY
>               end
> +            route_map_deadline = clock() + sleep_time
>           end
> +        bucket_generation_current = M.bucket_generation
>   
> -        if clock() - buckets_for_redirect_ts >=
> -           consts.BUCKET_SENT_GARBAGE_DELAY then
> -            status, err = gc_bucket_drop(buckets_for_redirect,
> -                                         consts.BUCKET.SENT)
> -            if not status then
> -                buckets_for_redirect = {}
> -                empty_sent_buckets = {}
> -                bucket_generation_collected = -1
> -                log.error('Error during deletion of empty sent buckets: %s',
> -                          err)
> -            elseif M.module_version ~= module_version then
> -                return
> +        if bucket_generation_current ~= bucket_generation_collected then
> +            -- Generation was changed during collection. Or *by* collection.
> +            if status then
> +                -- Retry immediately. If the generation was changed by the
> +                -- collection itself, it will notice it next iteration, and go
> +                -- to proper sleep.
> +                sleep_time = 0
>               else
> -                buckets_for_redirect = empty_sent_buckets or {}
> -                empty_sent_buckets = nil
> -                buckets_for_redirect_ts = clock()
> +                -- An error happened during the collection. Does not make sense
> +                -- to retry on each iteration of the event loop. The most likely
> +                -- errors are either a WAL error or a transaction abort - both
> +                -- look like an issue in the user's code and can't be fixed
> +                -- quickly anyway. Backoff.
> +                sleep_time = consts.GC_BACKOFF_INTERVAL
>               end
>           end
> -::continue::
> -        lfiber.sleep(M.collect_bucket_garbage_interval)
> +
> +        if M.module_version == module_version then
> +            M.bucket_generation_cond:wait(sleep_time)
> +        end
>       end
>   end
>   
> @@ -2421,8 +2413,6 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
>           vshard_cfg.rebalancer_disbalance_threshold
>       M.rebalancer_receiving_quota = vshard_cfg.rebalancer_max_receiving
>       M.shard_index = vshard_cfg.shard_index
> -    M.collect_bucket_garbage_interval =
> -        vshard_cfg.collect_bucket_garbage_interval
>       M.collect_lua_garbage = vshard_cfg.collect_lua_garbage
>       M.rebalancer_worker_count = vshard_cfg.rebalancer_max_sending
>       M.current_cfg = cfg
> @@ -2676,6 +2666,9 @@ else
>           storage_cfg(M.current_cfg, M.this_replica.uuid, true)
>       end
>       M.module_version = M.module_version + 1
> +    -- Background fibers could sleep waiting for bucket changes.
> +    -- Let them know it is time to reload.
> +    bucket_generation_increment()
>   end
>   
>   M.recovery_f = recovery_f
> @@ -2686,7 +2679,7 @@ M.gc_bucket_f = gc_bucket_f
>   -- These functions are saved in M not for atomic reload, but for
>   -- unit testing.
>   --
> -M.gc_bucket_step_by_type = gc_bucket_step_by_type
> +M.gc_bucket_drop = gc_bucket_drop
>   M.rebalancer_build_routes = rebalancer_build_routes
>   M.rebalancer_calculate_metrics = rebalancer_calculate_metrics
>   M.cached_find_sharded_spaces = find_sharded_spaces
> diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
> index f38af74..484f499 100644
> --- a/vshard/storage/reload_evolution.lua
> +++ b/vshard/storage/reload_evolution.lua
> @@ -4,6 +4,7 @@
>   -- in a commit.
>   --
>   local log = require('log')
> +local fiber = require('fiber')
>   
>   --
>   -- Array of upgrade functions.
> @@ -25,6 +26,13 @@ migrations[#migrations + 1] = function(M)
>       end
>   end
>   
> +migrations[#migrations + 1] = function(M)
> +    if not M.route_map then
> +        M.bucket_generation_cond = fiber.cond()
> +        M.route_map = {}
> +    end
> +end
> +
>   --
>   -- Perform an update based on a version stored in `M` (internals).
>   -- @param M Old module internals which should be updated.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector
  2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
@ 2021-02-10 22:35     ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-10 22:35 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

On 10.02.2021 10:00, Oleg Babin wrote:
> Thanks for your patch.
> 
> As I see you've introduced some new parameters: "LUA_CHUNK_SIZE" and "GC_BACKOFF_INTERVAL".

I decided not to go into too deep details and not describe private
constants in the commit message. GC_BACKOFF_INTERVAL is explained
in the place where it is used. LUA_CHUNK_SIZE is quite obvious if
you look at its usage.

> I think it's better to describe them in commit message to understand more clear how new algorithm.

These constants are not super relevant to the algorithm's core
idea. It does not matter much for the reactive GC concept if I
yield in table utility functions, or if I have a backoff timeout.
These could be considered 'optimizations', 'amendments'. I would
consider them small details not worth mentioning in the commit
message.

> I see that you didn't update comment above "gc_bucket_f" function. Is it still relevant?

No, irrelevant, thanks for noticing. Here is the diff:

====================
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 99f92a0..1ea8069 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -1543,14 +1543,16 @@ local function gc_bucket_drop(status, route_map)
 end
 
 --
--- Garbage collector. Works on masters. The garbage collector
--- wakes up once per specified time.
+-- Garbage collector. Works on masters. The garbage collector wakes up when
+-- state of any bucket changes.
 -- After wakeup it follows the plan:
--- 1) Check if _bucket has changed. If not, then sleep again;
--- 2) Scan user spaces for sent and garbage buckets, delete
---    garbage data in batches of limited size;
--- 3) Delete GARBAGE buckets from _bucket immediately, and
---    schedule SENT buckets for deletion after a timeout;
+-- 1) Check if state of any bucket has really changed. If not, then sleep again;
+-- 2) Delete all GARBAGE and SENT buckets along with their data in chunks of
+--    limited size.
+-- 3) Bucket destinations are saved into a global route_map to reroute incoming
+--    requests from routers in case they didn't notice the buckets being moved.
+--    The saved routes are scheduled for deletion after a timeout, which is
+--    checked on each iteration of this loop.
 -- 4) Sleep, go to (1).
 -- For each step details see comments in the code.
 --
====================

The full new patch below.

====================
    gc: introduce reactive garbage collector
    
    Garbage collector is a fiber on a master node which deletes
    GARBAGE and SENT buckets along with their data.
    
    It was proactive. It used to wakeup with a constant period to
    find and delete the needed buckets.
    
    But this won't work with the future feature called 'map-reduce'.
    Map-reduce as a preparation stage will need to ensure that all
    buckets on a storage are readable and writable. With the current
    GC algorithm if a bucket is sent, it won't be deleted for the next
    5 seconds by default. During this time all new map-reduce requests
    can't execute.
    
    This is not acceptable. As well as too frequent wakeup of GC fiber
    because it would waste TX thread time.
    
    The patch makes GC fiber wakeup not by a timeout but by events
    happening with _bucket space. GC fiber sleeps on a condition
    variable which is signaled when _bucket is changed.
    
    Once GC sees work to do, it won't sleep until it is done. It will
    only yield.
    
    This makes GC delete SENT and GARBAGE buckets as soon as possible
    reducing the waiting time for the incoming map-reduce requests.
    
    Needed for #147
    
    @TarantoolBot document
    Title: VShard: deprecate cfg option 'collect_bucket_garbage_interval'
    It was used to specify the interval between bucket garbage
    collection steps. It was needed because garbage collection in
    vshard was proactive. It didn't react to newly appeared garbage
    buckets immediately.
    
    Since now (0.1.17) garbage collection became reactive. It starts
    working with garbage buckets immediately as they appear. And
    sleeps rest of the time. The option is not used now and does not
    affect behaviour of anything.
    
    I suppose it can be deleted from the documentation. Or left with
    a big label 'deprecated' + the explanation above.
    
    An attempt to use the option does not cause an error, but logs a
    warning.

diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
index 21409bd..8df89f6 100644
--- a/test/lua_libs/storage_template.lua
+++ b/test/lua_libs/storage_template.lua
@@ -172,6 +172,5 @@ function wait_bucket_is_collected(id)
             return true
         end
         vshard.storage.recovery_wakeup()
-        vshard.storage.garbage_collector_wakeup()
     end)
 end
diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result
index 168be5d..3b34841 100644
--- a/test/misc/reconfigure.result
+++ b/test/misc/reconfigure.result
@@ -83,9 +83,6 @@ cfg.collect_lua_garbage = true
 cfg.rebalancer_max_receiving = 1000
 ---
 ...
-cfg.collect_bucket_garbage_interval = 100
----
-...
 cfg.invalid_option = 'kek'
 ---
 ...
@@ -105,10 +102,6 @@ vshard.storage.internal.rebalancer_max_receiving ~= 1000
 ---
 - true
 ...
-vshard.storage.internal.collect_bucket_garbage_interval ~= 100
----
-- true
-...
 cfg.sync_timeout = nil
 ---
 ...
@@ -118,9 +111,6 @@ cfg.collect_lua_garbage = nil
 cfg.rebalancer_max_receiving = nil
 ---
 ...
-cfg.collect_bucket_garbage_interval = nil
----
-...
 cfg.invalid_option = nil
 ---
 ...
diff --git a/test/misc/reconfigure.test.lua b/test/misc/reconfigure.test.lua
index e891010..348628c 100644
--- a/test/misc/reconfigure.test.lua
+++ b/test/misc/reconfigure.test.lua
@@ -33,17 +33,14 @@ vshard.storage.internal.sync_timeout
 cfg.sync_timeout = 100
 cfg.collect_lua_garbage = true
 cfg.rebalancer_max_receiving = 1000
-cfg.collect_bucket_garbage_interval = 100
 cfg.invalid_option = 'kek'
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
 not vshard.storage.internal.collect_lua_garbage
 vshard.storage.internal.sync_timeout
 vshard.storage.internal.rebalancer_max_receiving ~= 1000
-vshard.storage.internal.collect_bucket_garbage_interval ~= 100
 cfg.sync_timeout = nil
 cfg.collect_lua_garbage = nil
 cfg.rebalancer_max_receiving = nil
-cfg.collect_bucket_garbage_interval = nil
 cfg.invalid_option = nil
 
 --
diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
index b8fc7ff..9df7480 100644
--- a/test/rebalancer/bucket_ref.result
+++ b/test/rebalancer/bucket_ref.result
@@ -184,9 +184,6 @@ vshard.storage.bucket_unref(1, 'read')
 - true
 ...
 -- Force GC to take an RO lock on the bucket now.
-vshard.storage.garbage_collector_wakeup()
----
-...
 vshard.storage.buckets_info(1)
 ---
 - 1:
@@ -203,7 +200,6 @@ while true do
     if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
         break
     end
-    vshard.storage.garbage_collector_wakeup()
     fiber.sleep(0.01)
 end;
 ---
@@ -235,14 +231,6 @@ finish_refs = true
 while f1:status() ~= 'dead' do fiber.sleep(0.01) end
 ---
 ...
-vshard.storage.buckets_info(1)
----
-- 1:
-    status: garbage
-    ro_lock: true
-    destination: <replicaset_2>
-    id: 1
-...
 wait_bucket_is_collected(1)
 ---
 ...
diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
index 213ced3..1b032ff 100644
--- a/test/rebalancer/bucket_ref.test.lua
+++ b/test/rebalancer/bucket_ref.test.lua
@@ -56,7 +56,6 @@ vshard.storage.bucket_unref(1, 'write') -- Error, no refs.
 vshard.storage.bucket_ref(1, 'read')
 vshard.storage.bucket_unref(1, 'read')
 -- Force GC to take an RO lock on the bucket now.
-vshard.storage.garbage_collector_wakeup()
 vshard.storage.buckets_info(1)
 _ = test_run:cmd("setopt delimiter ';'")
 while true do
@@ -64,7 +63,6 @@ while true do
     if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
         break
     end
-    vshard.storage.garbage_collector_wakeup()
     fiber.sleep(0.01)
 end;
 _ = test_run:cmd("setopt delimiter ''");
@@ -72,7 +70,6 @@ vshard.storage.buckets_info(1)
 vshard.storage.bucket_refro(1)
 finish_refs = true
 while f1:status() ~= 'dead' do fiber.sleep(0.01) end
-vshard.storage.buckets_info(1)
 wait_bucket_is_collected(1)
 _ = test_run:switch('box_2_a')
 vshard.storage.buckets_info(1)
diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
index e50eb72..0ddb1c9 100644
--- a/test/rebalancer/errinj.result
+++ b/test/rebalancer/errinj.result
@@ -226,17 +226,6 @@ ret2, err2
 - true
 - null
 ...
-_bucket:get{35}
----
-- [35, 'sent', '<replicaset_2>']
-...
-_bucket:get{36}
----
-- [36, 'sent', '<replicaset_2>']
-...
--- Buckets became 'active' on box_2_a, but still are sending on
--- box_1_a. Wait until it is marked as garbage on box_1_a by the
--- recovery fiber.
 wait_bucket_is_collected(35)
 ---
 ...
diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
index 2cc4a69..a60f3d7 100644
--- a/test/rebalancer/errinj.test.lua
+++ b/test/rebalancer/errinj.test.lua
@@ -102,11 +102,6 @@ _ = test_run:switch('box_1_a')
 while f1:status() ~= 'dead' or f2:status() ~= 'dead' do fiber.sleep(0.001) end
 ret1, err1
 ret2, err2
-_bucket:get{35}
-_bucket:get{36}
--- Buckets became 'active' on box_2_a, but still are sending on
--- box_1_a. Wait until it is marked as garbage on box_1_a by the
--- recovery fiber.
 wait_bucket_is_collected(35)
 wait_bucket_is_collected(36)
 _ = test_run:switch('box_2_a')
diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
index 7d3612b..ad93445 100644
--- a/test/rebalancer/receiving_bucket.result
+++ b/test/rebalancer/receiving_bucket.result
@@ -366,14 +366,6 @@ vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
 ---
 - true
 ...
-vshard.storage.buckets_info(1)
----
-- 1:
-    status: sent
-    ro_lock: true
-    destination: <replicaset_1>
-    id: 1
-...
 wait_bucket_is_collected(1)
 ---
 ...
diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
index 24534b3..2cf6382 100644
--- a/test/rebalancer/receiving_bucket.test.lua
+++ b/test/rebalancer/receiving_bucket.test.lua
@@ -136,7 +136,6 @@ box.space.test3:select{100}
 -- Now the bucket is unreferenced and can be transferred.
 _ = test_run:switch('box_2_a')
 vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
-vshard.storage.buckets_info(1)
 wait_bucket_is_collected(1)
 vshard.storage.buckets_info(1)
 _ = test_run:switch('box_1_a')
diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
index 753687f..9d30a04 100644
--- a/test/reload_evolution/storage.result
+++ b/test/reload_evolution/storage.result
@@ -92,7 +92,7 @@ test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to')
 ...
 vshard.storage.internal.reload_version
 ---
-- 2
+- 3
 ...
 --
 -- gh-237: should be only one trigger. During gh-237 the trigger installation
diff --git a/test/router/reroute_wrong_bucket.result b/test/router/reroute_wrong_bucket.result
index 049bdef..ac340eb 100644
--- a/test/router/reroute_wrong_bucket.result
+++ b/test/router/reroute_wrong_bucket.result
@@ -37,7 +37,7 @@ test_run:switch('storage_1_a')
 ---
 - true
 ...
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 ---
 ...
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
@@ -53,7 +53,7 @@ test_run:switch('storage_2_a')
 ---
 - true
 ...
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 ---
 ...
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
@@ -202,12 +202,12 @@ test_run:grep_log('router_1', 'please update configuration')
 err
 ---
 - bucket_id: 100
-  reason: write is prohibited
+  reason: Not found
   code: 1
   destination: ac522f65-aa94-4134-9f64-51ee384f1a54
   type: ShardingError
   name: WRONG_BUCKET
-  message: 'Cannot perform action with bucket 100, reason: write is prohibited'
+  message: 'Cannot perform action with bucket 100, reason: Not found'
 ...
 --
 -- Now try again, but update configuration during call(). It must
diff --git a/test/router/reroute_wrong_bucket.test.lua b/test/router/reroute_wrong_bucket.test.lua
index 9e6e804..207aac3 100644
--- a/test/router/reroute_wrong_bucket.test.lua
+++ b/test/router/reroute_wrong_bucket.test.lua
@@ -11,13 +11,13 @@ util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memt
 test_run:cmd('create server router_1 with script="router/router_1.lua"')
 test_run:cmd('start server router_1')
 test_run:switch('storage_1_a')
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
 vshard.storage.rebalancer_disable()
 for i = 1, 100 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
 
 test_run:switch('storage_2_a')
-cfg.collect_bucket_garbage_interval = 100
+vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
 vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
 vshard.storage.rebalancer_disable()
 for i = 101, 200 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
diff --git a/test/storage/recovery.result b/test/storage/recovery.result
index f833fe7..8ccb0b9 100644
--- a/test/storage/recovery.result
+++ b/test/storage/recovery.result
@@ -79,8 +79,7 @@ _bucket = box.space._bucket
 ...
 _bucket:select{}
 ---
-- - [2, 'garbage', '<replicaset_2>']
-  - [3, 'garbage', '<replicaset_2>']
+- []
 ...
 _ = test_run:switch('storage_2_a')
 ---
diff --git a/test/storage/storage.result b/test/storage/storage.result
index 424bc4c..0550ad1 100644
--- a/test/storage/storage.result
+++ b/test/storage/storage.result
@@ -547,6 +547,9 @@ vshard.storage.bucket_send(1, util.replicasets[2])
 ---
 - true
 ...
+wait_bucket_is_collected(1)
+---
+...
 _ = test_run:switch("storage_2_a")
 ---
 ...
@@ -567,12 +570,7 @@ _ = test_run:switch("storage_1_a")
 ...
 vshard.storage.buckets_info()
 ---
-- 1:
-    status: sent
-    ro_lock: true
-    destination: <replicaset_2>
-    id: 1
-  2:
+- 2:
     status: active
     id: 2
 ...
diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
index d631b51..d8fbd94 100644
--- a/test/storage/storage.test.lua
+++ b/test/storage/storage.test.lua
@@ -136,6 +136,7 @@ vshard.storage.bucket_send(1, util.replicasets[1])
 
 -- Successful transfer.
 vshard.storage.bucket_send(1, util.replicasets[2])
+wait_bucket_is_collected(1)
 _ = test_run:switch("storage_2_a")
 vshard.storage.buckets_info()
 _ = test_run:switch("storage_1_a")
diff --git a/test/unit/config.result b/test/unit/config.result
index dfd0219..e0b2482 100644
--- a/test/unit/config.result
+++ b/test/unit/config.result
@@ -428,33 +428,6 @@ _ = lcfg.check(cfg)
 --
 -- gh-77: garbage collection options.
 --
-cfg.collect_bucket_garbage_interval = 'str'
----
-...
-check(cfg)
----
-- Garbage bucket collect interval must be positive number
-...
-cfg.collect_bucket_garbage_interval = 0
----
-...
-check(cfg)
----
-- Garbage bucket collect interval must be positive number
-...
-cfg.collect_bucket_garbage_interval = -1
----
-...
-check(cfg)
----
-- Garbage bucket collect interval must be positive number
-...
-cfg.collect_bucket_garbage_interval = 100.5
----
-...
-_ = lcfg.check(cfg)
----
-...
 cfg.collect_lua_garbage = 100
 ---
 ...
@@ -615,6 +588,12 @@ lcfg.check(cfg).rebalancer_max_sending
 cfg.rebalancer_max_sending = nil
 ---
 ...
-cfg.sharding = nil
+--
+-- Deprecated option does not break anything.
+--
+cfg.collect_bucket_garbage_interval = 100
+---
+...
+_ = lcfg.check(cfg)
 ---
 ...
diff --git a/test/unit/config.test.lua b/test/unit/config.test.lua
index ada43db..a1c9f07 100644
--- a/test/unit/config.test.lua
+++ b/test/unit/config.test.lua
@@ -175,15 +175,6 @@ _ = lcfg.check(cfg)
 --
 -- gh-77: garbage collection options.
 --
-cfg.collect_bucket_garbage_interval = 'str'
-check(cfg)
-cfg.collect_bucket_garbage_interval = 0
-check(cfg)
-cfg.collect_bucket_garbage_interval = -1
-check(cfg)
-cfg.collect_bucket_garbage_interval = 100.5
-_ = lcfg.check(cfg)
-
 cfg.collect_lua_garbage = 100
 check(cfg)
 cfg.collect_lua_garbage = true
@@ -244,4 +235,9 @@ util.check_error(lcfg.check, cfg)
 cfg.rebalancer_max_sending = 15
 lcfg.check(cfg).rebalancer_max_sending
 cfg.rebalancer_max_sending = nil
-cfg.sharding = nil
+
+--
+-- Deprecated option does not break anything.
+--
+cfg.collect_bucket_garbage_interval = 100
+_ = lcfg.check(cfg)
diff --git a/test/unit/garbage.result b/test/unit/garbage.result
index 74d9ccf..a530496 100644
--- a/test/unit/garbage.result
+++ b/test/unit/garbage.result
@@ -31,9 +31,6 @@ test_run:cmd("setopt delimiter ''");
 vshard.storage.internal.shard_index = 'bucket_id'
 ---
 ...
-vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
----
-...
 --
 -- Find nothing if no bucket_id anywhere, or there is no index
 -- by it, or bucket_id is not unsigned.
@@ -151,6 +148,9 @@ format[1] = {name = 'id', type = 'unsigned'}
 format[2] = {name = 'status', type = 'string'}
 ---
 ...
+format[3] = {name = 'destination', type = 'string', is_nullable = true}
+---
+...
 _bucket = box.schema.create_space('_bucket', {format = format})
 ---
 ...
@@ -172,22 +172,6 @@ _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
 ---
 - [3, 'active']
 ...
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
----
-- [4, 'sent']
-...
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
----
-- [5, 'garbage']
-...
-_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
----
-- [6, 'garbage']
-...
-_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
----
-- [200, 'garbage']
-...
 s = box.schema.create_space('test', {engine = engine})
 ---
 ...
@@ -213,7 +197,7 @@ s:replace{4, 2}
 ---
 - [4, 2]
 ...
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
+gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
 ---
 ...
 s2 = box.schema.create_space('test2', {engine = engine})
@@ -249,6 +233,10 @@ function fill_spaces_with_garbage()
     s2:replace{6, 4}
     s2:replace{7, 5}
     s2:replace{7, 6}
+    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
+    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
+    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
+    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
 end;
 ---
 ...
@@ -267,12 +255,22 @@ fill_spaces_with_garbage()
 ---
 - 1107
 ...
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
+route_map = {}
+---
+...
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
 ---
-- - 5
-  - 6
-  - 200
 - true
+- null
+...
+route_map
+---
+- - null
+  - null
+  - null
+  - null
+  - null
+  - destination2
 ...
 #s2:select{}
 ---
@@ -282,10 +280,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
 ---
 - 7
 ...
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+route_map = {}
+---
+...
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
 ---
-- - 4
 - true
+- null
+...
+route_map
+---
+- - null
+  - null
+  - null
+  - destination1
 ...
 s2:select{}
 ---
@@ -303,17 +311,22 @@ s:select{}
   - [6, 100]
 ...
 -- Nothing deleted - update collected generation.
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
+route_map = {}
+---
+...
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
 ---
-- - 5
-  - 6
-  - 200
 - true
+- null
 ...
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
 ---
-- - 4
 - true
+- null
+...
+route_map
+---
+- []
 ...
 #s2:select{}
 ---
@@ -329,15 +342,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
 fill_spaces_with_garbage()
 ---
 ...
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
+_ = _bucket:on_replace(function()                                               \
+    local gen = vshard.storage.internal.bucket_generation                       \
+    vshard.storage.internal.bucket_generation = gen + 1                         \
+    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
+end)
 ---
 ...
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
 ---
 ...
 -- Wait until garbage collection is finished.
-while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
 ---
+- true
 ...
 s:select{}
 ---
@@ -360,7 +378,6 @@ _bucket:select{}
 - - [1, 'active']
   - [2, 'receiving']
   - [3, 'active']
-  - [4, 'sent']
 ...
 --
 -- Test deletion of 'sent' buckets after a specified timeout.
@@ -370,8 +387,9 @@ _bucket:replace{2, vshard.consts.BUCKET.SENT}
 - [2, 'sent']
 ...
 -- Wait deletion after a while.
-while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{2} end)
 ---
+- true
 ...
 _bucket:select{}
 ---
@@ -410,8 +428,9 @@ _bucket:replace{4, vshard.consts.BUCKET.SENT}
 ---
 - [4, 'sent']
 ...
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 ---
+- true
 ...
 --
 -- Test WAL errors during deletion from _bucket.
@@ -434,11 +453,14 @@ s:replace{6, 4}
 ---
 - [6, 4]
 ...
-while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_log('default', 'Error during garbage collection step',            \
+                  65536, 10)
 ---
+- Error during garbage collection step
 ...
-while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return #sk:select{4} == 0 end)
 ---
+- true
 ...
 s:select{}
 ---
@@ -454,8 +476,9 @@ _bucket:select{}
 _ = _bucket:on_replace(nil, rollback_on_delete)
 ---
 ...
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 ---
+- true
 ...
 f:cancel()
 ---
@@ -562,8 +585,9 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
 ---
 ...
-while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return _bucket:count() == 0 end)
 ---
+- true
 ...
 _bucket:select{}
 ---
diff --git a/test/unit/garbage.test.lua b/test/unit/garbage.test.lua
index 30079fa..250afb0 100644
--- a/test/unit/garbage.test.lua
+++ b/test/unit/garbage.test.lua
@@ -15,7 +15,6 @@ end;
 test_run:cmd("setopt delimiter ''");
 
 vshard.storage.internal.shard_index = 'bucket_id'
-vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
 
 --
 -- Find nothing if no bucket_id anywhere, or there is no index
@@ -75,16 +74,13 @@ s:drop()
 format = {}
 format[1] = {name = 'id', type = 'unsigned'}
 format[2] = {name = 'status', type = 'string'}
+format[3] = {name = 'destination', type = 'string', is_nullable = true}
 _bucket = box.schema.create_space('_bucket', {format = format})
 _ = _bucket:create_index('pk')
 _ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
 _bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
 _bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
 _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
-_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
-_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
 
 s = box.schema.create_space('test', {engine = engine})
 pk = s:create_index('pk')
@@ -94,7 +90,7 @@ s:replace{2, 1}
 s:replace{3, 2}
 s:replace{4, 2}
 
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
+gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
 s2 = box.schema.create_space('test2', {engine = engine})
 pk2 = s2:create_index('pk')
 sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
@@ -114,6 +110,10 @@ function fill_spaces_with_garbage()
     s2:replace{6, 4}
     s2:replace{7, 5}
     s2:replace{7, 6}
+    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
+    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
+    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
+    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
 end;
 test_run:cmd("setopt delimiter ''");
 
@@ -121,15 +121,21 @@ fill_spaces_with_garbage()
 
 #s2:select{}
 #s:select{}
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
+route_map = {}
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
+route_map
 #s2:select{}
 #s:select{}
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+route_map = {}
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
+route_map
 s2:select{}
 s:select{}
 -- Nothing deleted - update collected generation.
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
+route_map = {}
+gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
+gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
+route_map
 #s2:select{}
 #s:select{}
 
@@ -137,10 +143,14 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
 -- Test continuous garbage collection via background fiber.
 --
 fill_spaces_with_garbage()
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
+_ = _bucket:on_replace(function()                                               \
+    local gen = vshard.storage.internal.bucket_generation                       \
+    vshard.storage.internal.bucket_generation = gen + 1                         \
+    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
+end)
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
 -- Wait until garbage collection is finished.
-while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
 s:select{}
 s2:select{}
 -- Check garbage bucket is deleted by background fiber.
@@ -150,7 +160,7 @@ _bucket:select{}
 --
 _bucket:replace{2, vshard.consts.BUCKET.SENT}
 -- Wait deletion after a while.
-while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{2} end)
 _bucket:select{}
 s:select{}
 s2:select{}
@@ -162,7 +172,7 @@ _bucket:replace{4, vshard.consts.BUCKET.ACTIVE}
 s:replace{5, 4}
 s:replace{6, 4}
 _bucket:replace{4, vshard.consts.BUCKET.SENT}
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 
 --
 -- Test WAL errors during deletion from _bucket.
@@ -172,12 +182,13 @@ _ = _bucket:on_replace(rollback_on_delete)
 _bucket:replace{4, vshard.consts.BUCKET.SENT}
 s:replace{5, 4}
 s:replace{6, 4}
-while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
-while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_log('default', 'Error during garbage collection step',            \
+                  65536, 10)
+test_run:wait_cond(function() return #sk:select{4} == 0 end)
 s:select{}
 _bucket:select{}
 _ = _bucket:on_replace(nil, rollback_on_delete)
-while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return not _bucket:get{4} end)
 
 f:cancel()
 
@@ -220,7 +231,7 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
 #s:select{}
 #s2:select{}
 f = fiber.create(vshard.storage.internal.gc_bucket_f)
-while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
+test_run:wait_cond(function() return _bucket:count() == 0 end)
 _bucket:select{}
 s:select{}
 s2:select{}
diff --git a/test/unit/garbage_errinj.result b/test/unit/garbage_errinj.result
deleted file mode 100644
index 92c8039..0000000
--- a/test/unit/garbage_errinj.result
+++ /dev/null
@@ -1,223 +0,0 @@
-test_run = require('test_run').new()
----
-...
-vshard = require('vshard')
----
-...
-fiber = require('fiber')
----
-...
-engine = test_run:get_cfg('engine')
----
-...
-vshard.storage.internal.shard_index = 'bucket_id'
----
-...
-format = {}
----
-...
-format[1] = {name = 'id', type = 'unsigned'}
----
-...
-format[2] = {name = 'status', type = 'string', is_nullable = true}
----
-...
-_bucket = box.schema.create_space('_bucket', {format = format})
----
-...
-_ = _bucket:create_index('pk')
----
-...
-_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
----
-...
-_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
----
-- [1, 'active']
-...
-_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
----
-- [2, 'receiving']
-...
-_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
----
-- [3, 'active']
-...
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
----
-- [4, 'sent']
-...
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
----
-- [5, 'garbage']
-...
-s = box.schema.create_space('test', {engine = engine})
----
-...
-pk = s:create_index('pk')
----
-...
-sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
----
-...
-s:replace{1, 1}
----
-- [1, 1]
-...
-s:replace{2, 1}
----
-- [2, 1]
-...
-s:replace{3, 2}
----
-- [3, 2]
-...
-s:replace{4, 2}
----
-- [4, 2]
-...
-s:replace{5, 100}
----
-- [5, 100]
-...
-s:replace{6, 100}
----
-- [6, 100]
-...
-s:replace{7, 4}
----
-- [7, 4]
-...
-s:replace{8, 5}
----
-- [8, 5]
-...
-s2 = box.schema.create_space('test2', {engine = engine})
----
-...
-pk2 = s2:create_index('pk')
----
-...
-sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
----
-...
-s2:replace{1, 1}
----
-- [1, 1]
-...
-s2:replace{3, 3}
----
-- [3, 3]
-...
-for i = 7, 1107 do s:replace{i, 200} end
----
-...
-s2:replace{4, 200}
----
-- [4, 200]
-...
-s2:replace{5, 100}
----
-- [5, 100]
-...
-s2:replace{5, 300}
----
-- [5, 300]
-...
-s2:replace{6, 4}
----
-- [6, 4]
-...
-s2:replace{7, 5}
----
-- [7, 5]
-...
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
----
-...
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
----
-- - 4
-- true
-...
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
----
-- - 5
-- true
-...
---
--- Test _bucket generation change during garbage buckets search.
---
-s:truncate()
----
-...
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
----
-...
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
----
-...
-f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
----
-...
-_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
----
-- [4, 'garbage']
-...
-s:replace{5, 4}
----
-- [5, 4]
-...
-s:replace{6, 4}
----
-- [6, 4]
-...
-#s:select{}
----
-- 2
-...
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
----
-...
-while f:status() ~= 'dead' do fiber.sleep(0.1) end
----
-...
--- Nothing is deleted - _bucket:replace() has changed _bucket
--- generation during search of garbage buckets.
-#s:select{}
----
-- 2
-...
-_bucket:select{4}
----
-- - [4, 'garbage']
-...
--- Next step deletes garbage ok.
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
----
-- []
-- true
-...
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
----
-- - 4
-  - 5
-- true
-...
-#s:select{}
----
-- 0
-...
-_bucket:delete{4}
----
-- [4, 'garbage']
-...
-s2:drop()
----
-...
-s:drop()
----
-...
-_bucket:drop()
----
-...
diff --git a/test/unit/garbage_errinj.test.lua b/test/unit/garbage_errinj.test.lua
deleted file mode 100644
index 31184b9..0000000
--- a/test/unit/garbage_errinj.test.lua
+++ /dev/null
@@ -1,73 +0,0 @@
-test_run = require('test_run').new()
-vshard = require('vshard')
-fiber = require('fiber')
-
-engine = test_run:get_cfg('engine')
-vshard.storage.internal.shard_index = 'bucket_id'
-
-format = {}
-format[1] = {name = 'id', type = 'unsigned'}
-format[2] = {name = 'status', type = 'string', is_nullable = true}
-_bucket = box.schema.create_space('_bucket', {format = format})
-_ = _bucket:create_index('pk')
-_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
-_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
-_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
-_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
-_bucket:replace{4, vshard.consts.BUCKET.SENT}
-_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
-
-s = box.schema.create_space('test', {engine = engine})
-pk = s:create_index('pk')
-sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
-s:replace{1, 1}
-s:replace{2, 1}
-s:replace{3, 2}
-s:replace{4, 2}
-s:replace{5, 100}
-s:replace{6, 100}
-s:replace{7, 4}
-s:replace{8, 5}
-
-s2 = box.schema.create_space('test2', {engine = engine})
-pk2 = s2:create_index('pk')
-sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
-s2:replace{1, 1}
-s2:replace{3, 3}
-for i = 7, 1107 do s:replace{i, 200} end
-s2:replace{4, 200}
-s2:replace{5, 100}
-s2:replace{5, 300}
-s2:replace{6, 4}
-s2:replace{7, 5}
-
-gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
-
---
--- Test _bucket generation change during garbage buckets search.
---
-s:truncate()
-_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
-f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
-_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
-s:replace{5, 4}
-s:replace{6, 4}
-#s:select{}
-vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
-while f:status() ~= 'dead' do fiber.sleep(0.1) end
--- Nothing is deleted - _bucket:replace() has changed _bucket
--- generation during search of garbage buckets.
-#s:select{}
-_bucket:select{4}
--- Next step deletes garbage ok.
-gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
-gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
-#s:select{}
-_bucket:delete{4}
-
-s2:drop()
-s:drop()
-_bucket:drop()
diff --git a/vshard/cfg.lua b/vshard/cfg.lua
index f7d5dbc..63d5414 100644
--- a/vshard/cfg.lua
+++ b/vshard/cfg.lua
@@ -251,9 +251,8 @@ local cfg_template = {
         max = consts.REBALANCER_MAX_SENDING_MAX
     },
     collect_bucket_garbage_interval = {
-        type = 'positive number', name = 'Garbage bucket collect interval',
-        is_optional = true,
-        default = consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
+        name = 'Garbage bucket collect interval', is_deprecated = true,
+        reason = 'Has no effect anymore'
     },
     collect_lua_garbage = {
         type = 'boolean', name = 'Garbage Lua collect necessity',
diff --git a/vshard/consts.lua b/vshard/consts.lua
index 8c2a8b0..3f1585a 100644
--- a/vshard/consts.lua
+++ b/vshard/consts.lua
@@ -23,6 +23,7 @@ return {
     DEFAULT_BUCKET_COUNT = 3000;
     BUCKET_SENT_GARBAGE_DELAY = 0.5;
     BUCKET_CHUNK_SIZE = 1000;
+    LUA_CHUNK_SIZE = 100000,
     DEFAULT_REBALANCER_DISBALANCE_THRESHOLD = 1;
     REBALANCER_IDLE_INTERVAL = 60 * 60;
     REBALANCER_WORK_INTERVAL = 10;
@@ -37,7 +38,7 @@ return {
     DEFAULT_FAILOVER_PING_TIMEOUT = 5;
     DEFAULT_SYNC_TIMEOUT = 1;
     RECONNECT_TIMEOUT = 0.5;
-    DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL = 0.5;
+    GC_BACKOFF_INTERVAL = 5,
     RECOVERY_INTERVAL = 5;
     COLLECT_LUA_GARBAGE_INTERVAL = 100;
 
@@ -45,4 +46,6 @@ return {
     DISCOVERY_WORK_INTERVAL = 1,
     DISCOVERY_WORK_STEP = 0.01,
     DISCOVERY_TIMEOUT = 10,
+
+    TIMEOUT_INFINITY = 500 * 365 * 86400,
 }
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index adf1c20..1ea8069 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -69,7 +69,6 @@ if not M then
         total_bucket_count = 0,
         errinj = {
             ERRINJ_CFG = false,
-            ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false,
             ERRINJ_RELOAD = false,
             ERRINJ_CFG_DELAY = false,
             ERRINJ_LONG_RECEIVE = false,
@@ -96,6 +95,8 @@ if not M then
         -- detect that _bucket was not changed between yields.
         --
         bucket_generation = 0,
+        -- Condition variable fired on generation update.
+        bucket_generation_cond = lfiber.cond(),
         --
         -- Reference to the function used as on_replace trigger on
         -- _bucket space. It is used to replace the trigger with
@@ -107,12 +108,14 @@ if not M then
         -- replace the old function is to keep its reference.
         --
         bucket_on_replace = nil,
+        -- Redirects for recently sent buckets. They are kept for a while to
+        -- help routers to find a new location for sent and deleted buckets
+        -- without whole cluster scan.
+        route_map = {},
 
         ------------------- Garbage collection -------------------
         -- Fiber to remove garbage buckets data.
         collect_bucket_garbage_fiber = nil,
-        -- Do buckets garbage collection once per this time.
-        collect_bucket_garbage_interval = nil,
         -- Boolean lua_gc state (create periodic gc task).
         collect_lua_garbage = nil,
 
@@ -173,6 +176,7 @@ end
 --
 local function bucket_generation_increment()
     M.bucket_generation = M.bucket_generation + 1
+    M.bucket_generation_cond:broadcast()
 end
 
 --
@@ -758,8 +762,9 @@ local function bucket_check_state(bucket_id, mode)
     else
         return bucket
     end
+    local dst = bucket and bucket.destination or M.route_map[bucket_id]
     return bucket, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, reason,
-                                 bucket and bucket.destination)
+                                 dst)
 end
 
 --
@@ -804,11 +809,23 @@ end
 --
 local function bucket_unrefro(bucket_id)
     local ref = M.bucket_refs[bucket_id]
-    if not ref or ref.ro == 0 then
+    local count = ref and ref.ro or 0
+    if count == 0 then
         return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id,
                                   "no refs", nil)
     end
-    ref.ro = ref.ro - 1
+    if count == 1 then
+        ref.ro = 0
+        if ref.ro_lock then
+            -- Garbage collector is waiting for the bucket if RO
+            -- is locked. Let it know it has one more bucket to
+            -- collect. It relies on generation, so its increment
+            -- it enough.
+            bucket_generation_increment()
+        end
+        return true
+    end
+    ref.ro = count - 1
     return true
 end
 
@@ -1481,79 +1498,44 @@ local function gc_bucket_in_space(space, bucket_id, status)
 end
 
 --
--- Remove tuples from buckets of a specified type.
--- @param type Type of buckets to gc.
--- @retval List of ids of empty buckets of the type.
+-- Drop buckets with the given status along with their data in all spaces.
+-- @param status Status of target buckets.
+-- @param route_map Destinations of deleted buckets are saved into this table.
 --
-local function gc_bucket_step_by_type(type)
-    local sharded_spaces = find_sharded_spaces()
-    local empty_buckets = {}
+local function gc_bucket_drop_xc(status, route_map)
     local limit = consts.BUCKET_CHUNK_SIZE
-    local is_all_collected = true
-    for _, bucket in box.space._bucket.index.status:pairs(type) do
-        local bucket_id = bucket.id
-        local ref = M.bucket_refs[bucket_id]
+    local _bucket = box.space._bucket
+    local sharded_spaces = find_sharded_spaces()
+    for _, b in _bucket.index.status:pairs(status) do
+        local id = b.id
+        local ref = M.bucket_refs[id]
         if ref then
             assert(ref.rw == 0)
             if ref.ro ~= 0 then
                 ref.ro_lock = true
-                is_all_collected = false
                 goto continue
             end
-            M.bucket_refs[bucket_id] = nil
+            M.bucket_refs[id] = nil
         end
         for _, space in pairs(sharded_spaces) do
-            gc_bucket_in_space_xc(space, bucket_id, type)
+            gc_bucket_in_space_xc(space, id, status)
             limit = limit - 1
             if limit == 0 then
                 lfiber.sleep(0)
                 limit = consts.BUCKET_CHUNK_SIZE
             end
         end
-        table.insert(empty_buckets, bucket.id)
-::continue::
+        route_map[id] = b.destination
+        _bucket:delete{id}
+    ::continue::
     end
-    return empty_buckets, is_all_collected
-end
-
---
--- Drop buckets with ids in the list.
--- @param bucket_ids Bucket ids to drop.
--- @param status Expected bucket status.
---
-local function gc_bucket_drop_xc(bucket_ids, status)
-    if #bucket_ids == 0 then
-        return
-    end
-    local limit = consts.BUCKET_CHUNK_SIZE
-    box.begin()
-    local _bucket = box.space._bucket
-    for _, id in pairs(bucket_ids) do
-        local bucket_exists = _bucket:get{id} ~= nil
-        local b = _bucket:get{id}
-        if b then
-            if b.status ~= status then
-                return error(string.format('Bucket %d status is changed. Was '..
-                                           '%s, became %s', id, status,
-                                           b.status))
-            end
-            _bucket:delete{id}
-        end
-        limit = limit - 1
-        if limit == 0 then
-            box.commit()
-            box.begin()
-            limit = consts.BUCKET_CHUNK_SIZE
-        end
-    end
-    box.commit()
 end
 
 --
 -- Exception safe version of gc_bucket_drop_xc.
 --
-local function gc_bucket_drop(bucket_ids, status)
-    local status, err = pcall(gc_bucket_drop_xc, bucket_ids, status)
+local function gc_bucket_drop(status, route_map)
+    local status, err = pcall(gc_bucket_drop_xc, status, route_map)
     if not status then
         box.rollback()
     end
@@ -1561,14 +1543,16 @@ local function gc_bucket_drop(bucket_ids, status)
 end
 
 --
--- Garbage collector. Works on masters. The garbage collector
--- wakes up once per specified time.
+-- Garbage collector. Works on masters. The garbage collector wakes up when
+-- state of any bucket changes.
 -- After wakeup it follows the plan:
--- 1) Check if _bucket has changed. If not, then sleep again;
--- 2) Scan user spaces for sent and garbage buckets, delete
---    garbage data in batches of limited size;
--- 3) Delete GARBAGE buckets from _bucket immediately, and
---    schedule SENT buckets for deletion after a timeout;
+-- 1) Check if state of any bucket has really changed. If not, then sleep again;
+-- 2) Delete all GARBAGE and SENT buckets along with their data in chunks of
+--    limited size.
+-- 3) Bucket destinations are saved into a global route_map to reroute incoming
+--    requests from routers in case they didn't notice the buckets being moved.
+--    The saved routes are scheduled for deletion after a timeout, which is
+--    checked on each iteration of this loop.
 -- 4) Sleep, go to (1).
 -- For each step details see comments in the code.
 --
@@ -1580,65 +1564,75 @@ function gc_bucket_f()
     -- generation == bucket generation. In such a case the fiber
     -- does nothing until next _bucket change.
     local bucket_generation_collected = -1
-    -- Empty sent buckets are collected into an array. After a
-    -- specified time interval the buckets are deleted both from
-    -- this array and from _bucket space.
-    local buckets_for_redirect = {}
-    local buckets_for_redirect_ts = fiber_clock()
-    -- Empty sent buckets, updated after each step, and when
-    -- buckets_for_redirect is deleted, it gets empty_sent_buckets
-    -- for next deletion.
-    local empty_garbage_buckets, empty_sent_buckets, status, err
+    local bucket_generation_current = M.bucket_generation
+    -- Deleted buckets are saved into a route map to redirect routers if they
+    -- didn't discover new location of the buckets yet. However route map does
+    -- not grow infinitely. Otherwise it would end up storing redirects for all
+    -- buckets in the cluster. Which could also be outdated.
+    -- Garbage collector periodically drops old routes from the map. For that it
+    -- remembers state of route map in one moment, and after a while clears the
+    -- remembered routes from the global route map.
+    local route_map = M.route_map
+    local route_map_old = {}
+    local route_map_deadline = 0
+    local status, err
     while M.module_version == module_version do
-        -- Check if no changes in buckets configuration.
-        if bucket_generation_collected ~= M.bucket_generation then
-            local bucket_generation = M.bucket_generation
-            local is_sent_collected, is_garbage_collected
-            status, empty_garbage_buckets, is_garbage_collected =
-                pcall(gc_bucket_step_by_type, consts.BUCKET.GARBAGE)
-            if not status then
-                err = empty_garbage_buckets
-                goto check_error
-            end
-            status, empty_sent_buckets, is_sent_collected =
-                pcall(gc_bucket_step_by_type, consts.BUCKET.SENT)
-            if not status then
-                err = empty_sent_buckets
-                goto check_error
+        if bucket_generation_collected ~= bucket_generation_current then
+            status, err = gc_bucket_drop(consts.BUCKET.GARBAGE, route_map)
+            if status then
+                status, err = gc_bucket_drop(consts.BUCKET.SENT, route_map)
             end
-            status, err = gc_bucket_drop(empty_garbage_buckets,
-                                         consts.BUCKET.GARBAGE)
-::check_error::
             if not status then
                 box.rollback()
                 log.error('Error during garbage collection step: %s', err)
-                goto continue
+            else
+                -- Don't use global generation. During the collection it could
+                -- already change. Instead, remember the generation known before
+                -- the collection has started.
+                -- Since the collection also changes the generation, it makes
+                -- the GC happen always at least twice. But typically on the
+                -- second iteration it should not find any buckets to collect,
+                -- and then the collected generation matches the global one.
+                bucket_generation_collected = bucket_generation_current
             end
-            if is_sent_collected and is_garbage_collected then
-                bucket_generation_collected = bucket_generation
+        else
+            status = true
+        end
+
+        local sleep_time = route_map_deadline - fiber_clock()
+        if sleep_time <= 0 then
+            local chunk = consts.LUA_CHUNK_SIZE
+            util.table_minus_yield(route_map, route_map_old, chunk)
+            route_map_old = util.table_copy_yield(route_map, chunk)
+            if next(route_map_old) then
+                sleep_time = consts.BUCKET_SENT_GARBAGE_DELAY
+            else
+                sleep_time = consts.TIMEOUT_INFINITY
             end
+            route_map_deadline = fiber_clock() + sleep_time
         end
+        bucket_generation_current = M.bucket_generation
 
-        if fiber_clock() - buckets_for_redirect_ts >=
-           consts.BUCKET_SENT_GARBAGE_DELAY then
-            status, err = gc_bucket_drop(buckets_for_redirect,
-                                         consts.BUCKET.SENT)
-            if not status then
-                buckets_for_redirect = {}
-                empty_sent_buckets = {}
-                bucket_generation_collected = -1
-                log.error('Error during deletion of empty sent buckets: %s',
-                          err)
-            elseif M.module_version ~= module_version then
-                return
+        if bucket_generation_current ~= bucket_generation_collected then
+            -- Generation was changed during collection. Or *by* collection.
+            if status then
+                -- Retry immediately. If the generation was changed by the
+                -- collection itself, it will notice it next iteration, and go
+                -- to proper sleep.
+                sleep_time = 0
             else
-                buckets_for_redirect = empty_sent_buckets or {}
-                empty_sent_buckets = nil
-                buckets_for_redirect_ts = fiber_clock()
+                -- An error happened during the collection. Does not make sense
+                -- to retry on each iteration of the event loop. The most likely
+                -- errors are either a WAL error or a transaction abort - both
+                -- look like an issue in the user's code and can't be fixed
+                -- quickly anyway. Backoff.
+                sleep_time = consts.GC_BACKOFF_INTERVAL
             end
         end
-::continue::
-        lfiber.sleep(M.collect_bucket_garbage_interval)
+
+        if M.module_version == module_version then
+            M.bucket_generation_cond:wait(sleep_time)
+        end
     end
 end
 
@@ -2423,8 +2417,6 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
         vshard_cfg.rebalancer_disbalance_threshold
     M.rebalancer_receiving_quota = vshard_cfg.rebalancer_max_receiving
     M.shard_index = vshard_cfg.shard_index
-    M.collect_bucket_garbage_interval =
-        vshard_cfg.collect_bucket_garbage_interval
     M.collect_lua_garbage = vshard_cfg.collect_lua_garbage
     M.rebalancer_worker_count = vshard_cfg.rebalancer_max_sending
     M.current_cfg = cfg
@@ -2678,6 +2670,9 @@ else
         storage_cfg(M.current_cfg, M.this_replica.uuid, true)
     end
     M.module_version = M.module_version + 1
+    -- Background fibers could sleep waiting for bucket changes.
+    -- Let them know it is time to reload.
+    bucket_generation_increment()
 end
 
 M.recovery_f = recovery_f
@@ -2688,7 +2683,7 @@ M.gc_bucket_f = gc_bucket_f
 -- These functions are saved in M not for atomic reload, but for
 -- unit testing.
 --
-M.gc_bucket_step_by_type = gc_bucket_step_by_type
+M.gc_bucket_drop = gc_bucket_drop
 M.rebalancer_build_routes = rebalancer_build_routes
 M.rebalancer_calculate_metrics = rebalancer_calculate_metrics
 M.cached_find_sharded_spaces = find_sharded_spaces
diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
index f38af74..484f499 100644
--- a/vshard/storage/reload_evolution.lua
+++ b/vshard/storage/reload_evolution.lua
@@ -4,6 +4,7 @@
 -- in a commit.
 --
 local log = require('log')
+local fiber = require('fiber')
 
 --
 -- Array of upgrade functions.
@@ -25,6 +26,13 @@ migrations[#migrations + 1] = function(M)
     end
 end
 
+migrations[#migrations + 1] = function(M)
+    if not M.route_map then
+        M.bucket_generation_cond = fiber.cond()
+        M.route_map = {}
+    end
+end
+
 --
 -- Perform an update based on a version stored in `M` (internals).
 -- @param M Old module internals which should be updated.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector
  2021-02-10 22:35     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-11  6:50 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your fixes! LGTM.

On 11/02/2021 01:35, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
> On 10.02.2021 10:00, Oleg Babin wrote:
>> Thanks for your patch.
>>
>> As I see you've introduced some new parameters: "LUA_CHUNK_SIZE" and "GC_BACKOFF_INTERVAL".
> I decided not to go into too deep details and not describe private
> constants in the commit message. GC_BACKOFF_INTERVAL is explained
> in the place where it is used. LUA_CHUNK_SIZE is quite obvious if
> you look at its usage.
>
>> I think it's better to describe them in commit message to understand more clear how new algorithm.
> These constants are not super relevant to the algorithm's core
> idea. It does not matter much for the reactive GC concept if I
> yield in table utility functions, or if I have a backoff timeout.
> These could be considered 'optimizations', 'amendments'. I would
> consider them small details not worth mentioning in the commit
> message.
>
>> I see that you didn't update comment above "gc_bucket_f" function. Is it still relevant?
> No, irrelevant, thanks for noticing. Here is the diff:
>
> ====================
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 99f92a0..1ea8069 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -1543,14 +1543,16 @@ local function gc_bucket_drop(status, route_map)
>   end
>   
>   --
> --- Garbage collector. Works on masters. The garbage collector
> --- wakes up once per specified time.
> +-- Garbage collector. Works on masters. The garbage collector wakes up when
> +-- state of any bucket changes.
>   -- After wakeup it follows the plan:
> --- 1) Check if _bucket has changed. If not, then sleep again;
> --- 2) Scan user spaces for sent and garbage buckets, delete
> ---    garbage data in batches of limited size;
> --- 3) Delete GARBAGE buckets from _bucket immediately, and
> ---    schedule SENT buckets for deletion after a timeout;
> +-- 1) Check if state of any bucket has really changed. If not, then sleep again;
> +-- 2) Delete all GARBAGE and SENT buckets along with their data in chunks of
> +--    limited size.
> +-- 3) Bucket destinations are saved into a global route_map to reroute incoming
> +--    requests from routers in case they didn't notice the buckets being moved.
> +--    The saved routes are scheduled for deletion after a timeout, which is
> +--    checked on each iteration of this loop.
>   -- 4) Sleep, go to (1).
>   -- For each step details see comments in the code.
>   --
> ====================
>
> The full new patch below.
>
> ====================
>      gc: introduce reactive garbage collector
>      
>      Garbage collector is a fiber on a master node which deletes
>      GARBAGE and SENT buckets along with their data.
>      
>      It was proactive. It used to wakeup with a constant period to
>      find and delete the needed buckets.
>      
>      But this won't work with the future feature called 'map-reduce'.
>      Map-reduce as a preparation stage will need to ensure that all
>      buckets on a storage are readable and writable. With the current
>      GC algorithm if a bucket is sent, it won't be deleted for the next
>      5 seconds by default. During this time all new map-reduce requests
>      can't execute.
>      
>      This is not acceptable. As well as too frequent wakeup of GC fiber
>      because it would waste TX thread time.
>      
>      The patch makes GC fiber wakeup not by a timeout but by events
>      happening with _bucket space. GC fiber sleeps on a condition
>      variable which is signaled when _bucket is changed.
>      
>      Once GC sees work to do, it won't sleep until it is done. It will
>      only yield.
>      
>      This makes GC delete SENT and GARBAGE buckets as soon as possible
>      reducing the waiting time for the incoming map-reduce requests.
>      
>      Needed for #147
>      
>      @TarantoolBot document
>      Title: VShard: deprecate cfg option 'collect_bucket_garbage_interval'
>      It was used to specify the interval between bucket garbage
>      collection steps. It was needed because garbage collection in
>      vshard was proactive. It didn't react to newly appeared garbage
>      buckets immediately.
>      
>      Since now (0.1.17) garbage collection became reactive. It starts
>      working with garbage buckets immediately as they appear. And
>      sleeps rest of the time. The option is not used now and does not
>      affect behaviour of anything.
>      
>      I suppose it can be deleted from the documentation. Or left with
>      a big label 'deprecated' + the explanation above.
>      
>      An attempt to use the option does not cause an error, but logs a
>      warning.
>
> diff --git a/test/lua_libs/storage_template.lua b/test/lua_libs/storage_template.lua
> index 21409bd..8df89f6 100644
> --- a/test/lua_libs/storage_template.lua
> +++ b/test/lua_libs/storage_template.lua
> @@ -172,6 +172,5 @@ function wait_bucket_is_collected(id)
>               return true
>           end
>           vshard.storage.recovery_wakeup()
> -        vshard.storage.garbage_collector_wakeup()
>       end)
>   end
> diff --git a/test/misc/reconfigure.result b/test/misc/reconfigure.result
> index 168be5d..3b34841 100644
> --- a/test/misc/reconfigure.result
> +++ b/test/misc/reconfigure.result
> @@ -83,9 +83,6 @@ cfg.collect_lua_garbage = true
>   cfg.rebalancer_max_receiving = 1000
>   ---
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> ----
> -...
>   cfg.invalid_option = 'kek'
>   ---
>   ...
> @@ -105,10 +102,6 @@ vshard.storage.internal.rebalancer_max_receiving ~= 1000
>   ---
>   - true
>   ...
> -vshard.storage.internal.collect_bucket_garbage_interval ~= 100
> ----
> -- true
> -...
>   cfg.sync_timeout = nil
>   ---
>   ...
> @@ -118,9 +111,6 @@ cfg.collect_lua_garbage = nil
>   cfg.rebalancer_max_receiving = nil
>   ---
>   ...
> -cfg.collect_bucket_garbage_interval = nil
> ----
> -...
>   cfg.invalid_option = nil
>   ---
>   ...
> diff --git a/test/misc/reconfigure.test.lua b/test/misc/reconfigure.test.lua
> index e891010..348628c 100644
> --- a/test/misc/reconfigure.test.lua
> +++ b/test/misc/reconfigure.test.lua
> @@ -33,17 +33,14 @@ vshard.storage.internal.sync_timeout
>   cfg.sync_timeout = 100
>   cfg.collect_lua_garbage = true
>   cfg.rebalancer_max_receiving = 1000
> -cfg.collect_bucket_garbage_interval = 100
>   cfg.invalid_option = 'kek'
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
>   not vshard.storage.internal.collect_lua_garbage
>   vshard.storage.internal.sync_timeout
>   vshard.storage.internal.rebalancer_max_receiving ~= 1000
> -vshard.storage.internal.collect_bucket_garbage_interval ~= 100
>   cfg.sync_timeout = nil
>   cfg.collect_lua_garbage = nil
>   cfg.rebalancer_max_receiving = nil
> -cfg.collect_bucket_garbage_interval = nil
>   cfg.invalid_option = nil
>   
>   --
> diff --git a/test/rebalancer/bucket_ref.result b/test/rebalancer/bucket_ref.result
> index b8fc7ff..9df7480 100644
> --- a/test/rebalancer/bucket_ref.result
> +++ b/test/rebalancer/bucket_ref.result
> @@ -184,9 +184,6 @@ vshard.storage.bucket_unref(1, 'read')
>   - true
>   ...
>   -- Force GC to take an RO lock on the bucket now.
> -vshard.storage.garbage_collector_wakeup()
> ----
> -...
>   vshard.storage.buckets_info(1)
>   ---
>   - 1:
> @@ -203,7 +200,6 @@ while true do
>       if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
>           break
>       end
> -    vshard.storage.garbage_collector_wakeup()
>       fiber.sleep(0.01)
>   end;
>   ---
> @@ -235,14 +231,6 @@ finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
>   ---
>   ...
> -vshard.storage.buckets_info(1)
> ----
> -- 1:
> -    status: garbage
> -    ro_lock: true
> -    destination: <replicaset_2>
> -    id: 1
> -...
>   wait_bucket_is_collected(1)
>   ---
>   ...
> diff --git a/test/rebalancer/bucket_ref.test.lua b/test/rebalancer/bucket_ref.test.lua
> index 213ced3..1b032ff 100644
> --- a/test/rebalancer/bucket_ref.test.lua
> +++ b/test/rebalancer/bucket_ref.test.lua
> @@ -56,7 +56,6 @@ vshard.storage.bucket_unref(1, 'write') -- Error, no refs.
>   vshard.storage.bucket_ref(1, 'read')
>   vshard.storage.bucket_unref(1, 'read')
>   -- Force GC to take an RO lock on the bucket now.
> -vshard.storage.garbage_collector_wakeup()
>   vshard.storage.buckets_info(1)
>   _ = test_run:cmd("setopt delimiter ';'")
>   while true do
> @@ -64,7 +63,6 @@ while true do
>       if i.status == vshard.consts.BUCKET.GARBAGE and i.ro_lock then
>           break
>       end
> -    vshard.storage.garbage_collector_wakeup()
>       fiber.sleep(0.01)
>   end;
>   _ = test_run:cmd("setopt delimiter ''");
> @@ -72,7 +70,6 @@ vshard.storage.buckets_info(1)
>   vshard.storage.bucket_refro(1)
>   finish_refs = true
>   while f1:status() ~= 'dead' do fiber.sleep(0.01) end
> -vshard.storage.buckets_info(1)
>   wait_bucket_is_collected(1)
>   _ = test_run:switch('box_2_a')
>   vshard.storage.buckets_info(1)
> diff --git a/test/rebalancer/errinj.result b/test/rebalancer/errinj.result
> index e50eb72..0ddb1c9 100644
> --- a/test/rebalancer/errinj.result
> +++ b/test/rebalancer/errinj.result
> @@ -226,17 +226,6 @@ ret2, err2
>   - true
>   - null
>   ...
> -_bucket:get{35}
> ----
> -- [35, 'sent', '<replicaset_2>']
> -...
> -_bucket:get{36}
> ----
> -- [36, 'sent', '<replicaset_2>']
> -...
> --- Buckets became 'active' on box_2_a, but still are sending on
> --- box_1_a. Wait until it is marked as garbage on box_1_a by the
> --- recovery fiber.
>   wait_bucket_is_collected(35)
>   ---
>   ...
> diff --git a/test/rebalancer/errinj.test.lua b/test/rebalancer/errinj.test.lua
> index 2cc4a69..a60f3d7 100644
> --- a/test/rebalancer/errinj.test.lua
> +++ b/test/rebalancer/errinj.test.lua
> @@ -102,11 +102,6 @@ _ = test_run:switch('box_1_a')
>   while f1:status() ~= 'dead' or f2:status() ~= 'dead' do fiber.sleep(0.001) end
>   ret1, err1
>   ret2, err2
> -_bucket:get{35}
> -_bucket:get{36}
> --- Buckets became 'active' on box_2_a, but still are sending on
> --- box_1_a. Wait until it is marked as garbage on box_1_a by the
> --- recovery fiber.
>   wait_bucket_is_collected(35)
>   wait_bucket_is_collected(36)
>   _ = test_run:switch('box_2_a')
> diff --git a/test/rebalancer/receiving_bucket.result b/test/rebalancer/receiving_bucket.result
> index 7d3612b..ad93445 100644
> --- a/test/rebalancer/receiving_bucket.result
> +++ b/test/rebalancer/receiving_bucket.result
> @@ -366,14 +366,6 @@ vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
>   ---
>   - true
>   ...
> -vshard.storage.buckets_info(1)
> ----
> -- 1:
> -    status: sent
> -    ro_lock: true
> -    destination: <replicaset_1>
> -    id: 1
> -...
>   wait_bucket_is_collected(1)
>   ---
>   ...
> diff --git a/test/rebalancer/receiving_bucket.test.lua b/test/rebalancer/receiving_bucket.test.lua
> index 24534b3..2cf6382 100644
> --- a/test/rebalancer/receiving_bucket.test.lua
> +++ b/test/rebalancer/receiving_bucket.test.lua
> @@ -136,7 +136,6 @@ box.space.test3:select{100}
>   -- Now the bucket is unreferenced and can be transferred.
>   _ = test_run:switch('box_2_a')
>   vshard.storage.bucket_send(1, util.replicasets[1], {timeout = 0.3})
> -vshard.storage.buckets_info(1)
>   wait_bucket_is_collected(1)
>   vshard.storage.buckets_info(1)
>   _ = test_run:switch('box_1_a')
> diff --git a/test/reload_evolution/storage.result b/test/reload_evolution/storage.result
> index 753687f..9d30a04 100644
> --- a/test/reload_evolution/storage.result
> +++ b/test/reload_evolution/storage.result
> @@ -92,7 +92,7 @@ test_run:grep_log('storage_2_a', 'vshard.storage.reload_evolution: upgraded to')
>   ...
>   vshard.storage.internal.reload_version
>   ---
> -- 2
> +- 3
>   ...
>   --
>   -- gh-237: should be only one trigger. During gh-237 the trigger installation
> diff --git a/test/router/reroute_wrong_bucket.result b/test/router/reroute_wrong_bucket.result
> index 049bdef..ac340eb 100644
> --- a/test/router/reroute_wrong_bucket.result
> +++ b/test/router/reroute_wrong_bucket.result
> @@ -37,7 +37,7 @@ test_run:switch('storage_1_a')
>   ---
>   - true
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   ---
>   ...
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
> @@ -53,7 +53,7 @@ test_run:switch('storage_2_a')
>   ---
>   - true
>   ...
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   ---
>   ...
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
> @@ -202,12 +202,12 @@ test_run:grep_log('router_1', 'please update configuration')
>   err
>   ---
>   - bucket_id: 100
> -  reason: write is prohibited
> +  reason: Not found
>     code: 1
>     destination: ac522f65-aa94-4134-9f64-51ee384f1a54
>     type: ShardingError
>     name: WRONG_BUCKET
> -  message: 'Cannot perform action with bucket 100, reason: write is prohibited'
> +  message: 'Cannot perform action with bucket 100, reason: Not found'
>   ...
>   --
>   -- Now try again, but update configuration during call(). It must
> diff --git a/test/router/reroute_wrong_bucket.test.lua b/test/router/reroute_wrong_bucket.test.lua
> index 9e6e804..207aac3 100644
> --- a/test/router/reroute_wrong_bucket.test.lua
> +++ b/test/router/reroute_wrong_bucket.test.lua
> @@ -11,13 +11,13 @@ util.map_evals(test_run, {REPLICASET_1, REPLICASET_2}, 'bootstrap_storage(\'memt
>   test_run:cmd('create server router_1 with script="router/router_1.lua"')
>   test_run:cmd('start server router_1')
>   test_run:switch('storage_1_a')
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_1_a)
>   vshard.storage.rebalancer_disable()
>   for i = 1, 100 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
>   
>   test_run:switch('storage_2_a')
> -cfg.collect_bucket_garbage_interval = 100
> +vshard.consts.BUCKET_SENT_GARBAGE_DELAY = 100
>   vshard.storage.cfg(cfg, util.name_to_uuid.storage_2_a)
>   vshard.storage.rebalancer_disable()
>   for i = 101, 200 do box.space._bucket:replace{i, vshard.consts.BUCKET.ACTIVE} end
> diff --git a/test/storage/recovery.result b/test/storage/recovery.result
> index f833fe7..8ccb0b9 100644
> --- a/test/storage/recovery.result
> +++ b/test/storage/recovery.result
> @@ -79,8 +79,7 @@ _bucket = box.space._bucket
>   ...
>   _bucket:select{}
>   ---
> -- - [2, 'garbage', '<replicaset_2>']
> -  - [3, 'garbage', '<replicaset_2>']
> +- []
>   ...
>   _ = test_run:switch('storage_2_a')
>   ---
> diff --git a/test/storage/storage.result b/test/storage/storage.result
> index 424bc4c..0550ad1 100644
> --- a/test/storage/storage.result
> +++ b/test/storage/storage.result
> @@ -547,6 +547,9 @@ vshard.storage.bucket_send(1, util.replicasets[2])
>   ---
>   - true
>   ...
> +wait_bucket_is_collected(1)
> +---
> +...
>   _ = test_run:switch("storage_2_a")
>   ---
>   ...
> @@ -567,12 +570,7 @@ _ = test_run:switch("storage_1_a")
>   ...
>   vshard.storage.buckets_info()
>   ---
> -- 1:
> -    status: sent
> -    ro_lock: true
> -    destination: <replicaset_2>
> -    id: 1
> -  2:
> +- 2:
>       status: active
>       id: 2
>   ...
> diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
> index d631b51..d8fbd94 100644
> --- a/test/storage/storage.test.lua
> +++ b/test/storage/storage.test.lua
> @@ -136,6 +136,7 @@ vshard.storage.bucket_send(1, util.replicasets[1])
>   
>   -- Successful transfer.
>   vshard.storage.bucket_send(1, util.replicasets[2])
> +wait_bucket_is_collected(1)
>   _ = test_run:switch("storage_2_a")
>   vshard.storage.buckets_info()
>   _ = test_run:switch("storage_1_a")
> diff --git a/test/unit/config.result b/test/unit/config.result
> index dfd0219..e0b2482 100644
> --- a/test/unit/config.result
> +++ b/test/unit/config.result
> @@ -428,33 +428,6 @@ _ = lcfg.check(cfg)
>   --
>   -- gh-77: garbage collection options.
>   --
> -cfg.collect_bucket_garbage_interval = 'str'
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = 0
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = -1
> ----
> -...
> -check(cfg)
> ----
> -- Garbage bucket collect interval must be positive number
> -...
> -cfg.collect_bucket_garbage_interval = 100.5
> ----
> -...
> -_ = lcfg.check(cfg)
> ----
> -...
>   cfg.collect_lua_garbage = 100
>   ---
>   ...
> @@ -615,6 +588,12 @@ lcfg.check(cfg).rebalancer_max_sending
>   cfg.rebalancer_max_sending = nil
>   ---
>   ...
> -cfg.sharding = nil
> +--
> +-- Deprecated option does not break anything.
> +--
> +cfg.collect_bucket_garbage_interval = 100
> +---
> +...
> +_ = lcfg.check(cfg)
>   ---
>   ...
> diff --git a/test/unit/config.test.lua b/test/unit/config.test.lua
> index ada43db..a1c9f07 100644
> --- a/test/unit/config.test.lua
> +++ b/test/unit/config.test.lua
> @@ -175,15 +175,6 @@ _ = lcfg.check(cfg)
>   --
>   -- gh-77: garbage collection options.
>   --
> -cfg.collect_bucket_garbage_interval = 'str'
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = 0
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = -1
> -check(cfg)
> -cfg.collect_bucket_garbage_interval = 100.5
> -_ = lcfg.check(cfg)
> -
>   cfg.collect_lua_garbage = 100
>   check(cfg)
>   cfg.collect_lua_garbage = true
> @@ -244,4 +235,9 @@ util.check_error(lcfg.check, cfg)
>   cfg.rebalancer_max_sending = 15
>   lcfg.check(cfg).rebalancer_max_sending
>   cfg.rebalancer_max_sending = nil
> -cfg.sharding = nil
> +
> +--
> +-- Deprecated option does not break anything.
> +--
> +cfg.collect_bucket_garbage_interval = 100
> +_ = lcfg.check(cfg)
> diff --git a/test/unit/garbage.result b/test/unit/garbage.result
> index 74d9ccf..a530496 100644
> --- a/test/unit/garbage.result
> +++ b/test/unit/garbage.result
> @@ -31,9 +31,6 @@ test_run:cmd("setopt delimiter ''");
>   vshard.storage.internal.shard_index = 'bucket_id'
>   ---
>   ...
> -vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
> ----
> -...
>   --
>   -- Find nothing if no bucket_id anywhere, or there is no index
>   -- by it, or bucket_id is not unsigned.
> @@ -151,6 +148,9 @@ format[1] = {name = 'id', type = 'unsigned'}
>   format[2] = {name = 'status', type = 'string'}
>   ---
>   ...
> +format[3] = {name = 'destination', type = 'string', is_nullable = true}
> +---
> +...
>   _bucket = box.schema.create_space('_bucket', {format = format})
>   ---
>   ...
> @@ -172,22 +172,6 @@ _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
>   ---
>   - [3, 'active']
>   ...
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> ----
> -- [4, 'sent']
> -...
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [5, 'garbage']
> -...
> -_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [6, 'garbage']
> -...
> -_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [200, 'garbage']
> -...
>   s = box.schema.create_space('test', {engine = engine})
>   ---
>   ...
> @@ -213,7 +197,7 @@ s:replace{4, 2}
>   ---
>   - [4, 2]
>   ...
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> +gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
>   ---
>   ...
>   s2 = box.schema.create_space('test2', {engine = engine})
> @@ -249,6 +233,10 @@ function fill_spaces_with_garbage()
>       s2:replace{6, 4}
>       s2:replace{7, 5}
>       s2:replace{7, 6}
> +    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
> +    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> +    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
> +    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   end;
>   ---
>   ...
> @@ -267,12 +255,22 @@ fill_spaces_with_garbage()
>   ---
>   - 1107
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
>   ---
> -- - 5
> -  - 6
> -  - 200
>   - true
> +- null
> +...
> +route_map
> +---
> +- - null
> +  - null
> +  - null
> +  - null
> +  - null
> +  - destination2
>   ...
>   #s2:select{}
>   ---
> @@ -282,10 +280,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
>   ---
>   - 7
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
>   ---
> -- - 4
>   - true
> +- null
> +...
> +route_map
> +---
> +- - null
> +  - null
> +  - null
> +  - destination1
>   ...
>   s2:select{}
>   ---
> @@ -303,17 +311,22 @@ s:select{}
>     - [6, 100]
>   ...
>   -- Nothing deleted - update collected generation.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +---
> +...
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
>   ---
> -- - 5
> -  - 6
> -  - 200
>   - true
> +- null
>   ...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
>   ---
> -- - 4
>   - true
> +- null
> +...
> +route_map
> +---
> +- []
>   ...
>   #s2:select{}
>   ---
> @@ -329,15 +342,20 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
>   fill_spaces_with_garbage()
>   ---
>   ...
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> +_ = _bucket:on_replace(function()                                               \
> +    local gen = vshard.storage.internal.bucket_generation                       \
> +    vshard.storage.internal.bucket_generation = gen + 1                         \
> +    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
> +end)
>   ---
>   ...
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   ---
>   ...
>   -- Wait until garbage collection is finished.
> -while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
>   ---
> +- true
>   ...
>   s:select{}
>   ---
> @@ -360,7 +378,6 @@ _bucket:select{}
>   - - [1, 'active']
>     - [2, 'receiving']
>     - [3, 'active']
> -  - [4, 'sent']
>   ...
>   --
>   -- Test deletion of 'sent' buckets after a specified timeout.
> @@ -370,8 +387,9 @@ _bucket:replace{2, vshard.consts.BUCKET.SENT}
>   - [2, 'sent']
>   ...
>   -- Wait deletion after a while.
> -while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{2} end)
>   ---
> +- true
>   ...
>   _bucket:select{}
>   ---
> @@ -410,8 +428,9 @@ _bucket:replace{4, vshard.consts.BUCKET.SENT}
>   ---
>   - [4, 'sent']
>   ...
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   ---
> +- true
>   ...
>   --
>   -- Test WAL errors during deletion from _bucket.
> @@ -434,11 +453,14 @@ s:replace{6, 4}
>   ---
>   - [6, 4]
>   ...
> -while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_log('default', 'Error during garbage collection step',            \
> +                  65536, 10)
>   ---
> +- Error during garbage collection step
>   ...
> -while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return #sk:select{4} == 0 end)
>   ---
> +- true
>   ...
>   s:select{}
>   ---
> @@ -454,8 +476,9 @@ _bucket:select{}
>   _ = _bucket:on_replace(nil, rollback_on_delete)
>   ---
>   ...
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   ---
> +- true
>   ...
>   f:cancel()
>   ---
> @@ -562,8 +585,9 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   ---
>   ...
> -while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return _bucket:count() == 0 end)
>   ---
> +- true
>   ...
>   _bucket:select{}
>   ---
> diff --git a/test/unit/garbage.test.lua b/test/unit/garbage.test.lua
> index 30079fa..250afb0 100644
> --- a/test/unit/garbage.test.lua
> +++ b/test/unit/garbage.test.lua
> @@ -15,7 +15,6 @@ end;
>   test_run:cmd("setopt delimiter ''");
>   
>   vshard.storage.internal.shard_index = 'bucket_id'
> -vshard.storage.internal.collect_bucket_garbage_interval = vshard.consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
>   
>   --
>   -- Find nothing if no bucket_id anywhere, or there is no index
> @@ -75,16 +74,13 @@ s:drop()
>   format = {}
>   format[1] = {name = 'id', type = 'unsigned'}
>   format[2] = {name = 'status', type = 'string'}
> +format[3] = {name = 'destination', type = 'string', is_nullable = true}
>   _bucket = box.schema.create_space('_bucket', {format = format})
>   _ = _bucket:create_index('pk')
>   _ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
>   _bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
>   _bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
>   _bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> -_bucket:replace{6, vshard.consts.BUCKET.GARBAGE}
> -_bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   
>   s = box.schema.create_space('test', {engine = engine})
>   pk = s:create_index('pk')
> @@ -94,7 +90,7 @@ s:replace{2, 1}
>   s:replace{3, 2}
>   s:replace{4, 2}
>   
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> +gc_bucket_drop = vshard.storage.internal.gc_bucket_drop
>   s2 = box.schema.create_space('test2', {engine = engine})
>   pk2 = s2:create_index('pk')
>   sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> @@ -114,6 +110,10 @@ function fill_spaces_with_garbage()
>       s2:replace{6, 4}
>       s2:replace{7, 5}
>       s2:replace{7, 6}
> +    _bucket:replace{4, vshard.consts.BUCKET.SENT, 'destination1'}
> +    _bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> +    _bucket:replace{6, vshard.consts.BUCKET.GARBAGE, 'destination2'}
> +    _bucket:replace{200, vshard.consts.BUCKET.GARBAGE}
>   end;
>   test_run:cmd("setopt delimiter ''");
>   
> @@ -121,15 +121,21 @@ fill_spaces_with_garbage()
>   
>   #s2:select{}
>   #s:select{}
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
> +route_map
>   #s2:select{}
>   #s:select{}
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
> +route_map
>   s2:select{}
>   s:select{}
>   -- Nothing deleted - update collected generation.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> +route_map = {}
> +gc_bucket_drop(vshard.consts.BUCKET.GARBAGE, route_map)
> +gc_bucket_drop(vshard.consts.BUCKET.SENT, route_map)
> +route_map
>   #s2:select{}
>   #s:select{}
>   
> @@ -137,10 +143,14 @@ gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
>   -- Test continuous garbage collection via background fiber.
>   --
>   fill_spaces_with_garbage()
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> +_ = _bucket:on_replace(function()                                               \
> +    local gen = vshard.storage.internal.bucket_generation                       \
> +    vshard.storage.internal.bucket_generation = gen + 1                         \
> +    vshard.storage.internal.bucket_generation_cond:broadcast()                  \
> +end)
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
>   -- Wait until garbage collection is finished.
> -while s2:count() ~= 3 or s:count() ~= 6 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return s2:count() == 3 and s:count() == 6 end)
>   s:select{}
>   s2:select{}
>   -- Check garbage bucket is deleted by background fiber.
> @@ -150,7 +160,7 @@ _bucket:select{}
>   --
>   _bucket:replace{2, vshard.consts.BUCKET.SENT}
>   -- Wait deletion after a while.
> -while _bucket:get{2} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{2} end)
>   _bucket:select{}
>   s:select{}
>   s2:select{}
> @@ -162,7 +172,7 @@ _bucket:replace{4, vshard.consts.BUCKET.ACTIVE}
>   s:replace{5, 4}
>   s:replace{6, 4}
>   _bucket:replace{4, vshard.consts.BUCKET.SENT}
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   
>   --
>   -- Test WAL errors during deletion from _bucket.
> @@ -172,12 +182,13 @@ _ = _bucket:on_replace(rollback_on_delete)
>   _bucket:replace{4, vshard.consts.BUCKET.SENT}
>   s:replace{5, 4}
>   s:replace{6, 4}
> -while not test_run:grep_log("default", "Error during deletion of empty sent buckets") do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> -while #sk:select{4} ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_log('default', 'Error during garbage collection step',            \
> +                  65536, 10)
> +test_run:wait_cond(function() return #sk:select{4} == 0 end)
>   s:select{}
>   _bucket:select{}
>   _ = _bucket:on_replace(nil, rollback_on_delete)
> -while _bucket:get{4} ~= nil do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return not _bucket:get{4} end)
>   
>   f:cancel()
>   
> @@ -220,7 +231,7 @@ for i = 1, 2000 do _bucket:replace{i, vshard.consts.BUCKET.GARBAGE} s:replace{i,
>   #s:select{}
>   #s2:select{}
>   f = fiber.create(vshard.storage.internal.gc_bucket_f)
> -while _bucket:count() ~= 0 do vshard.storage.garbage_collector_wakeup() fiber.sleep(0.001) end
> +test_run:wait_cond(function() return _bucket:count() == 0 end)
>   _bucket:select{}
>   s:select{}
>   s2:select{}
> diff --git a/test/unit/garbage_errinj.result b/test/unit/garbage_errinj.result
> deleted file mode 100644
> index 92c8039..0000000
> --- a/test/unit/garbage_errinj.result
> +++ /dev/null
> @@ -1,223 +0,0 @@
> -test_run = require('test_run').new()
> ----
> -...
> -vshard = require('vshard')
> ----
> -...
> -fiber = require('fiber')
> ----
> -...
> -engine = test_run:get_cfg('engine')
> ----
> -...
> -vshard.storage.internal.shard_index = 'bucket_id'
> ----
> -...
> -format = {}
> ----
> -...
> -format[1] = {name = 'id', type = 'unsigned'}
> ----
> -...
> -format[2] = {name = 'status', type = 'string', is_nullable = true}
> ----
> -...
> -_bucket = box.schema.create_space('_bucket', {format = format})
> ----
> -...
> -_ = _bucket:create_index('pk')
> ----
> -...
> -_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
> ----
> -...
> -_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
> ----
> -- [1, 'active']
> -...
> -_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
> ----
> -- [2, 'receiving']
> -...
> -_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> ----
> -- [3, 'active']
> -...
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> ----
> -- [4, 'sent']
> -...
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [5, 'garbage']
> -...
> -s = box.schema.create_space('test', {engine = engine})
> ----
> -...
> -pk = s:create_index('pk')
> ----
> -...
> -sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> ----
> -...
> -s:replace{1, 1}
> ----
> -- [1, 1]
> -...
> -s:replace{2, 1}
> ----
> -- [2, 1]
> -...
> -s:replace{3, 2}
> ----
> -- [3, 2]
> -...
> -s:replace{4, 2}
> ----
> -- [4, 2]
> -...
> -s:replace{5, 100}
> ----
> -- [5, 100]
> -...
> -s:replace{6, 100}
> ----
> -- [6, 100]
> -...
> -s:replace{7, 4}
> ----
> -- [7, 4]
> -...
> -s:replace{8, 5}
> ----
> -- [8, 5]
> -...
> -s2 = box.schema.create_space('test2', {engine = engine})
> ----
> -...
> -pk2 = s2:create_index('pk')
> ----
> -...
> -sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> ----
> -...
> -s2:replace{1, 1}
> ----
> -- [1, 1]
> -...
> -s2:replace{3, 3}
> ----
> -- [3, 3]
> -...
> -for i = 7, 1107 do s:replace{i, 200} end
> ----
> -...
> -s2:replace{4, 200}
> ----
> -- [4, 200]
> -...
> -s2:replace{5, 100}
> ----
> -- [5, 100]
> -...
> -s2:replace{5, 300}
> ----
> -- [5, 300]
> -...
> -s2:replace{6, 4}
> ----
> -- [6, 4]
> -...
> -s2:replace{7, 5}
> ----
> -- [7, 5]
> -...
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> ----
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> ----
> -- - 4
> -- true
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> ----
> -- - 5
> -- true
> -...
> ---
> --- Test _bucket generation change during garbage buckets search.
> ---
> -s:truncate()
> ----
> -...
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> ----
> -...
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
> ----
> -...
> -f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
> ----
> -...
> -_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
> ----
> -- [4, 'garbage']
> -...
> -s:replace{5, 4}
> ----
> -- [5, 4]
> -...
> -s:replace{6, 4}
> ----
> -- [6, 4]
> -...
> -#s:select{}
> ----
> -- 2
> -...
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
> ----
> -...
> -while f:status() ~= 'dead' do fiber.sleep(0.1) end
> ----
> -...
> --- Nothing is deleted - _bucket:replace() has changed _bucket
> --- generation during search of garbage buckets.
> -#s:select{}
> ----
> -- 2
> -...
> -_bucket:select{4}
> ----
> -- - [4, 'garbage']
> -...
> --- Next step deletes garbage ok.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> ----
> -- []
> -- true
> -...
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> ----
> -- - 4
> -  - 5
> -- true
> -...
> -#s:select{}
> ----
> -- 0
> -...
> -_bucket:delete{4}
> ----
> -- [4, 'garbage']
> -...
> -s2:drop()
> ----
> -...
> -s:drop()
> ----
> -...
> -_bucket:drop()
> ----
> -...
> diff --git a/test/unit/garbage_errinj.test.lua b/test/unit/garbage_errinj.test.lua
> deleted file mode 100644
> index 31184b9..0000000
> --- a/test/unit/garbage_errinj.test.lua
> +++ /dev/null
> @@ -1,73 +0,0 @@
> -test_run = require('test_run').new()
> -vshard = require('vshard')
> -fiber = require('fiber')
> -
> -engine = test_run:get_cfg('engine')
> -vshard.storage.internal.shard_index = 'bucket_id'
> -
> -format = {}
> -format[1] = {name = 'id', type = 'unsigned'}
> -format[2] = {name = 'status', type = 'string', is_nullable = true}
> -_bucket = box.schema.create_space('_bucket', {format = format})
> -_ = _bucket:create_index('pk')
> -_ = _bucket:create_index('status', {parts = {{2, 'string'}}, unique = false})
> -_bucket:replace{1, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{2, vshard.consts.BUCKET.RECEIVING}
> -_bucket:replace{3, vshard.consts.BUCKET.ACTIVE}
> -_bucket:replace{4, vshard.consts.BUCKET.SENT}
> -_bucket:replace{5, vshard.consts.BUCKET.GARBAGE}
> -
> -s = box.schema.create_space('test', {engine = engine})
> -pk = s:create_index('pk')
> -sk = s:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> -s:replace{1, 1}
> -s:replace{2, 1}
> -s:replace{3, 2}
> -s:replace{4, 2}
> -s:replace{5, 100}
> -s:replace{6, 100}
> -s:replace{7, 4}
> -s:replace{8, 5}
> -
> -s2 = box.schema.create_space('test2', {engine = engine})
> -pk2 = s2:create_index('pk')
> -sk2 = s2:create_index('bucket_id', {parts = {{2, 'unsigned'}}, unique = false})
> -s2:replace{1, 1}
> -s2:replace{3, 3}
> -for i = 7, 1107 do s:replace{i, 200} end
> -s2:replace{4, 200}
> -s2:replace{5, 100}
> -s2:replace{5, 300}
> -s2:replace{6, 4}
> -s2:replace{7, 5}
> -
> -gc_bucket_step_by_type = vshard.storage.internal.gc_bucket_step_by_type
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -
> ---
> --- Test _bucket generation change during garbage buckets search.
> ---
> -s:truncate()
> -_ = _bucket:on_replace(function() vshard.storage.internal.bucket_generation = vshard.storage.internal.bucket_generation + 1 end)
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = true
> -f = fiber.create(function() gc_bucket_step_by_type(vshard.consts.BUCKET.SENT) gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE) end)
> -_bucket:replace{4, vshard.consts.BUCKET.GARBAGE}
> -s:replace{5, 4}
> -s:replace{6, 4}
> -#s:select{}
> -vshard.storage.internal.errinj.ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false
> -while f:status() ~= 'dead' do fiber.sleep(0.1) end
> --- Nothing is deleted - _bucket:replace() has changed _bucket
> --- generation during search of garbage buckets.
> -#s:select{}
> -_bucket:select{4}
> --- Next step deletes garbage ok.
> -gc_bucket_step_by_type(vshard.consts.BUCKET.SENT)
> -gc_bucket_step_by_type(vshard.consts.BUCKET.GARBAGE)
> -#s:select{}
> -_bucket:delete{4}
> -
> -s2:drop()
> -s:drop()
> -_bucket:drop()
> diff --git a/vshard/cfg.lua b/vshard/cfg.lua
> index f7d5dbc..63d5414 100644
> --- a/vshard/cfg.lua
> +++ b/vshard/cfg.lua
> @@ -251,9 +251,8 @@ local cfg_template = {
>           max = consts.REBALANCER_MAX_SENDING_MAX
>       },
>       collect_bucket_garbage_interval = {
> -        type = 'positive number', name = 'Garbage bucket collect interval',
> -        is_optional = true,
> -        default = consts.DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL
> +        name = 'Garbage bucket collect interval', is_deprecated = true,
> +        reason = 'Has no effect anymore'
>       },
>       collect_lua_garbage = {
>           type = 'boolean', name = 'Garbage Lua collect necessity',
> diff --git a/vshard/consts.lua b/vshard/consts.lua
> index 8c2a8b0..3f1585a 100644
> --- a/vshard/consts.lua
> +++ b/vshard/consts.lua
> @@ -23,6 +23,7 @@ return {
>       DEFAULT_BUCKET_COUNT = 3000;
>       BUCKET_SENT_GARBAGE_DELAY = 0.5;
>       BUCKET_CHUNK_SIZE = 1000;
> +    LUA_CHUNK_SIZE = 100000,
>       DEFAULT_REBALANCER_DISBALANCE_THRESHOLD = 1;
>       REBALANCER_IDLE_INTERVAL = 60 * 60;
>       REBALANCER_WORK_INTERVAL = 10;
> @@ -37,7 +38,7 @@ return {
>       DEFAULT_FAILOVER_PING_TIMEOUT = 5;
>       DEFAULT_SYNC_TIMEOUT = 1;
>       RECONNECT_TIMEOUT = 0.5;
> -    DEFAULT_COLLECT_BUCKET_GARBAGE_INTERVAL = 0.5;
> +    GC_BACKOFF_INTERVAL = 5,
>       RECOVERY_INTERVAL = 5;
>       COLLECT_LUA_GARBAGE_INTERVAL = 100;
>   
> @@ -45,4 +46,6 @@ return {
>       DISCOVERY_WORK_INTERVAL = 1,
>       DISCOVERY_WORK_STEP = 0.01,
>       DISCOVERY_TIMEOUT = 10,
> +
> +    TIMEOUT_INFINITY = 500 * 365 * 86400,
>   }
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index adf1c20..1ea8069 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -69,7 +69,6 @@ if not M then
>           total_bucket_count = 0,
>           errinj = {
>               ERRINJ_CFG = false,
> -            ERRINJ_BUCKET_FIND_GARBAGE_DELAY = false,
>               ERRINJ_RELOAD = false,
>               ERRINJ_CFG_DELAY = false,
>               ERRINJ_LONG_RECEIVE = false,
> @@ -96,6 +95,8 @@ if not M then
>           -- detect that _bucket was not changed between yields.
>           --
>           bucket_generation = 0,
> +        -- Condition variable fired on generation update.
> +        bucket_generation_cond = lfiber.cond(),
>           --
>           -- Reference to the function used as on_replace trigger on
>           -- _bucket space. It is used to replace the trigger with
> @@ -107,12 +108,14 @@ if not M then
>           -- replace the old function is to keep its reference.
>           --
>           bucket_on_replace = nil,
> +        -- Redirects for recently sent buckets. They are kept for a while to
> +        -- help routers to find a new location for sent and deleted buckets
> +        -- without whole cluster scan.
> +        route_map = {},
>   
>           ------------------- Garbage collection -------------------
>           -- Fiber to remove garbage buckets data.
>           collect_bucket_garbage_fiber = nil,
> -        -- Do buckets garbage collection once per this time.
> -        collect_bucket_garbage_interval = nil,
>           -- Boolean lua_gc state (create periodic gc task).
>           collect_lua_garbage = nil,
>   
> @@ -173,6 +176,7 @@ end
>   --
>   local function bucket_generation_increment()
>       M.bucket_generation = M.bucket_generation + 1
> +    M.bucket_generation_cond:broadcast()
>   end
>   
>   --
> @@ -758,8 +762,9 @@ local function bucket_check_state(bucket_id, mode)
>       else
>           return bucket
>       end
> +    local dst = bucket and bucket.destination or M.route_map[bucket_id]
>       return bucket, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id, reason,
> -                                 bucket and bucket.destination)
> +                                 dst)
>   end
>   
>   --
> @@ -804,11 +809,23 @@ end
>   --
>   local function bucket_unrefro(bucket_id)
>       local ref = M.bucket_refs[bucket_id]
> -    if not ref or ref.ro == 0 then
> +    local count = ref and ref.ro or 0
> +    if count == 0 then
>           return nil, lerror.vshard(lerror.code.WRONG_BUCKET, bucket_id,
>                                     "no refs", nil)
>       end
> -    ref.ro = ref.ro - 1
> +    if count == 1 then
> +        ref.ro = 0
> +        if ref.ro_lock then
> +            -- Garbage collector is waiting for the bucket if RO
> +            -- is locked. Let it know it has one more bucket to
> +            -- collect. It relies on generation, so its increment
> +            -- it enough.
> +            bucket_generation_increment()
> +        end
> +        return true
> +    end
> +    ref.ro = count - 1
>       return true
>   end
>   
> @@ -1481,79 +1498,44 @@ local function gc_bucket_in_space(space, bucket_id, status)
>   end
>   
>   --
> --- Remove tuples from buckets of a specified type.
> --- @param type Type of buckets to gc.
> --- @retval List of ids of empty buckets of the type.
> +-- Drop buckets with the given status along with their data in all spaces.
> +-- @param status Status of target buckets.
> +-- @param route_map Destinations of deleted buckets are saved into this table.
>   --
> -local function gc_bucket_step_by_type(type)
> -    local sharded_spaces = find_sharded_spaces()
> -    local empty_buckets = {}
> +local function gc_bucket_drop_xc(status, route_map)
>       local limit = consts.BUCKET_CHUNK_SIZE
> -    local is_all_collected = true
> -    for _, bucket in box.space._bucket.index.status:pairs(type) do
> -        local bucket_id = bucket.id
> -        local ref = M.bucket_refs[bucket_id]
> +    local _bucket = box.space._bucket
> +    local sharded_spaces = find_sharded_spaces()
> +    for _, b in _bucket.index.status:pairs(status) do
> +        local id = b.id
> +        local ref = M.bucket_refs[id]
>           if ref then
>               assert(ref.rw == 0)
>               if ref.ro ~= 0 then
>                   ref.ro_lock = true
> -                is_all_collected = false
>                   goto continue
>               end
> -            M.bucket_refs[bucket_id] = nil
> +            M.bucket_refs[id] = nil
>           end
>           for _, space in pairs(sharded_spaces) do
> -            gc_bucket_in_space_xc(space, bucket_id, type)
> +            gc_bucket_in_space_xc(space, id, status)
>               limit = limit - 1
>               if limit == 0 then
>                   lfiber.sleep(0)
>                   limit = consts.BUCKET_CHUNK_SIZE
>               end
>           end
> -        table.insert(empty_buckets, bucket.id)
> -::continue::
> +        route_map[id] = b.destination
> +        _bucket:delete{id}
> +    ::continue::
>       end
> -    return empty_buckets, is_all_collected
> -end
> -
> ---
> --- Drop buckets with ids in the list.
> --- @param bucket_ids Bucket ids to drop.
> --- @param status Expected bucket status.
> ---
> -local function gc_bucket_drop_xc(bucket_ids, status)
> -    if #bucket_ids == 0 then
> -        return
> -    end
> -    local limit = consts.BUCKET_CHUNK_SIZE
> -    box.begin()
> -    local _bucket = box.space._bucket
> -    for _, id in pairs(bucket_ids) do
> -        local bucket_exists = _bucket:get{id} ~= nil
> -        local b = _bucket:get{id}
> -        if b then
> -            if b.status ~= status then
> -                return error(string.format('Bucket %d status is changed. Was '..
> -                                           '%s, became %s', id, status,
> -                                           b.status))
> -            end
> -            _bucket:delete{id}
> -        end
> -        limit = limit - 1
> -        if limit == 0 then
> -            box.commit()
> -            box.begin()
> -            limit = consts.BUCKET_CHUNK_SIZE
> -        end
> -    end
> -    box.commit()
>   end
>   
>   --
>   -- Exception safe version of gc_bucket_drop_xc.
>   --
> -local function gc_bucket_drop(bucket_ids, status)
> -    local status, err = pcall(gc_bucket_drop_xc, bucket_ids, status)
> +local function gc_bucket_drop(status, route_map)
> +    local status, err = pcall(gc_bucket_drop_xc, status, route_map)
>       if not status then
>           box.rollback()
>       end
> @@ -1561,14 +1543,16 @@ local function gc_bucket_drop(bucket_ids, status)
>   end
>   
>   --
> --- Garbage collector. Works on masters. The garbage collector
> --- wakes up once per specified time.
> +-- Garbage collector. Works on masters. The garbage collector wakes up when
> +-- state of any bucket changes.
>   -- After wakeup it follows the plan:
> --- 1) Check if _bucket has changed. If not, then sleep again;
> --- 2) Scan user spaces for sent and garbage buckets, delete
> ---    garbage data in batches of limited size;
> --- 3) Delete GARBAGE buckets from _bucket immediately, and
> ---    schedule SENT buckets for deletion after a timeout;
> +-- 1) Check if state of any bucket has really changed. If not, then sleep again;
> +-- 2) Delete all GARBAGE and SENT buckets along with their data in chunks of
> +--    limited size.
> +-- 3) Bucket destinations are saved into a global route_map to reroute incoming
> +--    requests from routers in case they didn't notice the buckets being moved.
> +--    The saved routes are scheduled for deletion after a timeout, which is
> +--    checked on each iteration of this loop.
>   -- 4) Sleep, go to (1).
>   -- For each step details see comments in the code.
>   --
> @@ -1580,65 +1564,75 @@ function gc_bucket_f()
>       -- generation == bucket generation. In such a case the fiber
>       -- does nothing until next _bucket change.
>       local bucket_generation_collected = -1
> -    -- Empty sent buckets are collected into an array. After a
> -    -- specified time interval the buckets are deleted both from
> -    -- this array and from _bucket space.
> -    local buckets_for_redirect = {}
> -    local buckets_for_redirect_ts = fiber_clock()
> -    -- Empty sent buckets, updated after each step, and when
> -    -- buckets_for_redirect is deleted, it gets empty_sent_buckets
> -    -- for next deletion.
> -    local empty_garbage_buckets, empty_sent_buckets, status, err
> +    local bucket_generation_current = M.bucket_generation
> +    -- Deleted buckets are saved into a route map to redirect routers if they
> +    -- didn't discover new location of the buckets yet. However route map does
> +    -- not grow infinitely. Otherwise it would end up storing redirects for all
> +    -- buckets in the cluster. Which could also be outdated.
> +    -- Garbage collector periodically drops old routes from the map. For that it
> +    -- remembers state of route map in one moment, and after a while clears the
> +    -- remembered routes from the global route map.
> +    local route_map = M.route_map
> +    local route_map_old = {}
> +    local route_map_deadline = 0
> +    local status, err
>       while M.module_version == module_version do
> -        -- Check if no changes in buckets configuration.
> -        if bucket_generation_collected ~= M.bucket_generation then
> -            local bucket_generation = M.bucket_generation
> -            local is_sent_collected, is_garbage_collected
> -            status, empty_garbage_buckets, is_garbage_collected =
> -                pcall(gc_bucket_step_by_type, consts.BUCKET.GARBAGE)
> -            if not status then
> -                err = empty_garbage_buckets
> -                goto check_error
> -            end
> -            status, empty_sent_buckets, is_sent_collected =
> -                pcall(gc_bucket_step_by_type, consts.BUCKET.SENT)
> -            if not status then
> -                err = empty_sent_buckets
> -                goto check_error
> +        if bucket_generation_collected ~= bucket_generation_current then
> +            status, err = gc_bucket_drop(consts.BUCKET.GARBAGE, route_map)
> +            if status then
> +                status, err = gc_bucket_drop(consts.BUCKET.SENT, route_map)
>               end
> -            status, err = gc_bucket_drop(empty_garbage_buckets,
> -                                         consts.BUCKET.GARBAGE)
> -::check_error::
>               if not status then
>                   box.rollback()
>                   log.error('Error during garbage collection step: %s', err)
> -                goto continue
> +            else
> +                -- Don't use global generation. During the collection it could
> +                -- already change. Instead, remember the generation known before
> +                -- the collection has started.
> +                -- Since the collection also changes the generation, it makes
> +                -- the GC happen always at least twice. But typically on the
> +                -- second iteration it should not find any buckets to collect,
> +                -- and then the collected generation matches the global one.
> +                bucket_generation_collected = bucket_generation_current
>               end
> -            if is_sent_collected and is_garbage_collected then
> -                bucket_generation_collected = bucket_generation
> +        else
> +            status = true
> +        end
> +
> +        local sleep_time = route_map_deadline - fiber_clock()
> +        if sleep_time <= 0 then
> +            local chunk = consts.LUA_CHUNK_SIZE
> +            util.table_minus_yield(route_map, route_map_old, chunk)
> +            route_map_old = util.table_copy_yield(route_map, chunk)
> +            if next(route_map_old) then
> +                sleep_time = consts.BUCKET_SENT_GARBAGE_DELAY
> +            else
> +                sleep_time = consts.TIMEOUT_INFINITY
>               end
> +            route_map_deadline = fiber_clock() + sleep_time
>           end
> +        bucket_generation_current = M.bucket_generation
>   
> -        if fiber_clock() - buckets_for_redirect_ts >=
> -           consts.BUCKET_SENT_GARBAGE_DELAY then
> -            status, err = gc_bucket_drop(buckets_for_redirect,
> -                                         consts.BUCKET.SENT)
> -            if not status then
> -                buckets_for_redirect = {}
> -                empty_sent_buckets = {}
> -                bucket_generation_collected = -1
> -                log.error('Error during deletion of empty sent buckets: %s',
> -                          err)
> -            elseif M.module_version ~= module_version then
> -                return
> +        if bucket_generation_current ~= bucket_generation_collected then
> +            -- Generation was changed during collection. Or *by* collection.
> +            if status then
> +                -- Retry immediately. If the generation was changed by the
> +                -- collection itself, it will notice it next iteration, and go
> +                -- to proper sleep.
> +                sleep_time = 0
>               else
> -                buckets_for_redirect = empty_sent_buckets or {}
> -                empty_sent_buckets = nil
> -                buckets_for_redirect_ts = fiber_clock()
> +                -- An error happened during the collection. Does not make sense
> +                -- to retry on each iteration of the event loop. The most likely
> +                -- errors are either a WAL error or a transaction abort - both
> +                -- look like an issue in the user's code and can't be fixed
> +                -- quickly anyway. Backoff.
> +                sleep_time = consts.GC_BACKOFF_INTERVAL
>               end
>           end
> -::continue::
> -        lfiber.sleep(M.collect_bucket_garbage_interval)
> +
> +        if M.module_version == module_version then
> +            M.bucket_generation_cond:wait(sleep_time)
> +        end
>       end
>   end
>   
> @@ -2423,8 +2417,6 @@ local function storage_cfg(cfg, this_replica_uuid, is_reload)
>           vshard_cfg.rebalancer_disbalance_threshold
>       M.rebalancer_receiving_quota = vshard_cfg.rebalancer_max_receiving
>       M.shard_index = vshard_cfg.shard_index
> -    M.collect_bucket_garbage_interval =
> -        vshard_cfg.collect_bucket_garbage_interval
>       M.collect_lua_garbage = vshard_cfg.collect_lua_garbage
>       M.rebalancer_worker_count = vshard_cfg.rebalancer_max_sending
>       M.current_cfg = cfg
> @@ -2678,6 +2670,9 @@ else
>           storage_cfg(M.current_cfg, M.this_replica.uuid, true)
>       end
>       M.module_version = M.module_version + 1
> +    -- Background fibers could sleep waiting for bucket changes.
> +    -- Let them know it is time to reload.
> +    bucket_generation_increment()
>   end
>   
>   M.recovery_f = recovery_f
> @@ -2688,7 +2683,7 @@ M.gc_bucket_f = gc_bucket_f
>   -- These functions are saved in M not for atomic reload, but for
>   -- unit testing.
>   --
> -M.gc_bucket_step_by_type = gc_bucket_step_by_type
> +M.gc_bucket_drop = gc_bucket_drop
>   M.rebalancer_build_routes = rebalancer_build_routes
>   M.rebalancer_calculate_metrics = rebalancer_calculate_metrics
>   M.cached_find_sharded_spaces = find_sharded_spaces
> diff --git a/vshard/storage/reload_evolution.lua b/vshard/storage/reload_evolution.lua
> index f38af74..484f499 100644
> --- a/vshard/storage/reload_evolution.lua
> +++ b/vshard/storage/reload_evolution.lua
> @@ -4,6 +4,7 @@
>   -- in a commit.
>   --
>   local log = require('log')
> +local fiber = require('fiber')
>   
>   --
>   -- Array of upgrade functions.
> @@ -25,6 +26,13 @@ migrations[#migrations + 1] = function(M)
>       end
>   end
>   
> +migrations[#migrations + 1] = function(M)
> +    if not M.route_map then
> +        M.bucket_generation_cond = fiber.cond()
> +        M.route_map = {}
> +    end
> +end
> +
>   --
>   -- Perform an update based on a version stored in `M` (internals).
>   -- @param M Old module internals which should be updated.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (6 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure Vladislav Shpilevoy via Tarantool-patches
  2021-02-09 23:51 ` [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Recovery is a fiber on a master node which tries to resolve
SENDING/RECEIVING buckets into GARBAGE or ACTIVE, in case they are
stuck. Usually it happens due to a conflict on the receiving side,
or if a restart happens during bucket send.

Recovery was proactive. It used to wakeup with a constant period
to find and resolve the needed buckets.

But this won't work with the future feature called 'map-reduce'.
Map-reduce as a preparation stage will need to ensure that all
buckets on a storage are readable and writable. With the current
recovery algorithm if a bucket is broken, it won't be recovered
for the next 5 seconds by default. During this time all new
map-reduce requests can't execute.

This is not acceptable. As well as too frequent wakeup of recovery
fiber because it would waste TX thread time.

The patch makes recovery fiber wakeup not by a timeout but by
events happening with _bucket space. Recovery fiber sleeps on a
condition variable which is signaled when _bucket is changed.

This is very similar to the reactive GC feature in a previous
commit.

It is worth mentioning that the backoff happens not only when a
bucket couldn't be recovered (its transfer is still in progress,
for example), but also when a network error happened and recovery
couldn't check state of the bucket on the other storage.

It would be a useless busy loop to retry network errors
immediately after their appearance. Recovery uses a backoff
interval for them as well.

Needed for #147
---
 test/router/router.result             | 22 ++++++++---
 test/router/router.test.lua           | 13 ++++++-
 test/storage/recovery.result          |  8 ++++
 test/storage/recovery.test.lua        |  5 +++
 test/storage/recovery_errinj.result   | 16 +++++++-
 test/storage/recovery_errinj.test.lua |  9 ++++-
 vshard/consts.lua                     |  2 +-
 vshard/storage/init.lua               | 54 +++++++++++++++++++++++----
 8 files changed, 110 insertions(+), 19 deletions(-)

diff --git a/test/router/router.result b/test/router/router.result
index b2efd6d..3c1d073 100644
--- a/test/router/router.result
+++ b/test/router/router.result
@@ -312,6 +312,11 @@ replicaset, err = vshard.router.bucket_discovery(2); return err == nil or err
 _ = test_run:switch('storage_2_a')
 ---
 ...
+-- Pause recovery. It is too aggressive, and the test needs to see buckets in
+-- their intermediate states.
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+---
+...
 box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]})
 ---
 - [1, 'sending', '<replicaset_1>']
@@ -319,6 +324,9 @@ box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]}
 _ = test_run:switch('storage_1_a')
 ---
 ...
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+---
+...
 box.space._bucket:replace({1, vshard.consts.BUCKET.RECEIVING, util.replicasets[2]})
 ---
 - [1, 'receiving', '<replicaset_2>']
@@ -342,19 +350,21 @@ util.check_error(vshard.router.call, 1, 'write', 'echo', {123})
   name: TRANSFER_IS_IN_PROGRESS
   message: Bucket 1 is transferring to replicaset <replicaset_1>
 ...
-_ = test_run:switch('storage_2_a')
+_ = test_run:switch('storage_1_a')
+---
+...
+box.space._bucket:delete({1})
 ---
+- [1, 'receiving', '<replicaset_2>']
 ...
-box.space._bucket:replace({1, vshard.consts.BUCKET.ACTIVE})
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
 ---
-- [1, 'active']
 ...
-_ = test_run:switch('storage_1_a')
+_ = test_run:switch('storage_2_a')
 ---
 ...
-box.space._bucket:delete({1})
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
 ---
-- [1, 'receiving', '<replicaset_2>']
 ...
 _ = test_run:switch('router_1')
 ---
diff --git a/test/router/router.test.lua b/test/router/router.test.lua
index 154310b..aa3eb3b 100644
--- a/test/router/router.test.lua
+++ b/test/router/router.test.lua
@@ -114,19 +114,28 @@ replicaset, err = vshard.router.bucket_discovery(1); return err == nil or err
 replicaset, err = vshard.router.bucket_discovery(2); return err == nil or err
 
 _ = test_run:switch('storage_2_a')
+-- Pause recovery. It is too aggressive, and the test needs to see buckets in
+-- their intermediate states.
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
 box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]})
+
 _ = test_run:switch('storage_1_a')
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
 box.space._bucket:replace({1, vshard.consts.BUCKET.RECEIVING, util.replicasets[2]})
+
 _ = test_run:switch('router_1')
 -- Ok to read sending bucket.
 vshard.router.call(1, 'read', 'echo', {123})
 -- Not ok to write sending bucket.
 util.check_error(vshard.router.call, 1, 'write', 'echo', {123})
 
-_ = test_run:switch('storage_2_a')
-box.space._bucket:replace({1, vshard.consts.BUCKET.ACTIVE})
 _ = test_run:switch('storage_1_a')
 box.space._bucket:delete({1})
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
+
+_ = test_run:switch('storage_2_a')
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
+
 _ = test_run:switch('router_1')
 
 -- Check unavailability of master of a replicaset.
diff --git a/test/storage/recovery.result b/test/storage/recovery.result
index 8ccb0b9..fa92bca 100644
--- a/test/storage/recovery.result
+++ b/test/storage/recovery.result
@@ -28,12 +28,20 @@ util.push_rs_filters(test_run)
 _ = test_run:switch("storage_2_a")
 ---
 ...
+-- Pause until restart. Otherwise recovery does its job too fast and does not
+-- allow to simulate the intermediate state.
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+---
+...
 vshard.storage.rebalancer_disable()
 ---
 ...
 _ = test_run:switch("storage_1_a")
 ---
 ...
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+---
+...
 -- Create buckets sending to rs2 and restart - recovery must
 -- garbage some of them and activate others. Receiving buckets
 -- must be garbaged on bootstrap.
diff --git a/test/storage/recovery.test.lua b/test/storage/recovery.test.lua
index a0651e8..93cec68 100644
--- a/test/storage/recovery.test.lua
+++ b/test/storage/recovery.test.lua
@@ -10,8 +10,13 @@ util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
 util.push_rs_filters(test_run)
 
 _ = test_run:switch("storage_2_a")
+-- Pause until restart. Otherwise recovery does its job too fast and does not
+-- allow to simulate the intermediate state.
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
 vshard.storage.rebalancer_disable()
+
 _ = test_run:switch("storage_1_a")
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
 
 -- Create buckets sending to rs2 and restart - recovery must
 -- garbage some of them and activate others. Receiving buckets
diff --git a/test/storage/recovery_errinj.result b/test/storage/recovery_errinj.result
index 3e9a9bf..8c178d5 100644
--- a/test/storage/recovery_errinj.result
+++ b/test/storage/recovery_errinj.result
@@ -35,9 +35,17 @@ _ = test_run:switch('storage_2_a')
 vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = true
 ---
 ...
+-- Pause recovery. Otherwise it does its job too fast and does not allow to
+-- simulate the intermediate state.
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+---
+...
 _ = test_run:switch('storage_1_a')
 ---
 ...
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+---
+...
 _bucket = box.space._bucket
 ---
 ...
@@ -76,10 +84,16 @@ _bucket:get{1}
 ---
 - [1, 'active']
 ...
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
+---
+...
 _ = test_run:switch('storage_1_a')
 ---
 ...
-while _bucket:count() ~= 0 do vshard.storage.recovery_wakeup() fiber.sleep(0.1) end
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
+---
+...
+wait_bucket_is_collected(1)
 ---
 ...
 _ = test_run:switch("default")
diff --git a/test/storage/recovery_errinj.test.lua b/test/storage/recovery_errinj.test.lua
index 8c1a9d2..c730560 100644
--- a/test/storage/recovery_errinj.test.lua
+++ b/test/storage/recovery_errinj.test.lua
@@ -14,7 +14,12 @@ util.push_rs_filters(test_run)
 --
 _ = test_run:switch('storage_2_a')
 vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = true
+-- Pause recovery. Otherwise it does its job too fast and does not allow to
+-- simulate the intermediate state.
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
+
 _ = test_run:switch('storage_1_a')
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
 _bucket = box.space._bucket
 _bucket:replace{1, vshard.consts.BUCKET.ACTIVE, util.replicasets[2]}
 ret, err = vshard.storage.bucket_send(1, util.replicasets[2], {timeout = 0.1})
@@ -27,9 +32,11 @@ vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = false
 _bucket = box.space._bucket
 while _bucket:get{1}.status ~= vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.01) end
 _bucket:get{1}
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
 
 _ = test_run:switch('storage_1_a')
-while _bucket:count() ~= 0 do vshard.storage.recovery_wakeup() fiber.sleep(0.1) end
+vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
+wait_bucket_is_collected(1)
 
 _ = test_run:switch("default")
 test_run:drop_cluster(REPLICASET_2)
diff --git a/vshard/consts.lua b/vshard/consts.lua
index 3f1585a..cf3f422 100644
--- a/vshard/consts.lua
+++ b/vshard/consts.lua
@@ -39,7 +39,7 @@ return {
     DEFAULT_SYNC_TIMEOUT = 1;
     RECONNECT_TIMEOUT = 0.5;
     GC_BACKOFF_INTERVAL = 5,
-    RECOVERY_INTERVAL = 5;
+    RECOVERY_BACKOFF_INTERVAL = 5,
     COLLECT_LUA_GARBAGE_INTERVAL = 100;
 
     DISCOVERY_IDLE_INTERVAL = 10,
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index 31a6fc7..85f5024 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -634,13 +634,16 @@ end
 -- Infinite function to resolve status of buckets, whose 'sending'
 -- has failed due to tarantool or network problems. Restarts on
 -- reload.
--- @param module_version Module version, on which the current
---        function had been started. If the actual module version
---        appears to be changed, then stop recovery. It is
---        restarted in reloadable_fiber.
 --
 local function recovery_f()
     local module_version = M.module_version
+    -- Changes of _bucket increments bucket generation. Recovery has its own
+    -- bucket generation which is <= actual. Recovery is finished, when its
+    -- generation == bucket generation. In such a case the fiber does nothing
+    -- until next _bucket change.
+    local bucket_generation_recovered = -1
+    local bucket_generation_current = M.bucket_generation
+    local ok, sleep_time, is_all_recovered, total, recovered
     -- Interrupt recovery if a module has been reloaded. Perhaps,
     -- there was found a bug, and reload fixes it.
     while module_version == M.module_version do
@@ -648,22 +651,57 @@ local function recovery_f()
             lfiber.yield()
             goto continue
         end
-        local ok, total, recovered = pcall(recovery_step_by_type,
-                                           consts.BUCKET.SENDING)
+        is_all_recovered = true
+        if bucket_generation_recovered == bucket_generation_current then
+            goto sleep
+        end
+
+        ok, total, recovered = pcall(recovery_step_by_type,
+                                     consts.BUCKET.SENDING)
         if not ok then
+            is_all_recovered = false
             log.error('Error during sending buckets recovery: %s', total)
+        elseif total ~= recovered then
+            is_all_recovered = false
         end
+
         ok, total, recovered = pcall(recovery_step_by_type,
                                      consts.BUCKET.RECEIVING)
         if not ok then
+            is_all_recovered = false
             log.error('Error during receiving buckets recovery: %s', total)
         elseif total == 0 then
             bucket_receiving_quota_reset()
         else
             bucket_receiving_quota_add(recovered)
+            if total ~= recovered then
+                is_all_recovered = false
+            end
+        end
+
+    ::sleep::
+        if not is_all_recovered then
+            bucket_generation_recovered = -1
+        else
+            bucket_generation_recovered = bucket_generation_current
+        end
+        bucket_generation_current = M.bucket_generation
+
+        if not is_all_recovered then
+            -- One option - some buckets are not broken. Their transmission is
+            -- still in progress. Don't need to retry immediately. Another
+            -- option - network errors when tried to repair the buckets. Also no
+            -- need to retry often. It won't help.
+            sleep_time = consts.RECOVERY_BACKOFF_INTERVAL
+        elseif bucket_generation_recovered ~= bucket_generation_current then
+            sleep_time = 0
+        else
+            sleep_time = consts.TIMEOUT_INFINITY
+        end
+        if module_version == M.module_version then
+            M.bucket_generation_cond:wait(sleep_time)
         end
-        lfiber.sleep(consts.RECOVERY_INTERVAL)
-        ::continue::
+    ::continue::
     end
 end
 
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  9:00 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch. LGTM.

On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Recovery is a fiber on a master node which tries to resolve
> SENDING/RECEIVING buckets into GARBAGE or ACTIVE, in case they are
> stuck. Usually it happens due to a conflict on the receiving side,
> or if a restart happens during bucket send.
>
> Recovery was proactive. It used to wakeup with a constant period
> to find and resolve the needed buckets.
>
> But this won't work with the future feature called 'map-reduce'.
> Map-reduce as a preparation stage will need to ensure that all
> buckets on a storage are readable and writable. With the current
> recovery algorithm if a bucket is broken, it won't be recovered
> for the next 5 seconds by default. During this time all new
> map-reduce requests can't execute.
>
> This is not acceptable. As well as too frequent wakeup of recovery
> fiber because it would waste TX thread time.
>
> The patch makes recovery fiber wakeup not by a timeout but by
> events happening with _bucket space. Recovery fiber sleeps on a
> condition variable which is signaled when _bucket is changed.
>
> This is very similar to the reactive GC feature in a previous
> commit.
>
> It is worth mentioning that the backoff happens not only when a
> bucket couldn't be recovered (its transfer is still in progress,
> for example), but also when a network error happened and recovery
> couldn't check state of the bucket on the other storage.
>
> It would be a useless busy loop to retry network errors
> immediately after their appearance. Recovery uses a backoff
> interval for them as well.
>
> Needed for #147
> ---
>   test/router/router.result             | 22 ++++++++---
>   test/router/router.test.lua           | 13 ++++++-
>   test/storage/recovery.result          |  8 ++++
>   test/storage/recovery.test.lua        |  5 +++
>   test/storage/recovery_errinj.result   | 16 +++++++-
>   test/storage/recovery_errinj.test.lua |  9 ++++-
>   vshard/consts.lua                     |  2 +-
>   vshard/storage/init.lua               | 54 +++++++++++++++++++++++----
>   8 files changed, 110 insertions(+), 19 deletions(-)
>
> diff --git a/test/router/router.result b/test/router/router.result
> index b2efd6d..3c1d073 100644
> --- a/test/router/router.result
> +++ b/test/router/router.result
> @@ -312,6 +312,11 @@ replicaset, err = vshard.router.bucket_discovery(2); return err == nil or err
>   _ = test_run:switch('storage_2_a')
>   ---
>   ...
> +-- Pause recovery. It is too aggressive, and the test needs to see buckets in
> +-- their intermediate states.
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +---
> +...
>   box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]})
>   ---
>   - [1, 'sending', '<replicaset_1>']
> @@ -319,6 +324,9 @@ box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]}
>   _ = test_run:switch('storage_1_a')
>   ---
>   ...
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +---
> +...
>   box.space._bucket:replace({1, vshard.consts.BUCKET.RECEIVING, util.replicasets[2]})
>   ---
>   - [1, 'receiving', '<replicaset_2>']
> @@ -342,19 +350,21 @@ util.check_error(vshard.router.call, 1, 'write', 'echo', {123})
>     name: TRANSFER_IS_IN_PROGRESS
>     message: Bucket 1 is transferring to replicaset <replicaset_1>
>   ...
> -_ = test_run:switch('storage_2_a')
> +_ = test_run:switch('storage_1_a')
> +---
> +...
> +box.space._bucket:delete({1})
>   ---
> +- [1, 'receiving', '<replicaset_2>']
>   ...
> -box.space._bucket:replace({1, vshard.consts.BUCKET.ACTIVE})
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
>   ---
> -- [1, 'active']
>   ...
> -_ = test_run:switch('storage_1_a')
> +_ = test_run:switch('storage_2_a')
>   ---
>   ...
> -box.space._bucket:delete({1})
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
>   ---
> -- [1, 'receiving', '<replicaset_2>']
>   ...
>   _ = test_run:switch('router_1')
>   ---
> diff --git a/test/router/router.test.lua b/test/router/router.test.lua
> index 154310b..aa3eb3b 100644
> --- a/test/router/router.test.lua
> +++ b/test/router/router.test.lua
> @@ -114,19 +114,28 @@ replicaset, err = vshard.router.bucket_discovery(1); return err == nil or err
>   replicaset, err = vshard.router.bucket_discovery(2); return err == nil or err
>   
>   _ = test_run:switch('storage_2_a')
> +-- Pause recovery. It is too aggressive, and the test needs to see buckets in
> +-- their intermediate states.
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
>   box.space._bucket:replace({1, vshard.consts.BUCKET.SENDING, util.replicasets[1]})
> +
>   _ = test_run:switch('storage_1_a')
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
>   box.space._bucket:replace({1, vshard.consts.BUCKET.RECEIVING, util.replicasets[2]})
> +
>   _ = test_run:switch('router_1')
>   -- Ok to read sending bucket.
>   vshard.router.call(1, 'read', 'echo', {123})
>   -- Not ok to write sending bucket.
>   util.check_error(vshard.router.call, 1, 'write', 'echo', {123})
>   
> -_ = test_run:switch('storage_2_a')
> -box.space._bucket:replace({1, vshard.consts.BUCKET.ACTIVE})
>   _ = test_run:switch('storage_1_a')
>   box.space._bucket:delete({1})
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
> +
> +_ = test_run:switch('storage_2_a')
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
> +
>   _ = test_run:switch('router_1')
>   
>   -- Check unavailability of master of a replicaset.
> diff --git a/test/storage/recovery.result b/test/storage/recovery.result
> index 8ccb0b9..fa92bca 100644
> --- a/test/storage/recovery.result
> +++ b/test/storage/recovery.result
> @@ -28,12 +28,20 @@ util.push_rs_filters(test_run)
>   _ = test_run:switch("storage_2_a")
>   ---
>   ...
> +-- Pause until restart. Otherwise recovery does its job too fast and does not
> +-- allow to simulate the intermediate state.
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +---
> +...
>   vshard.storage.rebalancer_disable()
>   ---
>   ...
>   _ = test_run:switch("storage_1_a")
>   ---
>   ...
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +---
> +...
>   -- Create buckets sending to rs2 and restart - recovery must
>   -- garbage some of them and activate others. Receiving buckets
>   -- must be garbaged on bootstrap.
> diff --git a/test/storage/recovery.test.lua b/test/storage/recovery.test.lua
> index a0651e8..93cec68 100644
> --- a/test/storage/recovery.test.lua
> +++ b/test/storage/recovery.test.lua
> @@ -10,8 +10,13 @@ util.wait_master(test_run, REPLICASET_2, 'storage_2_a')
>   util.push_rs_filters(test_run)
>   
>   _ = test_run:switch("storage_2_a")
> +-- Pause until restart. Otherwise recovery does its job too fast and does not
> +-- allow to simulate the intermediate state.
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
>   vshard.storage.rebalancer_disable()
> +
>   _ = test_run:switch("storage_1_a")
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
>   
>   -- Create buckets sending to rs2 and restart - recovery must
>   -- garbage some of them and activate others. Receiving buckets
> diff --git a/test/storage/recovery_errinj.result b/test/storage/recovery_errinj.result
> index 3e9a9bf..8c178d5 100644
> --- a/test/storage/recovery_errinj.result
> +++ b/test/storage/recovery_errinj.result
> @@ -35,9 +35,17 @@ _ = test_run:switch('storage_2_a')
>   vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = true
>   ---
>   ...
> +-- Pause recovery. Otherwise it does its job too fast and does not allow to
> +-- simulate the intermediate state.
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +---
> +...
>   _ = test_run:switch('storage_1_a')
>   ---
>   ...
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +---
> +...
>   _bucket = box.space._bucket
>   ---
>   ...
> @@ -76,10 +84,16 @@ _bucket:get{1}
>   ---
>   - [1, 'active']
>   ...
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
> +---
> +...
>   _ = test_run:switch('storage_1_a')
>   ---
>   ...
> -while _bucket:count() ~= 0 do vshard.storage.recovery_wakeup() fiber.sleep(0.1) end
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
> +---
> +...
> +wait_bucket_is_collected(1)
>   ---
>   ...
>   _ = test_run:switch("default")
> diff --git a/test/storage/recovery_errinj.test.lua b/test/storage/recovery_errinj.test.lua
> index 8c1a9d2..c730560 100644
> --- a/test/storage/recovery_errinj.test.lua
> +++ b/test/storage/recovery_errinj.test.lua
> @@ -14,7 +14,12 @@ util.push_rs_filters(test_run)
>   --
>   _ = test_run:switch('storage_2_a')
>   vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = true
> +-- Pause recovery. Otherwise it does its job too fast and does not allow to
> +-- simulate the intermediate state.
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
> +
>   _ = test_run:switch('storage_1_a')
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = true
>   _bucket = box.space._bucket
>   _bucket:replace{1, vshard.consts.BUCKET.ACTIVE, util.replicasets[2]}
>   ret, err = vshard.storage.bucket_send(1, util.replicasets[2], {timeout = 0.1})
> @@ -27,9 +32,11 @@ vshard.storage.internal.errinj.ERRINJ_LAST_RECEIVE_DELAY = false
>   _bucket = box.space._bucket
>   while _bucket:get{1}.status ~= vshard.consts.BUCKET.ACTIVE do fiber.sleep(0.01) end
>   _bucket:get{1}
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
>   
>   _ = test_run:switch('storage_1_a')
> -while _bucket:count() ~= 0 do vshard.storage.recovery_wakeup() fiber.sleep(0.1) end
> +vshard.storage.internal.errinj.ERRINJ_NO_RECOVERY = false
> +wait_bucket_is_collected(1)
>   
>   _ = test_run:switch("default")
>   test_run:drop_cluster(REPLICASET_2)
> diff --git a/vshard/consts.lua b/vshard/consts.lua
> index 3f1585a..cf3f422 100644
> --- a/vshard/consts.lua
> +++ b/vshard/consts.lua
> @@ -39,7 +39,7 @@ return {
>       DEFAULT_SYNC_TIMEOUT = 1;
>       RECONNECT_TIMEOUT = 0.5;
>       GC_BACKOFF_INTERVAL = 5,
> -    RECOVERY_INTERVAL = 5;
> +    RECOVERY_BACKOFF_INTERVAL = 5,
>       COLLECT_LUA_GARBAGE_INTERVAL = 100;
>   
>       DISCOVERY_IDLE_INTERVAL = 10,
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index 31a6fc7..85f5024 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -634,13 +634,16 @@ end
>   -- Infinite function to resolve status of buckets, whose 'sending'
>   -- has failed due to tarantool or network problems. Restarts on
>   -- reload.
> --- @param module_version Module version, on which the current
> ---        function had been started. If the actual module version
> ---        appears to be changed, then stop recovery. It is
> ---        restarted in reloadable_fiber.
>   --
>   local function recovery_f()
>       local module_version = M.module_version
> +    -- Changes of _bucket increments bucket generation. Recovery has its own
> +    -- bucket generation which is <= actual. Recovery is finished, when its
> +    -- generation == bucket generation. In such a case the fiber does nothing
> +    -- until next _bucket change.
> +    local bucket_generation_recovered = -1
> +    local bucket_generation_current = M.bucket_generation
> +    local ok, sleep_time, is_all_recovered, total, recovered
>       -- Interrupt recovery if a module has been reloaded. Perhaps,
>       -- there was found a bug, and reload fixes it.
>       while module_version == M.module_version do
> @@ -648,22 +651,57 @@ local function recovery_f()
>               lfiber.yield()
>               goto continue
>           end
> -        local ok, total, recovered = pcall(recovery_step_by_type,
> -                                           consts.BUCKET.SENDING)
> +        is_all_recovered = true
> +        if bucket_generation_recovered == bucket_generation_current then
> +            goto sleep
> +        end
> +
> +        ok, total, recovered = pcall(recovery_step_by_type,
> +                                     consts.BUCKET.SENDING)
>           if not ok then
> +            is_all_recovered = false
>               log.error('Error during sending buckets recovery: %s', total)
> +        elseif total ~= recovered then
> +            is_all_recovered = false
>           end
> +
>           ok, total, recovered = pcall(recovery_step_by_type,
>                                        consts.BUCKET.RECEIVING)
>           if not ok then
> +            is_all_recovered = false
>               log.error('Error during receiving buckets recovery: %s', total)
>           elseif total == 0 then
>               bucket_receiving_quota_reset()
>           else
>               bucket_receiving_quota_add(recovered)
> +            if total ~= recovered then
> +                is_all_recovered = false
> +            end
> +        end
> +
> +    ::sleep::
> +        if not is_all_recovered then
> +            bucket_generation_recovered = -1
> +        else
> +            bucket_generation_recovered = bucket_generation_current
> +        end
> +        bucket_generation_current = M.bucket_generation
> +
> +        if not is_all_recovered then
> +            -- One option - some buckets are not broken. Their transmission is
> +            -- still in progress. Don't need to retry immediately. Another
> +            -- option - network errors when tried to repair the buckets. Also no
> +            -- need to retry often. It won't help.
> +            sleep_time = consts.RECOVERY_BACKOFF_INTERVAL
> +        elseif bucket_generation_recovered ~= bucket_generation_current then
> +            sleep_time = 0
> +        else
> +            sleep_time = consts.TIMEOUT_INFINITY
> +        end
> +        if module_version == M.module_version then
> +            M.bucket_generation_cond:wait(sleep_time)
>           end
> -        lfiber.sleep(consts.RECOVERY_INTERVAL)
> -        ::continue::
> +    ::continue::
>       end
>   end
>   

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (7 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:46 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  9:01   ` Oleg Babin via Tarantool-patches
  2021-03-05 22:03   ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-09 23:51 ` [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
  9 siblings, 2 replies; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:46 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Lua does not have a built-in standard library for binary heaps
(also called priority queues). There is an implementation in
Tarantool core in libsalad, but it is in C.

Heap is a perfect storage for the soon coming feature map-reduce.
In the map-reduce algorithm it will be necessary to be able to
lock an entire storage against any bucket moves for time <=
specified timeout. Number of map-reduce requests can be big, and
they can have different timeouts.

So there is a pile of timeouts from different requests. It is
necessary to be able to quickly add new ones, be able to delete
random ones, and remove expired ones.

One way would be a sorted array of the deadlines. Unfortunately,
it is super slow. O(N + log(N)) to add a new element (find place
for log(N) and move all next elements for N), O(N) to delete a
random one (move all next elements one cell left/right).

Another way would be a sorted tree. But trees like RB or a dumb
binary tree require extra steps to keep them balanced and to have
access to the smallest element ASAP.

The best way is the binary heap. It is perfectly balanced by
design meaning that all operations there have complexity at most
O(log(N)). It is possible to find the closest deadline for
constant time as it is the heap's top.

This patch implements it. The heap is intrusive. It means it
stores index of each element right inside of the element as a
field 'index'. Having an index along with each element allows to
delete it from the heap for O(log(N)) without necessity to look
its place up first.

Part of #147
---
 test/unit-tap/heap.test.lua | 310 ++++++++++++++++++++++++++++++++++++
 test/unit-tap/suite.ini     |   4 +
 vshard/heap.lua             | 226 ++++++++++++++++++++++++++
 3 files changed, 540 insertions(+)
 create mode 100755 test/unit-tap/heap.test.lua
 create mode 100644 test/unit-tap/suite.ini
 create mode 100644 vshard/heap.lua

diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua
new file mode 100755
index 0000000..8c3819f
--- /dev/null
+++ b/test/unit-tap/heap.test.lua
@@ -0,0 +1,310 @@
+#!/usr/bin/env tarantool
+
+local tap = require('tap')
+local test = tap.test("cfg")
+local heap = require('vshard.heap')
+
+--
+-- Max number of heap to test. Number of iterations in the test
+-- grows as a factorial of this value. At 10 the test becomes
+-- too long already.
+--
+local heap_size = 8
+
+--
+-- Type of the object stored in the intrusive heap.
+--
+local function min_heap_cmp(l, r)
+    return l.value < r.value
+end
+
+local function max_heap_cmp(l, r)
+    return l.value > r.value
+end
+
+local function new_object(value)
+    return {value = value}
+end
+
+local function heap_check_indexes(heap)
+    local count = heap:count()
+    local data = heap.data
+    for i = 1, count do
+        assert(data[i].index == i)
+    end
+end
+
+local function reverse(values, i1, i2)
+    while i1 < i2 do
+        values[i1], values[i2] = values[i2], values[i1]
+        i1 = i1 + 1
+        i2 = i2 - 1
+    end
+end
+
+--
+-- Implementation of std::next_permutation() from C++.
+--
+local function next_permutation(values)
+    local count = #values
+    if count <= 1 then
+        return false
+    end
+    local i = count
+    while true do
+        local j = i
+        i = i - 1
+        if values[i] < values[j] then
+            local k = count
+            while values[i] >= values[k] do
+                k = k - 1
+            end
+            values[i], values[k] = values[k], values[i]
+            reverse(values, j, count)
+            return true
+        end
+        if i == 1 then
+            reverse(values, 1, count)
+            return false
+        end
+    end
+end
+
+local function range(count)
+    local res = {}
+    for i = 1, count do
+        res[i] = i
+    end
+    return res
+end
+
+--
+-- Min heap fill and empty.
+--
+local function test_min_heap_basic(test)
+    test:plan(1)
+
+    local h = heap.new(min_heap_cmp)
+    assert(not h:pop())
+    assert(h:count() == 0)
+    local values = {}
+    for i = 1, heap_size do
+        values[i] = new_object(i)
+    end
+    for counti = 1, heap_size do
+        local indexes = range(counti)
+        repeat
+            for i = 1, counti do
+                h:push(values[indexes[i]])
+            end
+            heap_check_indexes(h)
+            assert(h:count() == counti)
+            for i = 1, counti do
+                assert(h:top() == values[i])
+                assert(h:pop() == values[i])
+                heap_check_indexes(h)
+            end
+            assert(not h:pop())
+            assert(h:count() == 0)
+        until not next_permutation(indexes)
+    end
+
+    test:ok(true, "no asserts")
+end
+
+--
+-- Max heap fill and empty.
+--
+local function test_max_heap_basic(test)
+    test:plan(1)
+
+    local h = heap.new(max_heap_cmp)
+    assert(not h:pop())
+    assert(h:count() == 0)
+    local values = {}
+    for i = 1, heap_size do
+        values[i] = new_object(heap_size - i + 1)
+    end
+    for counti = 1, heap_size do
+        local indexes = range(counti)
+        repeat
+            for i = 1, counti do
+                h:push(values[indexes[i]])
+            end
+            heap_check_indexes(h)
+            assert(h:count() == counti)
+            for i = 1, counti do
+                assert(h:top() == values[i])
+                assert(h:pop() == values[i])
+                heap_check_indexes(h)
+            end
+            assert(not h:pop())
+            assert(h:count() == 0)
+        until not next_permutation(indexes)
+    end
+
+    test:ok(true, "no asserts")
+end
+
+--
+-- Min heap update top element.
+--
+local function test_min_heap_update_top(test)
+    test:plan(1)
+
+    local h = heap.new(min_heap_cmp)
+    for counti = 1, heap_size do
+        local indexes = range(counti)
+        repeat
+            local values = {}
+            for i = 1, counti do
+                values[i] = new_object(0)
+                h:push(values[i])
+            end
+            heap_check_indexes(h)
+            for i = 1, counti do
+                h:top().value = indexes[i]
+                h:update_top()
+            end
+            heap_check_indexes(h)
+            assert(h:count() == counti)
+            for i = 1, counti do
+                assert(h:top().value == i)
+                assert(h:pop().value == i)
+                heap_check_indexes(h)
+            end
+            assert(not h:pop())
+            assert(h:count() == 0)
+        until not next_permutation(indexes)
+    end
+
+    test:ok(true, "no asserts")
+end
+
+--
+-- Min heap update all elements in all possible positions.
+--
+local function test_min_heap_update(test)
+    test:plan(1)
+
+    local h = heap.new(min_heap_cmp)
+    for counti = 1, heap_size do
+        for srci = 1, counti do
+            local endv = srci * 10 + 5
+            for newv = 5, endv, 5 do
+                local values = {}
+                for i = 1, counti do
+                    values[i] = new_object(i * 10)
+                    h:push(values[i])
+                end
+                heap_check_indexes(h)
+                local obj = values[srci]
+                obj.value = newv
+                h:update(obj)
+                assert(obj.index >= 1)
+                assert(obj.index <= counti)
+                local prev = -1
+                for i = 1, counti do
+                    obj = h:pop()
+                    assert(obj.index == -1)
+                    assert(obj.value >= prev)
+                    assert(obj.value >= 1)
+                    prev = obj.value
+                    obj.value = -1
+                    heap_check_indexes(h)
+                end
+                assert(not h:pop())
+                assert(h:count() == 0)
+            end
+        end
+    end
+
+    test:ok(true, "no asserts")
+end
+
+--
+-- Max heap delete all elements from all possible positions.
+--
+local function test_max_heap_delete(test)
+    test:plan(1)
+
+    local h = heap.new(max_heap_cmp)
+    local inf = heap_size + 1
+    for counti = 1, heap_size do
+        for srci = 1, counti do
+            local values = {}
+            for i = 1, counti do
+                values[i] = new_object(i)
+                h:push(values[i])
+            end
+            heap_check_indexes(h)
+            local obj = values[srci]
+            obj.value = inf
+            h:remove(obj)
+            assert(obj.index == -1)
+            local prev = inf
+            for i = 2, counti do
+                obj = h:pop()
+                assert(obj.index == -1)
+                assert(obj.value < prev)
+                assert(obj.value >= 1)
+                prev = obj.value
+                obj.value = -1
+                heap_check_indexes(h)
+            end
+            assert(not h:pop())
+            assert(h:count() == 0)
+        end
+    end
+
+    test:ok(true, "no asserts")
+end
+
+local function test_min_heap_remove_top(test)
+    test:plan(1)
+
+    local h = heap.new(min_heap_cmp)
+    for i = 1, heap_size do
+        h:push(new_object(i))
+    end
+    for i = 1, heap_size do
+        assert(h:top().value == i)
+        h:remove_top()
+    end
+    assert(h:count() == 0)
+
+    test:ok(true, "no asserts")
+end
+
+local function test_max_heap_remove_try(test)
+    test:plan(1)
+
+    local h = heap.new(max_heap_cmp)
+    local obj = new_object(1)
+    assert(obj.index == nil)
+    h:remove_try(obj)
+    assert(h:count() == 0)
+
+    h:push(obj)
+    h:push(new_object(2))
+    assert(obj.index == 2)
+    h:remove(obj)
+    assert(obj.index == -1)
+    h:remove_try(obj)
+    assert(obj.index == -1)
+    assert(h:count() == 1)
+
+    test:ok(true, "no asserts")
+end
+
+test:plan(7)
+
+test:test('min_heap_basic', test_min_heap_basic)
+test:test('max_heap_basic', test_max_heap_basic)
+test:test('min_heap_update_top', test_min_heap_update_top)
+test:test('min heap update', test_min_heap_update)
+test:test('max heap delete', test_max_heap_delete)
+test:test('min heap remove top', test_min_heap_remove_top)
+test:test('max heap remove try', test_max_heap_remove_try)
+
+os.exit(test:check() and 0 or 1)
diff --git a/test/unit-tap/suite.ini b/test/unit-tap/suite.ini
new file mode 100644
index 0000000..f365b69
--- /dev/null
+++ b/test/unit-tap/suite.ini
@@ -0,0 +1,4 @@
+[default]
+core = app
+description = Unit tests TAP
+is_parallel = True
diff --git a/vshard/heap.lua b/vshard/heap.lua
new file mode 100644
index 0000000..78c600a
--- /dev/null
+++ b/vshard/heap.lua
@@ -0,0 +1,226 @@
+local math_floor = math.floor
+
+--
+-- Implementation of a typical algorithm of the binary heap.
+-- The heap is intrusive - it stores index of each element inside of it. It
+-- allows to update and delete elements in any place in the heap, not only top
+-- elements.
+--
+
+local function heap_parent_index(index)
+    return math_floor(index / 2)
+end
+
+local function heap_left_child_index(index)
+    return index * 2
+end
+
+--
+-- Generate a new heap.
+--
+-- The implementation is targeted on as few index accesses as possible.
+-- Everything what could be is stored as upvalue variables instead of as indexes
+-- in a table. What couldn't be an upvalue and is used in a function more than
+-- once is saved on the stack.
+--
+local function heap_new(is_left_above)
+    -- Having it as an upvalue allows not to do 'self.data' lookup in each
+    -- function.
+    local data = {}
+    -- Saves #data calculation. In Lua it is not just reading a number.
+    local count = 0
+
+    local function heap_update_index_up(idx)
+        if idx == 1 then
+            return false
+        end
+
+        local orig_idx = idx
+        local value = data[idx]
+        local pidx = heap_parent_index(idx)
+        local parent = data[pidx]
+        while is_left_above(value, parent) do
+            data[idx] = parent
+            parent.index = idx
+            idx = pidx
+            if idx == 1 then
+                break
+            end
+            pidx = heap_parent_index(idx)
+            parent = data[pidx]
+        end
+
+        if idx == orig_idx then
+            return false
+        end
+        data[idx] = value
+        value.index = idx
+        return true
+    end
+
+    local function heap_update_index_down(idx)
+        local left_idx = heap_left_child_index(idx)
+        if left_idx > count then
+            return false
+        end
+
+        local orig_idx = idx
+        local left
+        local right
+        local right_idx = left_idx + 1
+        local top
+        local top_idx
+        local value = data[idx]
+        repeat
+            right_idx = left_idx + 1
+            if right_idx > count then
+                top = data[left_idx]
+                if is_left_above(value, top) then
+                    break
+                end
+                top_idx = left_idx
+            else
+                left = data[left_idx]
+                right = data[right_idx]
+                if is_left_above(left, right) then
+                    if is_left_above(value, left) then
+                        break
+                    end
+                    top_idx = left_idx
+                    top = left
+                else
+                    if is_left_above(value, right) then
+                        break
+                    end
+                    top_idx = right_idx
+                    top = right
+                end
+            end
+
+            data[idx] = top
+            top.index = idx
+            idx = top_idx
+            left_idx = heap_left_child_index(idx)
+        until left_idx > count
+
+        if idx == orig_idx then
+            return false
+        end
+        data[idx] = value
+        value.index = idx
+        return true
+    end
+
+    local function heap_update_index(idx)
+        if not heap_update_index_up(idx) then
+            heap_update_index_down(idx)
+        end
+    end
+
+    local function heap_push(self, value)
+        count = count + 1
+        data[count] = value
+        value.index = count
+        heap_update_index_up(count)
+    end
+
+    local function heap_update_top(self)
+        heap_update_index_down(1)
+    end
+
+    local function heap_update(self, value)
+        heap_update_index(value.index)
+    end
+
+    local function heap_remove_top(self)
+        if count == 0 then
+            return
+        end
+        data[1].index = -1
+        if count == 1 then
+            data[1] = nil
+            count = 0
+            return
+        end
+        local value = data[count]
+        data[count] = nil
+        data[1] = value
+        value.index = 1
+        count = count - 1
+        heap_update_index_down(1)
+    end
+
+    local function heap_remove(self, value)
+        local idx = value.index
+        value.index = -1
+        if idx == count then
+            data[count] = nil
+            count = count - 1
+            return
+        end
+        value = data[count]
+        data[idx] = value
+        data[count] = nil
+        value.index = idx
+        count = count - 1
+        heap_update_index(idx)
+    end
+
+    local function heap_remove_try(self, value)
+        local idx = value.index
+        if idx and idx > 0 then
+            heap_remove(self, value)
+        end
+    end
+
+    local function heap_pop(self)
+        if count == 0 then
+            return
+        end
+        -- Some duplication from remove_top, but allows to save a few
+        -- condition checks, index accesses, and a function call.
+        local res = data[1]
+        res.index = -1
+        if count == 1 then
+            data[1] = nil
+            count = 0
+            return res
+        end
+        local value = data[count]
+        data[count] = nil
+        data[1] = value
+        value.index = 1
+        count = count - 1
+        heap_update_index_down(1)
+        return res
+    end
+
+    local function heap_top(self)
+        return data[1]
+    end
+
+    local function heap_count(self)
+        return count
+    end
+
+    return setmetatable({
+        -- Expose the data. For testing.
+        data = data,
+    }, {
+        __index = {
+            push = heap_push,
+            update_top = heap_update_top,
+            remove_top = heap_remove_top,
+            pop = heap_pop,
+            update = heap_update,
+            remove = heap_remove,
+            remove_try = heap_remove_try,
+            top = heap_top,
+            count = heap_count,
+        }
+    })
+end
+
+return {
+    new = heap_new,
+}
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-10  9:01   ` Oleg Babin via Tarantool-patches
  2021-02-10 22:36     ` Vladislav Shpilevoy via Tarantool-patches
  2021-03-05 22:03   ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-10  9:01 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your patch.

Shouldn't it be added to storage "MODULE_INTERNALS" ?


LGTM. One comment below.


On 10/02/2021 02:46, Vladislav Shpilevoy wrote:
> Lua does not have a built-in standard library for binary heaps
> (also called priority queues). There is an implementation in
> Tarantool core in libsalad, but it is in C.
>
> Heap is a perfect storage for the soon coming feature map-reduce.
> In the map-reduce algorithm it will be necessary to be able to
> lock an entire storage against any bucket moves for time <=
> specified timeout. Number of map-reduce requests can be big, and
> they can have different timeouts.
>
> So there is a pile of timeouts from different requests. It is
> necessary to be able to quickly add new ones, be able to delete
> random ones, and remove expired ones.
>
> One way would be a sorted array of the deadlines. Unfortunately,
> it is super slow. O(N + log(N)) to add a new element (find place
> for log(N) and move all next elements for N), O(N) to delete a
> random one (move all next elements one cell left/right).
>
> Another way would be a sorted tree. But trees like RB or a dumb
> binary tree require extra steps to keep them balanced and to have
> access to the smallest element ASAP.
>
> The best way is the binary heap. It is perfectly balanced by
> design meaning that all operations there have complexity at most
> O(log(N)). It is possible to find the closest deadline for
> constant time as it is the heap's top.
>
> This patch implements it. The heap is intrusive. It means it
> stores index of each element right inside of the element as a
> field 'index'. Having an index along with each element allows to
> delete it from the heap for O(log(N)) without necessity to look
> its place up first.
>
> Part of #147
> ---
>   test/unit-tap/heap.test.lua | 310 ++++++++++++++++++++++++++++++++++++
>   test/unit-tap/suite.ini     |   4 +
>   vshard/heap.lua             | 226 ++++++++++++++++++++++++++
>   3 files changed, 540 insertions(+)
>   create mode 100755 test/unit-tap/heap.test.lua
>   create mode 100644 test/unit-tap/suite.ini
>   create mode 100644 vshard/heap.lua
>
> diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua
> new file mode 100755
> index 0000000..8c3819f
> --- /dev/null
> +++ b/test/unit-tap/heap.test.lua
> @@ -0,0 +1,310 @@
> +#!/usr/bin/env tarantool
> +
> +local tap = require('tap')
> +local test = tap.test("cfg")
> +local heap = require('vshard.heap')
> +


Maybe it's better to use single brackets everywhere: test("cfg") -> 
test('cfg'). Or does such difference have some sense?


> +--
> +-- Max number of heap to test. Number of iterations in the test
> +-- grows as a factorial of this value. At 10 the test becomes
> +-- too long already.
> +--
> +local heap_size = 8
> +
> +--
> +-- Type of the object stored in the intrusive heap.
> +--
> +local function min_heap_cmp(l, r)
> +    return l.value < r.value
> +end
> +
> +local function max_heap_cmp(l, r)
> +    return l.value > r.value
> +end
> +
> +local function new_object(value)
> +    return {value = value}
> +end
> +
> +local function heap_check_indexes(heap)
> +    local count = heap:count()
> +    local data = heap.data
> +    for i = 1, count do
> +        assert(data[i].index == i)
> +    end
> +end
> +
> +local function reverse(values, i1, i2)
> +    while i1 < i2 do
> +        values[i1], values[i2] = values[i2], values[i1]
> +        i1 = i1 + 1
> +        i2 = i2 - 1
> +    end
> +end
> +
> +--
> +-- Implementation of std::next_permutation() from C++.
> +--
> +local function next_permutation(values)
> +    local count = #values
> +    if count <= 1 then
> +        return false
> +    end
> +    local i = count
> +    while true do
> +        local j = i
> +        i = i - 1
> +        if values[i] < values[j] then
> +            local k = count
> +            while values[i] >= values[k] do
> +                k = k - 1
> +            end
> +            values[i], values[k] = values[k], values[i]
> +            reverse(values, j, count)
> +            return true
> +        end
> +        if i == 1 then
> +            reverse(values, 1, count)
> +            return false
> +        end
> +    end
> +end
> +
> +local function range(count)
> +    local res = {}
> +    for i = 1, count do
> +        res[i] = i
> +    end
> +    return res
> +end
> +
> +--
> +-- Min heap fill and empty.
> +--
> +local function test_min_heap_basic(test)
> +    test:plan(1)
> +
> +    local h = heap.new(min_heap_cmp)
> +    assert(not h:pop())
> +    assert(h:count() == 0)
> +    local values = {}
> +    for i = 1, heap_size do
> +        values[i] = new_object(i)
> +    end
> +    for counti = 1, heap_size do
> +        local indexes = range(counti)
> +        repeat
> +            for i = 1, counti do
> +                h:push(values[indexes[i]])
> +            end
> +            heap_check_indexes(h)
> +            assert(h:count() == counti)
> +            for i = 1, counti do
> +                assert(h:top() == values[i])
> +                assert(h:pop() == values[i])
> +                heap_check_indexes(h)
> +            end
> +            assert(not h:pop())
> +            assert(h:count() == 0)
> +        until not next_permutation(indexes)
> +    end
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +--
> +-- Max heap fill and empty.
> +--
> +local function test_max_heap_basic(test)
> +    test:plan(1)
> +
> +    local h = heap.new(max_heap_cmp)
> +    assert(not h:pop())
> +    assert(h:count() == 0)
> +    local values = {}
> +    for i = 1, heap_size do
> +        values[i] = new_object(heap_size - i + 1)
> +    end
> +    for counti = 1, heap_size do
> +        local indexes = range(counti)
> +        repeat
> +            for i = 1, counti do
> +                h:push(values[indexes[i]])
> +            end
> +            heap_check_indexes(h)
> +            assert(h:count() == counti)
> +            for i = 1, counti do
> +                assert(h:top() == values[i])
> +                assert(h:pop() == values[i])
> +                heap_check_indexes(h)
> +            end
> +            assert(not h:pop())
> +            assert(h:count() == 0)
> +        until not next_permutation(indexes)
> +    end
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +--
> +-- Min heap update top element.
> +--
> +local function test_min_heap_update_top(test)
> +    test:plan(1)
> +
> +    local h = heap.new(min_heap_cmp)
> +    for counti = 1, heap_size do
> +        local indexes = range(counti)
> +        repeat
> +            local values = {}
> +            for i = 1, counti do
> +                values[i] = new_object(0)
> +                h:push(values[i])
> +            end
> +            heap_check_indexes(h)
> +            for i = 1, counti do
> +                h:top().value = indexes[i]
> +                h:update_top()
> +            end
> +            heap_check_indexes(h)
> +            assert(h:count() == counti)
> +            for i = 1, counti do
> +                assert(h:top().value == i)
> +                assert(h:pop().value == i)
> +                heap_check_indexes(h)
> +            end
> +            assert(not h:pop())
> +            assert(h:count() == 0)
> +        until not next_permutation(indexes)
> +    end
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +--
> +-- Min heap update all elements in all possible positions.
> +--
> +local function test_min_heap_update(test)
> +    test:plan(1)
> +
> +    local h = heap.new(min_heap_cmp)
> +    for counti = 1, heap_size do
> +        for srci = 1, counti do
> +            local endv = srci * 10 + 5
> +            for newv = 5, endv, 5 do
> +                local values = {}
> +                for i = 1, counti do
> +                    values[i] = new_object(i * 10)
> +                    h:push(values[i])
> +                end
> +                heap_check_indexes(h)
> +                local obj = values[srci]
> +                obj.value = newv
> +                h:update(obj)
> +                assert(obj.index >= 1)
> +                assert(obj.index <= counti)
> +                local prev = -1
> +                for i = 1, counti do
> +                    obj = h:pop()
> +                    assert(obj.index == -1)
> +                    assert(obj.value >= prev)
> +                    assert(obj.value >= 1)
> +                    prev = obj.value
> +                    obj.value = -1
> +                    heap_check_indexes(h)
> +                end
> +                assert(not h:pop())
> +                assert(h:count() == 0)
> +            end
> +        end
> +    end
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +--
> +-- Max heap delete all elements from all possible positions.
> +--
> +local function test_max_heap_delete(test)
> +    test:plan(1)
> +
> +    local h = heap.new(max_heap_cmp)
> +    local inf = heap_size + 1
> +    for counti = 1, heap_size do
> +        for srci = 1, counti do
> +            local values = {}
> +            for i = 1, counti do
> +                values[i] = new_object(i)
> +                h:push(values[i])
> +            end
> +            heap_check_indexes(h)
> +            local obj = values[srci]
> +            obj.value = inf
> +            h:remove(obj)
> +            assert(obj.index == -1)
> +            local prev = inf
> +            for i = 2, counti do
> +                obj = h:pop()
> +                assert(obj.index == -1)
> +                assert(obj.value < prev)
> +                assert(obj.value >= 1)
> +                prev = obj.value
> +                obj.value = -1
> +                heap_check_indexes(h)
> +            end
> +            assert(not h:pop())
> +            assert(h:count() == 0)
> +        end
> +    end
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +local function test_min_heap_remove_top(test)
> +    test:plan(1)
> +
> +    local h = heap.new(min_heap_cmp)
> +    for i = 1, heap_size do
> +        h:push(new_object(i))
> +    end
> +    for i = 1, heap_size do
> +        assert(h:top().value == i)
> +        h:remove_top()
> +    end
> +    assert(h:count() == 0)
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +local function test_max_heap_remove_try(test)
> +    test:plan(1)
> +
> +    local h = heap.new(max_heap_cmp)
> +    local obj = new_object(1)
> +    assert(obj.index == nil)
> +    h:remove_try(obj)
> +    assert(h:count() == 0)
> +
> +    h:push(obj)
> +    h:push(new_object(2))
> +    assert(obj.index == 2)
> +    h:remove(obj)
> +    assert(obj.index == -1)
> +    h:remove_try(obj)
> +    assert(obj.index == -1)
> +    assert(h:count() == 1)
> +
> +    test:ok(true, "no asserts")
> +end
> +
> +test:plan(7)
> +
> +test:test('min_heap_basic', test_min_heap_basic)
> +test:test('max_heap_basic', test_max_heap_basic)
> +test:test('min_heap_update_top', test_min_heap_update_top)
> +test:test('min heap update', test_min_heap_update)
> +test:test('max heap delete', test_max_heap_delete)
> +test:test('min heap remove top', test_min_heap_remove_top)
> +test:test('max heap remove try', test_max_heap_remove_try)
> +
> +os.exit(test:check() and 0 or 1)
> diff --git a/test/unit-tap/suite.ini b/test/unit-tap/suite.ini
> new file mode 100644
> index 0000000..f365b69
> --- /dev/null
> +++ b/test/unit-tap/suite.ini
> @@ -0,0 +1,4 @@
> +[default]
> +core = app
> +description = Unit tests TAP
> +is_parallel = True
> diff --git a/vshard/heap.lua b/vshard/heap.lua
> new file mode 100644
> index 0000000..78c600a
> --- /dev/null
> +++ b/vshard/heap.lua
> @@ -0,0 +1,226 @@
> +local math_floor = math.floor
> +
> +--
> +-- Implementation of a typical algorithm of the binary heap.
> +-- The heap is intrusive - it stores index of each element inside of it. It
> +-- allows to update and delete elements in any place in the heap, not only top
> +-- elements.
> +--
> +
> +local function heap_parent_index(index)
> +    return math_floor(index / 2)
> +end
> +
> +local function heap_left_child_index(index)
> +    return index * 2
> +end
> +
> +--
> +-- Generate a new heap.
> +--
> +-- The implementation is targeted on as few index accesses as possible.
> +-- Everything what could be is stored as upvalue variables instead of as indexes
> +-- in a table. What couldn't be an upvalue and is used in a function more than
> +-- once is saved on the stack.
> +--
> +local function heap_new(is_left_above)
> +    -- Having it as an upvalue allows not to do 'self.data' lookup in each
> +    -- function.
> +    local data = {}
> +    -- Saves #data calculation. In Lua it is not just reading a number.
> +    local count = 0
> +
> +    local function heap_update_index_up(idx)
> +        if idx == 1 then
> +            return false
> +        end
> +
> +        local orig_idx = idx
> +        local value = data[idx]
> +        local pidx = heap_parent_index(idx)
> +        local parent = data[pidx]
> +        while is_left_above(value, parent) do
> +            data[idx] = parent
> +            parent.index = idx
> +            idx = pidx
> +            if idx == 1 then
> +                break
> +            end
> +            pidx = heap_parent_index(idx)
> +            parent = data[pidx]
> +        end
> +
> +        if idx == orig_idx then
> +            return false
> +        end
> +        data[idx] = value
> +        value.index = idx
> +        return true
> +    end
> +
> +    local function heap_update_index_down(idx)
> +        local left_idx = heap_left_child_index(idx)
> +        if left_idx > count then
> +            return false
> +        end
> +
> +        local orig_idx = idx
> +        local left
> +        local right
> +        local right_idx = left_idx + 1
> +        local top
> +        local top_idx
> +        local value = data[idx]
> +        repeat
> +            right_idx = left_idx + 1
> +            if right_idx > count then
> +                top = data[left_idx]
> +                if is_left_above(value, top) then
> +                    break
> +                end
> +                top_idx = left_idx
> +            else
> +                left = data[left_idx]
> +                right = data[right_idx]
> +                if is_left_above(left, right) then
> +                    if is_left_above(value, left) then
> +                        break
> +                    end
> +                    top_idx = left_idx
> +                    top = left
> +                else
> +                    if is_left_above(value, right) then
> +                        break
> +                    end
> +                    top_idx = right_idx
> +                    top = right
> +                end
> +            end
> +
> +            data[idx] = top
> +            top.index = idx
> +            idx = top_idx
> +            left_idx = heap_left_child_index(idx)
> +        until left_idx > count
> +
> +        if idx == orig_idx then
> +            return false
> +        end
> +        data[idx] = value
> +        value.index = idx
> +        return true
> +    end
> +
> +    local function heap_update_index(idx)
> +        if not heap_update_index_up(idx) then
> +            heap_update_index_down(idx)
> +        end
> +    end
> +
> +    local function heap_push(self, value)
> +        count = count + 1
> +        data[count] = value
> +        value.index = count
> +        heap_update_index_up(count)
> +    end
> +
> +    local function heap_update_top(self)
> +        heap_update_index_down(1)
> +    end
> +
> +    local function heap_update(self, value)
> +        heap_update_index(value.index)
> +    end
> +
> +    local function heap_remove_top(self)
> +        if count == 0 then
> +            return
> +        end
> +        data[1].index = -1
> +        if count == 1 then
> +            data[1] = nil
> +            count = 0
> +            return
> +        end
> +        local value = data[count]
> +        data[count] = nil
> +        data[1] = value
> +        value.index = 1
> +        count = count - 1
> +        heap_update_index_down(1)
> +    end
> +
> +    local function heap_remove(self, value)
> +        local idx = value.index
> +        value.index = -1
> +        if idx == count then
> +            data[count] = nil
> +            count = count - 1
> +            return
> +        end
> +        value = data[count]
> +        data[idx] = value
> +        data[count] = nil
> +        value.index = idx
> +        count = count - 1
> +        heap_update_index(idx)
> +    end
> +
> +    local function heap_remove_try(self, value)
> +        local idx = value.index
> +        if idx and idx > 0 then
> +            heap_remove(self, value)
> +        end
> +    end
> +
> +    local function heap_pop(self)
> +        if count == 0 then
> +            return
> +        end
> +        -- Some duplication from remove_top, but allows to save a few
> +        -- condition checks, index accesses, and a function call.
> +        local res = data[1]
> +        res.index = -1
> +        if count == 1 then
> +            data[1] = nil
> +            count = 0
> +            return res
> +        end
> +        local value = data[count]
> +        data[count] = nil
> +        data[1] = value
> +        value.index = 1
> +        count = count - 1
> +        heap_update_index_down(1)
> +        return res
> +    end
> +
> +    local function heap_top(self)
> +        return data[1]
> +    end
> +
> +    local function heap_count(self)
> +        return count
> +    end
> +
> +    return setmetatable({
> +        -- Expose the data. For testing.
> +        data = data,
> +    }, {
> +        __index = {
> +            push = heap_push,
> +            update_top = heap_update_top,
> +            remove_top = heap_remove_top,
> +            pop = heap_pop,
> +            update = heap_update,
> +            remove = heap_remove,
> +            remove_try = heap_remove_try,
> +            top = heap_top,
> +            count = heap_count,
> +        }
> +    })
> +end
> +
> +return {
> +    new = heap_new,
> +}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure
  2021-02-10  9:01   ` Oleg Babin via Tarantool-patches
@ 2021-02-10 22:36     ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-11  6:51       ` Oleg Babin via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-10 22:36 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

Thanks for the review!

On 10.02.2021 10:01, Oleg Babin wrote:
> Thanks for your patch.
> 
> Shouldn't it be added to storage "MODULE_INTERNALS" ?

Hm. Not sure I understand. Did you mean 'vshard_modules' variable in
storage/init.lua? Why? The heap is not used in storage/init.lua and
won't be used there directly in future patches. The next patches
will introduce new modules for storage/, which will use the heap,
and will reload it.

Also it does not have any global objects. So it does not
need its own global M, if this is what you meant.

>> diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua
>> new file mode 100755
>> index 0000000..8c3819f
>> --- /dev/null
>> +++ b/test/unit-tap/heap.test.lua
>> @@ -0,0 +1,310 @@
>> +#!/usr/bin/env tarantool
>> +
>> +local tap = require('tap')
>> +local test = tap.test("cfg")
>> +local heap = require('vshard.heap')
>> +
> 
> 
> Maybe it's better to use single brackets everywhere: test("cfg") -> test('cfg'). Or does such difference have some sense?

Yeah, didn't notice it. Here is the diff:

====================
diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua
index 8c3819f..9202f62 100755
--- a/test/unit-tap/heap.test.lua
+++ b/test/unit-tap/heap.test.lua
@@ -1,7 +1,7 @@
 #!/usr/bin/env tarantool
 
 local tap = require('tap')
-local test = tap.test("cfg")
+local test = tap.test('cfg')
 local heap = require('vshard.heap')
 
 --
@@ -109,7 +109,7 @@ local function test_min_heap_basic(test)
         until not next_permutation(indexes)
     end
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 --
@@ -143,7 +143,7 @@ local function test_max_heap_basic(test)
         until not next_permutation(indexes)
     end
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 --
@@ -178,7 +178,7 @@ local function test_min_heap_update_top(test)
         until not next_permutation(indexes)
     end
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 --
@@ -219,7 +219,7 @@ local function test_min_heap_update(test)
         end
     end
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 --
@@ -257,7 +257,7 @@ local function test_max_heap_delete(test)
         end
     end
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 local function test_min_heap_remove_top(test)
@@ -273,7 +273,7 @@ local function test_min_heap_remove_top(test)
     end
     assert(h:count() == 0)
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 local function test_max_heap_remove_try(test)
@@ -294,7 +294,7 @@ local function test_max_heap_remove_try(test)
     assert(obj.index == -1)
     assert(h:count() == 1)
 
-    test:ok(true, "no asserts")
+    test:ok(true, 'no asserts')
 end
 
 test:plan(7)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure
  2021-02-10 22:36     ` Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-11  6:51       ` Oleg Babin via Tarantool-patches
  2021-02-12  0:09         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 1 reply; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-11  6:51 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Thanks for your fixes!

I found you've missed to add new file to "vshard/CMakeLists.txt" [1]


[1] https://github.com/tarantool/vshard/blob/master/vshard/CMakeLists.txt#L9

On 11/02/2021 01:36, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
> On 10.02.2021 10:01, Oleg Babin wrote:
>> Thanks for your patch.
>>
>> Shouldn't it be added to storage "MODULE_INTERNALS" ?
> Hm. Not sure I understand. Did you mean 'vshard_modules' variable in
> storage/init.lua? Why? The heap is not used in storage/init.lua and
> won't be used there directly in future patches. The next patches
> will introduce new modules for storage/, which will use the heap,
> and will reload it.
>
> Also it does not have any global objects. So it does not
> need its own global M, if this is what you meant.

Yes, thanks for your answer. Got it.

>>> diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua
>>> new file mode 100755
>>> index 0000000..8c3819f
>>> --- /dev/null
>>> +++ b/test/unit-tap/heap.test.lua
>>> @@ -0,0 +1,310 @@
>>> +#!/usr/bin/env tarantool
>>> +
>>> +local tap = require('tap')
>>> +local test = tap.test("cfg")
>>> +local heap = require('vshard.heap')
>>> +
>> Maybe it's better to use single brackets everywhere: test("cfg") -> test('cfg'). Or does such difference have some sense?
> Yeah, didn't notice it. Here is the diff:
>
> ====================
> diff --git a/test/unit-tap/heap.test.lua b/test/unit-tap/heap.test.lua
> index 8c3819f..9202f62 100755
> --- a/test/unit-tap/heap.test.lua
> +++ b/test/unit-tap/heap.test.lua
> @@ -1,7 +1,7 @@
>   #!/usr/bin/env tarantool
>   
>   local tap = require('tap')
> -local test = tap.test("cfg")
> +local test = tap.test('cfg')
>   local heap = require('vshard.heap')
>   
>   --
> @@ -109,7 +109,7 @@ local function test_min_heap_basic(test)
>           until not next_permutation(indexes)
>       end
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   --
> @@ -143,7 +143,7 @@ local function test_max_heap_basic(test)
>           until not next_permutation(indexes)
>       end
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   --
> @@ -178,7 +178,7 @@ local function test_min_heap_update_top(test)
>           until not next_permutation(indexes)
>       end
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   --
> @@ -219,7 +219,7 @@ local function test_min_heap_update(test)
>           end
>       end
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   --
> @@ -257,7 +257,7 @@ local function test_max_heap_delete(test)
>           end
>       end
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   local function test_min_heap_remove_top(test)
> @@ -273,7 +273,7 @@ local function test_min_heap_remove_top(test)
>       end
>       assert(h:count() == 0)
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   local function test_max_heap_remove_try(test)
> @@ -294,7 +294,7 @@ local function test_max_heap_remove_try(test)
>       assert(obj.index == -1)
>       assert(h:count() == 1)
>   
> -    test:ok(true, "no asserts")
> +    test:ok(true, 'no asserts')
>   end
>   
>   test:plan(7)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure
  2021-02-11  6:51       ` Oleg Babin via Tarantool-patches
@ 2021-02-12  0:09         ` Vladislav Shpilevoy via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-12  0:09 UTC (permalink / raw)
  To: Oleg Babin, tarantool-patches, yaroslav.dynnikov

On 11.02.2021 07:51, Oleg Babin wrote:
> Thanks for your fixes!
> 
> I found you've missed to add new file to "vshard/CMakeLists.txt" [1]
> 
> 
> [1] https://github.com/tarantool/vshard/blob/master/vshard/CMakeLists.txt#L9

Thanks for noticing! Fixed:

====================
diff --git a/vshard/CMakeLists.txt b/vshard/CMakeLists.txt
index 1063da8..78a3f07 100644
--- a/vshard/CMakeLists.txt
+++ b/vshard/CMakeLists.txt
@@ -7,4 +7,4 @@ add_subdirectory(router)
 
 # Install module
 install(FILES cfg.lua error.lua consts.lua hash.lua init.lua replicaset.lua
-        util.lua lua_gc.lua rlist.lua DESTINATION ${TARANTOOL_INSTALL_LUADIR}/vshard)
+        util.lua lua_gc.lua rlist.lua heap.lua DESTINATION ${TARANTOOL_INSTALL_LUADIR}/vshard)
====================

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure Vladislav Shpilevoy via Tarantool-patches
  2021-02-10  9:01   ` Oleg Babin via Tarantool-patches
@ 2021-03-05 22:03   ` Vladislav Shpilevoy via Tarantool-patches
  1 sibling, 0 replies; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-03-05 22:03 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Applied this diff and force-pushed, in order to eliminate the
metatable and __index access.

Besides, each heap's metatable is different from the others
because the methods are closures, so there wouldn't be any
memory saving from using a metatable. It couldn't have been
shared between the heaps anyway.

====================
diff --git a/vshard/heap.lua b/vshard/heap.lua
index 78c600a..b125921 100644
--- a/vshard/heap.lua
+++ b/vshard/heap.lua
@@ -203,22 +203,21 @@ local function heap_new(is_left_above)
         return count
     end
 
-    return setmetatable({
+    return {
         -- Expose the data. For testing.
         data = data,
-    }, {
-        __index = {
-            push = heap_push,
-            update_top = heap_update_top,
-            remove_top = heap_remove_top,
-            pop = heap_pop,
-            update = heap_update,
-            remove = heap_remove,
-            remove_try = heap_remove_try,
-            top = heap_top,
-            count = heap_count,
-        }
-    })
+        -- Methods are exported as members instead of __index so as to save on
+        -- not taking a metatable and going through __index on each method call.
+        push = heap_push,
+        update_top = heap_update_top,
+        remove_top = heap_remove_top,
+        pop = heap_pop,
+        update = heap_update,
+        remove = heap_remove,
+        remove_try = heap_remove_try,
+        top = heap_top,
+        count = heap_count,
+    }
 end
 
 return {

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations
  2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
                   ` (8 preceding siblings ...)
  2021-02-09 23:46 ` [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-09 23:51 ` Vladislav Shpilevoy via Tarantool-patches
  2021-02-12 11:02   ` Oleg Babin via Tarantool-patches
  9 siblings, 1 reply; 36+ messages in thread
From: Vladislav Shpilevoy via Tarantool-patches @ 2021-02-09 23:51 UTC (permalink / raw)
  To: tarantool-patches, olegrok, yaroslav.dynnikov

Bad links. Here are the correct ones:

Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-147-map-reduce-part1
Issue: https://github.com/tarantool/vshard/issues/147

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations
  2021-02-09 23:51 ` [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
@ 2021-02-12 11:02   ` Oleg Babin via Tarantool-patches
  0 siblings, 0 replies; 36+ messages in thread
From: Oleg Babin via Tarantool-patches @ 2021-02-12 11:02 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, yaroslav.dynnikov

Hi, Vlad! Thanks for your patchset LGTM.

On 10/02/2021 02:51, Vladislav Shpilevoy wrote:
> Bad links. Here are the correct ones:
>
> Branch: http://github.com/tarantool/vshard/tree/gerold103/gh-147-map-reduce-part1
> Issue: https://github.com/tarantool/vshard/issues/147

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-03-05 22:03 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 23:46 [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 1/9] rlist: move rlist to a new module Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
2021-02-11  6:50     ` Oleg Babin via Tarantool-patches
2021-02-12  0:09       ` Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 2/9] Use fiber.clock() instead of .time() everywhere Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 3/9] test: introduce a helper to wait for bucket GC Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:57   ` Oleg Babin via Tarantool-patches
2021-02-10 22:33     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 4/9] storage: bucket_recv() should check rs lock Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 5/9] util: introduce yielding table functions Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 6/9] cfg: introduce 'deprecated option' feature Vladislav Shpilevoy via Tarantool-patches
2021-02-10  8:59   ` Oleg Babin via Tarantool-patches
2021-02-10 22:34     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 7/9] gc: introduce reactive garbage collector Vladislav Shpilevoy via Tarantool-patches
2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
2021-02-10 22:35     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:50       ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 8/9] recovery: introduce reactive recovery Vladislav Shpilevoy via Tarantool-patches
2021-02-10  9:00   ` Oleg Babin via Tarantool-patches
2021-02-09 23:46 ` [Tarantool-patches] [PATCH 9/9] util: introduce binary heap data structure Vladislav Shpilevoy via Tarantool-patches
2021-02-10  9:01   ` Oleg Babin via Tarantool-patches
2021-02-10 22:36     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-11  6:51       ` Oleg Babin via Tarantool-patches
2021-02-12  0:09         ` Vladislav Shpilevoy via Tarantool-patches
2021-03-05 22:03   ` Vladislav Shpilevoy via Tarantool-patches
2021-02-09 23:51 ` [Tarantool-patches] [PATCH 0/9] VShard Map-Reduce, part 1, preparations Vladislav Shpilevoy via Tarantool-patches
2021-02-12 11:02   ` Oleg Babin via Tarantool-patches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox