Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladislav Shpilevoy via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: tarantool-patches@dev.tarantool.org, olegrok@tarantool.org,
	yaroslav.dynnikov@tarantool.org
Subject: [Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count
Date: Tue, 23 Feb 2021 01:15:41 +0100
Message-ID: <7a52b9f05ce9c6f2b44164668f09706c7723b104.1614039039.git.v.shpilevoy@tarantool.org> (raw)
In-Reply-To: <cover.1614039039.git.v.shpilevoy@tarantool.org>

Bucket count calculation costs 1 FFI call in Lua, and makes a few
actions and virtual calls in C. So it is not free even for memtx
spaces.

But it changes extremely rare, which makes reasonable to cache the
value.

Bucket count is not used much now, but will be used a lot in the
future storage_ref() function, which is a part of map-reduce API.

The idea is that a router will need to reference all the storages
and ensure that all the buckets in the cluster are pinned to their
storages. To check this, storage_ref() will return number of
buckets successfully pinned on the storage.

The router will sum counts from all storage_ref() calls and ensure
it equals to total configured bucket count.

This means bucket count is needed for each storage_ref() call,
whose count per second can be thousands and more.

The patch makes count calculation cost as much as one Lua function
call and a Lua table index operation (almost always).

Needed for #147
---
 test/storage/storage.result   | 45 +++++++++++++++++++++++++++++++++++
 test/storage/storage.test.lua | 18 ++++++++++++++
 vshard/storage/init.lua       | 44 +++++++++++++++++++++++++++++-----
 3 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/test/storage/storage.result b/test/storage/storage.result
index 0550ad1..edb45be 100644
--- a/test/storage/storage.result
+++ b/test/storage/storage.result
@@ -677,6 +677,51 @@ rs:callro('echo', {'some_data'})
 - null
 - null
 ...
+--
+-- Bucket count is calculated properly.
+--
+-- Cleanup after the previous tests.
+_ = test_run:switch('storage_1_a')
+---
+...
+buckets = vshard.storage.buckets_info()
+---
+...
+for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end
+---
+...
+_ = test_run:switch('storage_2_a')
+---
+...
+buckets = vshard.storage.buckets_info()
+---
+...
+for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end
+---
+...
+_ = test_run:switch('storage_1_a')
+---
+...
+assert(vshard.storage.buckets_count() == 0)
+---
+- true
+...
+vshard.storage.bucket_force_create(1, 5)
+---
+- true
+...
+assert(vshard.storage.buckets_count() == 5)
+---
+- true
+...
+vshard.storage.bucket_force_create(6, 5)
+---
+- true
+...
+assert(vshard.storage.buckets_count() == 10)
+---
+- true
+...
 _ = test_run:switch("default")
 ---
 ...
diff --git a/test/storage/storage.test.lua b/test/storage/storage.test.lua
index d8fbd94..db014ef 100644
--- a/test/storage/storage.test.lua
+++ b/test/storage/storage.test.lua
@@ -187,6 +187,24 @@ util.has_same_fields(old_internal, vshard.storage.internal)
 _, rs = next(vshard.storage.internal.replicasets)
 rs:callro('echo', {'some_data'})
 
+--
+-- Bucket count is calculated properly.
+--
+-- Cleanup after the previous tests.
+_ = test_run:switch('storage_1_a')
+buckets = vshard.storage.buckets_info()
+for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end
+_ = test_run:switch('storage_2_a')
+buckets = vshard.storage.buckets_info()
+for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end
+
+_ = test_run:switch('storage_1_a')
+assert(vshard.storage.buckets_count() == 0)
+vshard.storage.bucket_force_create(1, 5)
+assert(vshard.storage.buckets_count() == 5)
+vshard.storage.bucket_force_create(6, 5)
+assert(vshard.storage.buckets_count() == 10)
+
 _ = test_run:switch("default")
 test_run:drop_cluster(REPLICASET_2)
 test_run:drop_cluster(REPLICASET_1)
diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
index a3d383d..9b74bcb 100644
--- a/vshard/storage/init.lua
+++ b/vshard/storage/init.lua
@@ -110,6 +110,9 @@ if not M then
         -- replace the old function is to keep its reference.
         --
         bucket_on_replace = nil,
+        -- Fast alternative to box.space._bucket:count(). But may be nil. Reset
+        -- on each generation change.
+        bucket_count_cache = nil,
         -- Redirects for recently sent buckets. They are kept for a while to
         -- help routers to find a new location for sent and deleted buckets
         -- without whole cluster scan.
@@ -183,10 +186,44 @@ local function local_call(func_name, args)
     return pcall(netbox_self_call, netbox_self, func_name, args)
 end
 
+--
+-- Get number of buckets stored on this storage. Regardless of their state.
+--
+-- The idea is that all the code should use one function ref to get the bucket
+-- count. But inside the function never branches. Instead, it points at one of 2
+-- branch-less functions. Cached one simply returns a number which is supposed
+-- to be super fast. Non-cached remembers the count and changes the global
+-- function to the cached one. So on the next call it is cheap. No 'if's at all.
+--
+local bucket_count
+
+local function bucket_count_cache()
+    return M.bucket_count_cache
+end
+
+local function bucket_count_not_cache()
+    local count = box.space._bucket:count()
+    M.bucket_count_cache = count
+    bucket_count = bucket_count_cache
+    return count
+end
+
+bucket_count = bucket_count_not_cache
+
+--
+-- Can't expose bucket_count to the public API as is. Need this proxy-call.
+-- Because the original function changes at runtime.
+--
+local function bucket_count_public()
+    return bucket_count()
+end
+
 --
 -- Trigger for on replace into _bucket to update its generation.
 --
 local function bucket_generation_increment()
+    bucket_count = bucket_count_not_cache
+    M.bucket_count_cache = nil
     M.bucket_generation = M.bucket_generation + 1
     M.bucket_generation_cond:broadcast()
 end
@@ -2240,7 +2277,6 @@ local function rebalancer_request_state()
     if #status_index:select({consts.BUCKET.GARBAGE}, {limit = 1}) > 0 then
         return
     end
-    local bucket_count = _bucket:count()
     return {
         bucket_active_count = status_index:count({consts.BUCKET.ACTIVE}),
         bucket_pinned_count = status_index:count({consts.BUCKET.PINNED}),
@@ -2501,10 +2537,6 @@ end
 -- Monitoring
 --------------------------------------------------------------------------------
 
-local function storage_buckets_count()
-    return  box.space._bucket.index.pk:count()
-end
-
 local function storage_buckets_info(bucket_id)
     local ibuckets = setmetatable({}, { __serialize = 'mapping' })
 
@@ -2780,7 +2812,7 @@ return {
     cfg = function(cfg, uuid) return storage_cfg(cfg, uuid, false) end,
     info = storage_info,
     buckets_info = storage_buckets_info,
-    buckets_count = storage_buckets_count,
+    buckets_count = bucket_count_public,
     buckets_discovery = buckets_discovery,
     rebalancer_request_state = rebalancer_request_state,
     internal = M,
-- 
2.24.3 (Apple Git-128)


  parent reply	other threads:[~2021-02-23  0:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  0:15 [Tarantool-patches] [PATCH vshard 00/11] VShard Map-Reduce, part 2: Ref, Sched, Map Vladislav Shpilevoy via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 01/11] error: introduce vshard.error.timeout() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:46     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 10/11] sched: introduce vshard.storage.sched module Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 21:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-04 21:02   ` Oleg Babin via Tarantool-patches
2021-03-05 22:06     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-09  8:03       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 11/11] router: introduce map_callrw() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 22:04     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:43       ` Oleg Babin via Tarantool-patches
2021-02-26 23:58         ` Vladislav Shpilevoy via Tarantool-patches
2021-03-01 10:58           ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 02/11] storage: add helper for local functions invocation Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` Vladislav Shpilevoy via Tarantool-patches [this message]
2021-02-24 10:27   ` [Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count Oleg Babin via Tarantool-patches
2021-02-24 21:47     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 04/11] registry: module for circular deps resolution Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 05/11] util: introduce safe fiber_cond_wait() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:48     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 06/11] util: introduce fiber_is_self_canceled() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 07/11] storage: introduce bucket_generation_wait() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 08/11] storage: introduce bucket_are_all_rw() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:48     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 09/11] ref: introduce vshard.storage.ref module Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 21:49     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-03-04 21:22   ` Oleg Babin via Tarantool-patches
2021-03-05 22:06     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-09  8:03       ` Oleg Babin via Tarantool-patches
2021-03-21 18:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-12 23:13 ` [Tarantool-patches] [PATCH vshard 00/11] VShard Map-Reduce, part 2: Ref, Sched, Map Vladislav Shpilevoy via Tarantool-patches
2021-03-15  7:05   ` Oleg Babin via Tarantool-patches
2021-03-28 18:17 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a52b9f05ce9c6f2b44164668f09706c7723b104.1614039039.git.v.shpilevoy@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=yaroslav.dynnikov@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool development patches archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-patches/0 tarantool-patches/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-patches tarantool-patches/ https://lists.tarantool.org/tarantool-patches \
		tarantool-patches@dev.tarantool.org.
	public-inbox-index tarantool-patches

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git