Tarantool development patches archive
 help / color / mirror / Atom feed
From: Oleg Babin via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>,
	tarantool-patches@dev.tarantool.org,
	yaroslav.dynnikov@tarantool.org
Subject: Re: [Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count
Date: Wed, 24 Feb 2021 13:27:28 +0300	[thread overview]
Message-ID: <542ed6d8-9b68-7455-5a79-b192078c139e@tarantool.org> (raw)
In-Reply-To: <7a52b9f05ce9c6f2b44164668f09706c7723b104.1614039039.git.v.shpilevoy@tarantool.org>

Thanks for your patch! LGTM.

I see calls like "status_index:count({consts.BUCKET.ACTIVE})". Maybe it 
worth

to cache whole buckets stats as well?

On 23.02.2021 03:15, Vladislav Shpilevoy wrote:
> Bucket count calculation costs 1 FFI call in Lua, and makes a few
> actions and virtual calls in C. So it is not free even for memtx
> spaces.
>
> But it changes extremely rare, which makes reasonable to cache the
> value.
>
> Bucket count is not used much now, but will be used a lot in the
> future storage_ref() function, which is a part of map-reduce API.
>
> The idea is that a router will need to reference all the storages
> and ensure that all the buckets in the cluster are pinned to their
> storages. To check this, storage_ref() will return number of
> buckets successfully pinned on the storage.
>
> The router will sum counts from all storage_ref() calls and ensure
> it equals to total configured bucket count.
>
> This means bucket count is needed for each storage_ref() call,
> whose count per second can be thousands and more.
>
> The patch makes count calculation cost as much as one Lua function
> call and a Lua table index operation (almost always).
>
> Needed for #147
> ---
>   test/storage/storage.result   | 45 +++++++++++++++++++++++++++++++++++
>   test/storage/storage.test.lua | 18 ++++++++++++++
>   vshard/storage/init.lua       | 44 +++++++++++++++++++++++++++++-----
>   3 files changed, 101 insertions(+), 6 deletions(-)
>
> diff --git a/test/storage/storage.result b/test/storage/storage.result
> index 0550ad1..edb45be 100644
> --- a/test/storage/storage.result
> +++ b/test/storage/storage.result
> @@ -677,6 +677,51 @@ rs:callro('echo', {'some_data'})
>   - null
>   - null
>   ...
> +--
> +-- Bucket count is calculated properly.
> +--
> +-- Cleanup after the previous tests.
> +_ = test_run:switch('storage_1_a')
> +---
> +...
> +buckets = vshard.storage.buckets_info()
> +---
> +...
> +for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end
> +---
> +...
> +_ = test_run:switch('storage_2_a')
> +---
> +...
> +buckets = vshard.storage.buckets_info()
> +---
> +...
> +for bid, _ in pairs(buckets) do vshard.storage.bucket_force_drop(bid) end
> +---
> +...
> +_ = test_run:switch('storage_1_a')
> +---
> +...
> +assert(vshard.storage.buckets_count() == 0)
> +---
> +- true
> +...
> +vshard.storage.bucket_force_create(1, 5)
> +---
> +- true
> +...
> +assert(vshard.storage.buckets_count() == 5)
> +---
> +- true
> +...
> +vshard.storage.bucket_force_create(6, 5)
> +---
> +- true
> +...
> +assert(vshard.storage.buckets_count() == 10)
> +---
> +- true
> +...
>   _ = test_run:switch("default") --- ... diff --git a/test/storage/storage.test.lua 
> b/test/storage/storage.test.lua index d8fbd94..db014ef 100644 --- 
> a/test/storage/storage.test.lua +++ b/test/storage/storage.test.lua @@ 
> -187,6 +187,24 @@ util.has_same_fields(old_internal, 
> vshard.storage.internal) _, rs = 
> next(vshard.storage.internal.replicasets) rs:callro('echo', 
> {'some_data'}) +-- +-- Bucket count is calculated properly. +-- +-- 
> Cleanup after the previous tests. +_ = test_run:switch('storage_1_a') 
> +buckets = vshard.storage.buckets_info() +for bid, _ in pairs(buckets) 
> do vshard.storage.bucket_force_drop(bid) end +_ = 
> test_run:switch('storage_2_a') +buckets = 
> vshard.storage.buckets_info() +for bid, _ in pairs(buckets) do 
> vshard.storage.bucket_force_drop(bid) end + +_ = 
> test_run:switch('storage_1_a') +assert(vshard.storage.buckets_count() 
> == 0) +vshard.storage.bucket_force_create(1, 5) 
> +assert(vshard.storage.buckets_count() == 5) 
> +vshard.storage.bucket_force_create(6, 5) 
> +assert(vshard.storage.buckets_count() == 10) + _ = test_run:switch("default")
>   test_run:drop_cluster(REPLICASET_2)
>   test_run:drop_cluster(REPLICASET_1)
> diff --git a/vshard/storage/init.lua b/vshard/storage/init.lua
> index a3d383d..9b74bcb 100644
> --- a/vshard/storage/init.lua
> +++ b/vshard/storage/init.lua
> @@ -110,6 +110,9 @@ if not M then
>           -- replace the old function is to keep its reference.
>           --
>           bucket_on_replace = nil,
> +        -- Fast alternative to box.space._bucket:count(). But may be nil. Reset
> +        -- on each generation change.
> +        bucket_count_cache = nil,
>           -- Redirects for recently sent buckets. They are kept for a while to
>           -- help routers to find a new location for sent and deleted buckets
>           -- without whole cluster scan.
> @@ -183,10 +186,44 @@ local function local_call(func_name, args)
>       return pcall(netbox_self_call, netbox_self, func_name, args)
>   end
>   
> +--
> +-- Get number of buckets stored on this storage. Regardless of their state.
> +--
> +-- The idea is that all the code should use one function ref to get the bucket
> +-- count. But inside the function never branches. Instead, it points at one of 2
> +-- branch-less functions. Cached one simply returns a number which is supposed
> +-- to be super fast. Non-cached remembers the count and changes the global
> +-- function to the cached one. So on the next call it is cheap. No 'if's at all.
> +--
> +local bucket_count
> +
> +local function bucket_count_cache()
> +    return M.bucket_count_cache
> +end
> +
> +local function bucket_count_not_cache()
> +    local count = box.space._bucket:count()
> +    M.bucket_count_cache = count
> +    bucket_count = bucket_count_cache
> +    return count
> +end
> +
> +bucket_count = bucket_count_not_cache
> +
> +--
> +-- Can't expose bucket_count to the public API as is. Need this proxy-call.
> +-- Because the original function changes at runtime.
> +--
> +local function bucket_count_public()
> +    return bucket_count()
> +end
> +
>   --
>   -- Trigger for on replace into _bucket to update its generation.
>   --
>   local function bucket_generation_increment()
> +    bucket_count = bucket_count_not_cache
> +    M.bucket_count_cache = nil
>       M.bucket_generation = M.bucket_generation + 1
>       M.bucket_generation_cond:broadcast()
>   end
> @@ -2240,7 +2277,6 @@ local function rebalancer_request_state()
>       if #status_index:select({consts.BUCKET.GARBAGE}, {limit = 1}) > 0 then
>           return
>       end
> -    local bucket_count = _bucket:count()
>       return {
>           bucket_active_count = status_index:count({consts.BUCKET.ACTIVE}),
>           bucket_pinned_count = status_index:count({consts.BUCKET.PINNED}),
> @@ -2501,10 +2537,6 @@ end
>   -- Monitoring
>   --------------------------------------------------------------------------------
>   
> -local function storage_buckets_count()
> -    return  box.space._bucket.index.pk:count()
> -end
> -
>   local function storage_buckets_info(bucket_id)
>       local ibuckets = setmetatable({}, { __serialize = 'mapping' })
>   
> @@ -2780,7 +2812,7 @@ return {
>       cfg = function(cfg, uuid) return storage_cfg(cfg, uuid, false) end,
>       info = storage_info,
>       buckets_info = storage_buckets_info,
> -    buckets_count = storage_buckets_count,
> +    buckets_count = bucket_count_public,
>       buckets_discovery = buckets_discovery,
>       rebalancer_request_state = rebalancer_request_state,
>       internal = M,

  reply	other threads:[~2021-02-24 10:28 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  0:15 [Tarantool-patches] [PATCH vshard 00/11] VShard Map-Reduce, part 2: Ref, Sched, Map Vladislav Shpilevoy via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 01/11] error: introduce vshard.error.timeout() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:46     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 10/11] sched: introduce vshard.storage.sched module Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 21:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-04 21:02   ` Oleg Babin via Tarantool-patches
2021-03-05 22:06     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-09  8:03       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 11/11] router: introduce map_callrw() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 22:04     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:43       ` Oleg Babin via Tarantool-patches
2021-02-26 23:58         ` Vladislav Shpilevoy via Tarantool-patches
2021-03-01 10:58           ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 02/11] storage: add helper for local functions invocation Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches [this message]
2021-02-24 21:47     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 04/11] registry: module for circular deps resolution Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 05/11] util: introduce safe fiber_cond_wait() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:48     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 06/11] util: introduce fiber_is_self_canceled() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 07/11] storage: introduce bucket_generation_wait() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 08/11] storage: introduce bucket_are_all_rw() Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:27   ` Oleg Babin via Tarantool-patches
2021-02-24 21:48     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-23  0:15 ` [Tarantool-patches] [PATCH vshard 09/11] ref: introduce vshard.storage.ref module Vladislav Shpilevoy via Tarantool-patches
2021-02-24 10:28   ` Oleg Babin via Tarantool-patches
2021-02-24 21:49     ` Vladislav Shpilevoy via Tarantool-patches
2021-02-25 12:42       ` Oleg Babin via Tarantool-patches
2021-03-04 21:22   ` Oleg Babin via Tarantool-patches
2021-03-05 22:06     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-09  8:03       ` Oleg Babin via Tarantool-patches
2021-03-21 18:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-12 23:13 ` [Tarantool-patches] [PATCH vshard 00/11] VShard Map-Reduce, part 2: Ref, Sched, Map Vladislav Shpilevoy via Tarantool-patches
2021-03-15  7:05   ` Oleg Babin via Tarantool-patches
2021-03-28 18:17 ` Vladislav Shpilevoy via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542ed6d8-9b68-7455-5a79-b192078c139e@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=olegrok@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --cc=yaroslav.dynnikov@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox