[Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count

Thu Feb 25 15:42:25 MSK 2021

Thanks for your answer. You are right, let's won't overcomplicate this task.

On 25.02.2021 00:47, Vladislav Shpilevoy wrote:
> Thanks for the review!
>
> On 24.02.2021 11:27, Oleg Babin wrote:
>> Thanks for your patch! LGTM.
>>
>> I see calls like "status_index:count({consts.BUCKET.ACTIVE})". Maybe it worth
>>
>> to cache whole buckets stats as well?
> I thought about it a lot. But realized that I need only a few cached
> metrics used for most of the requests. Count of active buckets is not
> one of them, but would waste time on invaliding the cache on each
> generation update.
>
> Talking specifically, count({consts.BUCKET.ACTIVE}) is used by
> rebalancer only which happens extremely rare. So there is no win in
> optimizing it for normal cluster operation.
>
> Even now I worry about doing too much in the generation increment
> trigger. To calculate and keep the stat up to date I would need to
> make it more universal. So for example store number of buckets of
> each type. Then I face the issues:
>
> - In on_replace trigger I need to extract bucket status from the old
>    and new tuple, update the relevant counters. I mostly worry about
>    extracting the statuses (too long).
>
> - I need to handle the rollback to somehow revert the counters back.
>
> I could do something similar to the cache in this patch (I simply
> calculate the counts on demand and invalidate them all on each generation
> update), but it does not fix the real issue with the counts - they
> can be long if bucket count is millions, and the cache will be invalidated
> a lot during rebalancing. Exactly when a cache could help most.
>
> In the end I decided not to bother with this now in scope of map-reduce.