[Tarantool-patches] [PATCH vshard 03/11] storage: cache bucket count

Thu Feb 25 00:47:28 MSK 2021

Thanks for the review!

On 24.02.2021 11:27, Oleg Babin wrote:
> Thanks for your patch! LGTM.
> 
> I see calls like "status_index:count({consts.BUCKET.ACTIVE})". Maybe it worth
> 
> to cache whole buckets stats as well?

I thought about it a lot. But realized that I need only a few cached
metrics used for most of the requests. Count of active buckets is not
one of them, but would waste time on invaliding the cache on each
generation update.

Talking specifically, count({consts.BUCKET.ACTIVE}) is used by
rebalancer only which happens extremely rare. So there is no win in
optimizing it for normal cluster operation.

Even now I worry about doing too much in the generation increment
trigger. To calculate and keep the stat up to date I would need to
make it more universal. So for example store number of buckets of
each type. Then I face the issues:

- In on_replace trigger I need to extract bucket status from the old
  and new tuple, update the relevant counters. I mostly worry about
  extracting the statuses (too long).

- I need to handle the rollback to somehow revert the counters back.

I could do something similar to the cache in this patch (I simply
calculate the counts on demand and invalidate them all on each generation
update), but it does not fix the real issue with the counts - they
can be long if bucket count is millions, and the cache will be invalidated
a lot during rebalancing. Exactly when a cache could help most.

In the end I decided not to bother with this now in scope of map-reduce.