[patches] [PATCH 2/2] vinyl: implement space.bsize, index.bsize, and index.len

Vladimir Davydov vdavydov.dev at gmail.com
Sat Feb 10 13:59:06 MSK 2018


On Sat, Feb 10, 2018 at 09:40:58AM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [18/02/09 15:31]:
> > > >  - space.bsize returns the size of data stored in the space.
> > > >    It is the sum of memory.bytes and disk.bytes as reported
> > > >    by the primary index.
> > > 
> > > Why is it looking at the primary index only? Shouldn't it take
> > > into account all indexes? 
> > 
> > AFAIU usere want to see how much user data a space stores, hence len and
> > bsize return the actual number of rows and their cumulative size. This
> > is consistent with memtx.
> 
> Somehow it feels wrong. bsize() idea is binary size of the data, not
> msgpack size. The reason we don't add tuple memory overhead to
> tuple:bsize() because it's incomplete (there are fragmentation
> expenses not easily accountable) and useless (what we can account
> is fixed size). I think for space:bsize() we need to display
> binary size of entire space, both for memtx and vinyl.

I'm just trying to think as a user, and for a user the bsize()
implementation you're suggesting wouldn't make much sense IMO.
What does a typical user want to know about his instance?
For me, it would be:

 1. How much data my instance handles. This is what space.bsize()
    and space.len() implemented by this patch set show.

 2. How much disk space my database takes. To figure this out,
    I can (and I'm pretty sure most users will) check OS stats
    (e.g. df or du).

 3. What the overhead of indexing the data is. This is reported
    by index.bsize().

Now, you're saying that space.bsize() should report the total
size of binary data stored by all indexes of the space. OK, but
how can a user get an answer to Q1 then? The only way I see is
check out low-level vinyl-specific statistics reported by
index.info(), which is not particularly user-friendly, don't you
think? Then what would space.bsize() be useful for? For answering
Q2? Not really, because (a) some data is stored in memory,
(b) run files are compressed, (c) actual disk usage tends to
be greater than reported by index.info() because we keep a few
older snapshots as configured by box.cfg.checkpoint_count, and
(d) as I said, there are OS stats for that.

That being said, I fail to see any value in space.bsize()
implementation you're suggesting. What you're saying above
doesn't really help. Could you please elaborate?



More information about the Tarantool-patches mailing list