[tarantool-patches] Re: [PATCH 00/12] vinyl: statistics improvements

Vladimir Davydov vdavydov.dev at gmail.com
Thu Jan 17 15:06:42 MSK 2019


On Thu, Jan 17, 2019 at 02:32:36PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev at gmail.com> [19/01/15 17:20]:
> > This patch set adds a few metrics necessary for implementing compaction
> > randomization and transaction throttling, but it's useful on its own,
> > because it makes box.stat.vinyl() a little bit more useful when it comes
> > to performance analysis. Here's an example of box.stat.vinyl() output
> > with this patch set applied:
> 
> Please write a documentation request which explains the meaning of
> these variables. AFAIK these stats are still not described in the
> manual. Please try to explain why these statistics are useful, and
> how they can be used.

There are a lot of changes and they are done by spearate patches so
I'm planning to file a documentation request manually after this patch
set is pushed.

> 
> > ---
> > - tx:
> >     conflict: 0
> >     commit: 1979052
> >     rollback: 0
> >     statements: 2
> >     transactions: 1
> >     gap_locks: 0
> >     read_views: 0
> >   regulator:
> 
> let's rename it to rate_limit or rate_limits? Regulator is not
> specific enough. What does it regulate?

Transaction rate. I guess 'rate_limit' name would be somewhat more
straightforward, but the component is called vy_regulator in the code
and I'd like to keep the name 'regulator', because it'd be consistent
with other box.stat.vinyl() sections:

  scheduler - schedules dumps and compaction tasks
  regulator - regulates transaction rate basing on scheduler progress
  iterator - here we will account cumulative iterator statistics
  (cache hits/misses, read amplification, etc); this one hasn't been
  implemented yet.

Besides, I'm planning to add 'rate_limit' member to this table and
regulator.rate_limit looks better than rate_limit.rate_limit or
rate_limit.value IMO.

> 
> >     dump_bandwidth: 10485760
> Without comments even I forget the meaning of these.

So we have a documentation for it.

> 
> >     dump_watermark: 20023725
> >     write_rate: 7085581
> 
> >   memory:
> >     tuple_cache: 0
> >     tx: 2388
> >     level0: 19394239
> >     page_index: 4422529
> >     bloom_filter: 1517177
> 
> Good.
> 
> >   disk:
> >     data_compacted: 500330587
> 
> What's this? 

Size of disk space (without compression) the database would take if all
spaces were compacted: (data + index) / data_compacted can be used to
estimate space amplification. It is estimated as the size of the last
LSM tree level. Wouldn't know how to name it better: data_unique may be,
or data_stripped, or simply last_level. IMO data_compacted sounds
better.

> 
> >     data: 762493299
> >     index: 41814873
> >   scheduler:
> >     dump_time: 186.61679973663
> 
> It's total dump time, the name can be confused with
> last dump time.

Yep, but reporting the last dump time wouldn't make any sense. I'd like
to avoid total_ prefixes as the names are already long enough.

> 
> >     tasks_inprogress: 3
> >     dump_output: 2115930554
> >     compaction_queue: 213022513
> >     compaction_output: 4130054964
> >     compaction_time: 737.99443827965
> >     dump_count: 136
> >     tasks_failed: 0
> >     tasks_completed: 1839
> >     dump_input: 2061676471
> >     compaction_input: 5646476938



More information about the Tarantool-patches mailing list