From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Thu, 17 Jan 2019 15:06:42 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] Re: [PATCH 00/12] vinyl: statistics improvements Message-ID: <20190117120642.5gwdzpydnq3rld2a@esperanza> References: <20190117113236.GD28204@chai> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190117113236.GD28204@chai> To: Konstantin Osipov Cc: tarantool-patches@freelists.org List-ID: On Thu, Jan 17, 2019 at 02:32:36PM +0300, Konstantin Osipov wrote: > * Vladimir Davydov [19/01/15 17:20]: > > This patch set adds a few metrics necessary for implementing compaction > > randomization and transaction throttling, but it's useful on its own, > > because it makes box.stat.vinyl() a little bit more useful when it comes > > to performance analysis. Here's an example of box.stat.vinyl() output > > with this patch set applied: > > Please write a documentation request which explains the meaning of > these variables. AFAIK these stats are still not described in the > manual. Please try to explain why these statistics are useful, and > how they can be used. There are a lot of changes and they are done by spearate patches so I'm planning to file a documentation request manually after this patch set is pushed. > > > --- > > - tx: > > conflict: 0 > > commit: 1979052 > > rollback: 0 > > statements: 2 > > transactions: 1 > > gap_locks: 0 > > read_views: 0 > > regulator: > > let's rename it to rate_limit or rate_limits? Regulator is not > specific enough. What does it regulate? Transaction rate. I guess 'rate_limit' name would be somewhat more straightforward, but the component is called vy_regulator in the code and I'd like to keep the name 'regulator', because it'd be consistent with other box.stat.vinyl() sections: scheduler - schedules dumps and compaction tasks regulator - regulates transaction rate basing on scheduler progress iterator - here we will account cumulative iterator statistics (cache hits/misses, read amplification, etc); this one hasn't been implemented yet. Besides, I'm planning to add 'rate_limit' member to this table and regulator.rate_limit looks better than rate_limit.rate_limit or rate_limit.value IMO. > > > dump_bandwidth: 10485760 > Without comments even I forget the meaning of these. So we have a documentation for it. > > > dump_watermark: 20023725 > > write_rate: 7085581 > > > memory: > > tuple_cache: 0 > > tx: 2388 > > level0: 19394239 > > page_index: 4422529 > > bloom_filter: 1517177 > > Good. > > > disk: > > data_compacted: 500330587 > > What's this? Size of disk space (without compression) the database would take if all spaces were compacted: (data + index) / data_compacted can be used to estimate space amplification. It is estimated as the size of the last LSM tree level. Wouldn't know how to name it better: data_unique may be, or data_stripped, or simply last_level. IMO data_compacted sounds better. > > > data: 762493299 > > index: 41814873 > > scheduler: > > dump_time: 186.61679973663 > > It's total dump time, the name can be confused with > last dump time. Yep, but reporting the last dump time wouldn't make any sense. I'd like to avoid total_ prefixes as the names are already long enough. > > > tasks_inprogress: 3 > > dump_output: 2115930554 > > compaction_queue: 213022513 > > compaction_output: 4130054964 > > compaction_time: 737.99443827965 > > dump_count: 136 > > tasks_failed: 0 > > tasks_completed: 1839 > > dump_input: 2061676471 > > compaction_input: 5646476938