* [PATCH v2 1/8] vinyl: fix force compaction logic
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:43 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 2/8] vinyl: update compact priority usual way on range split/coalesce Vladimir Davydov
` (7 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
This patch addresses a few problems index.compact() is suffering from,
namely:
- When a range is split or coalesced, it should inherit the value of
needs_compaction flag from the source ranges. Currently, the flag is
cleared so that the resulting range may be not compacted.
- If a range has no slices, we shouldn't set needs_compaction flag for
it, because obviously it can't be compacted, but we do.
- The needs_compaction flag should be cleared as soon as we schedule a
range for compaction, not when all slices have been compacted into
one, as we presently expect, because the latter may never happen
under a write-intensive load.
---
src/box/vy_lsm.c | 9 +++++++--
src/box/vy_range.c | 16 ++--------------
src/box/vy_range.h | 8 ++------
src/box/vy_scheduler.c | 2 ++
4 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 1994525e..71a44f63 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -1016,6 +1016,7 @@ vy_lsm_split_range(struct vy_lsm *lsm, struct vy_range *range)
if (new_slice != NULL)
vy_range_add_slice(part, new_slice);
}
+ part->needs_compaction = range->needs_compaction;
part->compact_priority = range->compact_priority;
}
@@ -1123,6 +1124,8 @@ vy_lsm_coalesce_range(struct vy_lsm *lsm, struct vy_range *range)
rlist_splice(&result->slices, &it->slices);
result->slice_count += it->slice_count;
vy_disk_stmt_counter_add(&result->count, &it->count);
+ if (it->needs_compaction)
+ result->needs_compaction = true;
vy_range_delete(it);
it = next;
}
@@ -1157,8 +1160,10 @@ vy_lsm_force_compaction(struct vy_lsm *lsm)
struct vy_range_tree_iterator it;
vy_range_tree_ifirst(lsm->tree, &it);
- while ((range = vy_range_tree_inext(&it)) != NULL)
- vy_range_force_compaction(range);
+ while ((range = vy_range_tree_inext(&it)) != NULL) {
+ range->needs_compaction = true;
+ vy_range_update_compact_priority(range, &lsm->opts);
+ }
vy_range_heap_update_all(&lsm->range_heap);
}
diff --git a/src/box/vy_range.c b/src/box/vy_range.c
index 6a55a018..ddcd2ed3 100644
--- a/src/box/vy_range.c
+++ b/src/box/vy_range.c
@@ -262,18 +262,6 @@ vy_range_remove_slice(struct vy_range *range, struct vy_slice *slice)
vy_disk_stmt_counter_sub(&range->count, &slice->count);
}
-void
-vy_range_force_compaction(struct vy_range *range)
-{
- if (range->slice_count == 1) {
- /* Already compacted. */
- assert(!range->needs_compaction);
- return;
- }
- range->needs_compaction = true;
- range->compact_priority = range->slice_count;
-}
-
/**
* To reduce write amplification caused by compaction, we follow
* the LSM tree design. Runs in each range are divided into groups
@@ -304,9 +292,9 @@ vy_range_update_compact_priority(struct vy_range *range,
assert(opts->run_count_per_level > 0);
assert(opts->run_size_ratio > 1);
- if (range->slice_count == 1) {
+ if (range->slice_count <= 1) {
/* Nothing to compact. */
- range->compact_priority = 1;
+ range->compact_priority = 0;
range->needs_compaction = false;
return;
}
diff --git a/src/box/vy_range.h b/src/box/vy_range.h
index d7031e70..2ca19a1c 100644
--- a/src/box/vy_range.h
+++ b/src/box/vy_range.h
@@ -110,8 +110,8 @@ struct vy_range {
* If this flag is set, the range must be scheduled for
* major compaction, i.e. its compact_priority must be
* raised to max (slice_count). The flag is set by
- * vy_range_force_compaction() and cleared automatically
- * when all slices of the range have been compacted.
+ * vy_lsm_force_compaction() and cleared when the range
+ * is scheduled for compaction.
*/
bool needs_compaction;
/** Number of times the range was compacted. */
@@ -229,10 +229,6 @@ vy_range_add_slice_before(struct vy_range *range, struct vy_slice *slice,
void
vy_range_remove_slice(struct vy_range *range, struct vy_slice *slice);
-/** Mark a range for major compaction. */
-void
-vy_range_force_compaction(struct vy_range *range);
-
/**
* Update compaction priority of a range.
*
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index eb63e298..08de214a 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -1668,6 +1668,8 @@ vy_task_compact_new(struct vy_scheduler *scheduler, struct vy_worker *worker,
assert(n == 0);
assert(new_run->dump_lsn >= 0);
+ range->needs_compaction = false;
+
task->range = range;
task->new_run = new_run;
task->wi = wi;
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/8] vinyl: fix force compaction logic
2018-09-16 17:06 ` [PATCH v2 1/8] vinyl: fix force compaction logic Vladimir Davydov
@ 2018-09-19 1:43 ` Konstantin Osipov
0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Osipov @ 2018-09-19 1:43 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/09/17 15:05]:
> This patch addresses a few problems index.compact() is suffering from,
> namely:
>
> - When a range is split or coalesced, it should inherit the value of
> needs_compaction flag from the source ranges. Currently, the flag is
> cleared so that the resulting range may be not compacted.
>
> - If a range has no slices, we shouldn't set needs_compaction flag for
> it, because obviously it can't be compacted, but we do.
>
> - The needs_compaction flag should be cleared as soon as we schedule a
> range for compaction, not when all slices have been compacted into
> one, as we presently expect, because the latter may never happen
> under a write-intensive load.
OK to push (I assume you've already implemented the comment
changes we discussed in the chat).
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 2/8] vinyl: update compact priority usual way on range split/coalesce
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
2018-09-16 17:06 ` [PATCH v2 1/8] vinyl: fix force compaction logic Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:46 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 3/8] vinyl: annotate info_table_end with comment Vladimir Davydov
` (6 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
When a few ranges are coalesced, we "force" compaction of the resulting
range by raising its compaction priority to max (slice count). There's
actually no point in that, because as long as the shape of the resulting
LSM tree is OK, we don't need to do extra compaction work. Moreover, it
actually doesn't work if a new slice is added to the resulting range by
dump before it gets compacted, which is fairly likely, because then its
compaction priority will be recalculated as usual. So let's simply call
vy_range_update_compact_priority() for the resulting range.
When a range is split, the produced ranges will inherit its compaction
priority. This is actually incorrect, because range split may change the
shape of the tree so let's recalculate priority for each part the usual
way, i.e. by calling vy_range_update_compact_priority().
After this patch, there's this only place where we can update compaction
priority of a range - it's vy_range_update_compact_priority().
---
src/box/vy_lsm.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 71a44f63..a1d4aa80 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -1017,7 +1017,7 @@ vy_lsm_split_range(struct vy_lsm *lsm, struct vy_range *range)
vy_range_add_slice(part, new_slice);
}
part->needs_compaction = range->needs_compaction;
- part->compact_priority = range->compact_priority;
+ vy_range_update_compact_priority(part, &lsm->opts);
}
/*
@@ -1129,13 +1129,7 @@ vy_lsm_coalesce_range(struct vy_lsm *lsm, struct vy_range *range)
vy_range_delete(it);
it = next;
}
- /*
- * Coalescing increases read amplification and breaks the log
- * structured layout of the run list, so, although we could
- * leave the resulting range as it is, we'd better compact it
- * as soon as we can.
- */
- result->compact_priority = result->slice_count;
+ vy_range_update_compact_priority(result, &lsm->opts);
vy_lsm_acct_range(lsm, result);
vy_lsm_add_range(lsm, result);
lsm->range_tree_version++;
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/8] vinyl: update compact priority usual way on range split/coalesce
2018-09-16 17:06 ` [PATCH v2 2/8] vinyl: update compact priority usual way on range split/coalesce Vladimir Davydov
@ 2018-09-19 1:46 ` Konstantin Osipov
0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Osipov @ 2018-09-19 1:46 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/09/17 15:05]:
> When a few ranges are coalesced, we "force" compaction of the resulting
> range by raising its compaction priority to max (slice count). There's
> actually no point in that, because as long as the shape of the resulting
> LSM tree is OK, we don't need to do extra compaction work. Moreover, it
> actually doesn't work if a new slice is added to the resulting range by
> dump before it gets compacted, which is fairly likely, because then its
> compaction priority will be recalculated as usual. So let's simply call
> vy_range_update_compact_priority() for the resulting range.
>
> - /*
> - * Coalescing increases read amplification and breaks the log
> - * structured layout of the run list, so, although we could
> - * leave the resulting range as it is, we'd better compact it
> - * as soon as we can.
> - */
> - result->compact_priority = result->slice_count;
> + vy_range_update_compact_priority(result, &lsm->opts);
Looks like a drastic change of opinion to me :). I would leave
reasons you give in the old comment why we may need a compaction
and the new reasons you give in the CS comment why we may not need
a compaction in a consolidated comment for this line.
The patch itself is OK to push.
> vy_lsm_acct_range(lsm, result);
> vy_lsm_add_range(lsm, result);
> lsm->range_tree_version++;
> --
> 2.11.0
>
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 3/8] vinyl: annotate info_table_end with comment
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
2018-09-16 17:06 ` [PATCH v2 1/8] vinyl: fix force compaction logic Vladimir Davydov
2018-09-16 17:06 ` [PATCH v2 2/8] vinyl: update compact priority usual way on range split/coalesce Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:47 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 4/8] vinyl: report pages and bytes_compressed in dump/compact in/out stats Vladimir Davydov
` (5 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
The code is difficult to follow when there are nested info tables,
because info_table_end() doesn't refer to the table name. Let's
annotate info_table_end() with a comment to make it easier to follow.
No functional changes.
---
src/box/vinyl.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 70be7129..ce58247a 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -259,7 +259,7 @@ vy_info_append_quota(struct vy_env *env, struct info_handler *h)
info_append_int(h, "watermark", q->watermark);
info_append_int(h, "use_rate", q->use_rate);
info_append_int(h, "dump_bandwidth", q->dump_bw);
- info_table_end(h);
+ info_table_end(h); /* quota */
}
static void
@@ -283,7 +283,7 @@ vy_info_append_tx(struct vy_env *env, struct info_handler *h)
mempool_stats(&xm->read_view_mempool, &mstats);
info_append_int(h, "read_views", mstats.objcount);
- info_table_end(h);
+ info_table_end(h); /* tx */
}
static void
@@ -295,7 +295,7 @@ vy_info_append_memory(struct vy_env *env, struct info_handler *h)
info_append_int(h, "tuple_cache", env->cache_env.mem_used);
info_append_int(h, "page_index", env->lsm_env.page_index_size);
info_append_int(h, "bloom_filter", env->lsm_env.bloom_size);
- info_table_end(h);
+ info_table_end(h); /* memory */
}
void
@@ -371,21 +371,21 @@ vinyl_index_stat(struct index *index, struct info_handler *h)
info_append_double(h, "p90", latency_get(&stat->latency, 90));
info_append_double(h, "p95", latency_get(&stat->latency, 95));
info_append_double(h, "p99", latency_get(&stat->latency, 99));
- info_table_end(h);
+ info_table_end(h); /* latency */
info_table_begin(h, "upsert");
info_append_int(h, "squashed", stat->upsert.squashed);
info_append_int(h, "applied", stat->upsert.applied);
- info_table_end(h);
+ info_table_end(h); /* upsert */
info_table_begin(h, "memory");
vy_info_append_stmt_counter(h, NULL, &stat->memory.count);
info_table_begin(h, "iterator");
info_append_int(h, "lookup", stat->memory.iterator.lookup);
vy_info_append_stmt_counter(h, "get", &stat->memory.iterator.get);
- info_table_end(h);
+ info_table_end(h); /* iterator */
info_append_int(h, "index_size", vy_lsm_mem_tree_size(lsm));
- info_table_end(h);
+ info_table_end(h); /* memory */
info_table_begin(h, "disk");
vy_info_append_disk_stmt_counter(h, NULL, &stat->disk.count);
@@ -396,13 +396,13 @@ vinyl_index_stat(struct index *index, struct info_handler *h)
info_table_begin(h, "bloom");
info_append_int(h, "hit", stat->disk.iterator.bloom_hit);
info_append_int(h, "miss", stat->disk.iterator.bloom_miss);
- info_table_end(h);
- info_table_end(h);
+ info_table_end(h); /* bloom */
+ info_table_end(h); /* iterator */
vy_info_append_compact_stat(h, "dump", &stat->disk.dump);
vy_info_append_compact_stat(h, "compact", &stat->disk.compact);
info_append_int(h, "index_size", lsm->page_index_size);
info_append_int(h, "bloom_size", lsm->bloom_size);
- info_table_end(h);
+ info_table_end(h); /* disk */
info_table_begin(h, "cache");
vy_info_append_stmt_counter(h, NULL, &cache_stat->count);
@@ -413,15 +413,15 @@ vinyl_index_stat(struct index *index, struct info_handler *h)
vy_info_append_stmt_counter(h, "evict", &cache_stat->evict);
info_append_int(h, "index_size",
vy_cache_tree_mem_used(&lsm->cache.cache_tree));
- info_table_end(h);
+ info_table_end(h); /* cache */
info_table_begin(h, "txw");
vy_info_append_stmt_counter(h, NULL, &stat->txw.count);
info_table_begin(h, "iterator");
info_append_int(h, "lookup", stat->txw.iterator.lookup);
vy_info_append_stmt_counter(h, "get", &stat->txw.iterator.get);
- info_table_end(h);
- info_table_end(h);
+ info_table_end(h); /* iterator */
+ info_table_end(h); /* txw */
info_append_int(h, "range_count", lsm->range_count);
info_append_int(h, "run_count", lsm->run_count);
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 4/8] vinyl: report pages and bytes_compressed in dump/compact in/out stats
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
` (2 preceding siblings ...)
2018-09-16 17:06 ` [PATCH v2 3/8] vinyl: annotate info_table_end with comment Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:48 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 5/8] vinyl: add helpers for resetting statement counters Vladimir Davydov
` (4 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
There's no reason not to report pages and bytes_compressed under
disk.stat.dump.out and disk.stat.compact.{in,out} apart from using
the same struct for dump and compaction statistics (vy_compact_stat).
The statistics are going to differ anyway once compaction queue size
is added to disk.stat.compact so let's zap struct vy_compact_stat
and report as much info as we can.
---
src/box/vinyl.c | 23 ++++++++++-------------
src/box/vy_scheduler.c | 6 +++---
src/box/vy_stat.h | 27 ++++++++++++++++-----------
test/vinyl/info.result | 28 ++++++++++++++++++++++++----
4 files changed, 53 insertions(+), 31 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index ce58247a..197f16d6 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -337,17 +337,6 @@ vy_info_append_disk_stmt_counter(struct info_handler *h, const char *name,
}
static void
-vy_info_append_compact_stat(struct info_handler *h, const char *name,
- const struct vy_compact_stat *stat)
-{
- info_table_begin(h, name);
- info_append_int(h, "count", stat->count);
- vy_info_append_stmt_counter(h, "in", &stat->in);
- vy_info_append_stmt_counter(h, "out", &stat->out);
- info_table_end(h);
-}
-
-static void
vinyl_index_stat(struct index *index, struct info_handler *h)
{
char buf[1024];
@@ -398,8 +387,16 @@ vinyl_index_stat(struct index *index, struct info_handler *h)
info_append_int(h, "miss", stat->disk.iterator.bloom_miss);
info_table_end(h); /* bloom */
info_table_end(h); /* iterator */
- vy_info_append_compact_stat(h, "dump", &stat->disk.dump);
- vy_info_append_compact_stat(h, "compact", &stat->disk.compact);
+ info_table_begin(h, "dump");
+ info_append_int(h, "count", stat->disk.dump.count);
+ vy_info_append_stmt_counter(h, "in", &stat->disk.dump.in);
+ vy_info_append_disk_stmt_counter(h, "out", &stat->disk.dump.out);
+ info_table_end(h); /* dump */
+ info_table_begin(h, "compact");
+ info_append_int(h, "count", stat->disk.compact.count);
+ vy_info_append_disk_stmt_counter(h, "in", &stat->disk.compact.in);
+ vy_info_append_disk_stmt_counter(h, "out", &stat->disk.compact.out);
+ info_table_end(h); /* compact */
info_append_int(h, "index_size", lsm->page_index_size);
info_append_int(h, "bloom_size", lsm->bloom_size);
info_table_end(h); /* disk */
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index 08de214a..dd1e88d2 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -1182,7 +1182,7 @@ vy_task_dump_complete(struct vy_task *task)
* Account the new run.
*/
vy_lsm_add_run(lsm, new_run);
- vy_stmt_counter_add_disk(&lsm->stat.disk.dump.out, &new_run->count);
+ vy_disk_stmt_counter_add(&lsm->stat.disk.dump.out, &new_run->count);
/* Drop the reference held by the task. */
vy_run_unref(new_run);
@@ -1521,7 +1521,7 @@ vy_task_compact_complete(struct vy_task *task)
*/
if (new_slice != NULL) {
vy_lsm_add_run(lsm, new_run);
- vy_stmt_counter_add_disk(&lsm->stat.disk.compact.out,
+ vy_disk_stmt_counter_add(&lsm->stat.disk.compact.out,
&new_run->count);
/* Drop the reference held by the task. */
vy_run_unref(new_run);
@@ -1544,7 +1544,7 @@ vy_task_compact_complete(struct vy_task *task)
next_slice = rlist_next_entry(slice, in_range);
vy_range_remove_slice(range, slice);
rlist_add_entry(&compacted_slices, slice, in_range);
- vy_stmt_counter_add_disk(&lsm->stat.disk.compact.in,
+ vy_disk_stmt_counter_add(&lsm->stat.disk.compact.in,
&slice->count);
if (slice == last_slice)
break;
diff --git a/src/box/vy_stat.h b/src/box/vy_stat.h
index ca52c4d3..6d37e79f 100644
--- a/src/box/vy_stat.h
+++ b/src/box/vy_stat.h
@@ -101,15 +101,6 @@ struct vy_txw_iterator_stat {
struct vy_stmt_counter get;
};
-/** Dump/compaction statistics. */
-struct vy_compact_stat {
- int32_t count;
- /** Number of input statements. */
- struct vy_stmt_counter in;
- /** Number of output statements. */
- struct vy_stmt_counter out;
-};
-
/** LSM tree statistics. */
struct vy_lsm_stat {
/** Number of lookups in the LSM tree. */
@@ -141,9 +132,23 @@ struct vy_lsm_stat {
/** Run iterator statistics. */
struct vy_run_iterator_stat iterator;
/** Dump statistics. */
- struct vy_compact_stat dump;
+ struct {
+ /* Number of completed tasks. */
+ int32_t count;
+ /** Number of input statements. */
+ struct vy_stmt_counter in;
+ /** Number of output statements. */
+ struct vy_disk_stmt_counter out;
+ } dump;
/** Compaction statistics. */
- struct vy_compact_stat compact;
+ struct {
+ /* Number of completed tasks. */
+ int32_t count;
+ /** Number of input statements. */
+ struct vy_disk_stmt_counter in;
+ /** Number of output statements. */
+ struct vy_disk_stmt_counter out;
+ } compact;
} disk;
/** TX write set statistics. */
struct {
diff --git a/test/vinyl/info.result b/test/vinyl/info.result
index a53ee3ae..3d7108cc 100644
--- a/test/vinyl/info.result
+++ b/test/vinyl/info.result
@@ -160,14 +160,20 @@ istat()
bytes: 0
count: 0
out:
+ bytes_compressed: 0
+ pages: 0
rows: 0
bytes: 0
compact:
in:
+ bytes_compressed: 0
+ pages: 0
rows: 0
bytes: 0
count: 0
out:
+ bytes_compressed: 0
+ pages: 0
rows: 0
bytes: 0
iterator:
@@ -264,8 +270,10 @@ stat_diff(istat(), st)
bytes: 26525
count: 1
out:
- rows: 25
bytes: 26049
+ pages: 7
+ bytes_compressed: <bytes_compressed>
+ rows: 25
index_size: 294
rows: 25
bloom_size: 70
@@ -300,8 +308,10 @@ stat_diff(istat(), st)
bytes: 53050
count: 1
out:
- rows: 50
bytes: 52091
+ pages: 13
+ bytes_compressed: <bytes_compressed>
+ rows: 50
index_size: 252
rows: 25
bytes_compressed: <bytes_compressed>
@@ -309,12 +319,16 @@ stat_diff(istat(), st)
bytes: 26042
compact:
in:
- rows: 75
bytes: 78140
+ pages: 20
+ bytes_compressed: <bytes_compressed>
+ rows: 75
count: 1
out:
- rows: 50
bytes: 52091
+ pages: 13
+ bytes_compressed: <bytes_compressed>
+ rows: 50
put:
rows: 50
bytes: 53050
@@ -958,14 +972,20 @@ istat()
bytes: 0
count: 0
out:
+ bytes_compressed: <bytes_compressed>
+ pages: 0
rows: 0
bytes: 0
compact:
in:
+ bytes_compressed: <bytes_compressed>
+ pages: 0
rows: 0
bytes: 0
count: 0
out:
+ bytes_compressed: <bytes_compressed>
+ pages: 0
rows: 0
bytes: 0
iterator:
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 5/8] vinyl: add helpers for resetting statement counters
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
` (3 preceding siblings ...)
2018-09-16 17:06 ` [PATCH v2 4/8] vinyl: report pages and bytes_compressed in dump/compact in/out stats Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:49 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 6/8] vinyl: keep track of compaction queue length Vladimir Davydov
` (3 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Currently, we call memset() on vy_stmt_counter and vy_disk_stmt_counter
directly, but that looks rather ugly, especially when a counter has a
long name. Let's introduce helper functions for that.
---
src/box/vinyl.c | 28 ++++++++++++++++++++--------
src/box/vy_stat.h | 13 +++++++++++++
2 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 197f16d6..2c479836 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -436,22 +436,34 @@ vinyl_index_reset_stat(struct index *index)
struct vy_lsm_stat *stat = &lsm->stat;
struct vy_cache_stat *cache_stat = &lsm->cache.stat;
+ /* Get/put */
stat->lookup = 0;
latency_reset(&stat->latency);
- memset(&stat->get, 0, sizeof(stat->get));
- memset(&stat->put, 0, sizeof(stat->put));
+ vy_stmt_counter_reset(&stat->get);
+ vy_stmt_counter_reset(&stat->put);
memset(&stat->upsert, 0, sizeof(stat->upsert));
+
+ /* Iterator */
memset(&stat->txw.iterator, 0, sizeof(stat->txw.iterator));
memset(&stat->memory.iterator, 0, sizeof(stat->memory.iterator));
memset(&stat->disk.iterator, 0, sizeof(stat->disk.iterator));
- memset(&stat->disk.dump, 0, sizeof(stat->disk.dump));
- memset(&stat->disk.compact, 0, sizeof(stat->disk.compact));
+ /* Dump */
+ stat->disk.dump.count = 0;
+ vy_stmt_counter_reset(&stat->disk.dump.in);
+ vy_disk_stmt_counter_reset(&stat->disk.dump.out);
+
+ /* Compaction */
+ stat->disk.compact.count = 0;
+ vy_disk_stmt_counter_reset(&stat->disk.compact.in);
+ vy_disk_stmt_counter_reset(&stat->disk.compact.out);
+
+ /* Cache */
cache_stat->lookup = 0;
- memset(&cache_stat->get, 0, sizeof(cache_stat->get));
- memset(&cache_stat->put, 0, sizeof(cache_stat->put));
- memset(&cache_stat->invalidate, 0, sizeof(cache_stat->invalidate));
- memset(&cache_stat->evict, 0, sizeof(cache_stat->evict));
+ vy_stmt_counter_reset(&cache_stat->get);
+ vy_stmt_counter_reset(&cache_stat->put);
+ vy_stmt_counter_reset(&cache_stat->invalidate);
+ vy_stmt_counter_reset(&cache_stat->evict);
}
static void
diff --git a/src/box/vy_stat.h b/src/box/vy_stat.h
index 6d37e79f..83d3b8f9 100644
--- a/src/box/vy_stat.h
+++ b/src/box/vy_stat.h
@@ -32,6 +32,7 @@
*/
#include <stdint.h>
+#include <string.h>
#include "latency.h"
#include "tuple.h"
@@ -204,6 +205,18 @@ vy_lsm_stat_destroy(struct vy_lsm_stat *stat)
}
static inline void
+vy_stmt_counter_reset(struct vy_stmt_counter *c)
+{
+ memset(c, 0, sizeof(*c));
+}
+
+static inline void
+vy_disk_stmt_counter_reset(struct vy_disk_stmt_counter *c)
+{
+ memset(c, 0, sizeof(*c));
+}
+
+static inline void
vy_stmt_counter_acct_tuple(struct vy_stmt_counter *c,
const struct tuple *tuple)
{
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 6/8] vinyl: keep track of compaction queue length
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
` (4 preceding siblings ...)
2018-09-16 17:06 ` [PATCH v2 5/8] vinyl: add helpers for resetting statement counters Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:53 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 7/8] vinyl: factor out helpers for accounting dump/compaction Vladimir Davydov
` (2 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
Currently, there's no way to figure out whether compaction keeps up
with dumps or not while this is essential for implementing transaction
throttling. This patch adds a metric that is supposed to help answer
this question. This is the compaction queue size. It is calculated per
range and per LSM tree as the total size of slices awaiting compaction.
We update the metric along with the compaction priority of a range, in
vy_range_update_compact_priority(), and account it to an LSM tree in
vy_lsm_acct_range(). For now, the new metric is reported only on per
index basis, in index.stat() under disk.compact.queue.
---
src/box/vinyl.c | 1 +
src/box/vy_lsm.c | 6 +++
src/box/vy_lsm.h | 16 +++++++-
src/box/vy_range.c | 13 +++++--
src/box/vy_range.h | 2 +
src/box/vy_scheduler.c | 9 ++++-
src/box/vy_stat.h | 2 +
src/errinj.h | 1 +
test/box/errinj.result | 4 +-
test/vinyl/errinj.result | 94 ++++++++++++++++++++++++++++++++++++++++++++++
test/vinyl/errinj.test.lua | 27 +++++++++++++
test/vinyl/info.result | 10 +++++
12 files changed, 177 insertions(+), 8 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 2c479836..02a2b69d 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -396,6 +396,7 @@ vinyl_index_stat(struct index *index, struct info_handler *h)
info_append_int(h, "count", stat->disk.compact.count);
vy_info_append_disk_stmt_counter(h, "in", &stat->disk.compact.in);
vy_info_append_disk_stmt_counter(h, "out", &stat->disk.compact.out);
+ vy_info_append_disk_stmt_counter(h, "queue", &stat->disk.compact.queue);
info_table_end(h); /* compact */
info_append_int(h, "index_size", lsm->page_index_size);
info_append_int(h, "bloom_size", lsm->bloom_size);
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index a1d4aa80..6b9d0e6d 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -735,12 +735,16 @@ void
vy_lsm_acct_range(struct vy_lsm *lsm, struct vy_range *range)
{
histogram_collect(lsm->run_hist, range->slice_count);
+ vy_disk_stmt_counter_add(&lsm->stat.disk.compact.queue,
+ &range->compact_queue);
}
void
vy_lsm_unacct_range(struct vy_lsm *lsm, struct vy_range *range)
{
histogram_discard(lsm->run_hist, range->slice_count);
+ vy_disk_stmt_counter_sub(&lsm->stat.disk.compact.queue,
+ &range->compact_queue);
}
int
@@ -1155,8 +1159,10 @@ vy_lsm_force_compaction(struct vy_lsm *lsm)
vy_range_tree_ifirst(lsm->tree, &it);
while ((range = vy_range_tree_inext(&it)) != NULL) {
+ vy_lsm_unacct_range(lsm, range);
range->needs_compaction = true;
vy_range_update_compact_priority(range, &lsm->opts);
+ vy_lsm_acct_range(lsm, range);
}
vy_range_heap_update_all(&lsm->range_heap);
diff --git a/src/box/vy_lsm.h b/src/box/vy_lsm.h
index 6917d475..ba2feeef 100644
--- a/src/box/vy_lsm.h
+++ b/src/box/vy_lsm.h
@@ -436,11 +436,23 @@ vy_lsm_add_range(struct vy_lsm *lsm, struct vy_range *range);
void
vy_lsm_remove_range(struct vy_lsm *lsm, struct vy_range *range);
-/** Account a range to the run histogram of an LSM tree. */
+/**
+ * Account a range in an LSM tree.
+ *
+ * This function updates the following LSM tree statistics:
+ * - vy_lsm::run_hist after a slice is added to or removed from
+ * a range of the LSM tree.
+ * - vy_lsm::stat::disk::compact::queue after compaction priority
+ * of a range is updated.
+ */
void
vy_lsm_acct_range(struct vy_lsm *lsm, struct vy_range *range);
-/** Unaccount a range from the run histogram of an LSM tree. */
+/**
+ * Unaccount a range in an LSM tree.
+ *
+ * This function undoes the effect of vy_lsm_acct_range().
+ */
void
vy_lsm_unacct_range(struct vy_lsm *lsm, struct vy_range *range);
diff --git a/src/box/vy_range.c b/src/box/vy_range.c
index ddcd2ed3..4495ecd4 100644
--- a/src/box/vy_range.c
+++ b/src/box/vy_range.c
@@ -292,19 +292,24 @@ vy_range_update_compact_priority(struct vy_range *range,
assert(opts->run_count_per_level > 0);
assert(opts->run_size_ratio > 1);
+ range->compact_priority = 0;
+ vy_disk_stmt_counter_reset(&range->compact_queue);
+
if (range->slice_count <= 1) {
/* Nothing to compact. */
- range->compact_priority = 0;
range->needs_compaction = false;
return;
}
+
if (range->needs_compaction) {
range->compact_priority = range->slice_count;
+ range->compact_queue = range->count;
return;
}
- range->compact_priority = 0;
-
+ /* Total number of statements in checked runs. */
+ struct vy_disk_stmt_counter total_stmt_count;
+ vy_disk_stmt_counter_reset(&total_stmt_count);
/* Total number of checked runs. */
uint32_t total_run_count = 0;
/* The total size of runs checked so far. */
@@ -333,6 +338,7 @@ vy_range_update_compact_priority(struct vy_range *range,
total_size += size;
level_run_count++;
total_run_count++;
+ vy_disk_stmt_counter_add(&total_stmt_count, &slice->count);
while (size > target_run_size) {
/*
* The run size exceeds the threshold
@@ -370,6 +376,7 @@ vy_range_update_compact_priority(struct vy_range *range,
* this level and upper levels.
*/
range->compact_priority = total_run_count;
+ range->compact_queue = total_stmt_count;
est_new_run_size = total_size;
}
}
diff --git a/src/box/vy_range.h b/src/box/vy_range.h
index 2ca19a1c..426854ff 100644
--- a/src/box/vy_range.h
+++ b/src/box/vy_range.h
@@ -106,6 +106,8 @@ struct vy_range {
* how we decide how many runs to compact next time.
*/
int compact_priority;
+ /** Number of statements that need to be compacted. */
+ struct vy_disk_stmt_counter compact_queue;
/**
* If this flag is set, the range must be scheduled for
* major compaction, i.e. its compact_priority must be
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index dd1e88d2..e4afeafd 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -1201,8 +1201,8 @@ vy_task_dump_complete(struct vy_task *task)
slice = new_slices[i];
vy_lsm_unacct_range(lsm, range);
vy_range_add_slice(range, slice);
- vy_lsm_acct_range(lsm, range);
vy_range_update_compact_priority(range, &lsm->opts);
+ vy_lsm_acct_range(lsm, range);
if (!vy_range_is_scheduled(range))
vy_range_heap_update(&lsm->range_heap,
&range->heap_node);
@@ -1428,6 +1428,11 @@ err:
static int
vy_task_compact_execute(struct vy_task *task)
{
+ struct errinj *errinj = errinj(ERRINJ_VY_COMPACTION_DELAY, ERRINJ_BOOL);
+ if (errinj != NULL && errinj->bparam) {
+ while (errinj->bparam)
+ fiber_sleep(0.01);
+ }
return vy_task_write_run(task);
}
@@ -1551,8 +1556,8 @@ vy_task_compact_complete(struct vy_task *task)
}
range->n_compactions++;
range->version++;
- vy_lsm_acct_range(lsm, range);
vy_range_update_compact_priority(range, &lsm->opts);
+ vy_lsm_acct_range(lsm, range);
lsm->stat.disk.compact.count++;
/*
diff --git a/src/box/vy_stat.h b/src/box/vy_stat.h
index 83d3b8f9..c094d414 100644
--- a/src/box/vy_stat.h
+++ b/src/box/vy_stat.h
@@ -149,6 +149,8 @@ struct vy_lsm_stat {
struct vy_disk_stmt_counter in;
/** Number of output statements. */
struct vy_disk_stmt_counter out;
+ /** Number of statements awaiting compaction. */
+ struct vy_disk_stmt_counter queue;
} compact;
} disk;
/** TX write set statistics. */
diff --git a/src/errinj.h b/src/errinj.h
index b6d4a4c9..84a1fbb5 100644
--- a/src/errinj.h
+++ b/src/errinj.h
@@ -120,6 +120,7 @@ struct errinj {
_(ERRINJ_VY_INDEX_FILE_RENAME, ERRINJ_BOOL, {.bparam = false}) \
_(ERRINJ_RELAY_BREAK_LSN, ERRINJ_INT, {.iparam = -1}) \
_(ERRINJ_WAL_BREAK_LSN, ERRINJ_INT, {.iparam = -1}) \
+ _(ERRINJ_VY_COMPACTION_DELAY, ERRINJ_BOOL, {.bparam = false}) \
ENUM0(errinj_id, ERRINJ_LIST);
extern struct errinj errinjs[];
diff --git a/test/box/errinj.result b/test/box/errinj.result
index 8dae7614..81087900 100644
--- a/test/box/errinj.result
+++ b/test/box/errinj.result
@@ -58,6 +58,8 @@ errinj.info()
state: 0
ERRINJ_XLOG_META:
state: false
+ ERRINJ_SNAP_COMMIT_DELAY:
+ state: false
ERRINJ_WAL_BREAK_LSN:
state: -1
ERRINJ_WAL_WRITE_DISK:
@@ -74,7 +76,7 @@ errinj.info()
state: false
ERRINJ_RELAY_FINAL_JOIN:
state: false
- ERRINJ_SNAP_COMMIT_DELAY:
+ ERRINJ_VY_COMPACTION_DELAY:
state: false
ERRINJ_RELAY_FINAL_SLEEP:
state: false
diff --git a/test/vinyl/errinj.result b/test/vinyl/errinj.result
index 17e4dc8c..cc2287d2 100644
--- a/test/vinyl/errinj.result
+++ b/test/vinyl/errinj.result
@@ -2118,3 +2118,97 @@ s:select()
s:drop()
---
...
+--
+-- Check disk.compact.queue stat.
+--
+test_run:cmd("push filter 'bytes_compressed: .*' to 'bytes_compressed: <bytes_compressed>'")
+---
+- true
+...
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+i = s:create_index('pk', {run_count_per_level = 2})
+---
+...
+function dump() for i = 1, 10 do s:replace{i} end box.snapshot() end
+---
+...
+dump()
+---
+...
+dump()
+---
+...
+i:stat().disk.compact.queue -- none
+---
+- bytes_compressed: <bytes_compressed>
+ pages: 0
+ rows: 0
+ bytes: 0
+...
+errinj.set('ERRINJ_VY_COMPACTION_DELAY', true)
+---
+- ok
+...
+dump()
+---
+...
+i:stat().disk.compact.queue -- 30 statements
+---
+- bytes_compressed: <bytes_compressed>
+ pages: 3
+ rows: 30
+ bytes: 471
+...
+dump()
+---
+...
+i:stat().disk.compact.queue -- 40 statements
+---
+- bytes_compressed: <bytes_compressed>
+ pages: 4
+ rows: 40
+ bytes: 628
+...
+dump()
+---
+...
+i:stat().disk.compact.queue -- 50 statements
+---
+- bytes_compressed: <bytes_compressed>
+ pages: 5
+ rows: 50
+ bytes: 785
+...
+box.stat.reset() -- doesn't affect queue size
+---
+...
+i:stat().disk.compact.queue -- 50 statements
+---
+- bytes_compressed: <bytes_compressed>
+ pages: 5
+ rows: 50
+ bytes: 785
+...
+errinj.set('ERRINJ_VY_COMPACTION_DELAY', false)
+---
+- ok
+...
+while i:stat().disk.compact.count < 2 do fiber.sleep(0.01) end
+---
+...
+i:stat().disk.compact.queue -- none
+---
+- bytes_compressed: <bytes_compressed>
+ pages: 0
+ rows: 0
+ bytes: 0
+...
+s:drop()
+---
+...
+test_run:cmd("clear filter")
+---
+- true
+...
diff --git a/test/vinyl/errinj.test.lua b/test/vinyl/errinj.test.lua
index 1b02c01c..148662d8 100644
--- a/test/vinyl/errinj.test.lua
+++ b/test/vinyl/errinj.test.lua
@@ -850,3 +850,30 @@ fiber.sleep(0)
s:create_index('sk', {parts = {2, 'unsigned'}})
s:select()
s:drop()
+
+--
+-- Check disk.compact.queue stat.
+--
+test_run:cmd("push filter 'bytes_compressed: .*' to 'bytes_compressed: <bytes_compressed>'")
+
+s = box.schema.space.create('test', {engine = 'vinyl'})
+i = s:create_index('pk', {run_count_per_level = 2})
+function dump() for i = 1, 10 do s:replace{i} end box.snapshot() end
+dump()
+dump()
+i:stat().disk.compact.queue -- none
+errinj.set('ERRINJ_VY_COMPACTION_DELAY', true)
+dump()
+i:stat().disk.compact.queue -- 30 statements
+dump()
+i:stat().disk.compact.queue -- 40 statements
+dump()
+i:stat().disk.compact.queue -- 50 statements
+box.stat.reset() -- doesn't affect queue size
+i:stat().disk.compact.queue -- 50 statements
+errinj.set('ERRINJ_VY_COMPACTION_DELAY', false)
+while i:stat().disk.compact.count < 2 do fiber.sleep(0.01) end
+i:stat().disk.compact.queue -- none
+s:drop()
+
+test_run:cmd("clear filter")
diff --git a/test/vinyl/info.result b/test/vinyl/info.result
index 3d7108cc..d13806de 100644
--- a/test/vinyl/info.result
+++ b/test/vinyl/info.result
@@ -171,6 +171,11 @@ istat()
rows: 0
bytes: 0
count: 0
+ queue:
+ bytes_compressed: 0
+ pages: 0
+ rows: 0
+ bytes: 0
out:
bytes_compressed: 0
pages: 0
@@ -983,6 +988,11 @@ istat()
rows: 0
bytes: 0
count: 0
+ queue:
+ bytes_compressed: <bytes_compressed>
+ pages: 0
+ rows: 0
+ bytes: 0
out:
bytes_compressed: <bytes_compressed>
pages: 0
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 6/8] vinyl: keep track of compaction queue length
2018-09-16 17:06 ` [PATCH v2 6/8] vinyl: keep track of compaction queue length Vladimir Davydov
@ 2018-09-19 1:53 ` Konstantin Osipov
0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Osipov @ 2018-09-19 1:53 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/09/17 15:05]:
> Currently, there's no way to figure out whether compaction keeps up
> with dumps or not while this is essential for implementing transaction
> throttling. This patch adds a metric that is supposed to help answer
> this question. This is the compaction queue size. It is calculated per
> range and per LSM tree as the total size of slices awaiting compaction.
> We update the metric along with the compaction priority of a range, in
> vy_range_update_compact_priority(), and account it to an LSM tree in
> vy_lsm_acct_range(). For now, the new metric is reported only on per
> index basis, in index.stat() under disk.compact.queue.
OK to push.
Accounting pages, rows and bytes seems to be a bit of overkill for
now. All we need to throttle is dump bandwidth, which is in bytes.
I assume you have plans for rows and pages.
>
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 7/8] vinyl: factor out helpers for accounting dump/compaction
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
` (5 preceding siblings ...)
2018-09-16 17:06 ` [PATCH v2 6/8] vinyl: keep track of compaction queue length Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:53 ` Konstantin Osipov
2018-09-16 17:06 ` [PATCH v2 8/8] vinyl: add global disk stats Vladimir Davydov
2018-09-19 9:59 ` [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
So that we can easily extend them to account the stats not only per LSM
tree, but also globally, in vy_lsm_env.
---
src/box/vy_lsm.c | 20 ++++++++++++++++++++
src/box/vy_lsm.h | 16 ++++++++++++++++
src/box/vy_scheduler.c | 29 +++++++++++++++--------------
3 files changed, 51 insertions(+), 14 deletions(-)
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 6b9d0e6d..dbea2898 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -747,6 +747,26 @@ vy_lsm_unacct_range(struct vy_lsm *lsm, struct vy_range *range)
&range->compact_queue);
}
+void
+vy_lsm_acct_dump(struct vy_lsm *lsm,
+ const struct vy_stmt_counter *in,
+ const struct vy_disk_stmt_counter *out)
+{
+ lsm->stat.disk.dump.count++;
+ vy_stmt_counter_add(&lsm->stat.disk.dump.in, in);
+ vy_disk_stmt_counter_add(&lsm->stat.disk.dump.out, out);
+}
+
+void
+vy_lsm_acct_compaction(struct vy_lsm *lsm,
+ const struct vy_disk_stmt_counter *in,
+ const struct vy_disk_stmt_counter *out)
+{
+ lsm->stat.disk.compact.count++;
+ vy_disk_stmt_counter_add(&lsm->stat.disk.compact.in, in);
+ vy_disk_stmt_counter_add(&lsm->stat.disk.compact.out, out);
+}
+
int
vy_lsm_rotate_mem(struct vy_lsm *lsm)
{
diff --git a/src/box/vy_lsm.h b/src/box/vy_lsm.h
index ba2feeef..19f82e34 100644
--- a/src/box/vy_lsm.h
+++ b/src/box/vy_lsm.h
@@ -457,6 +457,22 @@ void
vy_lsm_unacct_range(struct vy_lsm *lsm, struct vy_range *range);
/**
+ * Account dump in LSM tree statistics.
+ */
+void
+vy_lsm_acct_dump(struct vy_lsm *lsm,
+ const struct vy_stmt_counter *in,
+ const struct vy_disk_stmt_counter *out);
+
+/**
+ * Account compaction in LSM tree statistics.
+ */
+void
+vy_lsm_acct_compaction(struct vy_lsm *lsm,
+ const struct vy_disk_stmt_counter *in,
+ const struct vy_disk_stmt_counter *out);
+
+/**
* Allocate a new active in-memory index for an LSM tree while
* moving the old one to the sealed list. Used by the dump task
* in order not to bother about synchronization with concurrent
diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index e4afeafd..2f85424a 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -1090,6 +1090,8 @@ vy_task_dump_complete(struct vy_task *task)
struct vy_lsm *lsm = task->lsm;
struct vy_run *new_run = task->new_run;
int64_t dump_lsn = new_run->dump_lsn;
+ struct vy_disk_stmt_counter dump_out = new_run->count;
+ struct vy_stmt_counter dump_in;
struct tuple_format *key_format = lsm->env->key_format;
struct vy_mem *mem, *next_mem;
struct vy_slice **new_slices, *slice;
@@ -1178,12 +1180,8 @@ vy_task_dump_complete(struct vy_task *task)
if (vy_log_tx_commit() < 0)
goto fail_free_slices;
- /*
- * Account the new run.
- */
+ /* Account the new run. */
vy_lsm_add_run(lsm, new_run);
- vy_disk_stmt_counter_add(&lsm->stat.disk.dump.out, &new_run->count);
-
/* Drop the reference held by the task. */
vy_run_unref(new_run);
@@ -1212,16 +1210,18 @@ vy_task_dump_complete(struct vy_task *task)
delete_mems:
/*
- * Delete dumped in-memory trees.
+ * Delete dumped in-memory trees and account dump in
+ * LSM tree statistics.
*/
+ vy_stmt_counter_reset(&dump_in);
rlist_foreach_entry_safe(mem, &lsm->sealed, in_sealed, next_mem) {
if (mem->generation > scheduler->dump_generation)
continue;
- vy_stmt_counter_add(&lsm->stat.disk.dump.in, &mem->count);
+ vy_stmt_counter_add(&dump_in, &mem->count);
vy_lsm_delete_mem(lsm, mem);
}
lsm->dump_lsn = MAX(lsm->dump_lsn, dump_lsn);
- lsm->stat.disk.dump.count++;
+ vy_lsm_acct_dump(lsm, &dump_in, &dump_out);
/* The iterator has been cleaned up in a worker thread. */
task->wi->iface->close(task->wi);
@@ -1443,6 +1443,8 @@ vy_task_compact_complete(struct vy_task *task)
struct vy_lsm *lsm = task->lsm;
struct vy_range *range = task->range;
struct vy_run *new_run = task->new_run;
+ struct vy_disk_stmt_counter compact_out = new_run->count;
+ struct vy_disk_stmt_counter compact_in;
struct vy_slice *first_slice = task->first_slice;
struct vy_slice *last_slice = task->last_slice;
struct vy_slice *slice, *next_slice, *new_slice = NULL;
@@ -1526,15 +1528,14 @@ vy_task_compact_complete(struct vy_task *task)
*/
if (new_slice != NULL) {
vy_lsm_add_run(lsm, new_run);
- vy_disk_stmt_counter_add(&lsm->stat.disk.compact.out,
- &new_run->count);
/* Drop the reference held by the task. */
vy_run_unref(new_run);
} else
vy_run_discard(new_run);
/*
- * Replace compacted slices with the resulting slice.
+ * Replace compacted slices with the resulting slice and
+ * account compaction in LSM tree statistics.
*
* Note, since a slice might have been added to the range
* by a concurrent dump while compaction was in progress,
@@ -1545,12 +1546,12 @@ vy_task_compact_complete(struct vy_task *task)
vy_lsm_unacct_range(lsm, range);
if (new_slice != NULL)
vy_range_add_slice_before(range, new_slice, first_slice);
+ vy_disk_stmt_counter_reset(&compact_in);
for (slice = first_slice; ; slice = next_slice) {
next_slice = rlist_next_entry(slice, in_range);
vy_range_remove_slice(range, slice);
rlist_add_entry(&compacted_slices, slice, in_range);
- vy_disk_stmt_counter_add(&lsm->stat.disk.compact.in,
- &slice->count);
+ vy_disk_stmt_counter_add(&compact_in, &slice->count);
if (slice == last_slice)
break;
}
@@ -1558,7 +1559,7 @@ vy_task_compact_complete(struct vy_task *task)
range->version++;
vy_range_update_compact_priority(range, &lsm->opts);
vy_lsm_acct_range(lsm, range);
- lsm->stat.disk.compact.count++;
+ vy_lsm_acct_compaction(lsm, &compact_in, &compact_out);
/*
* Unaccount unused runs and delete compacted slices.
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 8/8] vinyl: add global disk stats
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
` (6 preceding siblings ...)
2018-09-16 17:06 ` [PATCH v2 7/8] vinyl: factor out helpers for accounting dump/compaction Vladimir Davydov
@ 2018-09-16 17:06 ` Vladimir Davydov
2018-09-19 1:56 ` Konstantin Osipov
2018-09-19 9:59 ` [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
8 siblings, 1 reply; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-16 17:06 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
This patch adds some essential disk statistics that are already
collected and reported on per index basis to box.stat.vinyl().
The new statistics are shown under the 'disk' section and currently
include the following fields:
- data: size of data stored on disk.
- index: size of index stored on disk.
- dump.in: size of dump input.
- dump.out: size of dump output.
- compact.in: size of compaction input.
- compact.out: size of compaction output.
- compact.queue: size of compaction queue.
All the counters are given in bytes without taking into account
disk compression. Dump/compaction in/out counters can be reset with
box.stat.reset().
---
src/box/vinyl.c | 33 ++++++++++++++++++++-
src/box/vy_lsm.c | 50 ++++++++++++++++++++++++++-----
src/box/vy_lsm.h | 2 ++
src/box/vy_stat.h | 20 +++++++++++++
test/vinyl/errinj.result | 20 +++++++++++++
test/vinyl/errinj.test.lua | 5 ++++
test/vinyl/info.result | 73 ++++++++++++++++++++++++++++++++++++++++++++--
test/vinyl/info.test.lua | 14 +++++++++
8 files changed, 206 insertions(+), 11 deletions(-)
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 02a2b69d..b5904872 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -298,6 +298,30 @@ vy_info_append_memory(struct vy_env *env, struct info_handler *h)
info_table_end(h); /* memory */
}
+static void
+vy_info_append_disk(struct vy_env *env, struct info_handler *h)
+{
+ struct vy_disk_stat *stat = &env->lsm_env.disk_stat;
+
+ info_table_begin(h, "disk");
+
+ info_append_int(h, "data", stat->data);
+ info_append_int(h, "index", stat->index);
+
+ info_table_begin(h, "dump");
+ info_append_int(h, "in", stat->dump.in);
+ info_append_int(h, "out", stat->dump.out);
+ info_table_end(h); /* dump */
+
+ info_table_begin(h, "compact");
+ info_append_int(h, "in", stat->compact.in);
+ info_append_int(h, "out", stat->compact.out);
+ info_append_int(h, "queue", stat->compact.queue);
+ info_table_end(h); /* compact */
+
+ info_table_end(h); /* disk */
+}
+
void
vinyl_engine_stat(struct vinyl_engine *vinyl, struct info_handler *h)
{
@@ -307,6 +331,7 @@ vinyl_engine_stat(struct vinyl_engine *vinyl, struct info_handler *h)
vy_info_append_quota(env, h);
vy_info_append_tx(env, h);
vy_info_append_memory(env, h);
+ vy_info_append_disk(env, h);
info_end(h);
}
@@ -485,9 +510,15 @@ static void
vinyl_engine_reset_stat(struct engine *engine)
{
struct vy_env *env = vy_env(engine);
- struct tx_manager *xm = env->xm;
+ struct tx_manager *xm = env->xm;
memset(&xm->stat, 0, sizeof(xm->stat));
+
+ struct vy_disk_stat *disk_stat = &env->lsm_env.disk_stat;
+ disk_stat->dump.in = 0;
+ disk_stat->dump.out = 0;
+ disk_stat->compact.in = 0;
+ disk_stat->compact.out = 0;
}
/** }}} Introspection */
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index dbea2898..093bb820 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -252,6 +252,8 @@ vy_lsm_delete(struct vy_lsm *lsm)
assert(lsm->env->lsm_count > 0);
lsm->env->lsm_count--;
+ lsm->env->disk_stat.compact.queue -=
+ lsm->stat.disk.compact.queue.bytes;
if (lsm->pk != NULL)
vy_lsm_unref(lsm->pk);
@@ -685,32 +687,56 @@ vy_lsm_compact_priority(struct vy_lsm *lsm)
void
vy_lsm_add_run(struct vy_lsm *lsm, struct vy_run *run)
{
+ struct vy_lsm_env *env = lsm->env;
+ size_t bloom_size = vy_run_bloom_size(run);
+ size_t page_index_size = run->page_index_size;
+
assert(rlist_empty(&run->in_lsm));
rlist_add_entry(&lsm->runs, run, in_lsm);
lsm->run_count++;
vy_disk_stmt_counter_add(&lsm->stat.disk.count, &run->count);
- lsm->bloom_size += vy_run_bloom_size(run);
- lsm->page_index_size += run->page_index_size;
+ lsm->bloom_size += bloom_size;
+ lsm->page_index_size += page_index_size;
+
+ env->bloom_size += bloom_size;
+ env->page_index_size += page_index_size;
- lsm->env->bloom_size += vy_run_bloom_size(run);
- lsm->env->page_index_size += run->page_index_size;
+ /* Data size is consistent with space.bsize. */
+ if (lsm->index_id == 0)
+ env->disk_stat.data += run->count.bytes;
+ /* Index size is consistent with index.bsize. */
+ env->disk_stat.index += bloom_size + page_index_size;
+ if (lsm->index_id > 0)
+ env->disk_stat.index += run->count.bytes;
}
void
vy_lsm_remove_run(struct vy_lsm *lsm, struct vy_run *run)
{
+ struct vy_lsm_env *env = lsm->env;
+ size_t bloom_size = vy_run_bloom_size(run);
+ size_t page_index_size = run->page_index_size;
+
assert(lsm->run_count > 0);
assert(!rlist_empty(&run->in_lsm));
rlist_del_entry(run, in_lsm);
lsm->run_count--;
vy_disk_stmt_counter_sub(&lsm->stat.disk.count, &run->count);
- lsm->bloom_size -= vy_run_bloom_size(run);
- lsm->page_index_size -= run->page_index_size;
+ lsm->bloom_size -= bloom_size;
+ lsm->page_index_size -= page_index_size;
- lsm->env->bloom_size -= vy_run_bloom_size(run);
- lsm->env->page_index_size -= run->page_index_size;
+ env->bloom_size -= bloom_size;
+ env->page_index_size -= page_index_size;
+
+ /* Data size is consistent with space.bsize. */
+ if (lsm->index_id == 0)
+ env->disk_stat.data -= run->count.bytes;
+ /* Index size is consistent with index.bsize. */
+ env->disk_stat.index -= bloom_size + page_index_size;
+ if (lsm->index_id > 0)
+ env->disk_stat.index -= run->count.bytes;
}
void
@@ -737,6 +763,7 @@ vy_lsm_acct_range(struct vy_lsm *lsm, struct vy_range *range)
histogram_collect(lsm->run_hist, range->slice_count);
vy_disk_stmt_counter_add(&lsm->stat.disk.compact.queue,
&range->compact_queue);
+ lsm->env->disk_stat.compact.queue += range->compact_queue.bytes;
}
void
@@ -745,6 +772,7 @@ vy_lsm_unacct_range(struct vy_lsm *lsm, struct vy_range *range)
histogram_discard(lsm->run_hist, range->slice_count);
vy_disk_stmt_counter_sub(&lsm->stat.disk.compact.queue,
&range->compact_queue);
+ lsm->env->disk_stat.compact.queue -= range->compact_queue.bytes;
}
void
@@ -755,6 +783,9 @@ vy_lsm_acct_dump(struct vy_lsm *lsm,
lsm->stat.disk.dump.count++;
vy_stmt_counter_add(&lsm->stat.disk.dump.in, in);
vy_disk_stmt_counter_add(&lsm->stat.disk.dump.out, out);
+
+ lsm->env->disk_stat.dump.in += in->bytes;
+ lsm->env->disk_stat.dump.out += out->bytes;
}
void
@@ -765,6 +796,9 @@ vy_lsm_acct_compaction(struct vy_lsm *lsm,
lsm->stat.disk.compact.count++;
vy_disk_stmt_counter_add(&lsm->stat.disk.compact.in, in);
vy_disk_stmt_counter_add(&lsm->stat.disk.compact.out, out);
+
+ lsm->env->disk_stat.compact.in += in->bytes;
+ lsm->env->disk_stat.compact.out += out->bytes;
}
int
diff --git a/src/box/vy_lsm.h b/src/box/vy_lsm.h
index 19f82e34..2e026ced 100644
--- a/src/box/vy_lsm.h
+++ b/src/box/vy_lsm.h
@@ -91,6 +91,8 @@ struct vy_lsm_env {
size_t bloom_size;
/** Size of memory used for page index. */
size_t page_index_size;
+ /** Global disk statistics. */
+ struct vy_disk_stat disk_stat;
/** Memory pool for vy_history_node allocations. */
struct mempool history_node_pool;
};
diff --git a/src/box/vy_stat.h b/src/box/vy_stat.h
index c094d414..1545115a 100644
--- a/src/box/vy_stat.h
+++ b/src/box/vy_stat.h
@@ -194,6 +194,26 @@ struct vy_tx_stat {
int64_t conflict;
};
+/**
+ * Global disk statistics.
+ *
+ * Fields correspond to those of per LSM tree statistics.
+ * All counters are given in bytes, uncompressed.
+ */
+struct vy_disk_stat {
+ int64_t data;
+ int64_t index;
+ struct {
+ int64_t in;
+ int64_t out;
+ } dump;
+ struct {
+ int64_t in;
+ int64_t out;
+ int64_t queue;
+ } compact;
+};
+
static inline int
vy_lsm_stat_create(struct vy_lsm_stat *stat)
{
diff --git a/test/vinyl/errinj.result b/test/vinyl/errinj.result
index cc2287d2..8badb47a 100644
--- a/test/vinyl/errinj.result
+++ b/test/vinyl/errinj.result
@@ -2147,6 +2147,10 @@ i:stat().disk.compact.queue -- none
rows: 0
bytes: 0
...
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
+---
+- true
+...
errinj.set('ERRINJ_VY_COMPACTION_DELAY', true)
---
- ok
@@ -2161,6 +2165,10 @@ i:stat().disk.compact.queue -- 30 statements
rows: 30
bytes: 471
...
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
+---
+- true
+...
dump()
---
...
@@ -2171,6 +2179,10 @@ i:stat().disk.compact.queue -- 40 statements
rows: 40
bytes: 628
...
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
+---
+- true
+...
dump()
---
...
@@ -2181,6 +2193,10 @@ i:stat().disk.compact.queue -- 50 statements
rows: 50
bytes: 785
...
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
+---
+- true
+...
box.stat.reset() -- doesn't affect queue size
---
...
@@ -2191,6 +2207,10 @@ i:stat().disk.compact.queue -- 50 statements
rows: 50
bytes: 785
...
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
+---
+- true
+...
errinj.set('ERRINJ_VY_COMPACTION_DELAY', false)
---
- ok
diff --git a/test/vinyl/errinj.test.lua b/test/vinyl/errinj.test.lua
index 148662d8..5a47ecc4 100644
--- a/test/vinyl/errinj.test.lua
+++ b/test/vinyl/errinj.test.lua
@@ -862,15 +862,20 @@ function dump() for i = 1, 10 do s:replace{i} end box.snapshot() end
dump()
dump()
i:stat().disk.compact.queue -- none
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
errinj.set('ERRINJ_VY_COMPACTION_DELAY', true)
dump()
i:stat().disk.compact.queue -- 30 statements
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
dump()
i:stat().disk.compact.queue -- 40 statements
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
dump()
i:stat().disk.compact.queue -- 50 statements
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
box.stat.reset() -- doesn't affect queue size
i:stat().disk.compact.queue -- 50 statements
+i:stat().disk.compact.queue.bytes == box.stat.vinyl().disk.compact.queue
errinj.set('ERRINJ_VY_COMPACTION_DELAY', false)
while i:stat().disk.compact.count < 2 do fiber.sleep(0.01) end
i:stat().disk.compact.queue -- none
diff --git a/test/vinyl/info.result b/test/vinyl/info.result
index d13806de..21945b9d 100644
--- a/test/vinyl/info.result
+++ b/test/vinyl/info.result
@@ -221,7 +221,17 @@ istat()
...
gstat()
---
-- quota:
+- disk:
+ dump:
+ in: 0
+ out: 0
+ compact:
+ in: 0
+ queue: 0
+ out: 0
+ data: 0
+ index: 0
+ quota:
limit: 134217728
used: 0
memory:
@@ -686,6 +696,23 @@ box.rollback()
--
-- Global statistics.
--
+-- dump and compaction totals
+gstat().disk.dump['in'] == istat().disk.dump['in'].bytes
+---
+- true
+...
+gstat().disk.dump['out'] == istat().disk.dump['out'].bytes
+---
+- true
+...
+gstat().disk.compact['in'] == istat().disk.compact['in'].bytes
+---
+- true
+...
+gstat().disk.compact['out'] == istat().disk.compact['out'].bytes
+---
+- true
+...
-- use memory
st = gstat()
---
@@ -1038,7 +1065,17 @@ istat()
...
gstat()
---
-- quota:
+- disk:
+ dump:
+ in: 0
+ out: 0
+ compact:
+ in: 0
+ queue: 0
+ out: 0
+ data: 104300
+ index: 1190
+ quota:
limit: 134217728
used: 262583
memory:
@@ -1143,6 +1180,14 @@ gst.memory.bloom_filter == 0
---
- true
...
+gst.disk.data == 0
+---
+- true
+...
+gst.disk.index == 0
+---
+- true
+...
box.snapshot()
---
- ok
@@ -1198,6 +1243,14 @@ gst.memory.bloom_filter == st1.disk.bloom_size + st2.disk.bloom_size
---
- true
...
+gst.disk.data == s:bsize()
+---
+- true
+...
+gst.disk.index == i1:bsize() + i2:bsize()
+---
+- true
+...
for i = 1, 100, 2 do s:delete(i) end
---
...
@@ -1317,6 +1370,14 @@ gst.memory.bloom_filter == st1.disk.bloom_size + st2.disk.bloom_size
---
- true
...
+gst.disk.data == s:bsize()
+---
+- true
+...
+gst.disk.index == i1:bsize() + i2:bsize()
+---
+- true
+...
s:drop()
---
...
@@ -1331,6 +1392,14 @@ gst.memory.bloom_filter == 0
---
- true
...
+gst.disk.data == 0
+---
+- true
+...
+gst.disk.index == 0
+---
+- true
+...
test_run:cmd('switch default')
---
- true
diff --git a/test/vinyl/info.test.lua b/test/vinyl/info.test.lua
index 230bac1e..5912320c 100644
--- a/test/vinyl/info.test.lua
+++ b/test/vinyl/info.test.lua
@@ -206,6 +206,12 @@ box.rollback()
-- Global statistics.
--
+-- dump and compaction totals
+gstat().disk.dump['in'] == istat().disk.dump['in'].bytes
+gstat().disk.dump['out'] == istat().disk.dump['out'].bytes
+gstat().disk.compact['in'] == istat().disk.compact['in'].bytes
+gstat().disk.compact['out'] == istat().disk.compact['out'].bytes
+
-- use memory
st = gstat()
put(1)
@@ -342,6 +348,8 @@ i1:bsize() == st1.memory.index_size
i2:bsize() == st2.memory.index_size
gst.memory.page_index == 0
gst.memory.bloom_filter == 0
+gst.disk.data == 0
+gst.disk.index == 0
box.snapshot()
gst = gstat()
@@ -357,6 +365,8 @@ i1:bsize() == st1.disk.index_size + st1.disk.bloom_size
i2:bsize() == st2.disk.index_size + st2.disk.bloom_size + st2.disk.bytes
gst.memory.page_index == st1.disk.index_size + st2.disk.index_size
gst.memory.bloom_filter == st1.disk.bloom_size + st2.disk.bloom_size
+gst.disk.data == s:bsize()
+gst.disk.index == i1:bsize() + i2:bsize()
for i = 1, 100, 2 do s:delete(i) end
for i = 2, 100, 2 do s:replace{i, i, pad()} end
@@ -392,12 +402,16 @@ i1:bsize() == st1.disk.index_size + st1.disk.bloom_size
i2:bsize() == st2.disk.index_size + st2.disk.bloom_size + st2.disk.bytes
gst.memory.page_index == st1.disk.index_size + st2.disk.index_size
gst.memory.bloom_filter == st1.disk.bloom_size + st2.disk.bloom_size
+gst.disk.data == s:bsize()
+gst.disk.index == i1:bsize() + i2:bsize()
s:drop()
gst = gstat()
gst.memory.page_index == 0
gst.memory.bloom_filter == 0
+gst.disk.data == 0
+gst.disk.index == 0
test_run:cmd('switch default')
test_run:cmd('stop server test')
--
2.11.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/8] vinyl: improve stats for throttling
2018-09-16 17:06 [PATCH v2 0/8] vinyl: improve stats for throttling Vladimir Davydov
` (7 preceding siblings ...)
2018-09-16 17:06 ` [PATCH v2 8/8] vinyl: add global disk stats Vladimir Davydov
@ 2018-09-19 9:59 ` Vladimir Davydov
8 siblings, 0 replies; 18+ messages in thread
From: Vladimir Davydov @ 2018-09-19 9:59 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
On Sun, Sep 16, 2018 at 08:06:43PM +0300, Vladimir Davydov wrote:
> vinyl: fix force compaction logic
> vinyl: update compact priority usual way on range split/coalesce
> vinyl: annotate info_table_end with comment
> vinyl: report pages and bytes_compressed in dump/compact in/out stats
> vinyl: add helpers for resetting statement counters
> vinyl: keep track of compaction queue length
> vinyl: factor out helpers for accounting dump/compaction
> vinyl: add global disk stats
Added comments to the code requested by Kostja and pushed to 1.10.
^ permalink raw reply [flat|nested] 18+ messages in thread