Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: kostja@tarantool.org
Cc: tarantool-patches@freelists.org
Subject: [PATCH 18/18] vinyl: throttle tx rate if dump does not catch up
Date: Thu, 16 Aug 2018 19:12:12 +0300	[thread overview]
Message-ID: <c9f4d18f0303dccf0320e89110a62103c54c675c.1534432819.git.vdavydov.dev@gmail.com> (raw)
In-Reply-To: <cover.1534432819.git.vdavydov.dev@gmail.com>
In-Reply-To: <cover.1534432819.git.vdavydov.dev@gmail.com>

If the rate at which transactions are ready to write to the database is
greater than the dump bandwidth, memory will get depleted before the
last dump is complete and all newer transactions will have to wait until
the dump has been completed, which may take seconds or even minutes:

  2018-08-16 15:45:11.739 [30874] main/1100/main vy_quota.c:291 W> waited for 555 bytes of vinyl memory quota for too long: 15.750 sec

This patch set implements transaction throttling that are supposed to
help avoid unpredictably long stalls. Now once a dump is started,
transaction write rate will be limited so that the hard limit cannot get
exceeded before the dump is complete. This is how it looks in the log:

  2018-08-16 16:01:09.412 [489] main/445/main I> dumping 134217901 bytes, expected rate 6.0 MB/s, ETA 21.3 s, recent write rate 10.5 MB/s
  2018-08-16 16:01:09.447 [489] main/103/vinyl.scheduler I> 513/0: dump started
  2018-08-16 16:01:09.447 [489] vinyl.writer.0/103/task I> writing `./513/0/00000000000000000004.run'
  2018-08-16 16:01:09.468 [489] main I> throttling enabled, max write rate 6.0 MB/s
  2018-08-16 16:01:30.004 [489] vinyl.writer.0/103/task I> writing `./513/0/00000000000000000004.index'
  2018-08-16 16:01:30.094 [489] main/103/vinyl.scheduler I> 513/0: dump completed
  2018-08-16 16:01:30.095 [489] main/103/vinyl.scheduler I> dumped 134216236 bytes in 20.7 s, rate 6.2 MB/s
  2018-08-16 16:01:30.167 [489] main I> throttling disabled

Closes #1862
---
 src/box/vy_quota.c           | 41 ++++++++++++++++++-
 src/box/vy_quota.h           | 13 ++++++
 test/vinyl/suite.ini         |  2 +-
 test/vinyl/throttle.result   | 95 ++++++++++++++++++++++++++++++++++++++++++++
 test/vinyl/throttle.test.lua | 47 ++++++++++++++++++++++
 5 files changed, 195 insertions(+), 3 deletions(-)
 create mode 100644 test/vinyl/throttle.result
 create mode 100644 test/vinyl/throttle.test.lua

diff --git a/src/box/vy_quota.c b/src/box/vy_quota.c
index 471a8bd0..85e18d4b 100644
--- a/src/box/vy_quota.c
+++ b/src/box/vy_quota.c
@@ -119,7 +119,7 @@ enum {
 static inline void
 vy_quota_signal(struct vy_quota *q)
 {
-	if (q->used < q->limit)
+	if (q->used < q->limit && q->use_curr < q->use_max)
 		fiber_cond_signal(&q->cond);
 }
 
@@ -133,6 +133,8 @@ vy_quota_check_watermark(struct vy_quota *q)
 	if (!q->dump_in_progress &&
 	    q->used >= q->watermark && q->quota_exceeded_cb(q)) {
 		q->dump_in_progress = true;
+		q->dump_size = q->used;
+		q->dump_start = ev_monotonic_now(loop());
 		say_info("dumping %zu bytes, expected rate %.1f MB/s, "
 			 "ETA %.1f s, recent write rate %.1f MB/s", q->used,
 			 (double)q->dump_bw / 1024 / 1024,
@@ -148,6 +150,7 @@ vy_quota_timer_cb(ev_loop *loop, ev_timer *timer, int events)
 	(void)events;
 
 	struct vy_quota *q = timer->data;
+	double now = ev_monotonic_now(loop());
 
 	/*
 	 * Update the quota use rate with the new measurement.
@@ -159,6 +162,34 @@ vy_quota_timer_cb(ev_loop *loop, ev_timer *timer, int events)
 	q->use_curr = 0;
 
 	/*
+	 * To avoid unpredictably long stalls, we must limit
+	 * the write rate when a dump is in progress so that
+	 * we don't hit the hard limit before the dump has
+	 * completed, i.e.
+	 *
+	 *   left_to_use    left_to_dump
+	 *   ----------- <= ------------
+	 *     use_rate       dump_rate
+	 */
+	if (q->dump_in_progress) {
+		size_t dumped = q->dump_bw * (now - q->dump_start);
+		size_t left_to_dump = (dumped < q->dump_size ?
+				       q->dump_size - dumped : 0);
+		size_t left_to_use = (q->used < q->limit ?
+				      q->limit - q->used : 0);
+		double max_use_rate = (left_to_use * q->dump_bw /
+				       (left_to_dump + 1));
+		if (q->use_max == SIZE_MAX)
+			say_info("throttling enabled, max write rate "
+				 "%.1f MB/s", max_use_rate / 1024 / 1024);
+		q->use_max = VY_QUOTA_UPDATE_INTERVAL * max_use_rate;
+	} else {
+		if (q->use_max < SIZE_MAX)
+			say_info("throttling disabled");
+		q->use_max = SIZE_MAX;
+	}
+
+	/*
 	 * Update the quota watermark and trigger memory dump
 	 * if the watermark is exceeded.
 	 *
@@ -172,6 +203,9 @@ vy_quota_timer_cb(ev_loop *loop, ev_timer *timer, int events)
 	q->watermark = MIN(q->limit * VY_QUOTA_WATERMARK_MAX / 100,
 			   q->watermark);
 	vy_quota_check_watermark(q);
+
+	/* Wake up the next throttled fiber in the line. */
+	vy_quota_signal(q);
 }
 
 int
@@ -201,11 +235,14 @@ vy_quota_create(struct vy_quota *q, vy_quota_exceeded_f quota_exceeded_cb)
 	q->watermark = SIZE_MAX;
 	q->used = 0;
 	q->use_curr = 0;
+	q->use_max = SIZE_MAX;
 	q->use_rate = 0;
 	q->too_long_threshold = TIMEOUT_INFINITY;
 	q->dump_bw = VY_DEFAULT_DUMP_BANDWIDTH;
 	q->quota_exceeded_cb = quota_exceeded_cb;
 	q->dump_in_progress = false;
+	q->dump_size = 0;
+	q->dump_start = 0;
 	fiber_cond_create(&q->cond);
 	ev_timer_init(&q->timer, vy_quota_timer_cb, 0,
 		      VY_QUOTA_UPDATE_INTERVAL);
@@ -277,7 +314,7 @@ vy_quota_try_use(struct vy_quota *q, size_t size, double timeout)
 	double deadline = start_time + timeout;
 
 	q->used += size;
-	while (q->used > q->limit) {
+	while (q->used > q->limit || q->use_curr >= q->use_max) {
 		vy_quota_check_watermark(q);
 		q->used -= size;
 		int rc = fiber_cond_wait_deadline(&q->cond, deadline);
diff --git a/src/box/vy_quota.h b/src/box/vy_quota.h
index cb681386..ef2fb6cb 100644
--- a/src/box/vy_quota.h
+++ b/src/box/vy_quota.h
@@ -86,6 +86,13 @@ struct vy_quota {
 	 * true, but vy_quota_dump() hasn't been called yet.
 	 */
 	bool dump_in_progress;
+	/**
+	 * Memory usage at the time when the last dump was started
+	 * (memory dump size).
+	 */
+	size_t dump_size;
+	/** Time when the last dump was started. */
+	double dump_start;
 	/** Timer for updating quota watermark. */
 	ev_timer timer;
 	/**
@@ -94,6 +101,12 @@ struct vy_quota {
 	 */
 	size_t use_curr;
 	/**
+	 * Maximal amount of quota that can be used between timer
+	 * callback invocations. It is set to such a value so that
+	 * the quota use rate never exceeds the dump bandwidth.
+	 */
+	size_t use_max;
+	/**
 	 * Quota use rate, in bytes per second.
 	 * Calculated as exponentially weighted
 	 * moving average of use_curr.
diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini
index b9dae380..785bc63d 100644
--- a/test/vinyl/suite.ini
+++ b/test/vinyl/suite.ini
@@ -6,5 +6,5 @@ release_disabled = errinj.test.lua errinj_gc.test.lua errinj_vylog.test.lua part
 config = suite.cfg
 lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua
 use_unix_sockets = True
-long_run = stress.test.lua large.test.lua write_iterator_rand.test.lua dump_stress.test.lua select_consistency.test.lua
+long_run = stress.test.lua large.test.lua write_iterator_rand.test.lua dump_stress.test.lua select_consistency.test.lua throttle.test.lua
 is_parallel = False
diff --git a/test/vinyl/throttle.result b/test/vinyl/throttle.result
new file mode 100644
index 00000000..5634c40b
--- /dev/null
+++ b/test/vinyl/throttle.result
@@ -0,0 +1,95 @@
+test_run = require('test_run').new()
+---
+...
+test_run:cmd("create server test with script='vinyl/low_quota.lua'")
+---
+- true
+...
+test_run:cmd(string.format("start server test with args='%d'", 32 * 1024 * 1024))
+---
+- true
+...
+test_run:cmd('switch test')
+---
+- true
+...
+fiber = require('fiber')
+---
+...
+digest = require('digest')
+---
+...
+box.cfg{snap_io_rate_limit = 4}
+---
+...
+FIBER_COUNT = 5
+---
+...
+TUPLE_SIZE = 1000
+---
+...
+TX_TUPLE_COUNT = 10
+---
+...
+TX_SIZE = TUPLE_SIZE * TX_TUPLE_COUNT
+---
+...
+TX_COUNT = math.ceil(box.cfg.vinyl_memory / (TX_SIZE * FIBER_COUNT))
+---
+...
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('primary', {parts = {1, 'unsigned', 2, 'unsigned', 3, 'unsigned'}})
+---
+...
+latency = 0
+---
+...
+c = fiber.channel(FIBER_COUNT)
+---
+...
+test_run:cmd("setopt delimiter ';'")
+---
+- true
+...
+for i = 1, FIBER_COUNT do
+    fiber.create(function()
+        for j = 1, TX_COUNT do
+            local t1 = fiber.time()
+            box.begin()
+            for k = 1, TX_TUPLE_COUNT do
+                s:replace{i, j, k, digest.urandom(TUPLE_SIZE)}
+            end
+            box.commit()
+            local t2 = fiber.time()
+            latency = math.max(latency, t2 - t1)
+        end
+        c:put(true)
+    end)
+end;
+---
+...
+test_run:cmd("setopt delimiter ''");
+---
+- true
+...
+for i = 1, FIBER_COUNT do c:get() end
+---
+...
+latency < 0.5 or latency
+---
+- true
+...
+test_run:cmd('switch default')
+---
+- true
+...
+test_run:cmd("stop server test")
+---
+- true
+...
+test_run:cmd("cleanup server test")
+---
+- true
+...
diff --git a/test/vinyl/throttle.test.lua b/test/vinyl/throttle.test.lua
new file mode 100644
index 00000000..4efbbec7
--- /dev/null
+++ b/test/vinyl/throttle.test.lua
@@ -0,0 +1,47 @@
+test_run = require('test_run').new()
+test_run:cmd("create server test with script='vinyl/low_quota.lua'")
+test_run:cmd(string.format("start server test with args='%d'", 32 * 1024 * 1024))
+test_run:cmd('switch test')
+
+fiber = require('fiber')
+digest = require('digest')
+
+box.cfg{snap_io_rate_limit = 4}
+
+FIBER_COUNT = 5
+TUPLE_SIZE = 1000
+TX_TUPLE_COUNT = 10
+TX_SIZE = TUPLE_SIZE * TX_TUPLE_COUNT
+TX_COUNT = math.ceil(box.cfg.vinyl_memory / (TX_SIZE * FIBER_COUNT))
+
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('primary', {parts = {1, 'unsigned', 2, 'unsigned', 3, 'unsigned'}})
+
+latency = 0
+c = fiber.channel(FIBER_COUNT)
+
+test_run:cmd("setopt delimiter ';'")
+for i = 1, FIBER_COUNT do
+    fiber.create(function()
+        for j = 1, TX_COUNT do
+            local t1 = fiber.time()
+            box.begin()
+            for k = 1, TX_TUPLE_COUNT do
+                s:replace{i, j, k, digest.urandom(TUPLE_SIZE)}
+            end
+            box.commit()
+            local t2 = fiber.time()
+            latency = math.max(latency, t2 - t1)
+        end
+        c:put(true)
+    end)
+end;
+test_run:cmd("setopt delimiter ''");
+
+for i = 1, FIBER_COUNT do c:get() end
+
+latency < 0.5 or latency
+
+test_run:cmd('switch default')
+test_run:cmd("stop server test")
+test_run:cmd("cleanup server test")
-- 
2.11.0

  parent reply	other threads:[~2018-08-16 16:12 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16 16:11 [PATCH 00/18] Implement write throttling for vinyl Vladimir Davydov
2018-08-16 16:11 ` [PATCH 01/18] vinyl: rework internal quota API Vladimir Davydov
2018-08-20 11:07   ` Konstantin Osipov
2018-08-24  8:32     ` Vladimir Davydov
2018-08-27 18:29   ` Vladimir Davydov
2018-08-16 16:11 ` [PATCH 02/18] vinyl: move quota methods implementation to vy_quota.c Vladimir Davydov
2018-08-20 11:07   ` Konstantin Osipov
2018-08-27 18:30   ` Vladimir Davydov
2018-08-16 16:11 ` [PATCH 03/18] vinyl: move quota related methods and variables from vy_env to vy_quota Vladimir Davydov
2018-08-20 11:08   ` Konstantin Osipov
2018-08-27 18:33   ` Vladimir Davydov
2018-08-16 16:11 ` [PATCH 04/18] vinyl: implement vy_quota_wait using vy_quota_try_use Vladimir Davydov
2018-08-20 11:09   ` Konstantin Osipov
2018-08-27 18:36   ` Vladimir Davydov
2018-08-16 16:11 ` [PATCH 05/18] vinyl: wake up fibers waiting for quota one by one Vladimir Davydov
2018-08-20 11:11   ` Konstantin Osipov
2018-08-24  8:33     ` Vladimir Davydov
2018-08-28 13:19   ` Vladimir Davydov
2018-08-28 14:04     ` Konstantin Osipov
2018-08-28 14:39       ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 06/18] vinyl: do not wake up fibers waiting for quota if quota is unavailable Vladimir Davydov
2018-08-20 11:13   ` Konstantin Osipov
2018-08-16 16:12 ` [PATCH 07/18] vinyl: tune dump bandwidth histogram buckets Vladimir Davydov
2018-08-20 11:15   ` Konstantin Osipov
2018-08-28 15:37   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 08/18] vinyl: rename vy_quota::dump_bw to dump_bw_hist Vladimir Davydov
2018-08-20 11:15   ` Konstantin Osipov
2018-08-28 16:04   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 09/18] vinyl: cache dump bandwidth for timer invocation Vladimir Davydov
2018-08-20 11:21   ` Konstantin Osipov
2018-08-28 16:10   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 10/18] vinyl: do not add initial guess to dump bandwidth histogram Vladimir Davydov
2018-08-20 11:23   ` Konstantin Osipov
2018-08-23 20:15   ` Konstantin Osipov
2018-08-28 16:15   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 11/18] vinyl: use snap_io_rate_limit for initial dump bandwidth estimate Vladimir Davydov
2018-08-20 11:24   ` Konstantin Osipov
2018-08-24  8:31     ` Vladimir Davydov
2018-08-28 16:18   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 12/18] histogram: add function for computing lower bound percentile estimate Vladimir Davydov
2018-08-20 11:29   ` [tarantool-patches] " Konstantin Osipov
2018-08-24  8:30     ` Vladimir Davydov
2018-08-28 16:39   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 13/18] vinyl: use lower bound percentile estimate for dump bandwidth Vladimir Davydov
2018-08-28 16:51   ` Vladimir Davydov
2018-08-16 16:12 ` [PATCH 14/18] vinyl: do not try to trigger dump if it is already in progress Vladimir Davydov
2018-08-16 16:12 ` [PATCH 15/18] vinyl: improve dump start/stop logging Vladimir Davydov
2018-08-23 20:18   ` Konstantin Osipov
2018-08-16 16:12 ` [PATCH 16/18] vinyl: confine quota watermark within sane value range Vladimir Davydov
2018-08-16 16:12 ` [PATCH 17/18] vinyl: set quota timer period to 100 ms Vladimir Davydov
2018-08-23 20:49   ` Konstantin Osipov
2018-08-24  8:18     ` Vladimir Davydov
2018-08-16 16:12 ` Vladimir Davydov [this message]
2018-08-23 20:54   ` [PATCH 18/18] vinyl: throttle tx rate if dump does not catch up Konstantin Osipov
2018-08-23 20:58     ` [tarantool-patches] " Konstantin Osipov
2018-08-24  8:21     ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9f4d18f0303dccf0320e89110a62103c54c675c.1534432819.git.vdavydov.dev@gmail.com \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [PATCH 18/18] vinyl: throttle tx rate if dump does not catch up' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox