From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: [PATCH 18/18] vinyl: throttle tx rate if dump does not catch up Date: Thu, 16 Aug 2018 19:12:12 +0300 [thread overview] Message-ID: <c9f4d18f0303dccf0320e89110a62103c54c675c.1534432819.git.vdavydov.dev@gmail.com> (raw) In-Reply-To: <cover.1534432819.git.vdavydov.dev@gmail.com> In-Reply-To: <cover.1534432819.git.vdavydov.dev@gmail.com> If the rate at which transactions are ready to write to the database is greater than the dump bandwidth, memory will get depleted before the last dump is complete and all newer transactions will have to wait until the dump has been completed, which may take seconds or even minutes: 2018-08-16 15:45:11.739 [30874] main/1100/main vy_quota.c:291 W> waited for 555 bytes of vinyl memory quota for too long: 15.750 sec This patch set implements transaction throttling that are supposed to help avoid unpredictably long stalls. Now once a dump is started, transaction write rate will be limited so that the hard limit cannot get exceeded before the dump is complete. This is how it looks in the log: 2018-08-16 16:01:09.412 [489] main/445/main I> dumping 134217901 bytes, expected rate 6.0 MB/s, ETA 21.3 s, recent write rate 10.5 MB/s 2018-08-16 16:01:09.447 [489] main/103/vinyl.scheduler I> 513/0: dump started 2018-08-16 16:01:09.447 [489] vinyl.writer.0/103/task I> writing `./513/0/00000000000000000004.run' 2018-08-16 16:01:09.468 [489] main I> throttling enabled, max write rate 6.0 MB/s 2018-08-16 16:01:30.004 [489] vinyl.writer.0/103/task I> writing `./513/0/00000000000000000004.index' 2018-08-16 16:01:30.094 [489] main/103/vinyl.scheduler I> 513/0: dump completed 2018-08-16 16:01:30.095 [489] main/103/vinyl.scheduler I> dumped 134216236 bytes in 20.7 s, rate 6.2 MB/s 2018-08-16 16:01:30.167 [489] main I> throttling disabled Closes #1862 --- src/box/vy_quota.c | 41 ++++++++++++++++++- src/box/vy_quota.h | 13 ++++++ test/vinyl/suite.ini | 2 +- test/vinyl/throttle.result | 95 ++++++++++++++++++++++++++++++++++++++++++++ test/vinyl/throttle.test.lua | 47 ++++++++++++++++++++++ 5 files changed, 195 insertions(+), 3 deletions(-) create mode 100644 test/vinyl/throttle.result create mode 100644 test/vinyl/throttle.test.lua diff --git a/src/box/vy_quota.c b/src/box/vy_quota.c index 471a8bd0..85e18d4b 100644 --- a/src/box/vy_quota.c +++ b/src/box/vy_quota.c @@ -119,7 +119,7 @@ enum { static inline void vy_quota_signal(struct vy_quota *q) { - if (q->used < q->limit) + if (q->used < q->limit && q->use_curr < q->use_max) fiber_cond_signal(&q->cond); } @@ -133,6 +133,8 @@ vy_quota_check_watermark(struct vy_quota *q) if (!q->dump_in_progress && q->used >= q->watermark && q->quota_exceeded_cb(q)) { q->dump_in_progress = true; + q->dump_size = q->used; + q->dump_start = ev_monotonic_now(loop()); say_info("dumping %zu bytes, expected rate %.1f MB/s, " "ETA %.1f s, recent write rate %.1f MB/s", q->used, (double)q->dump_bw / 1024 / 1024, @@ -148,6 +150,7 @@ vy_quota_timer_cb(ev_loop *loop, ev_timer *timer, int events) (void)events; struct vy_quota *q = timer->data; + double now = ev_monotonic_now(loop()); /* * Update the quota use rate with the new measurement. @@ -159,6 +162,34 @@ vy_quota_timer_cb(ev_loop *loop, ev_timer *timer, int events) q->use_curr = 0; /* + * To avoid unpredictably long stalls, we must limit + * the write rate when a dump is in progress so that + * we don't hit the hard limit before the dump has + * completed, i.e. + * + * left_to_use left_to_dump + * ----------- <= ------------ + * use_rate dump_rate + */ + if (q->dump_in_progress) { + size_t dumped = q->dump_bw * (now - q->dump_start); + size_t left_to_dump = (dumped < q->dump_size ? + q->dump_size - dumped : 0); + size_t left_to_use = (q->used < q->limit ? + q->limit - q->used : 0); + double max_use_rate = (left_to_use * q->dump_bw / + (left_to_dump + 1)); + if (q->use_max == SIZE_MAX) + say_info("throttling enabled, max write rate " + "%.1f MB/s", max_use_rate / 1024 / 1024); + q->use_max = VY_QUOTA_UPDATE_INTERVAL * max_use_rate; + } else { + if (q->use_max < SIZE_MAX) + say_info("throttling disabled"); + q->use_max = SIZE_MAX; + } + + /* * Update the quota watermark and trigger memory dump * if the watermark is exceeded. * @@ -172,6 +203,9 @@ vy_quota_timer_cb(ev_loop *loop, ev_timer *timer, int events) q->watermark = MIN(q->limit * VY_QUOTA_WATERMARK_MAX / 100, q->watermark); vy_quota_check_watermark(q); + + /* Wake up the next throttled fiber in the line. */ + vy_quota_signal(q); } int @@ -201,11 +235,14 @@ vy_quota_create(struct vy_quota *q, vy_quota_exceeded_f quota_exceeded_cb) q->watermark = SIZE_MAX; q->used = 0; q->use_curr = 0; + q->use_max = SIZE_MAX; q->use_rate = 0; q->too_long_threshold = TIMEOUT_INFINITY; q->dump_bw = VY_DEFAULT_DUMP_BANDWIDTH; q->quota_exceeded_cb = quota_exceeded_cb; q->dump_in_progress = false; + q->dump_size = 0; + q->dump_start = 0; fiber_cond_create(&q->cond); ev_timer_init(&q->timer, vy_quota_timer_cb, 0, VY_QUOTA_UPDATE_INTERVAL); @@ -277,7 +314,7 @@ vy_quota_try_use(struct vy_quota *q, size_t size, double timeout) double deadline = start_time + timeout; q->used += size; - while (q->used > q->limit) { + while (q->used > q->limit || q->use_curr >= q->use_max) { vy_quota_check_watermark(q); q->used -= size; int rc = fiber_cond_wait_deadline(&q->cond, deadline); diff --git a/src/box/vy_quota.h b/src/box/vy_quota.h index cb681386..ef2fb6cb 100644 --- a/src/box/vy_quota.h +++ b/src/box/vy_quota.h @@ -86,6 +86,13 @@ struct vy_quota { * true, but vy_quota_dump() hasn't been called yet. */ bool dump_in_progress; + /** + * Memory usage at the time when the last dump was started + * (memory dump size). + */ + size_t dump_size; + /** Time when the last dump was started. */ + double dump_start; /** Timer for updating quota watermark. */ ev_timer timer; /** @@ -94,6 +101,12 @@ struct vy_quota { */ size_t use_curr; /** + * Maximal amount of quota that can be used between timer + * callback invocations. It is set to such a value so that + * the quota use rate never exceeds the dump bandwidth. + */ + size_t use_max; + /** * Quota use rate, in bytes per second. * Calculated as exponentially weighted * moving average of use_curr. diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini index b9dae380..785bc63d 100644 --- a/test/vinyl/suite.ini +++ b/test/vinyl/suite.ini @@ -6,5 +6,5 @@ release_disabled = errinj.test.lua errinj_gc.test.lua errinj_vylog.test.lua part config = suite.cfg lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua use_unix_sockets = True -long_run = stress.test.lua large.test.lua write_iterator_rand.test.lua dump_stress.test.lua select_consistency.test.lua +long_run = stress.test.lua large.test.lua write_iterator_rand.test.lua dump_stress.test.lua select_consistency.test.lua throttle.test.lua is_parallel = False diff --git a/test/vinyl/throttle.result b/test/vinyl/throttle.result new file mode 100644 index 00000000..5634c40b --- /dev/null +++ b/test/vinyl/throttle.result @@ -0,0 +1,95 @@ +test_run = require('test_run').new() +--- +... +test_run:cmd("create server test with script='vinyl/low_quota.lua'") +--- +- true +... +test_run:cmd(string.format("start server test with args='%d'", 32 * 1024 * 1024)) +--- +- true +... +test_run:cmd('switch test') +--- +- true +... +fiber = require('fiber') +--- +... +digest = require('digest') +--- +... +box.cfg{snap_io_rate_limit = 4} +--- +... +FIBER_COUNT = 5 +--- +... +TUPLE_SIZE = 1000 +--- +... +TX_TUPLE_COUNT = 10 +--- +... +TX_SIZE = TUPLE_SIZE * TX_TUPLE_COUNT +--- +... +TX_COUNT = math.ceil(box.cfg.vinyl_memory / (TX_SIZE * FIBER_COUNT)) +--- +... +s = box.schema.space.create('test', {engine = 'vinyl'}) +--- +... +_ = s:create_index('primary', {parts = {1, 'unsigned', 2, 'unsigned', 3, 'unsigned'}}) +--- +... +latency = 0 +--- +... +c = fiber.channel(FIBER_COUNT) +--- +... +test_run:cmd("setopt delimiter ';'") +--- +- true +... +for i = 1, FIBER_COUNT do + fiber.create(function() + for j = 1, TX_COUNT do + local t1 = fiber.time() + box.begin() + for k = 1, TX_TUPLE_COUNT do + s:replace{i, j, k, digest.urandom(TUPLE_SIZE)} + end + box.commit() + local t2 = fiber.time() + latency = math.max(latency, t2 - t1) + end + c:put(true) + end) +end; +--- +... +test_run:cmd("setopt delimiter ''"); +--- +- true +... +for i = 1, FIBER_COUNT do c:get() end +--- +... +latency < 0.5 or latency +--- +- true +... +test_run:cmd('switch default') +--- +- true +... +test_run:cmd("stop server test") +--- +- true +... +test_run:cmd("cleanup server test") +--- +- true +... diff --git a/test/vinyl/throttle.test.lua b/test/vinyl/throttle.test.lua new file mode 100644 index 00000000..4efbbec7 --- /dev/null +++ b/test/vinyl/throttle.test.lua @@ -0,0 +1,47 @@ +test_run = require('test_run').new() +test_run:cmd("create server test with script='vinyl/low_quota.lua'") +test_run:cmd(string.format("start server test with args='%d'", 32 * 1024 * 1024)) +test_run:cmd('switch test') + +fiber = require('fiber') +digest = require('digest') + +box.cfg{snap_io_rate_limit = 4} + +FIBER_COUNT = 5 +TUPLE_SIZE = 1000 +TX_TUPLE_COUNT = 10 +TX_SIZE = TUPLE_SIZE * TX_TUPLE_COUNT +TX_COUNT = math.ceil(box.cfg.vinyl_memory / (TX_SIZE * FIBER_COUNT)) + +s = box.schema.space.create('test', {engine = 'vinyl'}) +_ = s:create_index('primary', {parts = {1, 'unsigned', 2, 'unsigned', 3, 'unsigned'}}) + +latency = 0 +c = fiber.channel(FIBER_COUNT) + +test_run:cmd("setopt delimiter ';'") +for i = 1, FIBER_COUNT do + fiber.create(function() + for j = 1, TX_COUNT do + local t1 = fiber.time() + box.begin() + for k = 1, TX_TUPLE_COUNT do + s:replace{i, j, k, digest.urandom(TUPLE_SIZE)} + end + box.commit() + local t2 = fiber.time() + latency = math.max(latency, t2 - t1) + end + c:put(true) + end) +end; +test_run:cmd("setopt delimiter ''"); + +for i = 1, FIBER_COUNT do c:get() end + +latency < 0.5 or latency + +test_run:cmd('switch default') +test_run:cmd("stop server test") +test_run:cmd("cleanup server test") -- 2.11.0
next prev parent reply other threads:[~2018-08-16 16:12 UTC|newest] Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-16 16:11 [PATCH 00/18] Implement write throttling for vinyl Vladimir Davydov 2018-08-16 16:11 ` [PATCH 01/18] vinyl: rework internal quota API Vladimir Davydov 2018-08-20 11:07 ` Konstantin Osipov 2018-08-24 8:32 ` Vladimir Davydov 2018-08-27 18:29 ` Vladimir Davydov 2018-08-16 16:11 ` [PATCH 02/18] vinyl: move quota methods implementation to vy_quota.c Vladimir Davydov 2018-08-20 11:07 ` Konstantin Osipov 2018-08-27 18:30 ` Vladimir Davydov 2018-08-16 16:11 ` [PATCH 03/18] vinyl: move quota related methods and variables from vy_env to vy_quota Vladimir Davydov 2018-08-20 11:08 ` Konstantin Osipov 2018-08-27 18:33 ` Vladimir Davydov 2018-08-16 16:11 ` [PATCH 04/18] vinyl: implement vy_quota_wait using vy_quota_try_use Vladimir Davydov 2018-08-20 11:09 ` Konstantin Osipov 2018-08-27 18:36 ` Vladimir Davydov 2018-08-16 16:11 ` [PATCH 05/18] vinyl: wake up fibers waiting for quota one by one Vladimir Davydov 2018-08-20 11:11 ` Konstantin Osipov 2018-08-24 8:33 ` Vladimir Davydov 2018-08-28 13:19 ` Vladimir Davydov 2018-08-28 14:04 ` Konstantin Osipov 2018-08-28 14:39 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 06/18] vinyl: do not wake up fibers waiting for quota if quota is unavailable Vladimir Davydov 2018-08-20 11:13 ` Konstantin Osipov 2018-08-16 16:12 ` [PATCH 07/18] vinyl: tune dump bandwidth histogram buckets Vladimir Davydov 2018-08-20 11:15 ` Konstantin Osipov 2018-08-28 15:37 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 08/18] vinyl: rename vy_quota::dump_bw to dump_bw_hist Vladimir Davydov 2018-08-20 11:15 ` Konstantin Osipov 2018-08-28 16:04 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 09/18] vinyl: cache dump bandwidth for timer invocation Vladimir Davydov 2018-08-20 11:21 ` Konstantin Osipov 2018-08-28 16:10 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 10/18] vinyl: do not add initial guess to dump bandwidth histogram Vladimir Davydov 2018-08-20 11:23 ` Konstantin Osipov 2018-08-23 20:15 ` Konstantin Osipov 2018-08-28 16:15 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 11/18] vinyl: use snap_io_rate_limit for initial dump bandwidth estimate Vladimir Davydov 2018-08-20 11:24 ` Konstantin Osipov 2018-08-24 8:31 ` Vladimir Davydov 2018-08-28 16:18 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 12/18] histogram: add function for computing lower bound percentile estimate Vladimir Davydov 2018-08-20 11:29 ` [tarantool-patches] " Konstantin Osipov 2018-08-24 8:30 ` Vladimir Davydov 2018-08-28 16:39 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 13/18] vinyl: use lower bound percentile estimate for dump bandwidth Vladimir Davydov 2018-08-28 16:51 ` Vladimir Davydov 2018-08-16 16:12 ` [PATCH 14/18] vinyl: do not try to trigger dump if it is already in progress Vladimir Davydov 2018-08-16 16:12 ` [PATCH 15/18] vinyl: improve dump start/stop logging Vladimir Davydov 2018-08-23 20:18 ` Konstantin Osipov 2018-08-16 16:12 ` [PATCH 16/18] vinyl: confine quota watermark within sane value range Vladimir Davydov 2018-08-16 16:12 ` [PATCH 17/18] vinyl: set quota timer period to 100 ms Vladimir Davydov 2018-08-23 20:49 ` Konstantin Osipov 2018-08-24 8:18 ` Vladimir Davydov 2018-08-16 16:12 ` Vladimir Davydov [this message] 2018-08-23 20:54 ` [PATCH 18/18] vinyl: throttle tx rate if dump does not catch up Konstantin Osipov 2018-08-23 20:58 ` [tarantool-patches] " Konstantin Osipov 2018-08-24 8:21 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c9f4d18f0303dccf0320e89110a62103c54c675c.1534432819.git.vdavydov.dev@gmail.com \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH 18/18] vinyl: throttle tx rate if dump does not catch up' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox