From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 7C8314696C3 for ; Wed, 22 Apr 2020 03:32:31 +0300 (MSK) Received: by mail-lf1-f67.google.com with SMTP id u10so198107lfo.8 for ; Tue, 21 Apr 2020 17:32:31 -0700 (PDT) Date: Wed, 22 Apr 2020 03:32:28 +0300 From: Konstantin Osipov Message-ID: <20200422003228.GA19601@atlas> References: <3f1e18cc-cc18-b586-825b-3e4e3a9c8e3f@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3f1e18cc-cc18-b586-825b-3e4e3a9c8e3f@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH] vinyl: unthrottle scheduler on checkpoint List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org * Vladislav Shpilevoy [20/04/22 02:17]: Vinyl dump is based on memtable state, not a schedule. If each dump is unthrottling the scheduler, then what's the point of throttling? Having a nice infinite loop on ENOSPC? > Hi! Thanks for the patch! > > On 31/03/2020 16:42, Nikita Pettik wrote: > > Before this patch box.snapshot() bails out immediately if it sees that > > the scheduler is throttled due to errors. For instance: > > > > box.error.injection.set('ERRINJ_VY_RUN_WRITE', true) > > snapshot() -- fails due to ERRINJ_VY_RUN_WRITE > > box.error.injection.set('ERRINJ_VY_RUN_WRITE', false) > > snapshot() -- still fails despite the fact that injected error is unset > > > > As a result, one has to wait up to a minute to make a snapshot. The > > reason why throttling was introduced was to avoid flooding the log > > in case of repeating disk errors. > > On the other hand, checkpoint process is either called manually or on > > schedule. What is more, to deal with schedule throttling in tests, we > > had to introduce a new error injection (ERRINJ_VY_SCHED_TIMEOUT). > > It reduces time duration during which the scheduler remains throttled, > > which is ugly and race prone. > > So, let's unthrottle scheduler when checkpoint process is launched. > > > > Closes #3519 > > --- > > Note that VY_SCHED_TIMEOUT error injection is not completely removed > > from tests, since at least one of them fails (instance crashes) without > > it (vinyl/errinj_stat.test.lua). It's not a problem of this patch - > > reproducer is extracted and filed in gh-4821. > > > > Branch: https://github.com/tarantool/tarantool/tree/np/gh-3519-unthrottle-sched > > Issue: https://github.com/tarantool/tarantool/issues/3519 > > > > src/box/vy_scheduler.c | 14 +++++--------- > > test/box/errinj.result | 8 -------- > > test/box/errinj.test.lua | 2 -- > > test/vinyl/errinj.result | 12 ++---------- > > test/vinyl/errinj.test.lua | 5 +---- > > test/vinyl/errinj_vylog.result | 14 -------------- > > test/vinyl/errinj_vylog.test.lua | 4 ---- > > 7 files changed, 8 insertions(+), 51 deletions(-) > > > > diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c > > index 9dba93d34..f57f10119 100644 > > --- a/src/box/vy_scheduler.c > > +++ b/src/box/vy_scheduler.c > > @@ -687,17 +687,13 @@ vy_scheduler_begin_checkpoint(struct vy_scheduler *scheduler) > > assert(!scheduler->checkpoint_in_progress); > > > > /* > > - * If the scheduler is throttled due to errors, do not wait > > - * until it wakes up as it may take quite a while. Instead > > - * fail checkpoint immediately with the last error seen by > > - * the scheduler. > > + * It makes no sense throttling checkpoint process since > > + * it can be either ran manually or due to timeout. At this > > + * point let's ignore it. > > Kostja's questio is fair. Can it be done for manual snapshots only? > Automated checkpoints already have problems with killing the disk, > when multiple instances on the same machine start them at the same > time. With unthrotled scheduler it is going to get worse. > > However, if this is hard to detect, then it is ok. Just do a quick > check if it is possible. > > > */ > > if (scheduler->is_throttled) { > > - struct error *e = diag_last_error(&scheduler->diag); > > - diag_add_error(diag_get(), e); > > - say_error("cannot checkpoint vinyl, " > > - "scheduler is throttled with: %s", e->errmsg); > > - return -1; > > + say_info("scheduler is unthrottled due to checkpoint process"); > > + scheduler->is_throttled = false; > > Is there a simple way to let the throttling continue after the > checkpoint is finished? > > > } > > > > if (!vy_scheduler_dump_in_progress(scheduler)) { -- Konstantin Osipov, Moscow, Russia