From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp38.i.mail.ru (smtp38.i.mail.ru [94.100.177.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 533C14696C3 for ; Mon, 27 Apr 2020 15:17:02 +0300 (MSK) Date: Mon, 27 Apr 2020 12:17:01 +0000 From: Nikita Pettik Message-ID: <20200427121701.GA30870@tarantool.org> References: <3f1e18cc-cc18-b586-825b-3e4e3a9c8e3f@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <3f1e18cc-cc18-b586-825b-3e4e3a9c8e3f@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH] vinyl: unthrottle scheduler on checkpoint List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org On 22 Apr 01:13, Vladislav Shpilevoy wrote: > Hi! Thanks for the patch! > > > > > /* > > - * If the scheduler is throttled due to errors, do not wait > > - * until it wakes up as it may take quite a while. Instead > > - * fail checkpoint immediately with the last error seen by > > - * the scheduler. > > + * It makes no sense throttling checkpoint process since > > + * it can be either ran manually or due to timeout. At this > > + * point let's ignore it. > > Kostja's questio is fair. Can it be done for manual snapshots only? > Automated checkpoints already have problems with killing the disk, > when multiple instances on the same machine start them at the same > time. With unthrotled scheduler it is going to get worse. > > However, if this is hard to detect, then it is ok. Just do a quick > check if it is possible. Checkpoint daemon uses directly box.snapshot(), so now we can't tell whether checkpoint is launched manually or automatically. To differ these scenarious we can make checkpoint daemon call sort of box.__scheduled_snapshot() (which won't be part of public API ofc). Then we will be able to pass boolean parameter to begin_checkpoint() indicating manual/auto mode. Or simply close issue as wont fix :) > > */ > > if (scheduler->is_throttled) { > > - struct error *e = diag_last_error(&scheduler->diag); > > - diag_add_error(diag_get(), e); > > - say_error("cannot checkpoint vinyl, " > > - "scheduler is throttled with: %s", e->errmsg); > > - return -1; > > + say_info("scheduler is unthrottled due to checkpoint process"); > > + scheduler->is_throttled = false; > > Is there a simple way to let the throttling continue after the > checkpoint is finished? Hm, not simple but..For instance, we can introduce another one flag "is_checkpoint_aborted" alonside with workaround (tests seem to pass): diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c index f57f10119..666d0a008 100644 --- a/src/box/vy_scheduler.c +++ b/src/box/vy_scheduler.c @@ -615,7 +615,8 @@ vy_scheduler_dump(struct vy_scheduler *scheduler) /* Wait for dump to complete. */ while (vy_scheduler_dump_in_progress(scheduler)) { - if (scheduler->is_throttled) { + if (scheduler->is_throttled && + !scheduler->checkpoint_in_progress) { /* Dump error occurred. */ struct error *e = diag_last_error(&scheduler->diag); diag_add_error(diag_get(), e); @@ -692,8 +693,7 @@ vy_scheduler_begin_checkpoint(struct vy_scheduler *scheduler) * point let's ignore it. */ if (scheduler->is_throttled) { - say_info("scheduler is unthrottled due to checkpoint process"); - scheduler->is_throttled = false; + say_info("scheduler is temporary unthrottled due to checkpoint process"); } if (!vy_scheduler_dump_in_progress(scheduler)) { @@ -707,6 +707,7 @@ vy_scheduler_begin_checkpoint(struct vy_scheduler *scheduler) } scheduler->generation++; scheduler->checkpoint_in_progress = true; + scheduler->is_checkpoint_aborted = false; fiber_cond_signal(&scheduler->scheduler_cond); say_info("vinyl checkpoint started"); return 0; @@ -723,7 +724,7 @@ vy_scheduler_wait_checkpoint(struct vy_scheduler *scheduler) * checkpoint started have been dumped. */ while (vy_scheduler_dump_in_progress(scheduler)) { - if (scheduler->is_throttled) { + if (scheduler->is_checkpoint_aborted) { /* A dump error occurred, abort checkpoint. */ struct error *e = diag_last_error(&scheduler->diag); diag_add_error(diag_get(), e); @@ -2104,7 +2105,10 @@ error: scheduler->timeout = inj->dparam; say_warn("throttling scheduler for %.0f second(s)", scheduler->timeout); - scheduler->is_throttled = true; + if (scheduler->checkpoint_in_progress) + scheduler->is_checkpoint_aborted = true; + else + scheduler->is_throttled = true; fiber_sleep(scheduler->timeout); scheduler->is_throttled = false; }