[Tarantool-patches] [PATCH] vinyl: unthrottle scheduler on checkpoint

Nikita Pettik korablev at tarantool.org
Mon Apr 27 15:17:01 MSK 2020


On 22 Apr 01:13, Vladislav Shpilevoy wrote:
> Hi! Thanks for the patch!
> 
> >  
> >  	/*
> > -	 * If the scheduler is throttled due to errors, do not wait
> > -	 * until it wakes up as it may take quite a while. Instead
> > -	 * fail checkpoint immediately with the last error seen by
> > -	 * the scheduler.
> > +	 * It makes no sense throttling checkpoint process since
> > +	 * it can be either ran manually or due to timeout. At this
> > +	 * point let's ignore it.
> 
> Kostja's questio is fair. Can it be done for manual snapshots only?
> Automated checkpoints already have problems with killing the disk,
> when multiple instances on the same machine start them at the same
> time. With unthrotled scheduler it is going to get worse.
> 
> However, if this is hard to detect, then it is ok. Just do a quick
> check if it is possible.

Checkpoint daemon uses directly box.snapshot(), so now we can't tell
whether checkpoint is launched manually or automatically. To differ
these scenarious we can make checkpoint daemon call sort of
box.__scheduled_snapshot() (which won't be part of public API ofc).
Then we will be able to pass boolean parameter to begin_checkpoint()
indicating manual/auto mode. Or simply close issue as wont fix :)
 
> >  	 */
> >  	if (scheduler->is_throttled) {
> > -		struct error *e = diag_last_error(&scheduler->diag);
> > -		diag_add_error(diag_get(), e);
> > -		say_error("cannot checkpoint vinyl, "
> > -			  "scheduler is throttled with: %s", e->errmsg);
> > -		return -1;
> > +		say_info("scheduler is unthrottled due to checkpoint process");
> > +		scheduler->is_throttled = false;
> 
> Is there a simple way to let the throttling continue after the
> checkpoint is finished?

Hm, not simple but..For instance, we can introduce another one flag
"is_checkpoint_aborted" alonside with workaround (tests seem to pass):

diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
index f57f10119..666d0a008 100644
--- a/src/box/vy_scheduler.c
+++ b/src/box/vy_scheduler.c
@@ -615,7 +615,8 @@ vy_scheduler_dump(struct vy_scheduler *scheduler)
 
        /* Wait for dump to complete. */
        while (vy_scheduler_dump_in_progress(scheduler)) {
-               if (scheduler->is_throttled) {
+               if (scheduler->is_throttled &&
+                   !scheduler->checkpoint_in_progress) {
                        /* Dump error occurred. */
                        struct error *e = diag_last_error(&scheduler->diag);
                        diag_add_error(diag_get(), e);
@@ -692,8 +693,7 @@ vy_scheduler_begin_checkpoint(struct vy_scheduler *scheduler)
         * point let's ignore it.
         */
        if (scheduler->is_throttled) {
-               say_info("scheduler is unthrottled due to checkpoint process");
-               scheduler->is_throttled = false;
+               say_info("scheduler is temporary unthrottled due to checkpoint process");
        }
 
        if (!vy_scheduler_dump_in_progress(scheduler)) {
@@ -707,6 +707,7 @@ vy_scheduler_begin_checkpoint(struct vy_scheduler *scheduler)
        }
        scheduler->generation++;
        scheduler->checkpoint_in_progress = true;
+       scheduler->is_checkpoint_aborted = false;
        fiber_cond_signal(&scheduler->scheduler_cond);
        say_info("vinyl checkpoint started");
        return 0;
@@ -723,7 +724,7 @@ vy_scheduler_wait_checkpoint(struct vy_scheduler *scheduler)
         * checkpoint started have been dumped.
         */
        while (vy_scheduler_dump_in_progress(scheduler)) {
-               if (scheduler->is_throttled) {
+               if (scheduler->is_checkpoint_aborted) {
                        /* A dump error occurred, abort checkpoint. */
                        struct error *e = diag_last_error(&scheduler->diag);
                        diag_add_error(diag_get(), e);
@@ -2104,7 +2105,10 @@ error:
                        scheduler->timeout = inj->dparam;
                say_warn("throttling scheduler for %.0f second(s)",
                         scheduler->timeout);
-               scheduler->is_throttled = true;
+               if (scheduler->checkpoint_in_progress)
+                       scheduler->is_checkpoint_aborted = true;
+               else
+                       scheduler->is_throttled = true;
                fiber_sleep(scheduler->timeout);
                scheduler->is_throttled = false;
        }


More information about the Tarantool-patches mailing list