[Tarantool-patches] [PATCH] vinyl: unthrottle scheduler on checkpoint

Konstantin Osipov kostja.osipov at gmail.com
Wed Apr 22 03:32:28 MSK 2020


* Vladislav Shpilevoy <v.shpilevoy at tarantool.org> [20/04/22 02:17]:

Vinyl dump is based on memtable state, not a schedule.
If each dump is unthrottling the scheduler, then what's the point
of throttling? Having a nice infinite loop on ENOSPC?

> Hi! Thanks for the patch!
> 
> On 31/03/2020 16:42, Nikita Pettik wrote:
> > Before this patch box.snapshot() bails out immediately if it sees that
> > the scheduler is throttled due to errors. For instance:
> > 
> > box.error.injection.set('ERRINJ_VY_RUN_WRITE', true)
> > snapshot() -- fails due to ERRINJ_VY_RUN_WRITE
> > box.error.injection.set('ERRINJ_VY_RUN_WRITE', false)
> > snapshot() -- still fails despite the fact that injected error is unset
> > 
> > As a result, one has to wait up to a minute to make a snapshot. The
> > reason why throttling was introduced was to avoid flooding the log
> > in case of repeating disk errors.
> > On the other hand, checkpoint process is either called manually or on
> > schedule. What is more, to deal with schedule throttling in tests, we
> > had to introduce a new error injection (ERRINJ_VY_SCHED_TIMEOUT).
> > It reduces time duration during which the scheduler remains throttled,
> > which is ugly and race prone.
> > So, let's unthrottle scheduler when checkpoint process is launched.
> > 
> > Closes #3519
> > ---
> > Note that VY_SCHED_TIMEOUT error injection is not completely removed
> > from tests, since at least one of them fails (instance crashes) without
> > it (vinyl/errinj_stat.test.lua). It's not a problem of this patch -
> > reproducer is extracted and filed in gh-4821.
> > 
> > Branch: https://github.com/tarantool/tarantool/tree/np/gh-3519-unthrottle-sched
> > Issue: https://github.com/tarantool/tarantool/issues/3519
> > 
> >  src/box/vy_scheduler.c           | 14 +++++---------
> >  test/box/errinj.result           |  8 --------
> >  test/box/errinj.test.lua         |  2 --
> >  test/vinyl/errinj.result         | 12 ++----------
> >  test/vinyl/errinj.test.lua       |  5 +----
> >  test/vinyl/errinj_vylog.result   | 14 --------------
> >  test/vinyl/errinj_vylog.test.lua |  4 ----
> >  7 files changed, 8 insertions(+), 51 deletions(-)
> > 
> > diff --git a/src/box/vy_scheduler.c b/src/box/vy_scheduler.c
> > index 9dba93d34..f57f10119 100644
> > --- a/src/box/vy_scheduler.c
> > +++ b/src/box/vy_scheduler.c
> > @@ -687,17 +687,13 @@ vy_scheduler_begin_checkpoint(struct vy_scheduler *scheduler)
> >  	assert(!scheduler->checkpoint_in_progress);
> >  
> >  	/*
> > -	 * If the scheduler is throttled due to errors, do not wait
> > -	 * until it wakes up as it may take quite a while. Instead
> > -	 * fail checkpoint immediately with the last error seen by
> > -	 * the scheduler.
> > +	 * It makes no sense throttling checkpoint process since
> > +	 * it can be either ran manually or due to timeout. At this
> > +	 * point let's ignore it.
> 
> Kostja's questio is fair. Can it be done for manual snapshots only?
> Automated checkpoints already have problems with killing the disk,
> when multiple instances on the same machine start them at the same
> time. With unthrotled scheduler it is going to get worse.
> 
> However, if this is hard to detect, then it is ok. Just do a quick
> check if it is possible.
> 
> >  	 */
> >  	if (scheduler->is_throttled) {
> > -		struct error *e = diag_last_error(&scheduler->diag);
> > -		diag_add_error(diag_get(), e);
> > -		say_error("cannot checkpoint vinyl, "
> > -			  "scheduler is throttled with: %s", e->errmsg);
> > -		return -1;
> > +		say_info("scheduler is unthrottled due to checkpoint process");
> > +		scheduler->is_throttled = false;
> 
> Is there a simple way to let the throttling continue after the
> checkpoint is finished?
> 
> >  	}
> >  
> >  	if (!vy_scheduler_dump_in_progress(scheduler)) {

-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list