[Tarantool-patches] [PATCH v7 3/5] box/applier: fix nil dereference in applier rollback

Cyrill Gorcunov gorcunov at gmail.com
Wed Feb 5 10:33:36 MSK 2020


On Wed, Feb 05, 2020 at 01:19:02AM +0300, Konstantin Osipov wrote:
> * Cyrill Gorcunov <gorcunov at gmail.com> [20/01/28 22:58]:
> > +	/*
> > +	 * Something really bad happened, we can't proceed
> > +	 * thus stop the applier and throw FiberIsCancelled
> > +	 * exception which will be catched by the caller
> > +	 * and the fiber gracefully finish.
> > +	 *
> > +	 * FIXME: Need to make sure that this is a really
> > +	 * final error where we can't longer proceed and should
> > +	 * zap the applier, probably we could reconnect and
> > +	 * retry instead?
> > +	 */
> >  	fiber_cancel(applier->reader);
> > +	diag_set(FiberIsCancelled);
> 
> Now that I have seen the entire series and the test case, I think
> there are two different issues here and only one of them is
> leading to a crash.
> 
> One, is that we reset the original error with some vague
> ER_WAL_IO. This shouldn't happen, but it's harmless. Still, your
> fix for it is good. I don't think though it should check for !e.
> 
> Two, is a failure to propagate the error from the cancelled
> applier fiber. 
> 
> I think it's better to fix the call site to raise
> replicaset.applier.diag than to add a yet another vague error
> (fiberiscancelled) topped with prolific comments about how
> the current code is broken.

OK, thanks a huge, Kostya! I read all your mails about this
series, and really appreciate the feedback! I'll rework the
patchset.


More information about the Tarantool-patches mailing list