[Tarantool-patches] [PATCH v10 3/4] box/applier: prevent nil dereference on applier rollback

Cyrill Gorcunov gorcunov at gmail.com
Sat Feb 15 21:24:59 MSK 2020


On Sat, Feb 15, 2020 at 08:37:41PM +0300, Konstantin Osipov wrote:
> * Cyrill Gorcunov <gorcunov at gmail.com> [20/02/14 17:06]:
> > +	/*
> > +	 * FIXME: Do not clear fiber()->diag since it
> > +	 * cause nil dereference
> > +	 *
> > +	 *   applier_subscribe
> > +	 *     applier_apply_tx
> > +	 *       diag_raise
> > +	 *
> > +	 * In turn we need to redesign this code:
> > +	 *  - preserve original error or log it somewhere
> > +	 *  - make the error path more clear
> > +	 *
> > +	 *  We must never reach this point with clean diag
> > +	 *  area, if we do it means we're simply screwed
> > +	 *  somewhere and there is a bug.
> 
> I think this comment is obsolete now with the fix below.

The thing is that I saw once the error path which may end
up here without setting diag (well, to be honest i can't
plot a call graph right now but it was related to our
"light" salad helpers and [probably] bptree). That's why
I don't wanted to put assert. But if you prefer sure, I
may update the comment and put assert.

> 
> You no longer clear fiber->diag.
> 
> > +	 */
> > +
> > +	if (!diag_is_empty(diag_get()))
> > +		diag_log();
> > +	else
> > +		say_warn_ratelimited("applier_txn_rollback_cb: empty diag");
> 
> You can also add assert here for debug mode. It should never
> happen.
> 
> > +
> >  	/* Setup shared applier diagnostic area. */
> >  	diag_set(ClientError, ER_WAL_IO);
> > -	diag_move(&fiber()->diag, &replicaset.applier.diag);
> > +	diag_add_error(&replicaset.applier.diag,
> > +		       diag_last_error(&fiber()->diag));
> > +
> 
> It would be nice to explain in the comment why you want to
> preserve the original error in the fiber here: because when later
> this fiber is joined in (add call site here), we may want to check
> its diagnostics area.

You know I think we might need to drop replicaset.diag completely
and use applier.diag instead. But again, Kostya, I don't fully
understand this code yet, so I need more time to come with some
clean solution (which I plan to implement on the top).

> 
> >  	/* Broadcast the rollback event across all appliers. */
> >  	trigger_run(&replicaset.applier.on_rollback, event);
> >  	/* Rollback applier vclock to the committed one. */
> 
> -- 
> Konstantin Osipov, Moscow, Russia
> 

	Cyrill


More information about the Tarantool-patches mailing list