[Tarantool-patches] [PATCH v5 3/5] box/applier: fix nil dereference in applier rollback

Konstantin Osipov kostja.osipov at gmail.com
Wed Feb 5 01:11:10 MSK 2020


* Cyrill Gorcunov <gorcunov at gmail.com> [20/01/28 10:16]:
> +
> +	/*
> +	 * We must not loose the origin error, instead
> +	 * lets keep it in replicaset diag instance.
> +	 *
> +	 * FIXME: We need to revisit this code and
> +	 * figure out if we can reconnect and retry
> +	 * the prelication process instead of cancelling
> +	 * applier with FiberIsCancelled.

First of all, we're dealing with a regression introduced by the 
parallel applier patch. 

Could you please describe what is triggering the error? 

> +		/*
> +		 * If information is already lost
> +		 * (say xlog cleared diag instance)

I don't understand this comment. How can it be lost exactly?

> +		 * setup general ClientError, seriously
> +		 * we need to unweave this mess, if error
> +		 * happened it must never been cleared
> +		 * until error handling in rollback.

:-/

> +		 */
> +		diag_set(ClientError, ER_WAL_IO);
> +		e = diag_last_error(diag_get());
> +	}
> +	diag_add_error(&replicaset.applier.diag, e);
> +
>  	/* Broadcast the rollback event across all appliers. */
>  	trigger_run(&replicaset.applier.on_rollback, event);
>  	/* Rollback applier vclock to the committed one. */
> @@ -849,8 +871,20 @@ applier_on_rollback(struct trigger *trigger, void *event)
>  		diag_add_error(&applier->diag,
>  			       diag_last_error(&replicaset.applier.diag));
>  	}
> -	/* Stop the applier fiber. */
> +
> +	/*
> +	 * Something really bad happened, we can't proceed
> +	 * thus stop the applier and throw FiberIsCancelled
> +	 * exception which will be catched by the caller
> +	 * and the fiber gracefully finish.
> +	 *
> +	 * FIXME: Need to make sure that this is a really
> +	 * final error where we can't longer proceed and should
> +	 * zap the applier, probably we could reconnect and
> +	 * retry instead?
> +	 */
>  	fiber_cancel(applier->reader);

Let's begin by explaining why we need to cancel the reader fiber here.




> +	diag_set(FiberIsCancelled);

This is clearly a clutch: first you make an effort to set
replicaset.applier.diag and then it is not used by diag_raise().


-- 
Konstantin Osipov, Moscow, Russia


More information about the Tarantool-patches mailing list