[Tarantool-patches] [PATCH v5 3/5] box/applier: fix nil dereference in applier rollback
Konstantin Osipov
kostja.osipov at gmail.com
Wed Feb 5 01:11:10 MSK 2020
* Cyrill Gorcunov <gorcunov at gmail.com> [20/01/28 10:16]:
> +
> + /*
> + * We must not loose the origin error, instead
> + * lets keep it in replicaset diag instance.
> + *
> + * FIXME: We need to revisit this code and
> + * figure out if we can reconnect and retry
> + * the prelication process instead of cancelling
> + * applier with FiberIsCancelled.
First of all, we're dealing with a regression introduced by the
parallel applier patch.
Could you please describe what is triggering the error?
> + /*
> + * If information is already lost
> + * (say xlog cleared diag instance)
I don't understand this comment. How can it be lost exactly?
> + * setup general ClientError, seriously
> + * we need to unweave this mess, if error
> + * happened it must never been cleared
> + * until error handling in rollback.
:-/
> + */
> + diag_set(ClientError, ER_WAL_IO);
> + e = diag_last_error(diag_get());
> + }
> + diag_add_error(&replicaset.applier.diag, e);
> +
> /* Broadcast the rollback event across all appliers. */
> trigger_run(&replicaset.applier.on_rollback, event);
> /* Rollback applier vclock to the committed one. */
> @@ -849,8 +871,20 @@ applier_on_rollback(struct trigger *trigger, void *event)
> diag_add_error(&applier->diag,
> diag_last_error(&replicaset.applier.diag));
> }
> - /* Stop the applier fiber. */
> +
> + /*
> + * Something really bad happened, we can't proceed
> + * thus stop the applier and throw FiberIsCancelled
> + * exception which will be catched by the caller
> + * and the fiber gracefully finish.
> + *
> + * FIXME: Need to make sure that this is a really
> + * final error where we can't longer proceed and should
> + * zap the applier, probably we could reconnect and
> + * retry instead?
> + */
> fiber_cancel(applier->reader);
Let's begin by explaining why we need to cancel the reader fiber here.
> + diag_set(FiberIsCancelled);
This is clearly a clutch: first you make an effort to set
replicaset.applier.diag and then it is not used by diag_raise().
--
Konstantin Osipov, Moscow, Russia
More information about the Tarantool-patches
mailing list