[Tarantool-patches] [PATCH v11 7/8] box/applier: prevent nil dereference on applier rollback

Serge Petrenko sergepetrenko at tarantool.org
Tue Apr 7 13:36:45 MSK 2020


Hi! Thanks for the patch.

> 4 апр. 2020 г., в 19:15, Cyrill Gorcunov <gorcunov at gmail.com> написал(а):
> 
> Currently when transaction rollback happens we just drop an existing
> error setting ClientError to the replicaset.applier.diag. This action
> leaves current fiber with diag=nil, which in turn leads to sigsegv once
> diag_raise() called right after applier_apply_tx():
> 
> | applier_f
> |   try {
> |   applier_subscribe
> |     applier_apply_tx
> |       // error happens
> |       txn_rollback
> |         diag_set(ClientError, ER_WAL_IO)
> |         diag_move(&fiber()->diag, &replicaset.applier.diag)
> |         // fiber->diag = nil
> |       applier_on_rollback
> |         diag_add_error(&applier->diag, diag_last_error(&replicaset.applier.diag)
> |         fiber_cancel(applier->reader);
> |     diag_raise() -> NULL dereference
> |   } catch { ... }
> 
> Thus:
> - use diag_set_error() instead of diag_move() to not drop error
>   from a current fiber() preventing a nil dereference;
> - put fixme mark into the code: we need to rework it in a
>   more sense way.
> 
> Fixes #4730
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
> ---
> src/box/applier.cc | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 2f9c9c797..68de3c08c 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -692,9 +692,22 @@ static int
> applier_txn_rollback_cb(struct trigger *trigger, void *event)
> {
> 	(void) trigger;
> -	/* Setup shared applier diagnostic area. */
> +
> +	/*
> +	 * Setup shared applier diagnostic area.
> +	 *
> +	 * FIXME: We should consider redesign this
> +	 * moment and instead of carrying one shared
> +	 * diag use per-applier diag instead all the time
> +	 * (which actually already present in the structure).
> +	 *
> +	 * But remember that transactions are asynchronous
> +	 * and rollback may happen a way latter after it
> +	 * passed to the journal engine.
> +	 */
> 	diag_set(ClientError, ER_WAL_IO);
> -	diag_move(&fiber()->diag, &replicaset.applier.diag);
> +	diag_set_error(&replicaset.applier.diag,
> +		       diag_last_error(diag_get()));
> 
> 	/* Broadcast the rollback event across all appliers. */
> 	trigger_run(&replicaset.applier.on_rollback, event);
>> 2.20.1
> 

LGTM.


--
Serge Petrenko
sergepetrenko at tarantool.org




More information about the Tarantool-patches mailing list