[Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Mar 26 23:44:56 MSK 2021


Hi! Thanks for working on this!

> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 5a88a013e..326cf18d2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -566,9 +566,16 @@ applier_register(struct applier *applier)
>  	row.type = IPROTO_REGISTER;
>  	coio_write_xrow(coio, &row);
>  
> -	applier_set_state(applier, APPLIER_REGISTER);
> +	/*
> +	 * Register may serve as a retry for final join. Set corresponding
> +	 * states to unblock anyone who's waiting for final join to start or
> +	 * end.
> +	 */
> +	applier_set_state(applier, was_anon ? APPLIER_REGISTER :
> +					      APPLIER_FINAL_JOIN);
>  	applier_wait_register(applier, 0);
> -	applier_set_state(applier, APPLIER_REGISTERED);
> +	applier_set_state(applier, was_anon ? APPLIER_REGISTERED :
> +					      APPLIER_JOINED);
>  	applier_set_state(applier, APPLIER_READY);

Hm. I don't understand. Transition from anon to non-anon leads to
re-creation of all appliers. It calls box_sync_replication() and
creates new struct applier objects. How is it possible that during one
life of a reader fiber it manages to see 2 states and is not terminated?

Also could you please provide a test? Maybe it would be easier to see
what is happening then.


More information about the Tarantool-patches mailing list