[Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry
    Vladislav Shpilevoy 
    v.shpilevoy at tarantool.org
       
    Fri Mar 26 23:44:56 MSK 2021
    
    
  
Hi! Thanks for working on this!
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 5a88a013e..326cf18d2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -566,9 +566,16 @@ applier_register(struct applier *applier)
>  	row.type = IPROTO_REGISTER;
>  	coio_write_xrow(coio, &row);
>  
> -	applier_set_state(applier, APPLIER_REGISTER);
> +	/*
> +	 * Register may serve as a retry for final join. Set corresponding
> +	 * states to unblock anyone who's waiting for final join to start or
> +	 * end.
> +	 */
> +	applier_set_state(applier, was_anon ? APPLIER_REGISTER :
> +					      APPLIER_FINAL_JOIN);
>  	applier_wait_register(applier, 0);
> -	applier_set_state(applier, APPLIER_REGISTERED);
> +	applier_set_state(applier, was_anon ? APPLIER_REGISTERED :
> +					      APPLIER_JOINED);
>  	applier_set_state(applier, APPLIER_READY);
Hm. I don't understand. Transition from anon to non-anon leads to
re-creation of all appliers. It calls box_sync_replication() and
creates new struct applier objects. How is it possible that during one
life of a reader fiber it manages to see 2 states and is not terminated?
Also could you please provide a test? Maybe it would be easier to see
what is happening then.
    
    
More information about the Tarantool-patches
mailing list