[Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry
v.shpilevoy at tarantool.org
Fri Mar 26 23:44:56 MSK 2021
Hi! Thanks for working on this!
> diff --git a/src/box/applier.cc b/src/box/applier.cc
> index 5a88a013e..326cf18d2 100644
> --- a/src/box/applier.cc
> +++ b/src/box/applier.cc
> @@ -566,9 +566,16 @@ applier_register(struct applier *applier)
> row.type = IPROTO_REGISTER;
> coio_write_xrow(coio, &row);
> - applier_set_state(applier, APPLIER_REGISTER);
> + /*
> + * Register may serve as a retry for final join. Set corresponding
> + * states to unblock anyone who's waiting for final join to start or
> + * end.
> + */
> + applier_set_state(applier, was_anon ? APPLIER_REGISTER :
> + APPLIER_FINAL_JOIN);
> applier_wait_register(applier, 0);
> - applier_set_state(applier, APPLIER_REGISTERED);
> + applier_set_state(applier, was_anon ? APPLIER_REGISTERED :
> + APPLIER_JOINED);
> applier_set_state(applier, APPLIER_READY);
Hm. I don't understand. Transition from anon to non-anon leads to
re-creation of all appliers. It calls box_sync_replication() and
creates new struct applier objects. How is it possible that during one
life of a reader fiber it manages to see 2 states and is not terminated?
Also could you please provide a test? Maybe it would be easier to see
what is happening then.
More information about the Tarantool-patches