From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry Date: Sat, 27 Mar 2021 19:52:26 +0300 [thread overview] Message-ID: <a970ecab-3f50-4688-dd0e-842524c89c1c@tarantool.org> (raw) In-Reply-To: <2ab0844e-84b4-6701-15ab-652ab6f18075@tarantool.org> 26.03.2021 23:44, Vladislav Shpilevoy пишет: > Hi! Thanks for working on this! > >> diff --git a/src/box/applier.cc b/src/box/applier.cc >> index 5a88a013e..326cf18d2 100644 >> --- a/src/box/applier.cc >> +++ b/src/box/applier.cc >> @@ -566,9 +566,16 @@ applier_register(struct applier *applier) >> row.type = IPROTO_REGISTER; >> coio_write_xrow(coio, &row); >> >> - applier_set_state(applier, APPLIER_REGISTER); >> + /* >> + * Register may serve as a retry for final join. Set corresponding >> + * states to unblock anyone who's waiting for final join to start or >> + * end. >> + */ >> + applier_set_state(applier, was_anon ? APPLIER_REGISTER : >> + APPLIER_FINAL_JOIN); >> applier_wait_register(applier, 0); >> - applier_set_state(applier, APPLIER_REGISTERED); >> + applier_set_state(applier, was_anon ? APPLIER_REGISTERED : >> + APPLIER_JOINED); >> applier_set_state(applier, APPLIER_READY); > Hm. I don't understand. Transition from anon to non-anon leads to > re-creation of all appliers. It calls box_sync_replication() and > creates new struct applier objects. How is it possible that during one > life of a reader fiber it manages to see 2 states and is not terminated? You're correct. This isn't possible for an applier to see two states, anon and not anon. The flag is still needed though for the case when a normal replica receives some transient error during final join. In this case applier reconnects and we get to the next applier loop iteration. First it checks whether REPLICASET_UUID is nil. It isn't, because initial join succeeded. Then it checks whether instance_id is 0. It is, because final join failed. Applier now assumes that the replica was anonymous and tries to register. The hang I'm talking about is in `bootstrap_from_master()`. It waits until applier enters APPLIER_JOINED state, which never happened before this patch. So, `was_anon` comes in play only when final join fails and is retried. > > Also could you please provide a test? Maybe it would be easier to see > what is happening then. Ok. I'm not sure this test is needed because this is implicitly tested in gh-5566-final-join-synchro test. A test would be as follows: master: box.cfg{listen=3301, replication_synchro_quorum=10} box.space._cluster:alter{is_sync=true} box.schema.user.grant("guest", "replication") replica: box.cfg{replication=3301} master: wait until replica receives ER_SYNC_QUORUM_TIMEOUT, and then: box.cfg{replication_synchro_quorum=1} This test passes on the branch, meaning replica's box.cfg completes successfully, but it would hang indefinitely without this commit. -- Serge Petrenko
next prev parent reply other threads:[~2021-03-27 16:52 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-03-24 12:24 [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry Serge Petrenko via Tarantool-patches 2021-03-26 20:44 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 16:52 ` Serge Petrenko via Tarantool-patches [this message] 2021-03-29 21:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 2/7] applier: extract tx boundary checks from applier_read_tx into a separate routine Serge Petrenko via Tarantool-patches 2021-03-26 12:35 ` Cyrill Gorcunov via Tarantool-patches 2021-03-27 16:54 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 3/7] applier: extract plain tx application from applier_apply_tx() Serge Petrenko via Tarantool-patches 2021-03-26 20:47 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 17:34 ` Serge Petrenko via Tarantool-patches 2021-03-27 18:30 ` [Tarantool-patches] [PATCH v2 3.5/7] applier: fix not releasing the latch on apply_synchro_row() fail Serge Petrenko via Tarantool-patches 2021-03-29 21:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-30 8:15 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 4/7] applier: remove excess last_row_time update from subscribe loop Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 5/7] applier: make final join transactional Serge Petrenko via Tarantool-patches 2021-03-26 20:49 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 19:05 ` Serge Petrenko via Tarantool-patches 2021-03-29 21:51 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-30 8:15 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 6/7] replication: tolerate synchro rollback during final join Serge Petrenko via Tarantool-patches 2021-03-24 12:45 ` Serge Petrenko via Tarantool-patches 2021-03-26 20:49 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 19:23 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 7/7] replication: do not ignore replica vclock on register Serge Petrenko via Tarantool-patches 2021-03-26 20:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 20:13 ` Serge Petrenko via Tarantool-patches 2021-03-29 21:51 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-30 8:16 ` Serge Petrenko via Tarantool-patches 2021-03-30 12:33 ` Serge Petrenko via Tarantool-patches 2021-03-26 13:46 ` [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Cyrill Gorcunov via Tarantool-patches 2021-03-30 20:13 ` Vladislav Shpilevoy via Tarantool-patches 2021-04-05 16:15 ` Kirill Yukhin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=a970ecab-3f50-4688-dd0e-842524c89c1c@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox