From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org> To: v.shpilevoy@tarantool.org, gorcunov@gmail.com Cc: tarantool-patches@dev.tarantool.org Subject: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry Date: Wed, 24 Mar 2021 15:24:11 +0300 [thread overview] Message-ID: <12bf66a77b755eaadc09665ede9fbcde0516a7a4.1616588119.git.sergepetrenko@tarantool.org> (raw) In-Reply-To: <cover.1616588119.git.sergepetrenko@tarantool.org> Since the introduction of synchronous replication it became possible for final join to fail on master side due to not being able to gather acks for some tx around _cluster registration. A replica receives an error in this case: either ER_SYNC_ROLLBACK or ER_SYNC_QUORUM_TIMEOUT. The errors lead to applier retrying final join, but with wrong state, APPLIER_REGISTER, which should be used only on an anonymous replica. This lead to a hang in fiber executing box.cfg, because it waited for APPLIER_JOINED state, which was never entered. Part-of #5566 --- src/box/applier.cc | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/src/box/applier.cc b/src/box/applier.cc index 5a88a013e..326cf18d2 100644 --- a/src/box/applier.cc +++ b/src/box/applier.cc @@ -551,7 +551,7 @@ applier_wait_register(struct applier *applier, uint64_t row_count) } static void -applier_register(struct applier *applier) +applier_register(struct applier *applier, bool was_anon) { /* Send REGISTER request */ struct ev_io *coio = &applier->io; @@ -566,9 +566,16 @@ applier_register(struct applier *applier) row.type = IPROTO_REGISTER; coio_write_xrow(coio, &row); - applier_set_state(applier, APPLIER_REGISTER); + /* + * Register may serve as a retry for final join. Set corresponding + * states to unblock anyone who's waiting for final join to start or + * end. + */ + applier_set_state(applier, was_anon ? APPLIER_REGISTER : + APPLIER_FINAL_JOIN); applier_wait_register(applier, 0); - applier_set_state(applier, APPLIER_REGISTERED); + applier_set_state(applier, was_anon ? APPLIER_REGISTERED : + APPLIER_JOINED); applier_set_state(applier, APPLIER_READY); } @@ -1303,6 +1310,14 @@ applier_f(va_list ap) return -1; session_set_type(session, SESSION_TYPE_APPLIER); + /* + * The instance saves replication_anon value on bootstrap. + * If a freshly started instance sees it has received + * REPLICASET_UUID but hasn't yet registered, it must be an + * anonymous replica, hence the default value 'true'. + */ + bool was_anon = true; + /* Re-connect loop */ while (!fiber_is_cancelled()) { try { @@ -1316,6 +1331,7 @@ applier_f(va_list ap) * The join will pause the applier * until WAL is created. */ + was_anon = replication_anon; if (replication_anon) applier_fetch_snapshot(applier); else @@ -1324,11 +1340,10 @@ applier_f(va_list ap) if (instance_id == REPLICA_ID_NIL && !replication_anon) { /* - * The instance transitioned - * from anonymous. Register it - * now. + * The instance transitioned from anonymous or + * is retrying final join. */ - applier_register(applier); + applier_register(applier, was_anon); } applier_subscribe(applier); /* -- 2.24.3 (Apple Git-128)
next prev parent reply other threads:[~2021-03-24 12:24 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-03-24 12:24 [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` Serge Petrenko via Tarantool-patches [this message] 2021-03-26 20:44 ` [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry Vladislav Shpilevoy via Tarantool-patches 2021-03-27 16:52 ` Serge Petrenko via Tarantool-patches 2021-03-29 21:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 2/7] applier: extract tx boundary checks from applier_read_tx into a separate routine Serge Petrenko via Tarantool-patches 2021-03-26 12:35 ` Cyrill Gorcunov via Tarantool-patches 2021-03-27 16:54 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 3/7] applier: extract plain tx application from applier_apply_tx() Serge Petrenko via Tarantool-patches 2021-03-26 20:47 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 17:34 ` Serge Petrenko via Tarantool-patches 2021-03-27 18:30 ` [Tarantool-patches] [PATCH v2 3.5/7] applier: fix not releasing the latch on apply_synchro_row() fail Serge Petrenko via Tarantool-patches 2021-03-29 21:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-30 8:15 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 4/7] applier: remove excess last_row_time update from subscribe loop Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 5/7] applier: make final join transactional Serge Petrenko via Tarantool-patches 2021-03-26 20:49 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 19:05 ` Serge Petrenko via Tarantool-patches 2021-03-29 21:51 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-30 8:15 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 6/7] replication: tolerate synchro rollback during final join Serge Petrenko via Tarantool-patches 2021-03-24 12:45 ` Serge Petrenko via Tarantool-patches 2021-03-26 20:49 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 19:23 ` Serge Petrenko via Tarantool-patches 2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 7/7] replication: do not ignore replica vclock on register Serge Petrenko via Tarantool-patches 2021-03-26 20:50 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-27 20:13 ` Serge Petrenko via Tarantool-patches 2021-03-29 21:51 ` Vladislav Shpilevoy via Tarantool-patches 2021-03-30 8:16 ` Serge Petrenko via Tarantool-patches 2021-03-30 12:33 ` Serge Petrenko via Tarantool-patches 2021-03-26 13:46 ` [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Cyrill Gorcunov via Tarantool-patches 2021-03-30 20:13 ` Vladislav Shpilevoy via Tarantool-patches 2021-04-05 16:15 ` Kirill Yukhin via Tarantool-patches
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=12bf66a77b755eaadc09665ede9fbcde0516a7a4.1616588119.git.sergepetrenko@tarantool.org \ --to=tarantool-patches@dev.tarantool.org \ --cc=gorcunov@gmail.com \ --cc=sergepetrenko@tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox