Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko via Tarantool-patches <tarantool-patches@dev.tarantool.org>
To: v.shpilevoy@tarantool.org, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry
Date: Wed, 24 Mar 2021 15:24:11 +0300	[thread overview]
Message-ID: <12bf66a77b755eaadc09665ede9fbcde0516a7a4.1616588119.git.sergepetrenko@tarantool.org> (raw)
In-Reply-To: <cover.1616588119.git.sergepetrenko@tarantool.org>

Since the introduction of synchronous replication it became possible for
final join to fail on master side due to not being able to gather acks
for some tx around _cluster registration.

A replica receives an error in this case: either ER_SYNC_ROLLBACK or
ER_SYNC_QUORUM_TIMEOUT. The errors lead to applier retrying final join,
but with wrong state, APPLIER_REGISTER, which should be used only on an
anonymous replica. This lead to a hang in fiber executing box.cfg,
because it waited for APPLIER_JOINED state, which was never entered.

Part-of #5566
---
 src/box/applier.cc | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index 5a88a013e..326cf18d2 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -551,7 +551,7 @@ applier_wait_register(struct applier *applier, uint64_t row_count)
 }
 
 static void
-applier_register(struct applier *applier)
+applier_register(struct applier *applier, bool was_anon)
 {
 	/* Send REGISTER request */
 	struct ev_io *coio = &applier->io;
@@ -566,9 +566,16 @@ applier_register(struct applier *applier)
 	row.type = IPROTO_REGISTER;
 	coio_write_xrow(coio, &row);
 
-	applier_set_state(applier, APPLIER_REGISTER);
+	/*
+	 * Register may serve as a retry for final join. Set corresponding
+	 * states to unblock anyone who's waiting for final join to start or
+	 * end.
+	 */
+	applier_set_state(applier, was_anon ? APPLIER_REGISTER :
+					      APPLIER_FINAL_JOIN);
 	applier_wait_register(applier, 0);
-	applier_set_state(applier, APPLIER_REGISTERED);
+	applier_set_state(applier, was_anon ? APPLIER_REGISTERED :
+					      APPLIER_JOINED);
 	applier_set_state(applier, APPLIER_READY);
 }
 
@@ -1303,6 +1310,14 @@ applier_f(va_list ap)
 		return -1;
 	session_set_type(session, SESSION_TYPE_APPLIER);
 
+	/*
+	 * The instance saves replication_anon value on bootstrap.
+	 * If a freshly started instance sees it has received
+	 * REPLICASET_UUID but hasn't yet registered, it must be an
+	 * anonymous replica, hence the default value 'true'.
+	 */
+	bool was_anon = true;
+
 	/* Re-connect loop */
 	while (!fiber_is_cancelled()) {
 		try {
@@ -1316,6 +1331,7 @@ applier_f(va_list ap)
 				 * The join will pause the applier
 				 * until WAL is created.
 				 */
+				was_anon = replication_anon;
 				if (replication_anon)
 					applier_fetch_snapshot(applier);
 				else
@@ -1324,11 +1340,10 @@ applier_f(va_list ap)
 			if (instance_id == REPLICA_ID_NIL &&
 			    !replication_anon) {
 				/*
-				 * The instance transitioned
-				 * from anonymous. Register it
-				 * now.
+				 * The instance transitioned from anonymous or
+				 * is retrying final join.
 				 */
-				applier_register(applier);
+				applier_register(applier, was_anon);
 			}
 			applier_subscribe(applier);
 			/*
-- 
2.24.3 (Apple Git-128)


  reply	other threads:[~2021-03-24 12:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 12:24 [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` Serge Petrenko via Tarantool-patches [this message]
2021-03-26 20:44   ` [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry Vladislav Shpilevoy via Tarantool-patches
2021-03-27 16:52     ` Serge Petrenko via Tarantool-patches
2021-03-29 21:50       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 2/7] applier: extract tx boundary checks from applier_read_tx into a separate routine Serge Petrenko via Tarantool-patches
2021-03-26 12:35   ` Cyrill Gorcunov via Tarantool-patches
2021-03-27 16:54     ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 3/7] applier: extract plain tx application from applier_apply_tx() Serge Petrenko via Tarantool-patches
2021-03-26 20:47   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 17:34     ` Serge Petrenko via Tarantool-patches
2021-03-27 18:30   ` [Tarantool-patches] [PATCH v2 3.5/7] applier: fix not releasing the latch on apply_synchro_row() fail Serge Petrenko via Tarantool-patches
2021-03-29 21:50     ` Vladislav Shpilevoy via Tarantool-patches
2021-03-30  8:15       ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 4/7] applier: remove excess last_row_time update from subscribe loop Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 5/7] applier: make final join transactional Serge Petrenko via Tarantool-patches
2021-03-26 20:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 19:05     ` Serge Petrenko via Tarantool-patches
2021-03-29 21:51       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-30  8:15         ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 6/7] replication: tolerate synchro rollback during final join Serge Petrenko via Tarantool-patches
2021-03-24 12:45   ` Serge Petrenko via Tarantool-patches
2021-03-26 20:49   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 19:23     ` Serge Petrenko via Tarantool-patches
2021-03-24 12:24 ` [Tarantool-patches] [PATCH v2 7/7] replication: do not ignore replica vclock on register Serge Petrenko via Tarantool-patches
2021-03-26 20:50   ` Vladislav Shpilevoy via Tarantool-patches
2021-03-27 20:13     ` Serge Petrenko via Tarantool-patches
2021-03-29 21:51       ` Vladislav Shpilevoy via Tarantool-patches
2021-03-30  8:16         ` Serge Petrenko via Tarantool-patches
2021-03-30 12:33       ` Serge Petrenko via Tarantool-patches
2021-03-26 13:46 ` [Tarantool-patches] [PATCH v2 0/7] applier: handle synchronous transactions during final Cyrill Gorcunov via Tarantool-patches
2021-03-30 20:13 ` Vladislav Shpilevoy via Tarantool-patches
2021-04-05 16:15 ` Kirill Yukhin via Tarantool-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12bf66a77b755eaadc09665ede9fbcde0516a7a4.1616588119.git.sergepetrenko@tarantool.org \
    --to=tarantool-patches@dev.tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=sergepetrenko@tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 1/7] replication: fix a hang on final join retry' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox