Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: v.shpilevoy@tarantool.org, gorcunov@gmail.com
Cc: tarantool-patches@dev.tarantool.org
Subject: [Tarantool-patches] [PATCH v2 6/6] txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master
Date: Wed, 23 Dec 2020 14:59:24 +0300	[thread overview]
Message-ID: <b21832c41b36f61f98a59f39a4a26abf2d449654.1608724239.git.sergepetrenko@tarantool.org> (raw)
In-Reply-To: <cover.1608724238.git.sergepetrenko@tarantool.org>

We designed limbo so that it errors on receiving a CONFIRM or ROLLBACK
for other instance's data. Actually, this error is pointless, and even
harmful. Here's why:

Imagine you have 3 instances, 1, 2 and 3.
First 1 writes some synchronous transactions, but dies before writing CONFIRM.

Now 2 has to write CONFIRM instead of 1 to take limbo ownership.
From now on 2 is the limbo owner and in case of high enough load it constantly
has some data in the limbo.

Once 1 restarts, it first recovers its xlogs, and fills its limbo with
its own unconfirmed transactions from the previous run. Now replication
between 1, 2 and 3 is started and the first thing 1 sees is that 2 and 3
ack its old transactions. So 1 writes CONFIRM for its own transactions
even before the same CONFIRM written by 2 reaches it.
Once the CONFIRM written by 1 is replicated to 2 and 3 they error and
stop replication, since their limbo contains entries from 2, not from 1.
Actually, there's no need to error, since it's just a really old CONFIRM
which's already processed by both 2 and 3.

So, ignore CONFIRM/ROLLBACK when it references a wrong limbo owner.

The issue was discovered with test replication/election_qsync_stress.

Follow-up #5435
---
 src/box/applier.cc  |  3 +--
 src/box/box.cc      |  3 +--
 src/box/txn_limbo.c | 14 +++++++++-----
 src/box/txn_limbo.h |  2 +-
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index fb2f5d130..553db76fc 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -861,8 +861,7 @@ apply_synchro_row(struct xrow_header *row)
 	if (xrow_decode_synchro(row, &req) != 0)
 		goto err;
 
-	if (txn_limbo_process(&txn_limbo, &req))
-		goto err;
+	txn_limbo_process(&txn_limbo, &req);
 
 	struct synchro_entry *entry;
 	entry = synchro_entry_new(row, &req);
diff --git a/src/box/box.cc b/src/box/box.cc
index 38bf4034e..fc4888955 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -383,8 +383,7 @@ apply_wal_row(struct xstream *stream, struct xrow_header *row)
 		struct synchro_request syn_req;
 		if (xrow_decode_synchro(row, &syn_req) != 0)
 			diag_raise();
-		if (txn_limbo_process(&txn_limbo, &syn_req) != 0)
-			diag_raise();
+		txn_limbo_process(&txn_limbo, &syn_req);
 		return;
 	}
 	if (iproto_type_is_raft_request(row->type)) {
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 9272f5227..9498c7a44 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -634,13 +634,17 @@ complete:
 	return 0;
 }
 
-int
+void
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 {
 	if (req->replica_id != limbo->owner_id) {
-		diag_set(ClientError, ER_SYNC_MASTER_MISMATCH,
-			 req->replica_id, limbo->owner_id);
-		return -1;
+		/*
+		 * Ignore CONFIRM/ROLLBACK messages for a foreign master.
+		 * These are most likely outdated messages for already confirmed
+		 * data from an old leader, who has just started and written
+		 * confirm right on synchronous transaction recovery.
+		 */
+		return;
 	}
 	switch (req->type) {
 	case IPROTO_CONFIRM:
@@ -652,7 +656,7 @@ txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req)
 	default:
 		unreachable();
 	}
-	return 0;
+	return;
 }
 
 void
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index a49356c14..c28b5666d 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -257,7 +257,7 @@ int
 txn_limbo_wait_complete(struct txn_limbo *limbo, struct txn_limbo_entry *entry);
 
 /** Execute a synchronous replication request. */
-int
+void
 txn_limbo_process(struct txn_limbo *limbo, const struct synchro_request *req);
 
 /**
-- 
2.24.3 (Apple Git-128)

  parent reply	other threads:[~2020-12-23 11:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-23 11:59 [Tarantool-patches] [PATCH v2 0/6] make clear_synchro_queue commit everything Serge Petrenko
2020-12-23 11:59 ` [Tarantool-patches] [PATCH v2 1/6] box: add a single execution guard to clear_synchro_queue Serge Petrenko
2020-12-23 11:59 ` [Tarantool-patches] [PATCH v2 2/6] relay: introduce on_status_update trigger Serge Petrenko
2020-12-23 17:25   ` Vladislav Shpilevoy
2020-12-24 16:11     ` Serge Petrenko
2020-12-23 11:59 ` [Tarantool-patches] [PATCH v2 3/6] txn_limbo: introduce txn_limbo_last_synchro_entry method Serge Petrenko
2020-12-23 17:25   ` Vladislav Shpilevoy
2020-12-24 16:13     ` Serge Petrenko
2020-12-23 11:59 ` [Tarantool-patches] [PATCH v2 4/6] box: rework clear_synchro_queue to commit everything Serge Petrenko
2020-12-23 17:28   ` Vladislav Shpilevoy
2020-12-24 16:12     ` Serge Petrenko
2020-12-24 17:35       ` Vladislav Shpilevoy
2020-12-24 21:02         ` Serge Petrenko
2020-12-23 11:59 ` [Tarantool-patches] [PATCH v2 5/6] test: fix replication/election_qsync_stress test Serge Petrenko
2020-12-23 11:59 ` Serge Petrenko [this message]
2020-12-23 17:28   ` [Tarantool-patches] [PATCH v2 6/6] txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master Vladislav Shpilevoy
2020-12-24 16:13     ` Serge Petrenko
2020-12-25 10:04 ` [Tarantool-patches] [PATCH v2 0/6] make clear_synchro_queue commit everything Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b21832c41b36f61f98a59f39a4a26abf2d449654.1608724239.git.sergepetrenko@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=gorcunov@gmail.com \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH v2 6/6] txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox