[Tarantool-patches] [PATCH] applier: fix tx boundary check for half-applied txns

Serge Petrenko sergepetrenko at tarantool.org
Sun Jul 5 14:57:49 MSK 2020


In case there are 2 "new" instances, running tarantool 2.2+,
master and replica, and one "old" instance, running an earlier tarantool
version, in a full-mesh cluster, it may happen that the "new" replica
receives part of a tx from an "old" instance, and the remaining part
from a "new" instance.

Since "new" instances preserve tx boundaries, "new" replica would skip
the tx remains assuming it has already applied the full tx if it has
applied the first tx row. This leads to gaps in "new" replica's WAL and
to skipping the remaining part of the tx forever.

Fix this behaviour to apply the full tx even if it's beginning is
already applied in mixed clusters.

Closes #5125
---

https://github.com/tarantool/tarantool/issues/5125
https://github.com/tarantool/tarantool/tree/sp/gh-5125-applier-tx-boundaries

 src/box/applier.cc | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/box/applier.cc b/src/box/applier.cc
index df48b4796..6ca4cca94 100644
--- a/src/box/applier.cc
+++ b/src/box/applier.cc
@@ -737,6 +737,7 @@ applier_apply_tx(struct stailq *rows)
 	struct xrow_header *first_row = &stailq_first_entry(rows,
 					struct applier_tx_row, next)->row;
 	struct xrow_header *last_row;
+	last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row;
 	struct replica *replica = replica_by_id(first_row->replica_id);
 	/*
 	 * In a full mesh topology, the same set of changes
@@ -748,9 +749,28 @@ applier_apply_tx(struct stailq *rows)
 			       &replicaset.applier.order_latch);
 	latch_lock(latch);
 	if (vclock_get(&replicaset.applier.vclock,
-		       first_row->replica_id) >= first_row->lsn) {
+		       last_row->replica_id) >= last_row->lsn) {
 		latch_unlock(latch);
 		return 0;
+	} else if (vclock_get(&replicaset.applier.vclock,
+			      first_row->replica_id) >= first_row->lsn) {
+		/*
+		 * We've received part of the tx from an old
+		 * instance not knowing of tx boundaries.
+		 * Skip the already applied part.
+		 */
+		struct xrow_header *tmp;
+		while (true) {
+			tmp = &stailq_first_entry(rows,
+						  struct applier_tx_row,
+						  next)->row;
+			if (tmp->lsn <= vclock_get(&replicaset.applier.vclock,
+						   tmp->replica_id)) {
+				stailq_shift(rows);
+			} else {
+				break;
+			}
+		}
 	}
 
 	/**
@@ -835,7 +855,6 @@ applier_apply_tx(struct stailq *rows)
 	 * instances, which send every single tx row as a separate
 	 * transaction.
 	 */
-	last_row = &stailq_last_entry(rows, struct applier_tx_row, next)->row;
 	vclock_follow(&replicaset.applier.vclock, last_row->replica_id,
 		      last_row->lsn);
 	latch_unlock(latch);
-- 
2.24.3 (Apple Git-128)



More information about the Tarantool-patches mailing list