[Tarantool-patches] [PATCH v2 1/2] wal: fix tx boundaries

Serge Petrenko sergepetrenko at tarantool.org
Mon May 25 13:58:55 MSK 2020


In order to preserve transaction boundaries in replication protocol, wal
assigns each tx row a transaction sequence number (tsn). Tsn is equal to
the lsn of the first transaction row.

Starting with commit 7eb4650eecf1ac382119d0038076c19b2708f4a1, local
space requests are assigned a special replica id, 0, and have their own
lsns. These operations are not replicated.

If a transaction starting with a local space operation ends up in the
WAL, it gets a tsn equal to the lsn of the local space request. Then,
during replication, when such a transaction is replicated, the local
space request is omitted, and replica receives a global part of the
transaction with a seemingly random tsn, yielding an ER_PROTOCOL error:
"Transaction id must be equal to LSN of the first row in the transaction".

Assign tsn as equal to the lsn of the first global row in the
transaction to fix the problem, and assign tsn as before for fully local
transactions.

Follow-up #4114
Part-of #4928

Reviewed-by: Cyrill Gorcunov <gorcunov at gmail.com>
---
 src/box/wal.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/src/box/wal.c b/src/box/wal.c
index b979244e3..ef4d84920 100644
--- a/src/box/wal.c
+++ b/src/box/wal.c
@@ -956,25 +956,37 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 	       struct xrow_header **end)
 {
 	int64_t tsn = 0;
+	struct xrow_header **start = row;
+	struct xrow_header **first_glob_row = row;
 	/** Assign LSN to all local rows. */
 	for ( ; row < end; row++) {
 		if ((*row)->replica_id == 0) {
 			/*
 			 * All rows representing local space data
-			 * manipulations are signed wth a zero
+			 * manipulations are signed with a zero
 			 * instance id. This is also true for
 			 * anonymous replicas, since they are
 			 * only capable of writing to local and
 			 * temporary spaces.
 			 */
-			if ((*row)->group_id != GROUP_LOCAL)
+			if ((*row)->group_id != GROUP_LOCAL) {
 				(*row)->replica_id = instance_id;
+			}
 
 			(*row)->lsn = vclock_inc(vclock_diff, (*row)->replica_id) +
 				      vclock_get(base, (*row)->replica_id);
-			/* Use lsn of the first local row as transaction id. */
-			tsn = tsn == 0 ? (*row)->lsn : tsn;
-			(*row)->tsn = tsn;
+			/*
+			 * Use lsn of the first global row as
+			 * transaction id.
+			 */
+			if ((*row)->group_id != GROUP_LOCAL && tsn == 0) {
+				tsn = (*row)->lsn;
+				/*
+				 * Remember the tail being processed.
+				 */
+				first_glob_row = row;
+			}
+			(*row)->tsn = tsn == 0 ? (*start)->lsn : tsn;
 			(*row)->is_commit = row == end - 1;
 		} else {
 			int64_t diff = (*row)->lsn - vclock_get(base, (*row)->replica_id);
@@ -993,6 +1005,14 @@ wal_assign_lsn(struct vclock *vclock_diff, struct vclock *base,
 			}
 		}
 	}
+
+	/*
+	 * Fill transaction id for all the local rows preceding
+	 * the first global row. tsn was yet unknown when those
+	 * rows were processed.
+	 */
+	for (row = start; row < first_glob_row; row++)
+		(*row)->tsn = tsn;
 }
 
 static void
-- 
2.24.2 (Apple Git-127)



More information about the Tarantool-patches mailing list