From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <v.shpilevoy@tarantool.org>
Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id E2C40469719
 for <tarantool-patches@dev.tarantool.org>;
 Sun, 13 Sep 2020 19:01:52 +0300 (MSK)
From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Date: Sun, 13 Sep 2020 18:01:49 +0200
Message-Id: <84df265021fd31cffd1bda22dd7ecabe2ba51284.1600012369.git.v.shpilevoy@tarantool.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: [Tarantool-patches] [PATCH 1/1] limbo: handle artificial snapshot
	LSNs
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org, gorcunov@gmail.com, korablev@tarantool.org

Snapshot rows contain not real LSNs. Instead their LSNs are
signatures, ordinal numbers. Rows in the snap have LSNs from 1 to
the number of rows. This is because LSNs are not stored with every
tuple in the storages, and there is no way to store real LSNs in
the snapshot.

These artificial LSNs broke the synchronous replication limbo.
After snap recovery is done, limbo vclock was broken - it
contained numbers not related to reality, and affected by rows
from local spaces.

Another issue is that during snap recovery all memtx rows were
assigned the same LSN - signature of the instance. Having rows
from different transactions with the same LSN didn't affect the
limbo in a healthy way either.

This patch fixes the sync rows recovery. Firstly, snap rows don't
get the same LSN. They use whatever was stored in their LSN field
during snapshot creation. Secondly, the limbo now clears its LSNs
after the snapshot is recovered.

The limbo state clearance is totally fine because it does not need
to start .xlog recovery and further normal operation from a
correct instance vclock. It only needs to have newer rows have
newer vclock, that is it.

Closes #5298
---
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5298-qsync-recovery-snap
Issue: https://github.com/tarantool/tarantool/issues/5298

@ChangeLog
* Having synchronous rows in the snapshot could make the instance hang on recovery (gh-5298).

I am not entirely sure I understand why did we set row LSN to the signature
value in memtx recovery. I failed to find a reason for that. Vladimir added it
in https://github.com/tarantool/tarantool/commit/fd99d3abe38c52e1f53454e8d9b4ed45f36dbe44, but it looks like refactoring, not a
functional change.

AFAIU, vinyl may use it for some vylog accounting for DDL, because vinyl schema
is stored in memtx snapshot (spaces, indexes), but doesn't it rely on txn
signature instead of row LSN?

The Vladimir's commit is about
https://github.com/tarantool/tarantool/issues/2536, and the test in the issue
passes without the removed line.

I added Nikita in order to take a look at the vinyl-related part.

 src/box/box.cc                                |   7 ++
 src/box/memtx_engine.c                        |   1 -
 src/box/txn_limbo.c                           |  23 ++++
 src/box/txn_limbo.h                           |   8 ++
 .../gh-5298-qsync-recovery-snap.result        | 100 ++++++++++++++++++
 .../gh-5298-qsync-recovery-snap.test.lua      |  48 +++++++++
 6 files changed, 186 insertions(+), 1 deletion(-)
 create mode 100644 test/replication/gh-5298-qsync-recovery-snap.result
 create mode 100644 test/replication/gh-5298-qsync-recovery-snap.test.lua

diff --git a/src/box/box.cc b/src/box/box.cc
index faffd5769..ab813e452 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -2445,6 +2445,13 @@ local_recovery(const struct tt_uuid *instance_uuid,
 	 * other engines.
 	 */
 	memtx_engine_recover_snapshot_xc(memtx, checkpoint_vclock);
+	/*
+	 * Snapshot does not store correct LSNs next to rows. That is fine as
+	 * long as they are monotonically growing, but after the recovery is
+	 * done, the limbo should start working with real LSNs. Let the limbo
+	 * know the snap recovery is done in order to prepare to real LSNs.
+	 */
+	txn_limbo_on_snapshot_recovery(&txn_limbo);
 
 	engine_begin_final_recovery_xc();
 	recover_remaining_wals(recovery, &wal_stream.base, NULL, false);
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 9f079a6b5..e9d10f416 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -170,7 +170,6 @@ memtx_engine_recover_snapshot(struct memtx_engine *memtx,
 	uint64_t row_count = 0;
 	while ((rc = xlog_cursor_next(&cursor, &row,
 				      memtx->force_recovery)) == 0) {
-		row.lsn = signature;
 		rc = memtx_engine_recover_snapshot_row(memtx, &row);
 		if (rc < 0) {
 			if (!memtx->force_recovery)
diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c
index 3655338d8..21c8497db 100644
--- a/src/box/txn_limbo.c
+++ b/src/box/txn_limbo.c
@@ -662,6 +662,29 @@ txn_limbo_on_parameters_change(struct txn_limbo *limbo)
 	fiber_cond_broadcast(&limbo->wait_cond);
 }
 
+void
+txn_limbo_on_snapshot_recovery(struct txn_limbo *limbo)
+{
+	/*
+	 * Reset vclock to 0, because
+	 *
+	 * - Snapshot rows contain signatures instead of vclocks. So if any of
+	 *   them belonged to sync spaces, the limbo vclock currently is broken.
+	 *
+	 * - All snapshot rows are confirmed by design, so it is not possible
+	 *   that some of these rows need to wait for more ACKs. After snapshot
+	 *   is done, the limbo should be empty.
+	 *
+	 * - For the limbo concrete vclock values don't matter. Only their
+	 *   relative difference matters. So even if all vclocks here are 0, it
+	 *   is not important as long as next rows in .xlog files and newly
+	 *   created rows contain valid LSNs and replica IDs.
+	 */
+	vclock_create(&limbo->vclock);
+	limbo->confirmed_lsn = 0;
+	assert(txn_limbo_is_empty(limbo));
+}
+
 void
 txn_limbo_init(void)
 {
diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h
index eaf662987..26399ba7f 100644
--- a/src/box/txn_limbo.h
+++ b/src/box/txn_limbo.h
@@ -273,6 +273,14 @@ txn_limbo_force_empty(struct txn_limbo *limbo, int64_t last_confirm);
 void
 txn_limbo_on_parameters_change(struct txn_limbo *limbo);
 
+/**
+ * Let the limbo know a snapshot is recovered. The limbo should take a few
+ * actions related to artificial vclocks cleanup, because the snaps does not
+ * store real LSNs.
+ */
+void
+txn_limbo_on_snapshot_recovery(struct txn_limbo *limbo);
+
 /**
  * Initialize qsync engine.
  */
diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
new file mode 100644
index 000000000..922831552
--- /dev/null
+++ b/test/replication/gh-5298-qsync-recovery-snap.result
@@ -0,0 +1,100 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+--
+-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
+-- All the same LSN. For example, signature at the moment of recovery start
+-- was 100. Then all rows from the snap got their LSN = 100. That broke the
+-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
+--
+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+for i = 1, 10 do box.space.sync:replace{i} end
+ | ---
+ | ...
+
+-- Local rows could affect this by increasing the signature.
+_ = box.schema.space.create('loc', {is_local = true, engine = engine})
+ | ---
+ | ...
+_ = box.space.loc:create_index('pk')
+ | ---
+ | ...
+for i = 1, 10 do box.space.loc:replace{i} end
+ | ---
+ | ...
+
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+test_run:cmd("restart server default")
+ | 
+
+-- Could hang if the limbo would incorrectly handle the snapshot end.
+box.space.sync:replace{11}
+ | ---
+ | - [11]
+ | ...
+
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+old_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 0.001,                                        \
+    replication_synchro_quorum = 2,                                             \
+}
+ | ---
+ | ...
+box.space.sync:replace{12}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 1000,                                         \
+    replication_synchro_quorum = 1,                                             \
+}
+ | ---
+ | ...
+box.space.sync:replace{13}
+ | ---
+ | - [13]
+ | ...
+box.space.sync:get({11})
+ | ---
+ | - [11]
+ | ...
+box.space.sync:get({12})
+ | ---
+ | ...
+box.space.sync:get({13})
+ | ---
+ | - [13]
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+}
+ | ---
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.space.loc:drop()
+ | ---
+ | ...
diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
new file mode 100644
index 000000000..187f60d75
--- /dev/null
+++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
@@ -0,0 +1,48 @@
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+--
+-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
+-- All the same LSN. For example, signature at the moment of recovery start
+-- was 100. Then all rows from the snap got their LSN = 100. That broke the
+-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
+--
+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
+_ = box.space.sync:create_index('pk')
+for i = 1, 10 do box.space.sync:replace{i} end
+
+-- Local rows could affect this by increasing the signature.
+_ = box.schema.space.create('loc', {is_local = true, engine = engine})
+_ = box.space.loc:create_index('pk')
+for i = 1, 10 do box.space.loc:replace{i} end
+
+box.snapshot()
+
+test_run:cmd("restart server default")
+
+-- Could hang if the limbo would incorrectly handle the snapshot end.
+box.space.sync:replace{11}
+
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+old_synchro_timeout = box.cfg.replication_synchro_timeout
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 0.001,                                        \
+    replication_synchro_quorum = 2,                                             \
+}
+box.space.sync:replace{12}
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 1000,                                         \
+    replication_synchro_quorum = 1,                                             \
+}
+box.space.sync:replace{13}
+box.space.sync:get({11})
+box.space.sync:get({12})
+box.space.sync:get({13})
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+}
+box.space.sync:drop()
+box.space.loc:drop()
-- 
2.21.1 (Apple Git-122.3)