From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp33.i.mail.ru (smtp33.i.mail.ru [94.100.177.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E2C40469719 for ; Sun, 13 Sep 2020 19:01:52 +0300 (MSK) From: Vladislav Shpilevoy Date: Sun, 13 Sep 2020 18:01:49 +0200 Message-Id: <84df265021fd31cffd1bda22dd7ecabe2ba51284.1600012369.git.v.shpilevoy@tarantool.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH 1/1] limbo: handle artificial snapshot LSNs List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: tarantool-patches@dev.tarantool.org, sergepetrenko@tarantool.org, gorcunov@gmail.com, korablev@tarantool.org Snapshot rows contain not real LSNs. Instead their LSNs are signatures, ordinal numbers. Rows in the snap have LSNs from 1 to the number of rows. This is because LSNs are not stored with every tuple in the storages, and there is no way to store real LSNs in the snapshot. These artificial LSNs broke the synchronous replication limbo. After snap recovery is done, limbo vclock was broken - it contained numbers not related to reality, and affected by rows from local spaces. Another issue is that during snap recovery all memtx rows were assigned the same LSN - signature of the instance. Having rows from different transactions with the same LSN didn't affect the limbo in a healthy way either. This patch fixes the sync rows recovery. Firstly, snap rows don't get the same LSN. They use whatever was stored in their LSN field during snapshot creation. Secondly, the limbo now clears its LSNs after the snapshot is recovered. The limbo state clearance is totally fine because it does not need to start .xlog recovery and further normal operation from a correct instance vclock. It only needs to have newer rows have newer vclock, that is it. Closes #5298 --- Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5298-qsync-recovery-snap Issue: https://github.com/tarantool/tarantool/issues/5298 @ChangeLog * Having synchronous rows in the snapshot could make the instance hang on recovery (gh-5298). I am not entirely sure I understand why did we set row LSN to the signature value in memtx recovery. I failed to find a reason for that. Vladimir added it in https://github.com/tarantool/tarantool/commit/fd99d3abe38c52e1f53454e8d9b4ed45f36dbe44, but it looks like refactoring, not a functional change. AFAIU, vinyl may use it for some vylog accounting for DDL, because vinyl schema is stored in memtx snapshot (spaces, indexes), but doesn't it rely on txn signature instead of row LSN? The Vladimir's commit is about https://github.com/tarantool/tarantool/issues/2536, and the test in the issue passes without the removed line. I added Nikita in order to take a look at the vinyl-related part. src/box/box.cc | 7 ++ src/box/memtx_engine.c | 1 - src/box/txn_limbo.c | 23 ++++ src/box/txn_limbo.h | 8 ++ .../gh-5298-qsync-recovery-snap.result | 100 ++++++++++++++++++ .../gh-5298-qsync-recovery-snap.test.lua | 48 +++++++++ 6 files changed, 186 insertions(+), 1 deletion(-) create mode 100644 test/replication/gh-5298-qsync-recovery-snap.result create mode 100644 test/replication/gh-5298-qsync-recovery-snap.test.lua diff --git a/src/box/box.cc b/src/box/box.cc index faffd5769..ab813e452 100644 --- a/src/box/box.cc +++ b/src/box/box.cc @@ -2445,6 +2445,13 @@ local_recovery(const struct tt_uuid *instance_uuid, * other engines. */ memtx_engine_recover_snapshot_xc(memtx, checkpoint_vclock); + /* + * Snapshot does not store correct LSNs next to rows. That is fine as + * long as they are monotonically growing, but after the recovery is + * done, the limbo should start working with real LSNs. Let the limbo + * know the snap recovery is done in order to prepare to real LSNs. + */ + txn_limbo_on_snapshot_recovery(&txn_limbo); engine_begin_final_recovery_xc(); recover_remaining_wals(recovery, &wal_stream.base, NULL, false); diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 9f079a6b5..e9d10f416 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -170,7 +170,6 @@ memtx_engine_recover_snapshot(struct memtx_engine *memtx, uint64_t row_count = 0; while ((rc = xlog_cursor_next(&cursor, &row, memtx->force_recovery)) == 0) { - row.lsn = signature; rc = memtx_engine_recover_snapshot_row(memtx, &row); if (rc < 0) { if (!memtx->force_recovery) diff --git a/src/box/txn_limbo.c b/src/box/txn_limbo.c index 3655338d8..21c8497db 100644 --- a/src/box/txn_limbo.c +++ b/src/box/txn_limbo.c @@ -662,6 +662,29 @@ txn_limbo_on_parameters_change(struct txn_limbo *limbo) fiber_cond_broadcast(&limbo->wait_cond); } +void +txn_limbo_on_snapshot_recovery(struct txn_limbo *limbo) +{ + /* + * Reset vclock to 0, because + * + * - Snapshot rows contain signatures instead of vclocks. So if any of + * them belonged to sync spaces, the limbo vclock currently is broken. + * + * - All snapshot rows are confirmed by design, so it is not possible + * that some of these rows need to wait for more ACKs. After snapshot + * is done, the limbo should be empty. + * + * - For the limbo concrete vclock values don't matter. Only their + * relative difference matters. So even if all vclocks here are 0, it + * is not important as long as next rows in .xlog files and newly + * created rows contain valid LSNs and replica IDs. + */ + vclock_create(&limbo->vclock); + limbo->confirmed_lsn = 0; + assert(txn_limbo_is_empty(limbo)); +} + void txn_limbo_init(void) { diff --git a/src/box/txn_limbo.h b/src/box/txn_limbo.h index eaf662987..26399ba7f 100644 --- a/src/box/txn_limbo.h +++ b/src/box/txn_limbo.h @@ -273,6 +273,14 @@ txn_limbo_force_empty(struct txn_limbo *limbo, int64_t last_confirm); void txn_limbo_on_parameters_change(struct txn_limbo *limbo); +/** + * Let the limbo know a snapshot is recovered. The limbo should take a few + * actions related to artificial vclocks cleanup, because the snaps does not + * store real LSNs. + */ +void +txn_limbo_on_snapshot_recovery(struct txn_limbo *limbo); + /** * Initialize qsync engine. */ diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result new file mode 100644 index 000000000..922831552 --- /dev/null +++ b/test/replication/gh-5298-qsync-recovery-snap.result @@ -0,0 +1,100 @@ +-- test-run result file version 2 +test_run = require('test_run').new() + | --- + | ... +engine = test_run:get_cfg('engine') + | --- + | ... +-- +-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock. +-- All the same LSN. For example, signature at the moment of recovery start +-- was 100. Then all rows from the snap got their LSN = 100. That broke the +-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs. +-- +_ = box.schema.space.create('sync', {is_sync = true, engine = engine}) + | --- + | ... +_ = box.space.sync:create_index('pk') + | --- + | ... +for i = 1, 10 do box.space.sync:replace{i} end + | --- + | ... + +-- Local rows could affect this by increasing the signature. +_ = box.schema.space.create('loc', {is_local = true, engine = engine}) + | --- + | ... +_ = box.space.loc:create_index('pk') + | --- + | ... +for i = 1, 10 do box.space.loc:replace{i} end + | --- + | ... + +box.snapshot() + | --- + | - ok + | ... + +test_run:cmd("restart server default") + | + +-- Could hang if the limbo would incorrectly handle the snapshot end. +box.space.sync:replace{11} + | --- + | - [11] + | ... + +old_synchro_quorum = box.cfg.replication_synchro_quorum + | --- + | ... +old_synchro_timeout = box.cfg.replication_synchro_timeout + | --- + | ... + +box.cfg{ \ + replication_synchro_timeout = 0.001, \ + replication_synchro_quorum = 2, \ +} + | --- + | ... +box.space.sync:replace{12} + | --- + | - error: Quorum collection for a synchronous transaction is timed out + | ... + +box.cfg{ \ + replication_synchro_timeout = 1000, \ + replication_synchro_quorum = 1, \ +} + | --- + | ... +box.space.sync:replace{13} + | --- + | - [13] + | ... +box.space.sync:get({11}) + | --- + | - [11] + | ... +box.space.sync:get({12}) + | --- + | ... +box.space.sync:get({13}) + | --- + | - [13] + | ... + +box.cfg{ \ + replication_synchro_timeout = old_synchro_timeout, \ + replication_synchro_quorum = old_synchro_quorum, \ +} + | --- + | ... +box.space.sync:drop() + | --- + | ... +box.space.loc:drop() + | --- + | ... diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua new file mode 100644 index 000000000..187f60d75 --- /dev/null +++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua @@ -0,0 +1,48 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') +-- +-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock. +-- All the same LSN. For example, signature at the moment of recovery start +-- was 100. Then all rows from the snap got their LSN = 100. That broke the +-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs. +-- +_ = box.schema.space.create('sync', {is_sync = true, engine = engine}) +_ = box.space.sync:create_index('pk') +for i = 1, 10 do box.space.sync:replace{i} end + +-- Local rows could affect this by increasing the signature. +_ = box.schema.space.create('loc', {is_local = true, engine = engine}) +_ = box.space.loc:create_index('pk') +for i = 1, 10 do box.space.loc:replace{i} end + +box.snapshot() + +test_run:cmd("restart server default") + +-- Could hang if the limbo would incorrectly handle the snapshot end. +box.space.sync:replace{11} + +old_synchro_quorum = box.cfg.replication_synchro_quorum +old_synchro_timeout = box.cfg.replication_synchro_timeout + +box.cfg{ \ + replication_synchro_timeout = 0.001, \ + replication_synchro_quorum = 2, \ +} +box.space.sync:replace{12} + +box.cfg{ \ + replication_synchro_timeout = 1000, \ + replication_synchro_quorum = 1, \ +} +box.space.sync:replace{13} +box.space.sync:get({11}) +box.space.sync:get({12}) +box.space.sync:get({13}) + +box.cfg{ \ + replication_synchro_timeout = old_synchro_timeout, \ + replication_synchro_quorum = old_synchro_quorum, \ +} +box.space.sync:drop() +box.space.loc:drop() -- 2.21.1 (Apple Git-122.3)