[Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions
Serge Petrenko
sergepetrenko at tarantool.org
Mon Sep 14 12:15:54 MSK 2020
13.09.2020 22:39, Vladislav Shpilevoy пишет:
> Snapshot rows contain not real LSNs. Instead their LSNs are
> signatures, ordinal numbers. Rows in the snap have LSNs from 1 to
> the number of rows. This is because LSNs are not stored with every
> tuple in the storages, and there is no way to store real LSNs in
> the snapshot.
>
> These artificial LSNs broke the synchronous replication limbo.
> After snap recovery is done, limbo vclock was broken - it
> contained numbers not related to reality, and affected by rows
> from local spaces.
>
> Also the recovery could stuck because ACKs in the limbo stopped
> working after a first row - the vclock was set to the final
> signature right away.
>
> This patch makes all snapshot recovered rows async. Because they
> are confirmed by definition. So now the limbo is not involved into
> the snapshot recovery.
>
> Closes #5298
Thanks for the patch! LGTM.
> ---
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5298-qsync-recovery-snap
> Issue: https://github.com/tarantool/tarantool/issues/5298
>
> @ChangeLog
> * Having synchronous rows in the snapshot could make the instance hang on recovery (gh-5298).
>
> Changes in v2:
> - I realized it is enough to set snapshot transactions asynchronous. So they
> don't involve the limbo anymore, and can't pollute its vclock with the not
> real LSNs.
>
> src/box/memtx_engine.c | 5 +
> .../gh-5298-qsync-recovery-snap.result | 100 ++++++++++++++++++
> .../gh-5298-qsync-recovery-snap.test.lua | 48 +++++++++
> 3 files changed, 153 insertions(+)
> create mode 100644 test/replication/gh-5298-qsync-recovery-snap.result
> create mode 100644 test/replication/gh-5298-qsync-recovery-snap.test.lua
>
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index 9f079a6b5..9df67426c 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -233,6 +233,11 @@ memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
> goto rollback_stmt;
> if (txn_commit_stmt(txn, &request) != 0)
> goto rollback;
> + /*
> + * Snapshot rows are confirmed by definition. They don't need to go to
> + * the synchronous transactions limbo.
> + */
> + txn_set_flag(txn, TXN_FORCE_ASYNC);
> rc = txn_commit(txn);
> /*
> * Don't let gc pool grow too much. Yet to
> diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
> new file mode 100644
> index 000000000..922831552
> --- /dev/null
> +++ b/test/replication/gh-5298-qsync-recovery-snap.result
> @@ -0,0 +1,100 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +--
> +-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
> +-- All the same LSN. For example, signature at the moment of recovery start
> +-- was 100. Then all rows from the snap got their LSN = 100. That broke the
> +-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
> +--
> +_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +for i = 1, 10 do box.space.sync:replace{i} end
> + | ---
> + | ...
> +
> +-- Local rows could affect this by increasing the signature.
> +_ = box.schema.space.create('loc', {is_local = true, engine = engine})
> + | ---
> + | ...
> +_ = box.space.loc:create_index('pk')
> + | ---
> + | ...
> +for i = 1, 10 do box.space.loc:replace{i} end
> + | ---
> + | ...
> +
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +
> +test_run:cmd("restart server default")
> + |
> +
> +-- Could hang if the limbo would incorrectly handle the snapshot end.
> +box.space.sync:replace{11}
> + | ---
> + | - [11]
> + | ...
> +
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> + | ---
> + | ...
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
> + | ---
> + | ...
> +
> +box.cfg{ \
> + replication_synchro_timeout = 0.001, \
> + replication_synchro_quorum = 2, \
> +}
> + | ---
> + | ...
> +box.space.sync:replace{12}
> + | ---
> + | - error: Quorum collection for a synchronous transaction is timed out
> + | ...
> +
> +box.cfg{ \
> + replication_synchro_timeout = 1000, \
> + replication_synchro_quorum = 1, \
> +}
> + | ---
> + | ...
> +box.space.sync:replace{13}
> + | ---
> + | - [13]
> + | ...
> +box.space.sync:get({11})
> + | ---
> + | - [11]
> + | ...
> +box.space.sync:get({12})
> + | ---
> + | ...
> +box.space.sync:get({13})
> + | ---
> + | - [13]
> + | ...
> +
> +box.cfg{ \
> + replication_synchro_timeout = old_synchro_timeout, \
> + replication_synchro_quorum = old_synchro_quorum, \
> +}
> + | ---
> + | ...
> +box.space.sync:drop()
> + | ---
> + | ...
> +box.space.loc:drop()
> + | ---
> + | ...
> diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
> new file mode 100644
> index 000000000..187f60d75
> --- /dev/null
> +++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
> @@ -0,0 +1,48 @@
> +test_run = require('test_run').new()
> +engine = test_run:get_cfg('engine')
> +--
> +-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
> +-- All the same LSN. For example, signature at the moment of recovery start
> +-- was 100. Then all rows from the snap got their LSN = 100. That broke the
> +-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
> +--
> +_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
> +_ = box.space.sync:create_index('pk')
> +for i = 1, 10 do box.space.sync:replace{i} end
> +
> +-- Local rows could affect this by increasing the signature.
> +_ = box.schema.space.create('loc', {is_local = true, engine = engine})
> +_ = box.space.loc:create_index('pk')
> +for i = 1, 10 do box.space.loc:replace{i} end
> +
> +box.snapshot()
> +
> +test_run:cmd("restart server default")
> +
> +-- Could hang if the limbo would incorrectly handle the snapshot end.
> +box.space.sync:replace{11}
> +
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
> +
> +box.cfg{ \
> + replication_synchro_timeout = 0.001, \
> + replication_synchro_quorum = 2, \
> +}
> +box.space.sync:replace{12}
> +
> +box.cfg{ \
> + replication_synchro_timeout = 1000, \
> + replication_synchro_quorum = 1, \
> +}
> +box.space.sync:replace{13}
> +box.space.sync:get({11})
> +box.space.sync:get({12})
> +box.space.sync:get({13})
> +
> +box.cfg{ \
> + replication_synchro_timeout = old_synchro_timeout, \
> + replication_synchro_quorum = old_synchro_quorum, \
> +}
> +box.space.sync:drop()
> +box.space.loc:drop()
--
Serge Petrenko
More information about the Tarantool-patches
mailing list