Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions
@ 2020-09-13 19:39 Vladislav Shpilevoy
  2020-09-14  8:12 ` Cyrill Gorcunov
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-13 19:39 UTC (permalink / raw)
  To: tarantool-patches, gorcunov, sergepetrenko

Snapshot rows contain not real LSNs. Instead their LSNs are
signatures, ordinal numbers. Rows in the snap have LSNs from 1 to
the number of rows. This is because LSNs are not stored with every
tuple in the storages, and there is no way to store real LSNs in
the snapshot.

These artificial LSNs broke the synchronous replication limbo.
After snap recovery is done, limbo vclock was broken - it
contained numbers not related to reality, and affected by rows
from local spaces.

Also the recovery could stuck because ACKs in the limbo stopped
working after a first row - the vclock was set to the final
signature right away.

This patch makes all snapshot recovered rows async. Because they
are confirmed by definition. So now the limbo is not involved into
the snapshot recovery.

Closes #5298
---
Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5298-qsync-recovery-snap
Issue: https://github.com/tarantool/tarantool/issues/5298

@ChangeLog
* Having synchronous rows in the snapshot could make the instance hang on recovery (gh-5298).

Changes in v2:
- I realized it is enough to set snapshot transactions asynchronous. So they
  don't involve the limbo anymore, and can't pollute its vclock with the not
  real LSNs.

 src/box/memtx_engine.c                        |   5 +
 .../gh-5298-qsync-recovery-snap.result        | 100 ++++++++++++++++++
 .../gh-5298-qsync-recovery-snap.test.lua      |  48 +++++++++
 3 files changed, 153 insertions(+)
 create mode 100644 test/replication/gh-5298-qsync-recovery-snap.result
 create mode 100644 test/replication/gh-5298-qsync-recovery-snap.test.lua

diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 9f079a6b5..9df67426c 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -233,6 +233,11 @@ memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
 		goto rollback_stmt;
 	if (txn_commit_stmt(txn, &request) != 0)
 		goto rollback;
+	/*
+	 * Snapshot rows are confirmed by definition. They don't need to go to
+	 * the synchronous transactions limbo.
+	 */
+	txn_set_flag(txn, TXN_FORCE_ASYNC);
 	rc = txn_commit(txn);
 	/*
 	 * Don't let gc pool grow too much. Yet to
diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
new file mode 100644
index 000000000..922831552
--- /dev/null
+++ b/test/replication/gh-5298-qsync-recovery-snap.result
@@ -0,0 +1,100 @@
+-- test-run result file version 2
+test_run = require('test_run').new()
+ | ---
+ | ...
+engine = test_run:get_cfg('engine')
+ | ---
+ | ...
+--
+-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
+-- All the same LSN. For example, signature at the moment of recovery start
+-- was 100. Then all rows from the snap got their LSN = 100. That broke the
+-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
+--
+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
+ | ---
+ | ...
+_ = box.space.sync:create_index('pk')
+ | ---
+ | ...
+for i = 1, 10 do box.space.sync:replace{i} end
+ | ---
+ | ...
+
+-- Local rows could affect this by increasing the signature.
+_ = box.schema.space.create('loc', {is_local = true, engine = engine})
+ | ---
+ | ...
+_ = box.space.loc:create_index('pk')
+ | ---
+ | ...
+for i = 1, 10 do box.space.loc:replace{i} end
+ | ---
+ | ...
+
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+
+test_run:cmd("restart server default")
+ | 
+
+-- Could hang if the limbo would incorrectly handle the snapshot end.
+box.space.sync:replace{11}
+ | ---
+ | - [11]
+ | ...
+
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+ | ---
+ | ...
+old_synchro_timeout = box.cfg.replication_synchro_timeout
+ | ---
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 0.001,                                        \
+    replication_synchro_quorum = 2,                                             \
+}
+ | ---
+ | ...
+box.space.sync:replace{12}
+ | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 1000,                                         \
+    replication_synchro_quorum = 1,                                             \
+}
+ | ---
+ | ...
+box.space.sync:replace{13}
+ | ---
+ | - [13]
+ | ...
+box.space.sync:get({11})
+ | ---
+ | - [11]
+ | ...
+box.space.sync:get({12})
+ | ---
+ | ...
+box.space.sync:get({13})
+ | ---
+ | - [13]
+ | ...
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+}
+ | ---
+ | ...
+box.space.sync:drop()
+ | ---
+ | ...
+box.space.loc:drop()
+ | ---
+ | ...
diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
new file mode 100644
index 000000000..187f60d75
--- /dev/null
+++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
@@ -0,0 +1,48 @@
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+--
+-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
+-- All the same LSN. For example, signature at the moment of recovery start
+-- was 100. Then all rows from the snap got their LSN = 100. That broke the
+-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
+--
+_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
+_ = box.space.sync:create_index('pk')
+for i = 1, 10 do box.space.sync:replace{i} end
+
+-- Local rows could affect this by increasing the signature.
+_ = box.schema.space.create('loc', {is_local = true, engine = engine})
+_ = box.space.loc:create_index('pk')
+for i = 1, 10 do box.space.loc:replace{i} end
+
+box.snapshot()
+
+test_run:cmd("restart server default")
+
+-- Could hang if the limbo would incorrectly handle the snapshot end.
+box.space.sync:replace{11}
+
+old_synchro_quorum = box.cfg.replication_synchro_quorum
+old_synchro_timeout = box.cfg.replication_synchro_timeout
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 0.001,                                        \
+    replication_synchro_quorum = 2,                                             \
+}
+box.space.sync:replace{12}
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = 1000,                                         \
+    replication_synchro_quorum = 1,                                             \
+}
+box.space.sync:replace{13}
+box.space.sync:get({11})
+box.space.sync:get({12})
+box.space.sync:get({13})
+
+box.cfg{                                                                        \
+    replication_synchro_timeout = old_synchro_timeout,                          \
+    replication_synchro_quorum = old_synchro_quorum,                            \
+}
+box.space.sync:drop()
+box.space.loc:drop()
-- 
2.21.1 (Apple Git-122.3)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions
  2020-09-13 19:39 [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions Vladislav Shpilevoy
@ 2020-09-14  8:12 ` Cyrill Gorcunov
  2020-09-14  9:15 ` Serge Petrenko
  2020-09-14 20:01 ` Vladislav Shpilevoy
  2 siblings, 0 replies; 4+ messages in thread
From: Cyrill Gorcunov @ 2020-09-14  8:12 UTC (permalink / raw)
  To: Vladislav Shpilevoy; +Cc: tarantool-patches

On Sun, Sep 13, 2020 at 09:39:26PM +0200, Vladislav Shpilevoy wrote:
> Snapshot rows contain not real LSNs. Instead their LSNs are
> signatures, ordinal numbers. Rows in the snap have LSNs from 1 to
> the number of rows. This is because LSNs are not stored with every
> tuple in the storages, and there is no way to store real LSNs in
> the snapshot.
> 
> These artificial LSNs broke the synchronous replication limbo.
> After snap recovery is done, limbo vclock was broken - it
> contained numbers not related to reality, and affected by rows
> from local spaces.
> 
> Also the recovery could stuck because ACKs in the limbo stopped
> working after a first row - the vclock was set to the final
> signature right away.
> 
> This patch makes all snapshot recovered rows async. Because they
> are confirmed by definition. So now the limbo is not involved into
> the snapshot recovery.
> 
> Closes #5298
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions
  2020-09-13 19:39 [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions Vladislav Shpilevoy
  2020-09-14  8:12 ` Cyrill Gorcunov
@ 2020-09-14  9:15 ` Serge Petrenko
  2020-09-14 20:01 ` Vladislav Shpilevoy
  2 siblings, 0 replies; 4+ messages in thread
From: Serge Petrenko @ 2020-09-14  9:15 UTC (permalink / raw)
  To: Vladislav Shpilevoy, tarantool-patches, gorcunov


13.09.2020 22:39, Vladislav Shpilevoy пишет:
> Snapshot rows contain not real LSNs. Instead their LSNs are
> signatures, ordinal numbers. Rows in the snap have LSNs from 1 to
> the number of rows. This is because LSNs are not stored with every
> tuple in the storages, and there is no way to store real LSNs in
> the snapshot.
>
> These artificial LSNs broke the synchronous replication limbo.
> After snap recovery is done, limbo vclock was broken - it
> contained numbers not related to reality, and affected by rows
> from local spaces.
>
> Also the recovery could stuck because ACKs in the limbo stopped
> working after a first row - the vclock was set to the final
> signature right away.
>
> This patch makes all snapshot recovered rows async. Because they
> are confirmed by definition. So now the limbo is not involved into
> the snapshot recovery.
>
> Closes #5298
Thanks for the patch! LGTM.
> ---
> Branch: http://github.com/tarantool/tarantool/tree/gerold103/gh-5298-qsync-recovery-snap
> Issue: https://github.com/tarantool/tarantool/issues/5298
>
> @ChangeLog
> * Having synchronous rows in the snapshot could make the instance hang on recovery (gh-5298).
>
> Changes in v2:
> - I realized it is enough to set snapshot transactions asynchronous. So they
>    don't involve the limbo anymore, and can't pollute its vclock with the not
>    real LSNs.
>
>   src/box/memtx_engine.c                        |   5 +
>   .../gh-5298-qsync-recovery-snap.result        | 100 ++++++++++++++++++
>   .../gh-5298-qsync-recovery-snap.test.lua      |  48 +++++++++
>   3 files changed, 153 insertions(+)
>   create mode 100644 test/replication/gh-5298-qsync-recovery-snap.result
>   create mode 100644 test/replication/gh-5298-qsync-recovery-snap.test.lua
>
> diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
> index 9f079a6b5..9df67426c 100644
> --- a/src/box/memtx_engine.c
> +++ b/src/box/memtx_engine.c
> @@ -233,6 +233,11 @@ memtx_engine_recover_snapshot_row(struct memtx_engine *memtx,
>   		goto rollback_stmt;
>   	if (txn_commit_stmt(txn, &request) != 0)
>   		goto rollback;
> +	/*
> +	 * Snapshot rows are confirmed by definition. They don't need to go to
> +	 * the synchronous transactions limbo.
> +	 */
> +	txn_set_flag(txn, TXN_FORCE_ASYNC);
>   	rc = txn_commit(txn);
>   	/*
>   	 * Don't let gc pool grow too much. Yet to
> diff --git a/test/replication/gh-5298-qsync-recovery-snap.result b/test/replication/gh-5298-qsync-recovery-snap.result
> new file mode 100644
> index 000000000..922831552
> --- /dev/null
> +++ b/test/replication/gh-5298-qsync-recovery-snap.result
> @@ -0,0 +1,100 @@
> +-- test-run result file version 2
> +test_run = require('test_run').new()
> + | ---
> + | ...
> +engine = test_run:get_cfg('engine')
> + | ---
> + | ...
> +--
> +-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
> +-- All the same LSN. For example, signature at the moment of recovery start
> +-- was 100. Then all rows from the snap got their LSN = 100. That broke the
> +-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
> +--
> +_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
> + | ---
> + | ...
> +_ = box.space.sync:create_index('pk')
> + | ---
> + | ...
> +for i = 1, 10 do box.space.sync:replace{i} end
> + | ---
> + | ...
> +
> +-- Local rows could affect this by increasing the signature.
> +_ = box.schema.space.create('loc', {is_local = true, engine = engine})
> + | ---
> + | ...
> +_ = box.space.loc:create_index('pk')
> + | ---
> + | ...
> +for i = 1, 10 do box.space.loc:replace{i} end
> + | ---
> + | ...
> +
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +
> +test_run:cmd("restart server default")
> + |
> +
> +-- Could hang if the limbo would incorrectly handle the snapshot end.
> +box.space.sync:replace{11}
> + | ---
> + | - [11]
> + | ...
> +
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> + | ---
> + | ...
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
> + | ---
> + | ...
> +
> +box.cfg{                                                                        \
> +    replication_synchro_timeout = 0.001,                                        \
> +    replication_synchro_quorum = 2,                                             \
> +}
> + | ---
> + | ...
> +box.space.sync:replace{12}
> + | ---
> + | - error: Quorum collection for a synchronous transaction is timed out
> + | ...
> +
> +box.cfg{                                                                        \
> +    replication_synchro_timeout = 1000,                                         \
> +    replication_synchro_quorum = 1,                                             \
> +}
> + | ---
> + | ...
> +box.space.sync:replace{13}
> + | ---
> + | - [13]
> + | ...
> +box.space.sync:get({11})
> + | ---
> + | - [11]
> + | ...
> +box.space.sync:get({12})
> + | ---
> + | ...
> +box.space.sync:get({13})
> + | ---
> + | - [13]
> + | ...
> +
> +box.cfg{                                                                        \
> +    replication_synchro_timeout = old_synchro_timeout,                          \
> +    replication_synchro_quorum = old_synchro_quorum,                            \
> +}
> + | ---
> + | ...
> +box.space.sync:drop()
> + | ---
> + | ...
> +box.space.loc:drop()
> + | ---
> + | ...
> diff --git a/test/replication/gh-5298-qsync-recovery-snap.test.lua b/test/replication/gh-5298-qsync-recovery-snap.test.lua
> new file mode 100644
> index 000000000..187f60d75
> --- /dev/null
> +++ b/test/replication/gh-5298-qsync-recovery-snap.test.lua
> @@ -0,0 +1,48 @@
> +test_run = require('test_run').new()
> +engine = test_run:get_cfg('engine')
> +--
> +-- gh-5298: rows from snapshot got their LSN = signature of the instance vclock.
> +-- All the same LSN. For example, signature at the moment of recovery start
> +-- was 100. Then all rows from the snap got their LSN = 100. That broke the
> +-- limbo, because it assumes LSNs grow. In order to skip duplicate ACKs.
> +--
> +_ = box.schema.space.create('sync', {is_sync = true, engine = engine})
> +_ = box.space.sync:create_index('pk')
> +for i = 1, 10 do box.space.sync:replace{i} end
> +
> +-- Local rows could affect this by increasing the signature.
> +_ = box.schema.space.create('loc', {is_local = true, engine = engine})
> +_ = box.space.loc:create_index('pk')
> +for i = 1, 10 do box.space.loc:replace{i} end
> +
> +box.snapshot()
> +
> +test_run:cmd("restart server default")
> +
> +-- Could hang if the limbo would incorrectly handle the snapshot end.
> +box.space.sync:replace{11}
> +
> +old_synchro_quorum = box.cfg.replication_synchro_quorum
> +old_synchro_timeout = box.cfg.replication_synchro_timeout
> +
> +box.cfg{                                                                        \
> +    replication_synchro_timeout = 0.001,                                        \
> +    replication_synchro_quorum = 2,                                             \
> +}
> +box.space.sync:replace{12}
> +
> +box.cfg{                                                                        \
> +    replication_synchro_timeout = 1000,                                         \
> +    replication_synchro_quorum = 1,                                             \
> +}
> +box.space.sync:replace{13}
> +box.space.sync:get({11})
> +box.space.sync:get({12})
> +box.space.sync:get({13})
> +
> +box.cfg{                                                                        \
> +    replication_synchro_timeout = old_synchro_timeout,                          \
> +    replication_synchro_quorum = old_synchro_quorum,                            \
> +}
> +box.space.sync:drop()
> +box.space.loc:drop()

-- 
Serge Petrenko

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions
  2020-09-13 19:39 [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions Vladislav Shpilevoy
  2020-09-14  8:12 ` Cyrill Gorcunov
  2020-09-14  9:15 ` Serge Petrenko
@ 2020-09-14 20:01 ` Vladislav Shpilevoy
  2 siblings, 0 replies; 4+ messages in thread
From: Vladislav Shpilevoy @ 2020-09-14 20:01 UTC (permalink / raw)
  To: tarantool-patches, gorcunov, sergepetrenko

Pushed to master and 2.5.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-09-14 20:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-13 19:39 [Tarantool-patches] [PATCH v2 1/1] memtx: force async snapshot transactions Vladislav Shpilevoy
2020-09-14  8:12 ` Cyrill Gorcunov
2020-09-14  9:15 ` Serge Petrenko
2020-09-14 20:01 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox