Tarantool development patches archive
 help / color / mirror / Atom feed
* [Tarantool-patches] [PATCH] vinyl: restart read iterator in case of roll backed WAL
@ 2020-05-19 12:50 Nikita Pettik
  2020-05-20 22:00 ` Vladislav Shpilevoy
  0 siblings, 1 reply; 2+ messages in thread
From: Nikita Pettik @ 2020-05-19 12:50 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy

Disk scan in vinyl is known to yield during read. So it opens a window
for modifications of in-memory level. Imagine following scenario: right
before data selection tuple is inserted into space. It passes first
stage of commit procedure, i.e. it is prepared to be committed but still
is not yet reached WAL.  Meanwhile we are starting to read the same key.
At this moment prepared statement is already inserted to in-memory tree
ergo visible to read iterator.  So, read iterator fetches this statement
and proceeds to disk scan.  In turn, disk scan yields and in this moment
WAL fails to write statement on disk, so it is roll-backed and removed
from in-memory tree. Read iterator must recognize this situation and
should not include roll-backed tuple into result set. To achieve this
let's patch vy_read_iterator_restore_mem() so that it returns return
code > 0 in case read iterator should be restarted.

Close #3395
---
Branch: https://github.com/tarantool/tarantool/tree/np/gh-3395-vy-read-prepared
Issue: https://github.com/tarantool/tarantool/issues/3395

 src/box/vy_read_iterator.c                    | 18 +++-
 .../gh-3395-read-prepared-uncommitted.result  | 95 +++++++++++++++++++
 ...gh-3395-read-prepared-uncommitted.test.lua | 41 ++++++++
 test/vinyl/suite.ini                          |  2 +-
 4 files changed, 153 insertions(+), 3 deletions(-)
 create mode 100644 test/vinyl/gh-3395-read-prepared-uncommitted.result
 create mode 100644 test/vinyl/gh-3395-read-prepared-uncommitted.test.lua

diff --git a/src/box/vy_read_iterator.c b/src/box/vy_read_iterator.c
index dcc9a0873..d83d5f81d 100644
--- a/src/box/vy_read_iterator.c
+++ b/src/box/vy_read_iterator.c
@@ -382,6 +382,9 @@ vy_read_iterator_scan_disk(struct vy_read_iterator *itr, uint32_t disk_src,
  * Restore the position of the active in-memory tree iterator
  * after a yield caused by a disk read and update 'next'
  * if necessary.
+ * @retval -1 In case of error (e.g. OOM);
+ * @retval 0 Successful execution;
+ * @retval 1 Restart of advance_iterator is required.
  */
 static NODISCARD int
 vy_read_iterator_restore_mem(struct vy_read_iterator *itr,
@@ -405,8 +408,14 @@ vy_read_iterator_restore_mem(struct vy_read_iterator *itr,
 		 * Memory trees are append-only so if the
 		 * source is not on top of the heap after
 		 * restoration, it was not before.
+		 * It may turn out that we read statement
+		 * which was prepared but not committed to
+		 * WAL. In such a rare case, we should restart
+		 * iterator_advance() to get the right key.
 		 */
-		assert(src->front_id < itr->front_id);
+		assert(src->front_id <= itr->front_id);
+		if (src->front_id == itr->front_id)
+			return 1;
 		return 0;
 	}
 	if (cmp < 0) {
@@ -529,8 +538,13 @@ rescan_disk:
 	 * as it is owned exclusively by the current fiber so the only
 	 * source to check is the active in-memory tree.
 	 */
-	if (vy_read_iterator_restore_mem(itr, &next) != 0)
+	int rc = vy_read_iterator_restore_mem(itr, &next);
+	if (rc < 0)
 		return -1;
+	if (rc > 0) {
+		vy_read_iterator_restore(itr);
+		goto restart;
+	}
 	/*
 	 * Scan the next range in case we transgressed the current
 	 * range's boundaries.
diff --git a/test/vinyl/gh-3395-read-prepared-uncommitted.result b/test/vinyl/gh-3395-read-prepared-uncommitted.result
new file mode 100644
index 000000000..a4ec023fd
--- /dev/null
+++ b/test/vinyl/gh-3395-read-prepared-uncommitted.result
@@ -0,0 +1,95 @@
+-- test-run result file version 2
+-- Disk scan in vinyl is known to yield during read. So it opens
+-- a window for modifications of in-memory level. Let's test
+-- following case: right before data selection tuple is inserted
+-- into space. It passes first stage of commit procedure, i.e. it
+-- is prepared to be committed but still not yet reached WAL.
+-- Meanwhile we are starting read of the same key. At this moment
+-- prepared statement is already inserted to in-memory tree. So,
+-- read iterator fetches this statement and proceeds to disk scan.
+-- In turn, disk scan yields and in this moment WAL fails to write
+-- statement on disk, so it is roll-backed. Read iterator must
+-- recognize this situation and must not include roll-backed tuple
+-- into result set.
+--
+fiber = require('fiber')
+ | ---
+ | ...
+
+errinj = box.error.injection
+ | ---
+ | ...
+s = box.schema.create_space('test', {engine = 'vinyl'})
+ | ---
+ | ...
+pk = s:create_index('pk')
+ | ---
+ | ...
+sk = s:create_index('sk', {parts = {{2, 'unsigned'}}, unique = false})
+ | ---
+ | ...
+s:replace{3, 2}
+ | ---
+ | - [3, 2]
+ | ...
+s:replace{2, 2}
+ | ---
+ | - [2, 2]
+ | ...
+box.snapshot()
+ | ---
+ | - ok
+ | ...
+-- Tuple {1, 2} is not going to be committed, so it should not
+-- appear in the result set.
+--
+c = fiber.channel(1)
+ | ---
+ | ...
+function do_write() s:replace{1, 2} end
+ | ---
+ | ...
+errinj.set("ERRINJ_WAL_DELAY", true)
+ | ---
+ | - ok
+ | ...
+writer = fiber.create(do_write)
+ | ---
+ | ...
+
+function do_read() local ret = sk:select{2} c:put(ret) end
+ | ---
+ | ...
+errinj.set("ERRINJ_VY_READ_PAGE_DELAY", true)
+ | ---
+ | - ok
+ | ...
+f = fiber.create(do_read)
+ | ---
+ | ...
+errinj.set("ERRINJ_WAL_WRITE", true)
+ | ---
+ | - ok
+ | ...
+errinj.set("ERRINJ_WAL_DELAY", false)
+ | ---
+ | - ok
+ | ...
+errinj.set("ERRINJ_VY_READ_PAGE_DELAY", false)
+ | ---
+ | - ok
+ | ...
+
+c:get()
+ | ---
+ | - - [2, 2]
+ |   - [3, 2]
+ | ...
+
+errinj.set("ERRINJ_WAL_WRITE", false)
+ | ---
+ | - ok
+ | ...
+s:drop()
+ | ---
+ | ...
diff --git a/test/vinyl/gh-3395-read-prepared-uncommitted.test.lua b/test/vinyl/gh-3395-read-prepared-uncommitted.test.lua
new file mode 100644
index 000000000..d75b36d1a
--- /dev/null
+++ b/test/vinyl/gh-3395-read-prepared-uncommitted.test.lua
@@ -0,0 +1,41 @@
+-- Disk scan in vinyl is known to yield during read. So it opens
+-- a window for modifications of in-memory level. Let's test
+-- following case: right before data selection tuple is inserted
+-- into space. It passes first stage of commit procedure, i.e. it
+-- is prepared to be committed but still not yet reached WAL.
+-- Meanwhile we are starting read of the same key. At this moment
+-- prepared statement is already inserted to in-memory tree. So,
+-- read iterator fetches this statement and proceeds to disk scan.
+-- In turn, disk scan yields and in this moment WAL fails to write
+-- statement on disk, so it is roll-backed. Read iterator must
+-- recognize this situation and must not include roll-backed tuple
+-- into result set.
+--
+fiber = require('fiber')
+
+errinj = box.error.injection
+s = box.schema.create_space('test', {engine = 'vinyl'})
+pk = s:create_index('pk')
+sk = s:create_index('sk', {parts = {{2, 'unsigned'}}, unique = false})
+s:replace{3, 2}
+s:replace{2, 2}
+box.snapshot()
+-- Tuple {1, 2} is not going to be committed, so it should not
+-- appear in the result set.
+--
+c = fiber.channel(1)
+function do_write() s:replace{1, 2} end
+errinj.set("ERRINJ_WAL_DELAY", true)
+writer = fiber.create(do_write)
+
+function do_read() local ret = sk:select{2} c:put(ret) end
+errinj.set("ERRINJ_VY_READ_PAGE_DELAY", true)
+f = fiber.create(do_read)
+errinj.set("ERRINJ_WAL_WRITE", true)
+errinj.set("ERRINJ_WAL_DELAY", false)
+errinj.set("ERRINJ_VY_READ_PAGE_DELAY", false)
+
+c:get()
+
+errinj.set("ERRINJ_WAL_WRITE", false)
+s:drop()
diff --git a/test/vinyl/suite.ini b/test/vinyl/suite.ini
index 8b7e87955..09829b975 100644
--- a/test/vinyl/suite.ini
+++ b/test/vinyl/suite.ini
@@ -2,7 +2,7 @@
 core = tarantool
 description = vinyl integration tests
 script = vinyl.lua
-release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4864-stmt-alloc-fail-compact.test.lua gh-4805-open-run-err-recovery.test.lua
+release_disabled = errinj.test.lua errinj_ddl.test.lua errinj_gc.test.lua errinj_stat.test.lua errinj_tx.test.lua errinj_vylog.test.lua partial_dump.test.lua quota_timeout.test.lua recovery_quota.test.lua replica_rejoin.test.lua gh-4864-stmt-alloc-fail-compact.test.lua gh-4805-open-run-err-recovery.test.lua gh-3395-read-prepared-uncommitted.test.lua
 config = suite.cfg
 lua_libs = suite.lua stress.lua large.lua txn_proxy.lua ../box/lua/utils.lua
 use_unix_sockets = True
-- 
2.17.1

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Tarantool-patches] [PATCH] vinyl: restart read iterator in case of roll backed WAL
  2020-05-19 12:50 [Tarantool-patches] [PATCH] vinyl: restart read iterator in case of roll backed WAL Nikita Pettik
@ 2020-05-20 22:00 ` Vladislav Shpilevoy
  0 siblings, 0 replies; 2+ messages in thread
From: Vladislav Shpilevoy @ 2020-05-20 22:00 UTC (permalink / raw)
  To: Nikita Pettik, tarantool-patches

Hi! Thanks for the patch!

I would suggest adding Alexander L. to the review. He participated in
implementation of iterator restoration a lot.

See 2 comments below.

1. I see the test fails on the branch in one of the jobs:
https://travis-ci.org/github/tarantool/tarantool/jobs/688791208

I also was able to reproduce it locally. The test hangs, or prints
mismatching results like this:

[001] vinyl/gh-3395-read-prepared-uncommitted.test.lua                [ fail ]
[001] 
[001] Test failed! Result content mismatch:
[001] --- vinyl/gh-3395-read-prepared-uncommitted.result	Wed May 20 23:42:38 2020
[001] +++ vinyl/gh-3395-read-prepared-uncommitted.reject	Wed May 20 23:51:18 2020
[001] @@ -82,7 +82,8 @@
[001]  
[001]  c:get()
[001]   | ---
[001] - | - - [2, 2]
[001] + | - - [1, 2]
[001] + |   - [2, 2]
[001]   |   - [3, 2]
[001]   | ...
[001]  
[001]

The mismatch happens almost on every run.

> diff --git a/test/vinyl/gh-3395-read-prepared-uncommitted.result b/test/vinyl/gh-3395-read-prepared-uncommitted.result
> new file mode 100644
> index 000000000..a4ec023fd
> --- /dev/null
> +++ b/test/vinyl/gh-3395-read-prepared-uncommitted.result
> @@ -0,0 +1,95 @@
> +-- test-run result file version 2
> +-- Disk scan in vinyl is known to yield during read. So it opens
> +-- a window for modifications of in-memory level. Let's test
> +-- following case: right before data selection tuple is inserted
> +-- into space. It passes first stage of commit procedure, i.e. it
> +-- is prepared to be committed but still not yet reached WAL.
> +-- Meanwhile we are starting read of the same key. At this moment
> +-- prepared statement is already inserted to in-memory tree. So,
> +-- read iterator fetches this statement and proceeds to disk scan.
> +-- In turn, disk scan yields and in this moment WAL fails to write
> +-- statement on disk, so it is roll-backed. Read iterator must
> +-- recognize this situation and must not include roll-backed tuple
> +-- into result set.
> +--
> +fiber = require('fiber')
> + | ---
> + | ...
> +
> +errinj = box.error.injection
> + | ---
> + | ...
> +s = box.schema.create_space('test', {engine = 'vinyl'})
> + | ---
> + | ...
> +pk = s:create_index('pk')
> + | ---
> + | ...
> +sk = s:create_index('sk', {parts = {{2, 'unsigned'}}, unique = false})
> + | ---
> + | ...
> +s:replace{3, 2}
> + | ---
> + | - [3, 2]
> + | ...
> +s:replace{2, 2}
> + | ---
> + | - [2, 2]
> + | ...
> +box.snapshot()
> + | ---
> + | - ok
> + | ...
> +-- Tuple {1, 2} is not going to be committed, so it should not
> +-- appear in the result set.
> +--
> +c = fiber.channel(1)
> + | ---
> + | ...
> +function do_write() s:replace{1, 2} end
> + | ---
> + | ...
> +errinj.set("ERRINJ_WAL_DELAY", true)
> + | ---
> + | - ok
> + | ...
> +writer = fiber.create(do_write)
> + | ---
> + | ...
> +
> +function do_read() local ret = sk:select{2} c:put(ret) end

2. What if it will read not the first tuple? What if it will
read a second or a third? I didn't check, just want to ensure,
that the iterator restart in this case won't literally restart
the iteration, and won't return the same tuples twice.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-20 22:00 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 12:50 [Tarantool-patches] [PATCH] vinyl: restart read iterator in case of roll backed WAL Nikita Pettik
2020-05-20 22:00 ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox