* [PATCH] vinyl: fix read iterator skips source after reading cache
@ 2018-06-22 16:53 Vladimir Davydov
2018-06-27 16:02 ` Konstantin Osipov
0 siblings, 1 reply; 2+ messages in thread
From: Vladimir Davydov @ 2018-06-22 16:53 UTC (permalink / raw)
To: kostja; +Cc: tarantool-patches
If a source is used on a read iteration (i.e. the key at which it is
positioned is the next best match or, in terms of the read iterator
implementation, its front_id matches the read iterator front_id), its
history is cleaned up, see vy_read_iterator_apply_history(). This breaks
the logic behind vy_read_src_is_behind(), which assumes that the history
always points to the last used key. As a result, a source may be
mistakenly skipped, as illustrated below:
Fiber 1 Fiber 2
------- -------
1. Opens read iterator.
2. Advances it to the next key.
The returned key was read from
a mem or run (not from cache).
The source's history is emptied.
Adds a chain containing
the key read by fiber 1
to the cache.
3. Continues iteration, reads
next few keys from the cache
until the chain ends. The source
used at step 2 is skipped.
4. Calls vy_read_src_is_behind()
on the source used at step 2 and
skipped at step 3. It returns
false, because its history is
empty, thus skipping keys stored
in it.
Fix this bug by moving the code that checks whether a source iterator
needs to be advanced from vy_read_src_is_behind() to source iterator
'skip' method, because there we always know the last key returned by
the iterator.
Basically, this returns the code we had before commit b4d57284d270
("vinyl: consolidate skip optimization checks in read iterator").
Closes #3477
---
https://github.com/tarantool/tarantool/issues/3477
https://github.com/tarantool/tarantool/commits/gh-3477-vy-read-iterator-fix-skip-source
src/box/vy_cache.c | 15 ++++++++++++--
src/box/vy_mem.c | 11 +++++++++++
src/box/vy_read_iterator.c | 24 +++-------------------
src/box/vy_run.c | 14 ++++++++++++-
src/box/vy_tx.c | 11 +++++++++++
test/vinyl/iterator.result | 47 ++++++++++++++++++++++++++++++++++++++++++++
test/vinyl/iterator.test.lua | 22 +++++++++++++++++++++
7 files changed, 120 insertions(+), 24 deletions(-)
diff --git a/src/box/vy_cache.c b/src/box/vy_cache.c
index 2b9e40fa..4f6d0c7f 100644
--- a/src/box/vy_cache.c
+++ b/src/box/vy_cache.c
@@ -700,11 +700,22 @@ vy_cache_iterator_skip(struct vy_cache_iterator *itr,
const struct tuple *last_stmt,
struct vy_history *history, bool *stop)
{
+ assert(!itr->search_started || itr->version == itr->cache->version);
+
+ /*
+ * Check if the iterator is already positioned
+ * at the statement following last_stmt.
+ */
+ if (itr->search_started &&
+ (itr->curr_stmt == NULL || last_stmt == NULL ||
+ iterator_direction(itr->iterator_type) *
+ vy_tuple_compare(itr->curr_stmt, last_stmt,
+ itr->cache->cmp_def) > 0))
+ return 0;
+
*stop = false;
vy_history_cleanup(history);
- assert(!itr->search_started || itr->version == itr->cache->version);
-
itr->search_started = true;
itr->version = itr->cache->version;
if (itr->curr_stmt != NULL)
diff --git a/src/box/vy_mem.c b/src/box/vy_mem.c
index 29ef7710..7c9690ef 100644
--- a/src/box/vy_mem.c
+++ b/src/box/vy_mem.c
@@ -546,6 +546,17 @@ vy_mem_iterator_skip(struct vy_mem_iterator *itr,
{
assert(!itr->search_started || itr->version == itr->mem->version);
+ /*
+ * Check if the iterator is already positioned
+ * at the statement following last_stmt.
+ */
+ if (itr->search_started &&
+ (itr->curr_stmt == NULL || last_stmt == NULL ||
+ iterator_direction(itr->iterator_type) *
+ vy_tuple_compare(itr->curr_stmt, last_stmt,
+ itr->mem->cmp_def) > 0))
+ return 0;
+
vy_history_cleanup(history);
const struct tuple *key = itr->key;
diff --git a/src/box/vy_read_iterator.c b/src/box/vy_read_iterator.c
index d8d66229..160bb899 100644
--- a/src/box/vy_read_iterator.c
+++ b/src/box/vy_read_iterator.c
@@ -244,24 +244,6 @@ vy_read_iterator_evaluate_src(struct vy_read_iterator *itr,
}
}
-/**
- * Check if a read iterator source is behind the current read
- * iterator position and hence needs to be fast-forwarded.
- */
-static inline bool
-vy_read_src_is_behind(struct vy_read_iterator *itr, struct vy_read_src *src)
-{
- uint32_t src_id = src - itr->src;
- if (!src->is_started)
- return true;
- if (src_id < itr->skipped_src)
- return false;
- struct tuple *stmt = vy_history_last_stmt(&src->history);
- if (vy_read_iterator_cmp_stmt(itr, stmt, itr->last_stmt) > 0)
- return false;
- return true;
-}
-
/*
* Each of the functions from the vy_read_iterator_scan_* family
* is used by vy_read_iterator_advance() to:
@@ -327,7 +309,7 @@ vy_read_iterator_scan_cache(struct vy_read_iterator *itr,
int rc = vy_cache_iterator_restore(src_itr, itr->last_stmt,
&src->history, &is_interval);
if (rc == 0) {
- if (vy_read_src_is_behind(itr, src)) {
+ if (!src->is_started || itr->cache_src >= itr->skipped_src) {
rc = vy_cache_iterator_skip(src_itr, itr->last_stmt,
&src->history, &is_interval);
} else if (src->front_id == itr->prev_front_id) {
@@ -359,7 +341,7 @@ vy_read_iterator_scan_mem(struct vy_read_iterator *itr, uint32_t mem_src,
rc = vy_mem_iterator_restore(src_itr, itr->last_stmt, &src->history);
if (rc == 0) {
- if (vy_read_src_is_behind(itr, src)) {
+ if (!src->is_started || mem_src >= itr->skipped_src) {
rc = vy_mem_iterator_skip(src_itr, itr->last_stmt,
&src->history);
} else if (src->front_id == itr->prev_front_id) {
@@ -384,7 +366,7 @@ vy_read_iterator_scan_disk(struct vy_read_iterator *itr, uint32_t disk_src,
assert(disk_src >= itr->disk_src && disk_src < itr->src_count);
- if (vy_read_src_is_behind(itr, src))
+ if (!src->is_started || disk_src >= itr->skipped_src)
rc = vy_run_iterator_skip(src_itr, itr->last_stmt,
&src->history);
else if (src->front_id == itr->prev_front_id)
diff --git a/src/box/vy_run.c b/src/box/vy_run.c
index e2edbcaa..a15777b5 100644
--- a/src/box/vy_run.c
+++ b/src/box/vy_run.c
@@ -1523,10 +1523,22 @@ vy_run_iterator_skip(struct vy_run_iterator *itr,
const struct tuple *last_stmt,
struct vy_history *history)
{
- vy_history_cleanup(history);
if (itr->search_ended)
return 0;
+ /*
+ * Check if the iterator is already positioned
+ * at the statement following last_stmt.
+ */
+ if (itr->search_started &&
+ (itr->curr_stmt == NULL || last_stmt == NULL ||
+ iterator_direction(itr->iterator_type) *
+ vy_tuple_compare(itr->curr_stmt, last_stmt,
+ itr->cmp_def) > 0))
+ return 0;
+
+ vy_history_cleanup(history);
+
const struct tuple *key = itr->key;
enum iterator_type iterator_type = itr->iterator_type;
if (last_stmt != NULL) {
diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index 04792ce7..f5bb624f 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -983,6 +983,17 @@ vy_txw_iterator_skip(struct vy_txw_iterator *itr,
assert(!itr->search_started ||
itr->version == itr->tx->write_set_version);
+ /*
+ * Check if the iterator is already positioned
+ * at the statement following last_stmt.
+ */
+ if (itr->search_started &&
+ (itr->curr_txv == NULL || last_stmt == NULL ||
+ iterator_direction(itr->iterator_type) *
+ vy_tuple_compare(itr->curr_txv->stmt, last_stmt,
+ itr->lsm->cmp_def) > 0))
+ return 0;
+
vy_history_cleanup(history);
const struct tuple *key = itr->key;
diff --git a/test/vinyl/iterator.result b/test/vinyl/iterator.result
index 4798f5f2..be0744d1 100644
--- a/test/vinyl/iterator.result
+++ b/test/vinyl/iterator.result
@@ -2182,3 +2182,50 @@ box.commit()
s:drop()
---
...
+--
+-- gh-3477: read iterator skips a source after reading cache.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+for i = 1, 5 do s:insert{i} end
+---
+...
+-- Start iteration.
+t = {}
+---
+...
+gen, param, state = s:pairs({})
+---
+...
+_, v = gen(param, state)
+---
+...
+table.insert(t, v)
+---
+...
+-- Add chain {1}..{3} to the cache
+box.space.test:select({}, {limit = 3})
+---
+- - [1]
+ - [2]
+ - [3]
+...
+-- Continue iteration.
+for k, v in gen, param, state do table.insert(t, v) end
+---
+...
+t
+---
+- - [1]
+ - [2]
+ - [3]
+ - [4]
+ - [5]
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/iterator.test.lua b/test/vinyl/iterator.test.lua
index b15f4ef1..9fa6609a 100644
--- a/test/vinyl/iterator.test.lua
+++ b/test/vinyl/iterator.test.lua
@@ -759,3 +759,25 @@ value
box.commit()
s:drop()
+
+--
+-- gh-3477: read iterator skips a source after reading cache.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+for i = 1, 5 do s:insert{i} end
+
+-- Start iteration.
+t = {}
+gen, param, state = s:pairs({})
+_, v = gen(param, state)
+table.insert(t, v)
+
+-- Add chain {1}..{3} to the cache
+box.space.test:select({}, {limit = 3})
+
+-- Continue iteration.
+for k, v in gen, param, state do table.insert(t, v) end
+t
+
+s:drop()
--
2.11.0
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] vinyl: fix read iterator skips source after reading cache
2018-06-22 16:53 [PATCH] vinyl: fix read iterator skips source after reading cache Vladimir Davydov
@ 2018-06-27 16:02 ` Konstantin Osipov
0 siblings, 0 replies; 2+ messages in thread
From: Konstantin Osipov @ 2018-06-27 16:02 UTC (permalink / raw)
To: Vladimir Davydov; +Cc: tarantool-patches
* Vladimir Davydov <vdavydov.dev@gmail.com> [18/06/23 08:30]:
> If a source is used on a read iteration (i.e. the key at which it is
> positioned is the next best match or, in terms of the read iterator
> implementation, its front_id matches the read iterator front_id), its
> history is cleaned up, see vy_read_iterator_apply_history(). This breaks
> the logic behind vy_read_src_is_behind(), which assumes that the history
> always points to the last used key. As a result, a source may be
> mistakenly skipped, as illustrated below:
Pushed.
--
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-06-27 16:02 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-22 16:53 [PATCH] vinyl: fix read iterator skips source after reading cache Vladimir Davydov
2018-06-27 16:02 ` Konstantin Osipov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox