Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: kostja@tarantool.org
Cc: tarantool-patches@freelists.org
Subject: [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update
Date: Fri, 10 Aug 2018 20:26:05 +0300	[thread overview]
Message-ID: <9bec570f3b7e3b781430dd688c033d00eb6874ff.1533921796.git.vdavydov.dev@gmail.com> (raw)

index.update() looks up the old tuple in the primary index, applies
update operations to it, then writes a DELETE statement to secondary
indexes to delete the old tuple and a REPLACE statement to all indexes
to insert the new tuple. It also sets a column mask for both DELETE and
REPLACE statements. The column mask is a bit mask which has a bit set if
the corresponding field is updated by update operations. It is used by
the write iterator for two purposes. First, the write iterator skips
REPLACE statements that don't update key fields. Second, the write
iterator turns a REPLACE that has a column mask that intersects with key
fields into an INSERT (so that it can get annihilated with a DELETE when
the time comes). The latter is correct, because if an update() does
update secondary key fields, then it must have deleted the old tuple and
hence the new tuple is unique in terms of extended key (merged primary
and secondary key parts, i.e. cmp_def).

The problem is that a bit may be set in a column mask even if the
corresponding field does not actually get updated. For example, consider
the following example.

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 10}
  box.snapshot()
  s:update(1, {{'=', 2, 10}})

The update() doesn't modify the secondary key field so it only writes
REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10}
too, but it gets overwritten by the REPLACE). However, the REPLACE has
column mask that says that update() does modify the key field, because a
column mask is generated solely from update operations, before applying
them. As a result, the write iterator will not skip this REPLACE on
dump. This won't have any serious consequences, because this is a mere
optimization. What is worse, the write iterator will also turn the
REPLACE into an INSERT, which is absolutely wrong as the REPLACE is
preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE
statement and the INSERT created by the write iterator from the REPLACE
will get annihilated, leaving the old INSERT{1, 10} visible.

The issue may result in invalid select() output as demonstrated in the
issue description. It may also result in crashes, because the tuple
cache is very sensible to invalid select() output.

To fix this issue let's clear key bits in the column mask if we detect
that an update() doesn't actually update secondary key fields although
the column mask says it does. We do that in vy_tx_set(), if we see that
the statement overwritten by a REPLACE is a DELETE: the point is a
DELETE may be written to a secondary index only if the tuple it is
supposed to delete actually exists (we can't generate a surrogate DELETE
otherwise); so if a REPLACE overwrites a DELETE, there must be the same
REPLACE in the index and hence we can clear key bits in the REPLACE
column mask.

Closes #3607
---
https://github.com/tarantool/tarantool/issues/3607
https://github.com/tarantool/tarantool/commits/dv/gh-3607-vy-fix-phantom-tuple-after-update

The patch is for 1.9

 src/box/vy_tx.c                     | 22 +++++++++++++++++
 test/vinyl/update_optimize.result   | 47 +++++++++++++++++++++++++++++++++++++
 test/vinyl/update_optimize.test.lua | 22 +++++++++++++++++
 3 files changed, 91 insertions(+)

diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index cb9bbf58..558e173b 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -865,6 +865,28 @@ vy_tx_set(struct vy_tx *tx, struct vy_index *index, struct tuple *stmt)
 						vy_stmt_column_mask(old->stmt));
 	}
 
+	if (index->id > 0 && vy_stmt_type(stmt) == IPROTO_REPLACE &&
+	    old != NULL && vy_stmt_type(old->stmt) == IPROTO_DELETE) {
+		/*
+		 * We write a DELETE statement to a secondary index only
+		 * if there is actually a tuple it is supposed to delete,
+		 * because in order to create a surrogate DELETE we have
+		 * to look up the deleted tuple in the primary index.
+		 *
+		 * So if we overwrite a DELETE statement in tx write set
+		 * with a REPLACE, it means the REPLACE doesn't actually
+		 * modify the tuple stored in the secondary key and hence
+		 * we can clear key bits in the column mask.
+		 *
+		 * Actually, we must do it in order to prevent the write
+		 * iterator from turning this REPLACE into an INSERT.
+		 */
+		uint64_t column_mask = vy_stmt_column_mask(stmt);
+		if (column_mask != UINT64_MAX)
+			vy_stmt_set_column_mask(stmt, column_mask &
+						~index->cmp_def->column_mask);
+	}
+
 	v->overwritten = old;
 	write_set_insert(&tx->write_set, v);
 	tx->write_set_version++;
diff --git a/test/vinyl/update_optimize.result b/test/vinyl/update_optimize.result
index fbd42df0..00242f4e 100644
--- a/test/vinyl/update_optimize.result
+++ b/test/vinyl/update_optimize.result
@@ -711,3 +711,50 @@ lookups()
 space:drop()
 ---
 ...
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+---
+...
+s:insert{1, 10}
+---
+- [1, 10]
+...
+box.snapshot()
+---
+- ok
+...
+s:update(1, {{'=', 2, 10}})
+---
+- [1, 10]
+...
+s:delete(1)
+---
+...
+box.snapshot()
+---
+- ok
+...
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+---
+- 2
+...
+s:insert{1, 20}
+---
+- [1, 20]
+...
+s.index.sk:select()
+---
+- - [1, 20]
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/update_optimize.test.lua b/test/vinyl/update_optimize.test.lua
index 32144172..91bc9744 100644
--- a/test/vinyl/update_optimize.test.lua
+++ b/test/vinyl/update_optimize.test.lua
@@ -234,3 +234,25 @@ space:update(1, {{'+', 5, 1}})
 lookups()
 
 space:drop()
+
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+
+s:insert{1, 10}
+box.snapshot()
+
+s:update(1, {{'=', 2, 10}})
+s:delete(1)
+box.snapshot()
+
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+
+s:insert{1, 20}
+s.index.sk:select()
+
+s:drop()
-- 
2.11.0

             reply	other threads:[~2018-08-10 17:26 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-10 17:26 Vladimir Davydov [this message]
2018-08-10 18:43 ` [PATCH v2] " Vladimir Davydov
2018-08-10 18:49   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bec570f3b7e3b781430dd688c033d00eb6874ff.1533921796.git.vdavydov.dev@gmail.com \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox