[PATCH v2] vinyl: fix appearance of phantom tuple in secondary index after update

Vladimir Davydov vdavydov.dev at gmail.com
Fri Aug 10 21:43:11 MSK 2018


index.update() looks up the old tuple in the primary index, applies
update operations to it, then writes a DELETE statement to secondary
indexes to delete the old tuple and a REPLACE statement to all indexes
to insert the new tuple. It also sets a column mask for both DELETE and
REPLACE statements. The column mask is a bit mask which has a bit set if
the corresponding field is updated by update operations. It is used by
the write iterator for two purposes. First, the write iterator skips
REPLACE statements that don't update key fields. Second, the write
iterator turns a REPLACE that has a column mask that intersects with key
fields into an INSERT (so that it can get annihilated with a DELETE when
the time comes). The latter is correct, because if an update() does
update secondary key fields, then it must have deleted the old tuple and
hence the new tuple is unique in terms of extended key (merged primary
and secondary key parts, i.e. cmp_def).

The problem is that a bit may be set in a column mask even if the
corresponding field does not actually get updated. For example, consider
the following example.

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 10}
  box.snapshot()
  s:update(1, {{'=', 2, 10}})

The update() doesn't modify the secondary key field so it only writes
REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10}
too, but it gets overwritten by the REPLACE). However, the REPLACE has
column mask that says that update() does modify the key field, because a
column mask is generated solely from update operations, before applying
them. As a result, the write iterator will not skip this REPLACE on
dump. This won't have any serious consequences, because this is a mere
optimization. What is worse, the write iterator will also turn the
REPLACE into an INSERT, which is absolutely wrong as the REPLACE is
preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE
statement and the INSERT created by the write iterator from the REPLACE
will get annihilated, leaving the old INSERT{1, 10} visible.

The issue may result in invalid select() output as demonstrated in the
issue description. It may also result in crashes, because the tuple
cache is very sensible to invalid select() output.

To fix this issue let's clear key bits in the column mask if we detect
that an update() doesn't actually update secondary key fields although
the column mask says it does.

Closes #3607
---
https://github.com/tarantool/tarantool/issues/3607
https://github.com/tarantool/tarantool/commits/dv/gh-3607-vy-fix-phantom-tuple-after-update

The patch is for 1.9

Changes in v2:
 - Rewrite the comment as suggested by Kostja.

 src/box/vy_tx.c                     | 28 ++++++++++++++++++++++
 test/vinyl/update_optimize.result   | 47 +++++++++++++++++++++++++++++++++++++
 test/vinyl/update_optimize.test.lua | 22 +++++++++++++++++
 3 files changed, 97 insertions(+)

diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index cb9bbf58..b1c39ff9 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -865,6 +865,34 @@ vy_tx_set(struct vy_tx *tx, struct vy_index *index, struct tuple *stmt)
 						vy_stmt_column_mask(old->stmt));
 	}
 
+	if (index->id > 0 && vy_stmt_type(stmt) == IPROTO_REPLACE &&
+	    old != NULL && vy_stmt_type(old->stmt) == IPROTO_DELETE) {
+		/*
+		 * The column mask of an update operation may have a bit
+		 * set even if the corresponding field doesn't actually
+		 * get updated, because a column mask is generated
+		 * only from update operations, before applying them.
+		 * E.g. update(1, {{'+', 2, 0}}) doesn't modify the
+		 * second field, but the column mask will say it does.
+		 *
+		 * To discard DELETE statements in the write iterator
+		 * (see optimization #6), we turn a REPLACE into an
+		 * INSERT in case the REPLACE was generated by an
+		 * update that changed secondary key fields. So we
+		 * can't tolerate inaccuracy in a column mask.
+		 *
+		 * So if the update didn't actually modify secondary
+		 * key fields, i.e. DELETE and REPLACE generated by the
+		 * update have the same key fields, we forcefully clear
+		 * key bits in the column mask to ensure that no REPLACE
+		 * statement will be written for this secondary key.
+		 */
+		uint64_t column_mask = vy_stmt_column_mask(stmt);
+		if (column_mask != UINT64_MAX)
+			vy_stmt_set_column_mask(stmt, column_mask &
+						~index->cmp_def->column_mask);
+	}
+
 	v->overwritten = old;
 	write_set_insert(&tx->write_set, v);
 	tx->write_set_version++;
diff --git a/test/vinyl/update_optimize.result b/test/vinyl/update_optimize.result
index fbd42df0..00242f4e 100644
--- a/test/vinyl/update_optimize.result
+++ b/test/vinyl/update_optimize.result
@@ -711,3 +711,50 @@ lookups()
 space:drop()
 ---
 ...
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+---
+...
+s:insert{1, 10}
+---
+- [1, 10]
+...
+box.snapshot()
+---
+- ok
+...
+s:update(1, {{'=', 2, 10}})
+---
+- [1, 10]
+...
+s:delete(1)
+---
+...
+box.snapshot()
+---
+- ok
+...
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+---
+- 2
+...
+s:insert{1, 20}
+---
+- [1, 20]
+...
+s.index.sk:select()
+---
+- - [1, 20]
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/update_optimize.test.lua b/test/vinyl/update_optimize.test.lua
index 32144172..91bc9744 100644
--- a/test/vinyl/update_optimize.test.lua
+++ b/test/vinyl/update_optimize.test.lua
@@ -234,3 +234,25 @@ space:update(1, {{'+', 5, 1}})
 lookups()
 
 space:drop()
+
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+
+s:insert{1, 10}
+box.snapshot()
+
+s:update(1, {{'=', 2, 10}})
+s:delete(1)
+box.snapshot()
+
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+
+s:insert{1, 20}
+s.index.sk:select()
+
+s:drop()
-- 
2.11.0




More information about the Tarantool-patches mailing list