Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update
@ 2018-08-10 17:26 Vladimir Davydov
  2018-08-10 18:43 ` [PATCH v2] " Vladimir Davydov
  0 siblings, 1 reply; 3+ messages in thread
From: Vladimir Davydov @ 2018-08-10 17:26 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

index.update() looks up the old tuple in the primary index, applies
update operations to it, then writes a DELETE statement to secondary
indexes to delete the old tuple and a REPLACE statement to all indexes
to insert the new tuple. It also sets a column mask for both DELETE and
REPLACE statements. The column mask is a bit mask which has a bit set if
the corresponding field is updated by update operations. It is used by
the write iterator for two purposes. First, the write iterator skips
REPLACE statements that don't update key fields. Second, the write
iterator turns a REPLACE that has a column mask that intersects with key
fields into an INSERT (so that it can get annihilated with a DELETE when
the time comes). The latter is correct, because if an update() does
update secondary key fields, then it must have deleted the old tuple and
hence the new tuple is unique in terms of extended key (merged primary
and secondary key parts, i.e. cmp_def).

The problem is that a bit may be set in a column mask even if the
corresponding field does not actually get updated. For example, consider
the following example.

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 10}
  box.snapshot()
  s:update(1, {{'=', 2, 10}})

The update() doesn't modify the secondary key field so it only writes
REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10}
too, but it gets overwritten by the REPLACE). However, the REPLACE has
column mask that says that update() does modify the key field, because a
column mask is generated solely from update operations, before applying
them. As a result, the write iterator will not skip this REPLACE on
dump. This won't have any serious consequences, because this is a mere
optimization. What is worse, the write iterator will also turn the
REPLACE into an INSERT, which is absolutely wrong as the REPLACE is
preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE
statement and the INSERT created by the write iterator from the REPLACE
will get annihilated, leaving the old INSERT{1, 10} visible.

The issue may result in invalid select() output as demonstrated in the
issue description. It may also result in crashes, because the tuple
cache is very sensible to invalid select() output.

To fix this issue let's clear key bits in the column mask if we detect
that an update() doesn't actually update secondary key fields although
the column mask says it does. We do that in vy_tx_set(), if we see that
the statement overwritten by a REPLACE is a DELETE: the point is a
DELETE may be written to a secondary index only if the tuple it is
supposed to delete actually exists (we can't generate a surrogate DELETE
otherwise); so if a REPLACE overwrites a DELETE, there must be the same
REPLACE in the index and hence we can clear key bits in the REPLACE
column mask.

Closes #3607
---
https://github.com/tarantool/tarantool/issues/3607
https://github.com/tarantool/tarantool/commits/dv/gh-3607-vy-fix-phantom-tuple-after-update

The patch is for 1.9

 src/box/vy_tx.c                     | 22 +++++++++++++++++
 test/vinyl/update_optimize.result   | 47 +++++++++++++++++++++++++++++++++++++
 test/vinyl/update_optimize.test.lua | 22 +++++++++++++++++
 3 files changed, 91 insertions(+)

diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index cb9bbf58..558e173b 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -865,6 +865,28 @@ vy_tx_set(struct vy_tx *tx, struct vy_index *index, struct tuple *stmt)
 						vy_stmt_column_mask(old->stmt));
 	}
 
+	if (index->id > 0 && vy_stmt_type(stmt) == IPROTO_REPLACE &&
+	    old != NULL && vy_stmt_type(old->stmt) == IPROTO_DELETE) {
+		/*
+		 * We write a DELETE statement to a secondary index only
+		 * if there is actually a tuple it is supposed to delete,
+		 * because in order to create a surrogate DELETE we have
+		 * to look up the deleted tuple in the primary index.
+		 *
+		 * So if we overwrite a DELETE statement in tx write set
+		 * with a REPLACE, it means the REPLACE doesn't actually
+		 * modify the tuple stored in the secondary key and hence
+		 * we can clear key bits in the column mask.
+		 *
+		 * Actually, we must do it in order to prevent the write
+		 * iterator from turning this REPLACE into an INSERT.
+		 */
+		uint64_t column_mask = vy_stmt_column_mask(stmt);
+		if (column_mask != UINT64_MAX)
+			vy_stmt_set_column_mask(stmt, column_mask &
+						~index->cmp_def->column_mask);
+	}
+
 	v->overwritten = old;
 	write_set_insert(&tx->write_set, v);
 	tx->write_set_version++;
diff --git a/test/vinyl/update_optimize.result b/test/vinyl/update_optimize.result
index fbd42df0..00242f4e 100644
--- a/test/vinyl/update_optimize.result
+++ b/test/vinyl/update_optimize.result
@@ -711,3 +711,50 @@ lookups()
 space:drop()
 ---
 ...
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+---
+...
+s:insert{1, 10}
+---
+- [1, 10]
+...
+box.snapshot()
+---
+- ok
+...
+s:update(1, {{'=', 2, 10}})
+---
+- [1, 10]
+...
+s:delete(1)
+---
+...
+box.snapshot()
+---
+- ok
+...
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+---
+- 2
+...
+s:insert{1, 20}
+---
+- [1, 20]
+...
+s.index.sk:select()
+---
+- - [1, 20]
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/update_optimize.test.lua b/test/vinyl/update_optimize.test.lua
index 32144172..91bc9744 100644
--- a/test/vinyl/update_optimize.test.lua
+++ b/test/vinyl/update_optimize.test.lua
@@ -234,3 +234,25 @@ space:update(1, {{'+', 5, 1}})
 lookups()
 
 space:drop()
+
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+
+s:insert{1, 10}
+box.snapshot()
+
+s:update(1, {{'=', 2, 10}})
+s:delete(1)
+box.snapshot()
+
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+
+s:insert{1, 20}
+s.index.sk:select()
+
+s:drop()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] vinyl: fix appearance of phantom tuple in secondary index after update
  2018-08-10 17:26 [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update Vladimir Davydov
@ 2018-08-10 18:43 ` Vladimir Davydov
  2018-08-10 18:49   ` Vladimir Davydov
  0 siblings, 1 reply; 3+ messages in thread
From: Vladimir Davydov @ 2018-08-10 18:43 UTC (permalink / raw)
  To: kostja; +Cc: tarantool-patches

index.update() looks up the old tuple in the primary index, applies
update operations to it, then writes a DELETE statement to secondary
indexes to delete the old tuple and a REPLACE statement to all indexes
to insert the new tuple. It also sets a column mask for both DELETE and
REPLACE statements. The column mask is a bit mask which has a bit set if
the corresponding field is updated by update operations. It is used by
the write iterator for two purposes. First, the write iterator skips
REPLACE statements that don't update key fields. Second, the write
iterator turns a REPLACE that has a column mask that intersects with key
fields into an INSERT (so that it can get annihilated with a DELETE when
the time comes). The latter is correct, because if an update() does
update secondary key fields, then it must have deleted the old tuple and
hence the new tuple is unique in terms of extended key (merged primary
and secondary key parts, i.e. cmp_def).

The problem is that a bit may be set in a column mask even if the
corresponding field does not actually get updated. For example, consider
the following example.

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 10}
  box.snapshot()
  s:update(1, {{'=', 2, 10}})

The update() doesn't modify the secondary key field so it only writes
REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10}
too, but it gets overwritten by the REPLACE). However, the REPLACE has
column mask that says that update() does modify the key field, because a
column mask is generated solely from update operations, before applying
them. As a result, the write iterator will not skip this REPLACE on
dump. This won't have any serious consequences, because this is a mere
optimization. What is worse, the write iterator will also turn the
REPLACE into an INSERT, which is absolutely wrong as the REPLACE is
preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE
statement and the INSERT created by the write iterator from the REPLACE
will get annihilated, leaving the old INSERT{1, 10} visible.

The issue may result in invalid select() output as demonstrated in the
issue description. It may also result in crashes, because the tuple
cache is very sensible to invalid select() output.

To fix this issue let's clear key bits in the column mask if we detect
that an update() doesn't actually update secondary key fields although
the column mask says it does.

Closes #3607
---
https://github.com/tarantool/tarantool/issues/3607
https://github.com/tarantool/tarantool/commits/dv/gh-3607-vy-fix-phantom-tuple-after-update

The patch is for 1.9

Changes in v2:
 - Rewrite the comment as suggested by Kostja.

 src/box/vy_tx.c                     | 28 ++++++++++++++++++++++
 test/vinyl/update_optimize.result   | 47 +++++++++++++++++++++++++++++++++++++
 test/vinyl/update_optimize.test.lua | 22 +++++++++++++++++
 3 files changed, 97 insertions(+)

diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c
index cb9bbf58..b1c39ff9 100644
--- a/src/box/vy_tx.c
+++ b/src/box/vy_tx.c
@@ -865,6 +865,34 @@ vy_tx_set(struct vy_tx *tx, struct vy_index *index, struct tuple *stmt)
 						vy_stmt_column_mask(old->stmt));
 	}
 
+	if (index->id > 0 && vy_stmt_type(stmt) == IPROTO_REPLACE &&
+	    old != NULL && vy_stmt_type(old->stmt) == IPROTO_DELETE) {
+		/*
+		 * The column mask of an update operation may have a bit
+		 * set even if the corresponding field doesn't actually
+		 * get updated, because a column mask is generated
+		 * only from update operations, before applying them.
+		 * E.g. update(1, {{'+', 2, 0}}) doesn't modify the
+		 * second field, but the column mask will say it does.
+		 *
+		 * To discard DELETE statements in the write iterator
+		 * (see optimization #6), we turn a REPLACE into an
+		 * INSERT in case the REPLACE was generated by an
+		 * update that changed secondary key fields. So we
+		 * can't tolerate inaccuracy in a column mask.
+		 *
+		 * So if the update didn't actually modify secondary
+		 * key fields, i.e. DELETE and REPLACE generated by the
+		 * update have the same key fields, we forcefully clear
+		 * key bits in the column mask to ensure that no REPLACE
+		 * statement will be written for this secondary key.
+		 */
+		uint64_t column_mask = vy_stmt_column_mask(stmt);
+		if (column_mask != UINT64_MAX)
+			vy_stmt_set_column_mask(stmt, column_mask &
+						~index->cmp_def->column_mask);
+	}
+
 	v->overwritten = old;
 	write_set_insert(&tx->write_set, v);
 	tx->write_set_version++;
diff --git a/test/vinyl/update_optimize.result b/test/vinyl/update_optimize.result
index fbd42df0..00242f4e 100644
--- a/test/vinyl/update_optimize.result
+++ b/test/vinyl/update_optimize.result
@@ -711,3 +711,50 @@ lookups()
 space:drop()
 ---
 ...
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+---
+...
+_ = s:create_index('pk')
+---
+...
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+---
+...
+s:insert{1, 10}
+---
+- [1, 10]
+...
+box.snapshot()
+---
+- ok
+...
+s:update(1, {{'=', 2, 10}})
+---
+- [1, 10]
+...
+s:delete(1)
+---
+...
+box.snapshot()
+---
+- ok
+...
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+---
+- 2
+...
+s:insert{1, 20}
+---
+- [1, 20]
+...
+s.index.sk:select()
+---
+- - [1, 20]
+...
+s:drop()
+---
+...
diff --git a/test/vinyl/update_optimize.test.lua b/test/vinyl/update_optimize.test.lua
index 32144172..91bc9744 100644
--- a/test/vinyl/update_optimize.test.lua
+++ b/test/vinyl/update_optimize.test.lua
@@ -234,3 +234,25 @@ space:update(1, {{'+', 5, 1}})
 lookups()
 
 space:drop()
+
+--
+-- gh-3607: phantom tuples in secondary index if UPDATE does not
+-- change key fields.
+--
+s = box.schema.space.create('test', {engine = 'vinyl'})
+_ = s:create_index('pk')
+_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10})
+
+s:insert{1, 10}
+box.snapshot()
+
+s:update(1, {{'=', 2, 10}})
+s:delete(1)
+box.snapshot()
+
+s.index.sk:info().rows -- INSERT in the first run + DELETE the second run
+
+s:insert{1, 20}
+s.index.sk:select()
+
+s:drop()
-- 
2.11.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] vinyl: fix appearance of phantom tuple in secondary index after update
  2018-08-10 18:43 ` [PATCH v2] " Vladimir Davydov
@ 2018-08-10 18:49   ` Vladimir Davydov
  0 siblings, 0 replies; 3+ messages in thread
From: Vladimir Davydov @ 2018-08-10 18:49 UTC (permalink / raw)
  To: tarantool-patches

Pushed to 1.9 (approved by Kostja).

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-10 18:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-10 17:26 [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update Vladimir Davydov
2018-08-10 18:43 ` [PATCH v2] " Vladimir Davydov
2018-08-10 18:49   ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox