From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Vladimir Davydov Subject: [PATCH v2] vinyl: fix appearance of phantom tuple in secondary index after update Date: Fri, 10 Aug 2018 21:43:11 +0300 Message-Id: In-Reply-To: <9bec570f3b7e3b781430dd688c033d00eb6874ff.1533921796.git.vdavydov.dev@gmail.com> References: <9bec570f3b7e3b781430dd688c033d00eb6874ff.1533921796.git.vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org List-ID: index.update() looks up the old tuple in the primary index, applies update operations to it, then writes a DELETE statement to secondary indexes to delete the old tuple and a REPLACE statement to all indexes to insert the new tuple. It also sets a column mask for both DELETE and REPLACE statements. The column mask is a bit mask which has a bit set if the corresponding field is updated by update operations. It is used by the write iterator for two purposes. First, the write iterator skips REPLACE statements that don't update key fields. Second, the write iterator turns a REPLACE that has a column mask that intersects with key fields into an INSERT (so that it can get annihilated with a DELETE when the time comes). The latter is correct, because if an update() does update secondary key fields, then it must have deleted the old tuple and hence the new tuple is unique in terms of extended key (merged primary and secondary key parts, i.e. cmp_def). The problem is that a bit may be set in a column mask even if the corresponding field does not actually get updated. For example, consider the following example. s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 10} box.snapshot() s:update(1, {{'=', 2, 10}}) The update() doesn't modify the secondary key field so it only writes REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10} too, but it gets overwritten by the REPLACE). However, the REPLACE has column mask that says that update() does modify the key field, because a column mask is generated solely from update operations, before applying them. As a result, the write iterator will not skip this REPLACE on dump. This won't have any serious consequences, because this is a mere optimization. What is worse, the write iterator will also turn the REPLACE into an INSERT, which is absolutely wrong as the REPLACE is preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE statement and the INSERT created by the write iterator from the REPLACE will get annihilated, leaving the old INSERT{1, 10} visible. The issue may result in invalid select() output as demonstrated in the issue description. It may also result in crashes, because the tuple cache is very sensible to invalid select() output. To fix this issue let's clear key bits in the column mask if we detect that an update() doesn't actually update secondary key fields although the column mask says it does. Closes #3607 --- https://github.com/tarantool/tarantool/issues/3607 https://github.com/tarantool/tarantool/commits/dv/gh-3607-vy-fix-phantom-tuple-after-update The patch is for 1.9 Changes in v2: - Rewrite the comment as suggested by Kostja. src/box/vy_tx.c | 28 ++++++++++++++++++++++ test/vinyl/update_optimize.result | 47 +++++++++++++++++++++++++++++++++++++ test/vinyl/update_optimize.test.lua | 22 +++++++++++++++++ 3 files changed, 97 insertions(+) diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c index cb9bbf58..b1c39ff9 100644 --- a/src/box/vy_tx.c +++ b/src/box/vy_tx.c @@ -865,6 +865,34 @@ vy_tx_set(struct vy_tx *tx, struct vy_index *index, struct tuple *stmt) vy_stmt_column_mask(old->stmt)); } + if (index->id > 0 && vy_stmt_type(stmt) == IPROTO_REPLACE && + old != NULL && vy_stmt_type(old->stmt) == IPROTO_DELETE) { + /* + * The column mask of an update operation may have a bit + * set even if the corresponding field doesn't actually + * get updated, because a column mask is generated + * only from update operations, before applying them. + * E.g. update(1, {{'+', 2, 0}}) doesn't modify the + * second field, but the column mask will say it does. + * + * To discard DELETE statements in the write iterator + * (see optimization #6), we turn a REPLACE into an + * INSERT in case the REPLACE was generated by an + * update that changed secondary key fields. So we + * can't tolerate inaccuracy in a column mask. + * + * So if the update didn't actually modify secondary + * key fields, i.e. DELETE and REPLACE generated by the + * update have the same key fields, we forcefully clear + * key bits in the column mask to ensure that no REPLACE + * statement will be written for this secondary key. + */ + uint64_t column_mask = vy_stmt_column_mask(stmt); + if (column_mask != UINT64_MAX) + vy_stmt_set_column_mask(stmt, column_mask & + ~index->cmp_def->column_mask); + } + v->overwritten = old; write_set_insert(&tx->write_set, v); tx->write_set_version++; diff --git a/test/vinyl/update_optimize.result b/test/vinyl/update_optimize.result index fbd42df0..00242f4e 100644 --- a/test/vinyl/update_optimize.result +++ b/test/vinyl/update_optimize.result @@ -711,3 +711,50 @@ lookups() space:drop() --- ... +-- +-- gh-3607: phantom tuples in secondary index if UPDATE does not +-- change key fields. +-- +s = box.schema.space.create('test', {engine = 'vinyl'}) +--- +... +_ = s:create_index('pk') +--- +... +_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10}) +--- +... +s:insert{1, 10} +--- +- [1, 10] +... +box.snapshot() +--- +- ok +... +s:update(1, {{'=', 2, 10}}) +--- +- [1, 10] +... +s:delete(1) +--- +... +box.snapshot() +--- +- ok +... +s.index.sk:info().rows -- INSERT in the first run + DELETE the second run +--- +- 2 +... +s:insert{1, 20} +--- +- [1, 20] +... +s.index.sk:select() +--- +- - [1, 20] +... +s:drop() +--- +... diff --git a/test/vinyl/update_optimize.test.lua b/test/vinyl/update_optimize.test.lua index 32144172..91bc9744 100644 --- a/test/vinyl/update_optimize.test.lua +++ b/test/vinyl/update_optimize.test.lua @@ -234,3 +234,25 @@ space:update(1, {{'+', 5, 1}}) lookups() space:drop() + +-- +-- gh-3607: phantom tuples in secondary index if UPDATE does not +-- change key fields. +-- +s = box.schema.space.create('test', {engine = 'vinyl'}) +_ = s:create_index('pk') +_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10}) + +s:insert{1, 10} +box.snapshot() + +s:update(1, {{'=', 2, 10}}) +s:delete(1) +box.snapshot() + +s.index.sk:info().rows -- INSERT in the first run + DELETE the second run + +s:insert{1, 20} +s.index.sk:select() + +s:drop() -- 2.11.0