From: Vladimir Davydov <vdavydov.dev@gmail.com> To: kostja@tarantool.org Cc: tarantool-patches@freelists.org Subject: [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update Date: Fri, 10 Aug 2018 20:26:05 +0300 [thread overview] Message-ID: <9bec570f3b7e3b781430dd688c033d00eb6874ff.1533921796.git.vdavydov.dev@gmail.com> (raw) index.update() looks up the old tuple in the primary index, applies update operations to it, then writes a DELETE statement to secondary indexes to delete the old tuple and a REPLACE statement to all indexes to insert the new tuple. It also sets a column mask for both DELETE and REPLACE statements. The column mask is a bit mask which has a bit set if the corresponding field is updated by update operations. It is used by the write iterator for two purposes. First, the write iterator skips REPLACE statements that don't update key fields. Second, the write iterator turns a REPLACE that has a column mask that intersects with key fields into an INSERT (so that it can get annihilated with a DELETE when the time comes). The latter is correct, because if an update() does update secondary key fields, then it must have deleted the old tuple and hence the new tuple is unique in terms of extended key (merged primary and secondary key parts, i.e. cmp_def). The problem is that a bit may be set in a column mask even if the corresponding field does not actually get updated. For example, consider the following example. s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 10} box.snapshot() s:update(1, {{'=', 2, 10}}) The update() doesn't modify the secondary key field so it only writes REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10} too, but it gets overwritten by the REPLACE). However, the REPLACE has column mask that says that update() does modify the key field, because a column mask is generated solely from update operations, before applying them. As a result, the write iterator will not skip this REPLACE on dump. This won't have any serious consequences, because this is a mere optimization. What is worse, the write iterator will also turn the REPLACE into an INSERT, which is absolutely wrong as the REPLACE is preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE statement and the INSERT created by the write iterator from the REPLACE will get annihilated, leaving the old INSERT{1, 10} visible. The issue may result in invalid select() output as demonstrated in the issue description. It may also result in crashes, because the tuple cache is very sensible to invalid select() output. To fix this issue let's clear key bits in the column mask if we detect that an update() doesn't actually update secondary key fields although the column mask says it does. We do that in vy_tx_set(), if we see that the statement overwritten by a REPLACE is a DELETE: the point is a DELETE may be written to a secondary index only if the tuple it is supposed to delete actually exists (we can't generate a surrogate DELETE otherwise); so if a REPLACE overwrites a DELETE, there must be the same REPLACE in the index and hence we can clear key bits in the REPLACE column mask. Closes #3607 --- https://github.com/tarantool/tarantool/issues/3607 https://github.com/tarantool/tarantool/commits/dv/gh-3607-vy-fix-phantom-tuple-after-update The patch is for 1.9 src/box/vy_tx.c | 22 +++++++++++++++++ test/vinyl/update_optimize.result | 47 +++++++++++++++++++++++++++++++++++++ test/vinyl/update_optimize.test.lua | 22 +++++++++++++++++ 3 files changed, 91 insertions(+) diff --git a/src/box/vy_tx.c b/src/box/vy_tx.c index cb9bbf58..558e173b 100644 --- a/src/box/vy_tx.c +++ b/src/box/vy_tx.c @@ -865,6 +865,28 @@ vy_tx_set(struct vy_tx *tx, struct vy_index *index, struct tuple *stmt) vy_stmt_column_mask(old->stmt)); } + if (index->id > 0 && vy_stmt_type(stmt) == IPROTO_REPLACE && + old != NULL && vy_stmt_type(old->stmt) == IPROTO_DELETE) { + /* + * We write a DELETE statement to a secondary index only + * if there is actually a tuple it is supposed to delete, + * because in order to create a surrogate DELETE we have + * to look up the deleted tuple in the primary index. + * + * So if we overwrite a DELETE statement in tx write set + * with a REPLACE, it means the REPLACE doesn't actually + * modify the tuple stored in the secondary key and hence + * we can clear key bits in the column mask. + * + * Actually, we must do it in order to prevent the write + * iterator from turning this REPLACE into an INSERT. + */ + uint64_t column_mask = vy_stmt_column_mask(stmt); + if (column_mask != UINT64_MAX) + vy_stmt_set_column_mask(stmt, column_mask & + ~index->cmp_def->column_mask); + } + v->overwritten = old; write_set_insert(&tx->write_set, v); tx->write_set_version++; diff --git a/test/vinyl/update_optimize.result b/test/vinyl/update_optimize.result index fbd42df0..00242f4e 100644 --- a/test/vinyl/update_optimize.result +++ b/test/vinyl/update_optimize.result @@ -711,3 +711,50 @@ lookups() space:drop() --- ... +-- +-- gh-3607: phantom tuples in secondary index if UPDATE does not +-- change key fields. +-- +s = box.schema.space.create('test', {engine = 'vinyl'}) +--- +... +_ = s:create_index('pk') +--- +... +_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10}) +--- +... +s:insert{1, 10} +--- +- [1, 10] +... +box.snapshot() +--- +- ok +... +s:update(1, {{'=', 2, 10}}) +--- +- [1, 10] +... +s:delete(1) +--- +... +box.snapshot() +--- +- ok +... +s.index.sk:info().rows -- INSERT in the first run + DELETE the second run +--- +- 2 +... +s:insert{1, 20} +--- +- [1, 20] +... +s.index.sk:select() +--- +- - [1, 20] +... +s:drop() +--- +... diff --git a/test/vinyl/update_optimize.test.lua b/test/vinyl/update_optimize.test.lua index 32144172..91bc9744 100644 --- a/test/vinyl/update_optimize.test.lua +++ b/test/vinyl/update_optimize.test.lua @@ -234,3 +234,25 @@ space:update(1, {{'+', 5, 1}}) lookups() space:drop() + +-- +-- gh-3607: phantom tuples in secondary index if UPDATE does not +-- change key fields. +-- +s = box.schema.space.create('test', {engine = 'vinyl'}) +_ = s:create_index('pk') +_ = s:create_index('sk', {parts = {2, 'unsigned'}, run_count_per_level = 10}) + +s:insert{1, 10} +box.snapshot() + +s:update(1, {{'=', 2, 10}}) +s:delete(1) +box.snapshot() + +s.index.sk:info().rows -- INSERT in the first run + DELETE the second run + +s:insert{1, 20} +s.index.sk:select() + +s:drop() -- 2.11.0
next reply other threads:[~2018-08-10 17:26 UTC|newest] Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-10 17:26 Vladimir Davydov [this message] 2018-08-10 18:43 ` [PATCH v2] " Vladimir Davydov 2018-08-10 18:49 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=9bec570f3b7e3b781430dd688c033d00eb6874ff.1533921796.git.vdavydov.dev@gmail.com \ --to=vdavydov.dev@gmail.com \ --cc=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH] vinyl: fix appearance of phantom tuple in secondary index after update' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox