[tarantool-patches] Re: [PATCH 1/3] vinyl: fix secondary index divergence on update

Konstantin Osipov kostja at tarantool.org
Sat May 25 09:11:57 MSK 2019


* Vladimir Davydov <vdavydov.dev at gmail.com> [19/05/25 06:41]:

Vladimir, 

could you clarify your comments a bit?

> If an UPDATE request doesn't touch key parts of a secondary index, we
> don't need to write it to the index memory level or dump it to disk, as

We don't have a separate memtable for secondary keys. Better say
"we don't need to re-index it in the in-memory secondary index".

> this would only increase IO load. Historically, we use column mask set
> by the UPDATE operation to skip secondary indexes that are not affected
> by the operation on commit. However, there's a problem here: the column
> mask isn't precise - it may have a bit set even if the corresponding
> column doesn't really get updated, e.g. consider {'+', 2, 0}.

The column does get updated, but the update doesn't change its
value. Now I am making the ends of it.


> Not taking
> this into account may result in appearance of phantom tuples on disk as
> the write iterator assumes that statements that have no effect aren't
> written to secondary indexes (this is needed to apply INSERT+DELETE
> "annihilation" optimization). We fixed that by clearing column mask bits
> in vy_tx_set in case we detect that the key isn't changed, for more
> details see #3607 and commit e72867cb9169 ("vinyl: fix appearance of
> phantom tuple in secondary index after update"). It was rather an ugly
> hack, but it worked.
> 
> However, it turned out that apart from looking hackish this code has
> a nasty bug that may lead to tuples missing from secondary indexes.
> Consider the following example:
> 
>   s = box.schema.space.create('test', {engine = 'vinyl'})
>   s:create_index('pk')
>   s:create_index('sk', {parts = {2, 'unsigned'}})
>   s:insert{1, 1, 1}
> 
>   box.begin()
>   s:update(1, {{'=', 2, 2}})
>   s:update(1, {{'=', 3, 2}})
>   box.commit()
> 
> The first update operation writes DELETE{1,1} and REPLACE{2,1} to the
> secondary index write set. The second update replaces REPLACE{2,1} with
> DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with
> REPLACE{2,1} in the write set, we assume that the update doesn't modify
> secondary index key parts and clear the column mask so as not to commit
> a pointless request, see vy_tx_set. As a result, we skip the first
> update too and get key {2,1} missing from the secondary index.
> 
> Actually, it was a dumb idea to use column mask to skip statements in
> the first place, as there's a much easier way to filter out statements
> that have no effect for secondary indexes. The thing is every DELETE
> statement inserted into a secondary index write set acts as a "single
> DELETE", i.e. there's exactly one older statement it is supposed to
> purge. This is, because in contrast to the primary index we don't write
> DELETE statements blindly - we always look up the tuple overwritten in
> the primary index first. This means that REPLACE+DELETE for the same key
> is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE
> can be treated as no-op, too, because secondary indexes don't store full
> tuples hence all REPLACE statements for the same key are equivalent.
> By marking such pair of statements as no-op in vy_tx_set, we guarantee
> that no-op statements don't make it to secondary index memory or disk
> levels.

Better say "mark both statements", not a pair, since they are not
present in the tx write list as a pair.

Could you also please explain why you decided to introduce a new
flag, and not use is_overwritten?


-- 
Konstantin Osipov, Moscow, Russia




More information about the Tarantool-patches mailing list