From: Konstantin Osipov <kostja@tarantool.org> To: tarantool-patches@freelists.org Subject: [tarantool-patches] Re: [PATCH 1/3] vinyl: fix secondary index divergence on update Date: Sat, 25 May 2019 09:11:57 +0300 [thread overview] Message-ID: <20190525061157.GB14501@atlas> (raw) In-Reply-To: <8e4175c3f3b857097ccfd264608b046b71635e91.1558733443.git.vdavydov.dev@gmail.com> * Vladimir Davydov <vdavydov.dev@gmail.com> [19/05/25 06:41]: Vladimir, could you clarify your comments a bit? > If an UPDATE request doesn't touch key parts of a secondary index, we > don't need to write it to the index memory level or dump it to disk, as We don't have a separate memtable for secondary keys. Better say "we don't need to re-index it in the in-memory secondary index". > this would only increase IO load. Historically, we use column mask set > by the UPDATE operation to skip secondary indexes that are not affected > by the operation on commit. However, there's a problem here: the column > mask isn't precise - it may have a bit set even if the corresponding > column doesn't really get updated, e.g. consider {'+', 2, 0}. The column does get updated, but the update doesn't change its value. Now I am making the ends of it. > Not taking > this into account may result in appearance of phantom tuples on disk as > the write iterator assumes that statements that have no effect aren't > written to secondary indexes (this is needed to apply INSERT+DELETE > "annihilation" optimization). We fixed that by clearing column mask bits > in vy_tx_set in case we detect that the key isn't changed, for more > details see #3607 and commit e72867cb9169 ("vinyl: fix appearance of > phantom tuple in secondary index after update"). It was rather an ugly > hack, but it worked. > > However, it turned out that apart from looking hackish this code has > a nasty bug that may lead to tuples missing from secondary indexes. > Consider the following example: > > s = box.schema.space.create('test', {engine = 'vinyl'}) > s:create_index('pk') > s:create_index('sk', {parts = {2, 'unsigned'}}) > s:insert{1, 1, 1} > > box.begin() > s:update(1, {{'=', 2, 2}}) > s:update(1, {{'=', 3, 2}}) > box.commit() > > The first update operation writes DELETE{1,1} and REPLACE{2,1} to the > secondary index write set. The second update replaces REPLACE{2,1} with > DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with > REPLACE{2,1} in the write set, we assume that the update doesn't modify > secondary index key parts and clear the column mask so as not to commit > a pointless request, see vy_tx_set. As a result, we skip the first > update too and get key {2,1} missing from the secondary index. > > Actually, it was a dumb idea to use column mask to skip statements in > the first place, as there's a much easier way to filter out statements > that have no effect for secondary indexes. The thing is every DELETE > statement inserted into a secondary index write set acts as a "single > DELETE", i.e. there's exactly one older statement it is supposed to > purge. This is, because in contrast to the primary index we don't write > DELETE statements blindly - we always look up the tuple overwritten in > the primary index first. This means that REPLACE+DELETE for the same key > is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE > can be treated as no-op, too, because secondary indexes don't store full > tuples hence all REPLACE statements for the same key are equivalent. > By marking such pair of statements as no-op in vy_tx_set, we guarantee > that no-op statements don't make it to secondary index memory or disk > levels. Better say "mark both statements", not a pair, since they are not present in the tx write list as a pair. Could you also please explain why you decided to introduce a new flag, and not use is_overwritten? -- Konstantin Osipov, Moscow, Russia
next prev parent reply other threads:[~2019-05-25 6:12 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-24 21:53 [PATCH 0/3] Fixes of a few Vinyl transaction manager issues Vladimir Davydov 2019-05-24 21:53 ` [PATCH 1/3] vinyl: fix secondary index divergence on update Vladimir Davydov 2019-05-25 6:11 ` Konstantin Osipov [this message] 2019-05-25 19:51 ` [tarantool-patches] " Vladimir Davydov 2019-05-25 20:28 ` Konstantin Osipov 2019-05-26 14:36 ` Vladimir Davydov [not found] ` <CAPZPwLoP+SEO2WbTavgtR3feWN4tX81GAYw5ZYp4_pC5JkyS_A@mail.gmail.com> 2019-05-27 8:28 ` Vladimir Davydov 2019-05-24 21:53 ` [PATCH 2/3] vinyl: don't produce deferred DELETE on commit if key isn't updated Vladimir Davydov 2019-05-25 6:13 ` [tarantool-patches] " Konstantin Osipov 2019-05-24 21:53 ` [PATCH 3/3] vinyl: fix deferred DELETE statement lost on commit Vladimir Davydov 2019-05-25 6:15 ` [tarantool-patches] " Konstantin Osipov 2019-05-27 8:29 ` Vladimir Davydov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190525061157.GB14501@atlas \ --to=kostja@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='[tarantool-patches] Re: [PATCH 1/3] vinyl: fix secondary index divergence on update' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox