Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Konstantin Osipov <kostja@tarantool.org>
Cc: tarantool-patches@freelists.org
Subject: Re: [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version
Date: Tue, 10 Jul 2018 19:43:43 +0300	[thread overview]
Message-ID: <20180710164343.xgxdwk4ovbhdmmbo@esperanza> (raw)
In-Reply-To: <20180710161926.GC22105@chai>

On Tue, Jul 10, 2018 at 07:19:26PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/07/08 22:52]:
> > Currently, vy_point_lookup(), in contrast to vy_read_iterator, doesn't
> > rescan the memory level after reading disk, so if the caller doesn't
> > track the key before calling this function, the caller won't be sent to
> > a read view in case the key gets updated during yield and hence will
> > be returned a stale tuple. This is OK now, because we always track the
> > key before calling vy_point_lookup(), either in the primary or in a
> > secondary index. However, for #2129 we need it to always return the
> > latest tuple version, no matter if the key is tracked or not.
> > 
> > The point is in the scope of #2129 we won't write DELETE statements to
> > secondary indexes corresponding to a tuple replaced in the primary
> > index. Instead after reading a tuple from a secondary index we will
> > check whether it matches the tuple corresponding to it in the primary
> > index: if it is not, it means that the tuple read from the secondary
> > index was overwritten and should be skipped. E.g. suppose we have the
> > primary index over the first field and a secondary index over the second
> > field and the following statements in the space:
> > 
> >   REPLACE{1, 10}
> >   REPLACE{1, 20}
> > 
> > Then reading {10} from the secondary index will return REPLACE{1, 10}, but
> > lookup of {1} in the primary index will return REPLACE{1, 20} which
> > doesn't match REPLACE{1, 10} read from the secondary index hence the
> > latter was overwritten and should be skipped.
> > 
> > The problem is in the example above we don't want to track key {1} in
> > the primary index before lookup, because we don't actually read its
> > value. So for the check to work correctly, we need the point lookup to
> > guarantee that the returned tuple is always the newest one. It's fairly
> > easy to do - we just need to rescan the memory level after yielding on
> > disk if its version changed.
> 
> Thank you for the explanation. I haven't read the patch itself
> yet. But aren't you complicating things more than necessary? All
> we need to do when looking up a match in the primary index is to
> compare the match LSN and the secondary index tuple LSN. If there
> is a mismatch, then we need to skip the secondary key tuple: it's
> garbage. The mismatch does not need to take into account new
> tuples which appeared during yield, since a mismatch can not
> appear during yield.

Using LSNs solely for detecting mismatch is complicated, because of
prepared and txn statements, but even if we put those aside, there's
an optimization in write iterator, which excludes a statement from
the output in case it doesn't modify key parts - see

  https://github.com/tarantool/tarantool/blob/f64f46199e19542fa60eede939d62cd115abb83a/src/box/vy_write_iterator.c#L674

This optimization makes detection by LSN impossible.

Anyway, this particular patch is needed no matter if we detect mismatch
by LSN or by value. Example:

  Let primary index be over part 1 and secondary index be over part 2.
  Let the following statement be committed to both indexes and written
  to disk:

  REPLACE{1, 10, lsn = 123}

  Now let us consider the following race condition:

  Fiber 1                               Fiber 2
  -------                               -------
  look up {10} in the secondary index
  get REPLACE{1, 10, lsn = 123}
  look up {1} in the primary index to check for mismatch
  yields on disk read

                                        commits REPLACE{1, 20, lsn = 456}

  ( skips the new statement, because point
    lookup doesn't rescan the memory level )
  gets REPLACE{1, 10, lsn = 123}

  LSNs are equal, values are equal too,
  hence no mismatch, return to the user

This behavior would be incorrect, because the transaction wouldn't
be sent to read view in this case since secondary key {10} is not
modified.

We could track primary key {1} before the lookup to make sure the
transaction is sent to read view in such a case, but that wouldn't be
quire right: if there was no {1} in the primary index, we would track
a value we didn't actually read.

Hope this explains the problem I'm coping with here.

  reply	other threads:[~2018-07-10 16:43 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-08 16:48 [RFC PATCH 02/23] vinyl: always get full tuple from pk after reading from secondary index Vladimir Davydov
2018-07-08 16:48 ` [RFC PATCH 00/23] vinyl: eliminate read on REPLACE/DELETE Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 01/23] vinyl: do not turn REPLACE into INSERT when processing DML request Vladimir Davydov
2018-07-10 12:15     ` Konstantin Osipov
2018-07-10 12:19       ` Vladimir Davydov
2018-07-10 18:39         ` Konstantin Osipov
2018-07-11  7:57           ` Vladimir Davydov
2018-07-11 10:25             ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 03/23] vinyl: use vy_mem_iterator for point lookup Vladimir Davydov
2018-07-17 10:14     ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version Vladimir Davydov
2018-07-10 16:19     ` Konstantin Osipov
2018-07-10 16:43       ` Vladimir Davydov [this message]
2018-07-11 16:33         ` Vladimir Davydov
2018-07-31 19:17           ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 05/23] vinyl: fold vy_replace_one and vy_replace_impl Vladimir Davydov
2018-07-31 20:28     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 06/23] vinyl: fold vy_delete_impl Vladimir Davydov
2018-07-31 20:28     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 07/23] vinyl: refactor unique check Vladimir Davydov
2018-07-31 20:28     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 08/23] vinyl: check key uniqueness before modifying tx write set Vladimir Davydov
2018-07-31 20:34     ` Konstantin Osipov
2018-08-01 10:42       ` Vladimir Davydov
2018-08-09 20:26     ` Konstantin Osipov
2018-08-10  8:26       ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 09/23] vinyl: remove env argument of vy_check_is_unique_{primary,secondary} Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 10/23] vinyl: store full tuples in secondary index cache Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 11/23] xrow: allow to store flags in DML requests Vladimir Davydov
2018-07-31 20:36     ` Konstantin Osipov
2018-08-01 14:10       ` Vladimir Davydov
2018-08-17 13:34         ` Vladimir Davydov
2018-08-17 13:34           ` [PATCH 1/2] xrow: allow to store tuple metadata in request Vladimir Davydov
2018-08-17 13:34           ` [PATCH 2/2] vinyl: introduce statement flags Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 12/23] vinyl: do not pass region explicitly to write iterator functions Vladimir Davydov
2018-07-17 10:16     ` Vladimir Davydov
2018-07-31 20:38     ` Konstantin Osipov
2018-08-01 14:14       ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 13/23] vinyl: fix potential use-after-free in vy_read_view_merge Vladimir Davydov
2018-07-17 10:16     ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 14/23] test: unit/vy_write_iterator: minor refactoring Vladimir Davydov
2018-07-17 10:17     ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 15/23] vinyl: teach write iterator to return overwritten tuples Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 16/23] vinyl: allow to skip certain statements on read Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 17/23] vinyl: do not free pending tasks on shutdown Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 18/23] vinyl: store pointer to scheduler in struct vy_task Vladimir Davydov
2018-07-31 20:39     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 19/23] vinyl: rename some members of vy_scheduler and vy_task struct Vladimir Davydov
2018-07-31 20:40     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 20/23] vinyl: use cbus for communication between scheduler and worker threads Vladimir Davydov
2018-07-31 20:43     ` Konstantin Osipov
2018-08-01 14:26       ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 21/23] vinyl: zap vy_scheduler::is_worker_pool_running Vladimir Davydov
2018-07-31 20:43     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 22/23] vinyl: rename vy_task::status to is_failed Vladimir Davydov
2018-07-31 20:44     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 23/23] vinyl: eliminate read on REPLACE/DELETE Vladimir Davydov
2018-07-13 10:53     ` Vladimir Davydov
2018-07-13 10:53       ` [PATCH 1/3] stailq: add stailq_insert function Vladimir Davydov
2018-07-15  7:02         ` Konstantin Osipov
2018-07-15 13:17           ` Vladimir Davydov
2018-07-15 18:40             ` Konstantin Osipov
2018-07-17 10:18         ` Vladimir Davydov
2018-07-13 10:53       ` [PATCH 2/3] vinyl: link all indexes of the same space Vladimir Davydov
2018-07-13 10:53       ` [PATCH 3/3] vinyl: generate deferred DELETEs on tx commit Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180710164343.xgxdwk4ovbhdmmbo@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox