Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: kostja@tarantool.org
Cc: tarantool-patches@freelists.org
Subject: [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version
Date: Sun,  8 Jul 2018 19:48:35 +0300	[thread overview]
Message-ID: <a413558e2ac5ef82da3849d1770152896a5c588a.1531065648.git.vdavydov.dev@gmail.com> (raw)
In-Reply-To: <cover.1531065648.git.vdavydov.dev@gmail.com>
In-Reply-To: <cover.1531065648.git.vdavydov.dev@gmail.com>

Currently, vy_point_lookup(), in contrast to vy_read_iterator, doesn't
rescan the memory level after reading disk, so if the caller doesn't
track the key before calling this function, the caller won't be sent to
a read view in case the key gets updated during yield and hence will
be returned a stale tuple. This is OK now, because we always track the
key before calling vy_point_lookup(), either in the primary or in a
secondary index. However, for #2129 we need it to always return the
latest tuple version, no matter if the key is tracked or not.

The point is in the scope of #2129 we won't write DELETE statements to
secondary indexes corresponding to a tuple replaced in the primary
index. Instead after reading a tuple from a secondary index we will
check whether it matches the tuple corresponding to it in the primary
index: if it is not, it means that the tuple read from the secondary
index was overwritten and should be skipped. E.g. suppose we have the
primary index over the first field and a secondary index over the second
field and the following statements in the space:

  REPLACE{1, 10}
  REPLACE{1, 20}

Then reading {10} from the secondary index will return REPLACE{1, 10}, but
lookup of {1} in the primary index will return REPLACE{1, 20} which
doesn't match REPLACE{1, 10} read from the secondary index hence the
latter was overwritten and should be skipped.

The problem is in the example above we don't want to track key {1} in
the primary index before lookup, because we don't actually read its
value. So for the check to work correctly, we need the point lookup to
guarantee that the returned tuple is always the newest one. It's fairly
easy to do - we just need to rescan the memory level after yielding on
disk if its version changed.

Needed for #2129
---
 src/box/vy_point_lookup.c | 35 +++++++++++++++++++++++++++++------
 src/box/vy_point_lookup.h |  9 +++------
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
index 504a8e80..f2261fdf 100644
--- a/src/box/vy_point_lookup.c
+++ b/src/box/vy_point_lookup.c
@@ -203,10 +203,13 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
 	int rc = 0;
 
 	lsm->stat.lookup++;
+
 	/* History list */
-	struct vy_history history;
+	struct vy_history history, mem_history, disk_history;
 	vy_history_create(&history, &lsm->env->history_node_pool);
-restart:
+	vy_history_create(&mem_history, &lsm->env->history_node_pool);
+	vy_history_create(&disk_history, &lsm->env->history_node_pool);
+
 	rc = vy_point_lookup_scan_txw(lsm, tx, key, &history);
 	if (rc != 0 || vy_history_is_terminal(&history))
 		goto done;
@@ -215,14 +218,16 @@ restart:
 	if (rc != 0 || vy_history_is_terminal(&history))
 		goto done;
 
-	rc = vy_point_lookup_scan_mems(lsm, rv, key, &history);
-	if (rc != 0 || vy_history_is_terminal(&history))
+restart:
+	rc = vy_point_lookup_scan_mems(lsm, rv, key, &mem_history);
+	if (rc != 0 || vy_history_is_terminal(&mem_history))
 		goto done;
 
 	/* Save version before yield */
+	uint32_t mem_version = lsm->mem->version;
 	uint32_t mem_list_version = lsm->mem_list_version;
 
-	rc = vy_point_lookup_scan_slices(lsm, rv, key, &history);
+	rc = vy_point_lookup_scan_slices(lsm, rv, key, &disk_history);
 	if (rc != 0)
 		goto done;
 
@@ -241,11 +246,29 @@ restart:
 		 * This in unnecessary in case of rotation but since we
 		 * cannot distinguish these two cases we always restart.
 		 */
-		vy_history_cleanup(&history);
+		vy_history_cleanup(&mem_history);
+		vy_history_cleanup(&disk_history);
 		goto restart;
 	}
 
+	if (mem_version != lsm->mem->version) {
+		/*
+		 * Rescan the memory level if its version changed while we
+		 * were reading disk, because there may be new statements
+		 * matching the search key.
+		 */
+		vy_history_cleanup(&mem_history);
+		rc = vy_point_lookup_scan_mems(lsm, rv, key, &mem_history);
+		if (rc != 0)
+			goto done;
+		if (vy_history_is_terminal(&mem_history))
+			vy_history_cleanup(&disk_history);
+	}
+
 done:
+	vy_history_splice(&history, &mem_history);
+	vy_history_splice(&history, &disk_history);
+
 	if (rc == 0) {
 		int upserts_applied;
 		rc = vy_history_apply(&history, lsm->cmp_def, lsm->mem_format,
diff --git a/src/box/vy_point_lookup.h b/src/box/vy_point_lookup.h
index d74be9a9..3b7c5a04 100644
--- a/src/box/vy_point_lookup.h
+++ b/src/box/vy_point_lookup.h
@@ -62,12 +62,9 @@ struct tuple;
  * tuple in the LSM tree. The tuple is returned in @ret with its
  * reference counter elevated.
  *
- * The caller must guarantee that if the tuple looked up by this
- * function is modified, the transaction will be sent to read view.
- * This is needed to avoid inserting a stale value into the cache.
- * In other words, vy_tx_track() must be called for the search key
- * before calling this function unless this is a primary index and
- * the tuple is already tracked in a secondary index.
+ * Note, this function doesn't track the result in the transaction
+ * read set, i.e. it is up to the caller to call vy_tx_track() if
+ * necessary.
  */
 int
 vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
-- 
2.11.0

  parent reply	other threads:[~2018-07-08 16:48 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-08 16:48 [RFC PATCH 02/23] vinyl: always get full tuple from pk after reading from secondary index Vladimir Davydov
2018-07-08 16:48 ` [RFC PATCH 00/23] vinyl: eliminate read on REPLACE/DELETE Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 01/23] vinyl: do not turn REPLACE into INSERT when processing DML request Vladimir Davydov
2018-07-10 12:15     ` Konstantin Osipov
2018-07-10 12:19       ` Vladimir Davydov
2018-07-10 18:39         ` Konstantin Osipov
2018-07-11  7:57           ` Vladimir Davydov
2018-07-11 10:25             ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 03/23] vinyl: use vy_mem_iterator for point lookup Vladimir Davydov
2018-07-17 10:14     ` Vladimir Davydov
2018-07-08 16:48   ` Vladimir Davydov [this message]
2018-07-10 16:19     ` [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version Konstantin Osipov
2018-07-10 16:43       ` Vladimir Davydov
2018-07-11 16:33         ` Vladimir Davydov
2018-07-31 19:17           ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 05/23] vinyl: fold vy_replace_one and vy_replace_impl Vladimir Davydov
2018-07-31 20:28     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 06/23] vinyl: fold vy_delete_impl Vladimir Davydov
2018-07-31 20:28     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 07/23] vinyl: refactor unique check Vladimir Davydov
2018-07-31 20:28     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 08/23] vinyl: check key uniqueness before modifying tx write set Vladimir Davydov
2018-07-31 20:34     ` Konstantin Osipov
2018-08-01 10:42       ` Vladimir Davydov
2018-08-09 20:26     ` Konstantin Osipov
2018-08-10  8:26       ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 09/23] vinyl: remove env argument of vy_check_is_unique_{primary,secondary} Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 10/23] vinyl: store full tuples in secondary index cache Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 11/23] xrow: allow to store flags in DML requests Vladimir Davydov
2018-07-31 20:36     ` Konstantin Osipov
2018-08-01 14:10       ` Vladimir Davydov
2018-08-17 13:34         ` Vladimir Davydov
2018-08-17 13:34           ` [PATCH 1/2] xrow: allow to store tuple metadata in request Vladimir Davydov
2018-08-17 13:34           ` [PATCH 2/2] vinyl: introduce statement flags Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 12/23] vinyl: do not pass region explicitly to write iterator functions Vladimir Davydov
2018-07-17 10:16     ` Vladimir Davydov
2018-07-31 20:38     ` Konstantin Osipov
2018-08-01 14:14       ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 13/23] vinyl: fix potential use-after-free in vy_read_view_merge Vladimir Davydov
2018-07-17 10:16     ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 14/23] test: unit/vy_write_iterator: minor refactoring Vladimir Davydov
2018-07-17 10:17     ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 15/23] vinyl: teach write iterator to return overwritten tuples Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 16/23] vinyl: allow to skip certain statements on read Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 17/23] vinyl: do not free pending tasks on shutdown Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 18/23] vinyl: store pointer to scheduler in struct vy_task Vladimir Davydov
2018-07-31 20:39     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 19/23] vinyl: rename some members of vy_scheduler and vy_task struct Vladimir Davydov
2018-07-31 20:40     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 20/23] vinyl: use cbus for communication between scheduler and worker threads Vladimir Davydov
2018-07-31 20:43     ` Konstantin Osipov
2018-08-01 14:26       ` Vladimir Davydov
2018-07-08 16:48   ` [RFC PATCH 21/23] vinyl: zap vy_scheduler::is_worker_pool_running Vladimir Davydov
2018-07-31 20:43     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 22/23] vinyl: rename vy_task::status to is_failed Vladimir Davydov
2018-07-31 20:44     ` Konstantin Osipov
2018-07-08 16:48   ` [RFC PATCH 23/23] vinyl: eliminate read on REPLACE/DELETE Vladimir Davydov
2018-07-13 10:53     ` Vladimir Davydov
2018-07-13 10:53       ` [PATCH 1/3] stailq: add stailq_insert function Vladimir Davydov
2018-07-15  7:02         ` Konstantin Osipov
2018-07-15 13:17           ` Vladimir Davydov
2018-07-15 18:40             ` Konstantin Osipov
2018-07-17 10:18         ` Vladimir Davydov
2018-07-13 10:53       ` [PATCH 2/3] vinyl: link all indexes of the same space Vladimir Davydov
2018-07-13 10:53       ` [PATCH 3/3] vinyl: generate deferred DELETEs on tx commit Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a413558e2ac5ef82da3849d1770152896a5c588a.1531065648.git.vdavydov.dev@gmail.com \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox