From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Vladimir Davydov Subject: [RFC PATCH 04/23] vinyl: make point lookup always return the latest tuple version Date: Sun, 8 Jul 2018 19:48:35 +0300 Message-Id: In-Reply-To: In-Reply-To: References: To: kostja@tarantool.org Cc: tarantool-patches@freelists.org List-ID: Currently, vy_point_lookup(), in contrast to vy_read_iterator, doesn't rescan the memory level after reading disk, so if the caller doesn't track the key before calling this function, the caller won't be sent to a read view in case the key gets updated during yield and hence will be returned a stale tuple. This is OK now, because we always track the key before calling vy_point_lookup(), either in the primary or in a secondary index. However, for #2129 we need it to always return the latest tuple version, no matter if the key is tracked or not. The point is in the scope of #2129 we won't write DELETE statements to secondary indexes corresponding to a tuple replaced in the primary index. Instead after reading a tuple from a secondary index we will check whether it matches the tuple corresponding to it in the primary index: if it is not, it means that the tuple read from the secondary index was overwritten and should be skipped. E.g. suppose we have the primary index over the first field and a secondary index over the second field and the following statements in the space: REPLACE{1, 10} REPLACE{1, 20} Then reading {10} from the secondary index will return REPLACE{1, 10}, but lookup of {1} in the primary index will return REPLACE{1, 20} which doesn't match REPLACE{1, 10} read from the secondary index hence the latter was overwritten and should be skipped. The problem is in the example above we don't want to track key {1} in the primary index before lookup, because we don't actually read its value. So for the check to work correctly, we need the point lookup to guarantee that the returned tuple is always the newest one. It's fairly easy to do - we just need to rescan the memory level after yielding on disk if its version changed. Needed for #2129 --- src/box/vy_point_lookup.c | 35 +++++++++++++++++++++++++++++------ src/box/vy_point_lookup.h | 9 +++------ 2 files changed, 32 insertions(+), 12 deletions(-) diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c index 504a8e80..f2261fdf 100644 --- a/src/box/vy_point_lookup.c +++ b/src/box/vy_point_lookup.c @@ -203,10 +203,13 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx, int rc = 0; lsm->stat.lookup++; + /* History list */ - struct vy_history history; + struct vy_history history, mem_history, disk_history; vy_history_create(&history, &lsm->env->history_node_pool); -restart: + vy_history_create(&mem_history, &lsm->env->history_node_pool); + vy_history_create(&disk_history, &lsm->env->history_node_pool); + rc = vy_point_lookup_scan_txw(lsm, tx, key, &history); if (rc != 0 || vy_history_is_terminal(&history)) goto done; @@ -215,14 +218,16 @@ restart: if (rc != 0 || vy_history_is_terminal(&history)) goto done; - rc = vy_point_lookup_scan_mems(lsm, rv, key, &history); - if (rc != 0 || vy_history_is_terminal(&history)) +restart: + rc = vy_point_lookup_scan_mems(lsm, rv, key, &mem_history); + if (rc != 0 || vy_history_is_terminal(&mem_history)) goto done; /* Save version before yield */ + uint32_t mem_version = lsm->mem->version; uint32_t mem_list_version = lsm->mem_list_version; - rc = vy_point_lookup_scan_slices(lsm, rv, key, &history); + rc = vy_point_lookup_scan_slices(lsm, rv, key, &disk_history); if (rc != 0) goto done; @@ -241,11 +246,29 @@ restart: * This in unnecessary in case of rotation but since we * cannot distinguish these two cases we always restart. */ - vy_history_cleanup(&history); + vy_history_cleanup(&mem_history); + vy_history_cleanup(&disk_history); goto restart; } + if (mem_version != lsm->mem->version) { + /* + * Rescan the memory level if its version changed while we + * were reading disk, because there may be new statements + * matching the search key. + */ + vy_history_cleanup(&mem_history); + rc = vy_point_lookup_scan_mems(lsm, rv, key, &mem_history); + if (rc != 0) + goto done; + if (vy_history_is_terminal(&mem_history)) + vy_history_cleanup(&disk_history); + } + done: + vy_history_splice(&history, &mem_history); + vy_history_splice(&history, &disk_history); + if (rc == 0) { int upserts_applied; rc = vy_history_apply(&history, lsm->cmp_def, lsm->mem_format, diff --git a/src/box/vy_point_lookup.h b/src/box/vy_point_lookup.h index d74be9a9..3b7c5a04 100644 --- a/src/box/vy_point_lookup.h +++ b/src/box/vy_point_lookup.h @@ -62,12 +62,9 @@ struct tuple; * tuple in the LSM tree. The tuple is returned in @ret with its * reference counter elevated. * - * The caller must guarantee that if the tuple looked up by this - * function is modified, the transaction will be sent to read view. - * This is needed to avoid inserting a stale value into the cache. - * In other words, vy_tx_track() must be called for the search key - * before calling this function unless this is a primary index and - * the tuple is already tracked in a secondary index. + * Note, this function doesn't track the result in the transaction + * read set, i.e. it is up to the caller to call vy_tx_track() if + * necessary. */ int vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx, -- 2.11.0