Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Kirill Shcherbatov <kshcherbatov@tarantool.org>
Cc: tarantool-patches@freelists.org, kostja@tarantool.org
Subject: Re: [PATCH v5 8/9] box: introduce offset slot cache in key_part
Date: Tue, 4 Dec 2018 00:04:18 +0300	[thread overview]
Message-ID: <20181203210418.bpjwulkewjnoyqoh@esperanza> (raw)
In-Reply-To: <e4320c5499f57c1118c73ff15ffa5684d08eee2f.1543229303.git.kshcherbatov@tarantool.org>

On Mon, Nov 26, 2018 at 01:49:42PM +0300, Kirill Shcherbatov wrote:
> Tuned tuple_field_by_part_raw routine with key_part's
> offset_slot_cache. Introduced tuple_format epoch to test validity
> of this cache. The key_part caches last offset_slot source
> format to make epoch comparison because same space may have
> multiple format of same epoch that have different key_parts and
> related offset_slots distribution.
> 
> Part of #1012

If I didn't know what this patch is about, I'd be puzzled after reading
this comment. I'd put it like that:

  tuple_field_by_part looks up the tuple_field corresponding to the
  given key part in tuple_format in order to quickly retrieve the offset
  of indexed data from the tuple field map. For regular indexes this
  operation is blazing fast, however of JSON indexes it is not as we
  have to parse the path to data and then do multiple lookups in a JSON
  tree. Since tuple_field_by_part is used by comparators, we should
  strive to make this routine as fast as possible for all kinds of
  indexes.

  This patch introduces an optimization that is supposed to make
  tuple_field_by_part for JSON indexes as fast as it is for regular
  indexes in most cases. We do that by caching the offset slot right in
  key_part. There's a catch here however - we create a new format
  whenever an index is dropped or created and we don't reindex old
  tuples. As a result, there may be several generations of tuples in the
  same space, all using different formats while there's the only key_def
  used for comparison.

  To overcome this problem, we introduce the notion of tuple_format
  epoch. This is a counter incremented each time a new format is
  created. We store it in tuple_format and key_def, and we only use
  the offset slot cached in a key_def if it's epoch coincides with the
  epoch of the tuple format. If they don't, we look up a tuple_field as
  before, and then update the cached value provided the epoch of the
  tuple format is newer than the key_def's.

> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 029da02..6291159 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -856,7 +856,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
>  	 * Create a new (empty) space for the new definition.
>  	 * Sic: the triggers are not moved over yet.
>  	 */
> -	alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
> +	alter->new_space =
> +		space_new_xc(alter->space_def, &alter->key_list,
> +			     alter->old_space->format != NULL ?
> +			     alter->old_space->format->epoch + 1 : 1);

Passing the epoch around looks ugly. Let's introduce a global counter
and bump it right in tuple_format_new(). If you worry about disk_format
and mem_format having different epochs in vinyl, it's not really a
problem: make epoch an optional argument to tuple_format_new(); if the
caller passes an epoch, use it, otherwise generate a new one; in vinyl
reuse mem_format->epoch for disk_format.

> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index 7731e48..3e08eb4 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -95,6 +95,14 @@ struct key_part {
>  	char *path;
>  	/** The length of JSON path. */
>  	uint32_t path_len;
> +	/**
> +	 * Source format for offset_slot_cache hit validations.
> +	 * Cache is expected to use "the format with the newest
> +	 * epoch is most relevant" strategy.
> +	 */
> +	struct tuple_format *format_cache;

Why did you decide to store tuple_format in key_part instead of epoch,
as you did originally? I liked that more.

> +	/** Cache with format's field offset slot. */

	/**
	 * Cached value of the offset slot corresponding to
	 * the indexed field (tuple_field::offset_slot).
	 * Valid only if key_def::epoch equals the epoch of
	 * the tuple format. Updated in tuple_field_by_part
	 * to always store the offset corresponding to the
	 * most recent tuple format (the one with the greatest
	 * epoch value).
	 */

> +	int32_t offset_slot_cache;

> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index 860f052..8a7ebfa 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -137,6 +137,8 @@ tuple_field_is_nullable(const struct tuple_field *tuple_field)
>   * Tuple format describes how tuple is stored and information about its fields
>   */
>  struct tuple_format {
> +	/** Counter that grows incrementally on space rebuild. */

... used for caching offset slot in key_part, for more details
see key_part::offset_slot_cache

> +	uint64_t epoch;

  reply	other threads:[~2018-12-03 21:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
2018-11-29 15:39     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 2/9] lib: implement JSON tree class for json library Kirill Shcherbatov
2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
2018-11-29 17:38     ` Vladimir Davydov
2018-11-29 17:50       ` Vladimir Davydov
2018-12-04 15:22       ` Vladimir Davydov
2018-12-04 15:47       ` [tarantool-patches] " Kirill Shcherbatov
2018-12-04 17:54         ` Vladimir Davydov
2018-12-05  8:37           ` Kirill Shcherbatov
2018-12-05  9:07             ` Vladimir Davydov
2018-12-05  9:52               ` Vladimir Davydov
2018-12-06  7:56                 ` Kirill Shcherbatov
2018-12-06  7:56                 ` [tarantool-patches] Re: [PATCH v5 2/9] lib: make index_base support for json_lexer Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 3/9] box: manage format fields with JSON tree class Kirill Shcherbatov
2018-11-29 19:07   ` Vladimir Davydov
2018-12-04 15:47     ` [tarantool-patches] " Kirill Shcherbatov
2018-12-04 16:09       ` Vladimir Davydov
2018-12-04 16:32         ` Kirill Shcherbatov
2018-12-05  8:37         ` Kirill Shcherbatov
2018-12-06  7:56         ` Kirill Shcherbatov
2018-12-06  8:06           ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 4/9] lib: introduce json_path_cmp routine Kirill Shcherbatov
2018-11-30 10:46   ` Vladimir Davydov
2018-12-03 17:37     ` [tarantool-patches] " Konstantin Osipov
2018-12-03 18:48       ` Vladimir Davydov
2018-12-03 20:14         ` Konstantin Osipov
2018-12-06  7:56           ` [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov
2018-11-26 10:49 ` [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes Kirill Shcherbatov
2018-11-30 21:28   ` Vladimir Davydov
2018-12-01 16:49     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 6/9] box: introduce has_json_paths flag in templates Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2018-12-01 17:20   ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 8/9] box: introduce offset slot cache in key_part Kirill Shcherbatov
2018-12-03 21:04   ` Vladimir Davydov [this message]
2018-12-04 15:51     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 9/9] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-12-04 12:22   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181203210418.bpjwulkewjnoyqoh@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=kshcherbatov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [PATCH v5 8/9] box: introduce offset slot cache in key_part' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox