Re: [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, korablev@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible
Date: Tue, 1 Dec 2020 11:48:59 +0300	[thread overview]
Message-ID: <c3c532fb-620e-493c-ccd6-833bbfe8830d@tarantool.org> (raw)
In-Reply-To: <fb78ec25-3b9a-3320-f32f-cca1ce13d372@tarantool.org>


25.11.2020 01:33, Vladislav Shpilevoy пишет:
> Thanks for the investigation!
>
>>> On 14.11.2020 18:28, Serge Petrenko wrote:
>>>> tuple_field_raw_by_part allows to use part's offset slot cache to bypass
>>>> tuple field lookup in a JSON tree, which is involved even for plain
>>>> tuples.
>>> I am not sure I understand. How does it work in 1.10 then? It uses
>>> tuple_field_by_part_raw, which in the end simply calls
>>>
>>>      tuple_field_raw(format, data, field_map, part->fieldno);
>>>
>>> Exactly what we had here. Why did anything change?
>> tuple_field_raw() itself changed. On 1.10 it accesses field as an array element,
>> then takes its offset slot and finds the shift in the field map.
>>
>> On 2.x it aliases to tuple_field_raw_by_path():
>> finds the top-level field in the format's JSON tree, does a couple of
>> checks for JSON paths and multikey indices and only then takes the offset
>> like 1.10 did.
>>
>> But while tuple_field_raw_by_path() takes an offset slot hint,
>> tuple_field_raw() doesn't use it, so its call always results in searching
>> the JSON tree.
> Aha, now I see. So tuple_field_raw used the offset slot but with tons of
> unnecessary checks before that.
>
> Looking at how much did we gain from removing a couple of 'ifs' from the
> field map build, we could probably gain another portion of perf by
> splitting huge tuple_field_raw_by_path into 2 parts. 1 part with non-NULL
> path, and one with NULL path. Non-NULL path would simply use the root JSON
> array like you did in the first patch, and would omit all checks
> path !=/== NULL, whose count now is 3!!! in one function, at least.
>
> 1)
> 	if (path == NULL && fieldno == 0) {
> 		mp_decode_array(&tuple);
> 		return tuple;
> 	}
>
> 2)
> 	tuple_format_field_by_path() does another path == NULL
> 	inside;
>
> 3)
> 	if (path != NULL && field == NULL)
> 		goto parse;


Thanks for the suggestion!

That's what I did and it gave even more perf back. Please see v2 of this 
patch

for details.


> Could be done with templates relatively trivial. We just make path
> check a boolean template argument, and check it before checking the
> path.
>
> Or we can literally split this function into 2 independent functions,
> because it seems they will look quite different.
>>>> Since both tuple_field_raw_by_part and tuple_field_raw are basically
>>>> aliases to tuple_field_raw_by_path, prefer the former to the latter,
>>>> since it takes advantage over key part's offset slot cache.
>>> Besides, I can't find any info why do we need this 'offset slot cache'.
>>> The comment above this member simply narrates the code, and I don't
>>> understand the actual reason it was added. Did you manage to realize it?
>>> Can you explain, please?
>> Yep. This cache ameliorates the performance loss for looking the field up
>> in JSON tree every time we need to access some tuple field by offset.
>>
>> Say, each time you call tuple_compare_slowpath and it gets to comparing
>> field values, you'll look up the field in JSON tree to know its offset slot.
>>
>> You may have lots of calls to tuple_compare for different tuples but for
>> the same set of key_parts. So instead of looking the field up on every
>> comparison, you look it up once and remember its offset slot in the
>> key part. Now next time you need offset slot access you don't need
>> to look for the field in the JSON tree at all.
>>
>> Looks like initially this cache was intended to save us from some complex
>> tree lookups, when path is actually not null.
>>
>> But the cache helps even for top-level field access. It helps skip not only
>> the JSON tree lookup, but all the additional checks coming after it.
>>
>> So that's why I made use of it in this patch.
> There must be a problem with these offset slots in case of space alter. If
> we add a new index or remove another index, the existing tuples will have
> 2 formats in one index. Some tuples will have the old format, and some will
> have the new one. With probably different offset slots for the same index
> parts. Or not, but it does not matter - format->epoch will change anyway.
>
> AFAIU, such a simple alter can make the offset slots almost not used, because
> tuple_field_raw_by_part will reset them on each switch from tuple of one
> format to a tuple of another format.
>
> Maybe we don't want to optimize it though, since alter is not a common case.


I didn't think of that. Aren't tuples updating their format on the fly?
Or you probably have to replace the tuple for its format to be updated?


-- 
Serge Petrenko

next prev parent reply	other threads:[~2020-12-01  8:49 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14 17:28 [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko
2020-11-14 17:28 ` [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create Serge Petrenko
2020-11-14 18:00   ` Cyrill Gorcunov
2020-11-16  7:34     ` Serge Petrenko
2020-11-20 23:26   ` Vladislav Shpilevoy
2020-11-23 13:50     ` Serge Petrenko
2020-11-24 22:33       ` Vladislav Shpilevoy
2020-12-01  8:48         ` Serge Petrenko
2020-12-01 22:04           ` Vladislav Shpilevoy
2020-12-02  9:53             ` Serge Petrenko
2020-11-14 17:28 ` [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible Serge Petrenko
2020-11-20 23:26   ` Vladislav Shpilevoy
2020-11-23 13:51     ` Serge Petrenko
2020-11-24 22:33       ` Vladislav Shpilevoy
2020-12-01  8:48         ` Serge Petrenko [this message]
2020-12-01 22:04           ` Vladislav Shpilevoy
2020-11-16  7:50 ` [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3c532fb-620e-493c-ccd6-833bbfe8830d@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox