[Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible

Tue Dec 1 11:48:59 MSK 2020

25.11.2020 01:33, Vladislav Shpilevoy пишет:
> Thanks for the investigation!
>
>>> On 14.11.2020 18:28, Serge Petrenko wrote:
>>>> tuple_field_raw_by_part allows to use part's offset slot cache to bypass
>>>> tuple field lookup in a JSON tree, which is involved even for plain
>>>> tuples.
>>> I am not sure I understand. How does it work in 1.10 then? It uses
>>> tuple_field_by_part_raw, which in the end simply calls
>>>
>>>      tuple_field_raw(format, data, field_map, part->fieldno);
>>>
>>> Exactly what we had here. Why did anything change?
>> tuple_field_raw() itself changed. On 1.10 it accesses field as an array element,
>> then takes its offset slot and finds the shift in the field map.
>>
>> On 2.x it aliases to tuple_field_raw_by_path():
>> finds the top-level field in the format's JSON tree, does a couple of
>> checks for JSON paths and multikey indices and only then takes the offset
>> like 1.10 did.
>>
>> But while tuple_field_raw_by_path() takes an offset slot hint,
>> tuple_field_raw() doesn't use it, so its call always results in searching
>> the JSON tree.
> Aha, now I see. So tuple_field_raw used the offset slot but with tons of
> unnecessary checks before that.
>
> Looking at how much did we gain from removing a couple of 'ifs' from the
> field map build, we could probably gain another portion of perf by
> splitting huge tuple_field_raw_by_path into 2 parts. 1 part with non-NULL
> path, and one with NULL path. Non-NULL path would simply use the root JSON
> array like you did in the first patch, and would omit all checks
> path !=/== NULL, whose count now is 3!!! in one function, at least.
>
> 1)
> 	if (path == NULL && fieldno == 0) {
> 		mp_decode_array(&tuple);
> 		return tuple;
> 	}
>
> 2)
> 	tuple_format_field_by_path() does another path == NULL
> 	inside;
>
> 3)
> 	if (path != NULL && field == NULL)
> 		goto parse;

Thanks for the suggestion!

That's what I did and it gave even more perf back. Please see v2 of this 
patch

for details.

> Could be done with templates relatively trivial. We just make path
> check a boolean template argument, and check it before checking the
> path.
>
> Or we can literally split this function into 2 independent functions,
> because it seems they will look quite different.
>>>> Since both tuple_field_raw_by_part and tuple_field_raw are basically
>>>> aliases to tuple_field_raw_by_path, prefer the former to the latter,
>>>> since it takes advantage over key part's offset slot cache.
>>> Besides, I can't find any info why do we need this 'offset slot cache'.
>>> The comment above this member simply narrates the code, and I don't
>>> understand the actual reason it was added. Did you manage to realize it?
>>> Can you explain, please?
>> Yep. This cache ameliorates the performance loss for looking the field up
>> in JSON tree every time we need to access some tuple field by offset.
>>
>> Say, each time you call tuple_compare_slowpath and it gets to comparing
>> field values, you'll look up the field in JSON tree to know its offset slot.
>>
>> You may have lots of calls to tuple_compare for different tuples but for
>> the same set of key_parts. So instead of looking the field up on every
>> comparison, you look it up once and remember its offset slot in the
>> key part. Now next time you need offset slot access you don't need
>> to look for the field in the JSON tree at all.
>>
>> Looks like initially this cache was intended to save us from some complex
>> tree lookups, when path is actually not null.
>>
>> But the cache helps even for top-level field access. It helps skip not only
>> the JSON tree lookup, but all the additional checks coming after it.
>>
>> So that's why I made use of it in this patch.
> There must be a problem with these offset slots in case of space alter. If
> we add a new index or remove another index, the existing tuples will have
> 2 formats in one index. Some tuples will have the old format, and some will
> have the new one. With probably different offset slots for the same index
> parts. Or not, but it does not matter - format->epoch will change anyway.
>
> AFAIU, such a simple alter can make the offset slots almost not used, because
> tuple_field_raw_by_part will reset them on each switch from tuple of one
> format to a tuple of another format.
>
> Maybe we don't want to optimize it though, since alter is not a common case.

I didn't think of that. Aren't tuples updating their format on the fly?
Or you probably have to replace the tuple for its format to be updated?

-- 
Serge Petrenko