From: Serge Petrenko <sergepetrenko@tarantool.org> To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, korablev@tarantool.org Cc: tarantool-patches@dev.tarantool.org Subject: Re: [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible Date: Tue, 1 Dec 2020 11:48:59 +0300 [thread overview] Message-ID: <c3c532fb-620e-493c-ccd6-833bbfe8830d@tarantool.org> (raw) In-Reply-To: <fb78ec25-3b9a-3320-f32f-cca1ce13d372@tarantool.org> 25.11.2020 01:33, Vladislav Shpilevoy пишет: > Thanks for the investigation! > >>> On 14.11.2020 18:28, Serge Petrenko wrote: >>>> tuple_field_raw_by_part allows to use part's offset slot cache to bypass >>>> tuple field lookup in a JSON tree, which is involved even for plain >>>> tuples. >>> I am not sure I understand. How does it work in 1.10 then? It uses >>> tuple_field_by_part_raw, which in the end simply calls >>> >>> tuple_field_raw(format, data, field_map, part->fieldno); >>> >>> Exactly what we had here. Why did anything change? >> tuple_field_raw() itself changed. On 1.10 it accesses field as an array element, >> then takes its offset slot and finds the shift in the field map. >> >> On 2.x it aliases to tuple_field_raw_by_path(): >> finds the top-level field in the format's JSON tree, does a couple of >> checks for JSON paths and multikey indices and only then takes the offset >> like 1.10 did. >> >> But while tuple_field_raw_by_path() takes an offset slot hint, >> tuple_field_raw() doesn't use it, so its call always results in searching >> the JSON tree. > Aha, now I see. So tuple_field_raw used the offset slot but with tons of > unnecessary checks before that. > > Looking at how much did we gain from removing a couple of 'ifs' from the > field map build, we could probably gain another portion of perf by > splitting huge tuple_field_raw_by_path into 2 parts. 1 part with non-NULL > path, and one with NULL path. Non-NULL path would simply use the root JSON > array like you did in the first patch, and would omit all checks > path !=/== NULL, whose count now is 3!!! in one function, at least. > > 1) > if (path == NULL && fieldno == 0) { > mp_decode_array(&tuple); > return tuple; > } > > 2) > tuple_format_field_by_path() does another path == NULL > inside; > > 3) > if (path != NULL && field == NULL) > goto parse; Thanks for the suggestion! That's what I did and it gave even more perf back. Please see v2 of this patch for details. > Could be done with templates relatively trivial. We just make path > check a boolean template argument, and check it before checking the > path. > > Or we can literally split this function into 2 independent functions, > because it seems they will look quite different. >>>> Since both tuple_field_raw_by_part and tuple_field_raw are basically >>>> aliases to tuple_field_raw_by_path, prefer the former to the latter, >>>> since it takes advantage over key part's offset slot cache. >>> Besides, I can't find any info why do we need this 'offset slot cache'. >>> The comment above this member simply narrates the code, and I don't >>> understand the actual reason it was added. Did you manage to realize it? >>> Can you explain, please? >> Yep. This cache ameliorates the performance loss for looking the field up >> in JSON tree every time we need to access some tuple field by offset. >> >> Say, each time you call tuple_compare_slowpath and it gets to comparing >> field values, you'll look up the field in JSON tree to know its offset slot. >> >> You may have lots of calls to tuple_compare for different tuples but for >> the same set of key_parts. So instead of looking the field up on every >> comparison, you look it up once and remember its offset slot in the >> key part. Now next time you need offset slot access you don't need >> to look for the field in the JSON tree at all. >> >> Looks like initially this cache was intended to save us from some complex >> tree lookups, when path is actually not null. >> >> But the cache helps even for top-level field access. It helps skip not only >> the JSON tree lookup, but all the additional checks coming after it. >> >> So that's why I made use of it in this patch. > There must be a problem with these offset slots in case of space alter. If > we add a new index or remove another index, the existing tuples will have > 2 formats in one index. Some tuples will have the old format, and some will > have the new one. With probably different offset slots for the same index > parts. Or not, but it does not matter - format->epoch will change anyway. > > AFAIU, such a simple alter can make the offset slots almost not used, because > tuple_field_raw_by_part will reset them on each switch from tuple of one > format to a tuple of another format. > > Maybe we don't want to optimize it though, since alter is not a common case. I didn't think of that. Aren't tuples updating their format on the fly? Or you probably have to replace the tuple for its format to be updated? -- Serge Petrenko
next prev parent reply other threads:[~2020-12-01 8:49 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-14 17:28 [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko 2020-11-14 17:28 ` [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create Serge Petrenko 2020-11-14 18:00 ` Cyrill Gorcunov 2020-11-16 7:34 ` Serge Petrenko 2020-11-20 23:26 ` Vladislav Shpilevoy 2020-11-23 13:50 ` Serge Petrenko 2020-11-24 22:33 ` Vladislav Shpilevoy 2020-12-01 8:48 ` Serge Petrenko 2020-12-01 22:04 ` Vladislav Shpilevoy 2020-12-02 9:53 ` Serge Petrenko 2020-11-14 17:28 ` [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible Serge Petrenko 2020-11-20 23:26 ` Vladislav Shpilevoy 2020-11-23 13:51 ` Serge Petrenko 2020-11-24 22:33 ` Vladislav Shpilevoy 2020-12-01 8:48 ` Serge Petrenko [this message] 2020-12-01 22:04 ` Vladislav Shpilevoy 2020-11-16 7:50 ` [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=c3c532fb-620e-493c-ccd6-833bbfe8830d@tarantool.org \ --to=sergepetrenko@tarantool.org \ --cc=korablev@tarantool.org \ --cc=tarantool-patches@dev.tarantool.org \ --cc=v.shpilevoy@tarantool.org \ --subject='Re: [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox