From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp63.i.mail.ru (smtp63.i.mail.ru [217.69.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id E465D45C304 for ; Tue, 1 Dec 2020 11:48:45 +0300 (MSK) From: Serge Petrenko References: <20201114172823.8217-1-sergepetrenko@tarantool.org> <20201114172823.8217-2-sergepetrenko@tarantool.org> <635cf529-93e6-09e3-73d4-bba8ad04e793@tarantool.org> <2b49855f-3337-0828-465b-38fbd7ef0912@tarantool.org> <9eba270a-c0fb-3238-bc2e-2e39fe73eb91@tarantool.org> Message-ID: <301e2d4d-3534-d03e-4dec-84c72d4c351d@tarantool.org> Date: Tue, 1 Dec 2020 11:48:44 +0300 MIME-Version: 1.0 In-Reply-To: <9eba270a-c0fb-3238-bc2e-2e39fe73eb91@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy , korablev@tarantool.org Cc: tarantool-patches@dev.tarantool.org 25.11.2020 01:33, Vladislav Shpilevoy пишет: > >> ``` >> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c >> index d8656aa26..6c9b2a255 100644 >> --- a/src/box/tuple_format.c >> +++ b/src/box/tuple_format.c >> @@ -888,17 +888,10 @@ tuple_field_map_create_plain(struct tuple_format *format, const char *tuple, >>                        required_fields_sz); >>         } >> >> -       struct json_token token = { >> -               .type = JSON_TOKEN_NUM, >> -               .num = 0, >> -       }; >>         struct tuple_field *field; >> - >> -       for (uint32_t i = 0; i < defined_field_count; i++, mp_next(&pos)) { >> -               token.num = i; >> -               field = json_tree_lookup_entry(&format->fields, >> - &format->fields.root, &token, >> -                                              struct tuple_field, token); >> +       struct json_token **token = format->fields.root.children; >> +       for (uint32_t i = 0; i < defined_field_count; i++, token++, mp_next(&pos)) { >> +               field = json_tree_entry(*token, struct tuple_field, token); >>                 if (validate) { >>                         bool nullable = tuple_field_is_nullable(field); >> if(!field_mp_type_is_compatible(field->type, pos, >> ``` >> >> >> I actually profiled my original patch and the version with this diff applied: >> >> >> This diff reduces time taken by tuple_field_map_create from 2.12 seconds >> to 1.69 seconds. 1.10 still has the best time: 1.31 seconds. >> Just for comparison, current 2.x takes 6.1 seconds for the same amount of >> tuple_field_map_create calls. > Hm. That looks too much. json_tree_lookup_entry() is sure slower than a > direct access, but why so much slower? It calls json_tree_lookup, which > checks token type, which is num, and then it does the same as you in the > diff above. So it is just +2 ifs, not counting the 'unlikely' multikey > check. Does a couple of ifs slow down the code so heavily? I suppose we should count all ifs. likely/unlikely shouldn't make any difference after some iterations, the branch predictor should already learn and guess correctly every time.  and we have ~30 mil iteratiions in this test. So there's +3 ifs in a cycle (just +3 instructions, without any branch misprediction). > Maybe the inlining was too aggressive, and we washed out the instruction > cache? Don't know how to check it properly. Just out of curiosity, what > if we move json_tree_lookup to json.c, and field_map_builder_set_slot > to field_map.c? And maybe field_mp_type_is_compatible to field_def.c. tuple_field_map_create time with everything moved to *.c: 2.44 s field_mp_type_is_compatible inlined, everything elese in *.c: 2.8 s only json_tree_lookup moved to json.c: 2.21 s everything inlined: 2.12 s > If it won't change anything, then probably just 2 ifs really matter that > much. > >> I'm starting to like this variant more. Anyway, I'm not applying this diff >> until your ack. > Looks fine. Just don't understand why so big difference for such a simple > change. These 3 ifs are a noticeable portion of all the instructions inside the loop. Why not? The diff applied. > Then we probably also could gain some perf by splitting these functions > into 2 versions with validate true and validate false. Because we always > know it at compile time. Would be cool to use templates for this, but not > sure if we can simply change tuple.c to tuple.cc only for that. You mean like that? tuple_field_map_create(...) { tuple_field_map_create_impl(..., true, ...); } tuple_field_map_create_unchecked(...) { tuple_field_mp_create_impl(..., false, ...); } I tried this approach, it didn't give anything. tuple_field_map_create time 2.15 s. Is this possible that compiler already evaluates `validate`? The diff, just in case. ============================================= diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index d8656aa26..14e3984b2 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -857,7 +857,7 @@ tuple_format_required_fields_validate(struct tuple_format *format,                                       void *required_fields,                                       uint32_t required_fields_sz); -static int +static inline int  tuple_field_map_create_plain(struct tuple_format *format, const char *tuple,                              bool validate, struct field_map_builder *builder)  { @@ -925,10 +925,9 @@ end:                0;  } -/** @sa declaration for details. */ -int -tuple_field_map_create(struct tuple_format *format, const char *tuple, -                      bool validate, struct field_map_builder *builder) +static inline int +tuple_field_map_create_impl(struct tuple_format *format, const char *tuple, +                           bool validate, struct field_map_builder *builder)  {         struct region *region = &fiber()->gc;         if (field_map_builder_create(builder, format->field_map_size, @@ -966,6 +965,20 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple,         return entry.data == NULL ? 0 : -1;  } +int +tuple_field_map_create(struct tuple_format *format, const char *tuple, +                      struct field_map_builder *builder) +{ +       return tuple_field_map_create_impl(format, tuple, true, builder); +} + +int +tuple_field_map_create_unchecked(struct tuple_format *format, const char *tuple, +                                struct field_map_builder *builder) +{ +       return tuple_field_map_create_impl(format, tuple, false, builder); +} +  uint32_t  tuple_format_min_field_count(struct key_def * const *keys, uint16_t key_count,                              const struct field_def *space_fields, > We could also template tuple_field_raw_by_path, because tuple_field_raw > always passes NULL path and NULL offset_slot_hint. Yes, this should also help. -- Serge Petrenko