Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>, korablev@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create
Date: Mon, 23 Nov 2020 16:50:38 +0300	[thread overview]
Message-ID: <2b49855f-3337-0828-465b-38fbd7ef0912@tarantool.org> (raw)
In-Reply-To: <635cf529-93e6-09e3-73d4-bba8ad04e793@tarantool.org>


21.11.2020 02:26, Vladislav Shpilevoy пишет:
> Hi! Thanks for the patch!


Thanks for the review!


> Did you think of trying to flatten the first level of tuple_format.fields
> tree into an array, like it was before JSON indexes? So we would have an
> array of fields, and each one has a json-subtree in it. Which is just
> unused in 99.999% of cases. It could restore even more perf maybe.


Your suggestion is good, but looks like it won't work. As I understand,
the first level of JSON schema (or how does one call this?) may be a map,
rather than an array. So, even top-level fields must be organised in a tree,
rather than an array.


I was thinking of making fields a union. So that it'd hold an array for
simple cases and a json tree for complex ones. I also thought it should work
even better, but didn't start implementation.


What's interesting, the JSON tree root has all its children organised in an
array anyway. It's just hidden behind one level of abstraction. So, in
`json_tree_lookup_entry` one actually does `return 
root->children[token.num]`.


So, I'm not sure whether we should introduce a plain array for top-level
fields. This'll require a bigger refactoring just to check whether the idea
is good or not.


I have a suggestion: let's access the root's array directly. I didn't
implement it initially since I thought it'd harm JSON tree encapsulation.
However, I tried it out, and it actually looks good.
Consider this diff (not applied yet, just an example):

```
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index d8656aa26..6c9b2a255 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -888,17 +888,10 @@ tuple_field_map_create_plain(struct tuple_format 
*format, const char *tuple,
                        required_fields_sz);
         }

-       struct json_token token = {
-               .type = JSON_TOKEN_NUM,
-               .num = 0,
-       };
         struct tuple_field *field;
-
-       for (uint32_t i = 0; i < defined_field_count; i++, mp_next(&pos)) {
-               token.num = i;
-               field = json_tree_lookup_entry(&format->fields,
- &format->fields.root, &token,
-                                              struct tuple_field, token);
+       struct json_token **token = format->fields.root.children;
+       for (uint32_t i = 0; i < defined_field_count; i++, token++, 
mp_next(&pos)) {
+               field = json_tree_entry(*token, struct tuple_field, token);
                 if (validate) {
                         bool nullable = tuple_field_is_nullable(field);
if(!field_mp_type_is_compatible(field->type, pos,
```


I actually profiled my original patch and the version with this diff 
applied:


This diff reduces time taken by tuple_field_map_create from 2.12 seconds
to 1.69 seconds. 1.10 still has the best time: 1.31 seconds.
Just for comparison, current 2.x takes 6.1 seconds for the same amount of
tuple_field_map_create calls.


I'm starting to like this variant more. Anyway, I'm not applying this diff
until your ack.


Please find other comments and incremental diff below.


> See 2 comments below.
>
>> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
>> index 9b817d3cf..ad2f251b4 100644
>> --- a/src/box/tuple_format.c
>> +++ b/src/box/tuple_format.c
>> @@ -852,6 +852,89 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
>>   	return true;
>>   }
>>   
>> +static int
>> +tuple_format_required_fields_validate(struct tuple_format *format,
>> +				      void *required_fields,
>> +				      uint32_t required_fields_sz);
>> +
>> +static int
>> +tuple_field_map_create_plain(struct tuple_format *format, const char *tuple,
>> +			     bool validate, struct field_map_builder *builder)
>> +{
>> +#define check_field_type(field, pos) ({					       \
>> +	bool nullable = tuple_field_is_nullable(field);			       \
>> +	if(!field_mp_type_is_compatible(field->type, pos, nullable)) {	       \
>> +		diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field),  \
>> +			 field_type_strs[field->type]);			       \
>> +		return -1;						       \
>> +	}								       \
>> +})
>> +
>> +	struct region *region = &fiber()->gc;
>> +	const char *pos = tuple;
>> +	uint32_t defined_field_count = mp_decode_array(&pos);
>> +	if (validate && format->exact_field_count > 0  &&
>> +	    format->exact_field_count != defined_field_count) {
>> +		diag_set(ClientError, ER_EXACT_FIELD_COUNT,
>> +			 (unsigned) defined_field_count,
>> +			 (unsigned) format->exact_field_count);
>> +		return -1;
>> +	}
>> +	defined_field_count = MIN(defined_field_count,
>> +				  tuple_format_field_count(format));
>> +
>> +	void *required_fields = NULL;
>> +	uint32_t required_fields_sz = bitmap_size(format->total_field_count);
>> +	if (validate) {
>> +		required_fields = region_alloc(region, required_fields_sz);
>> +		memcpy(required_fields, format->required_fields,
>> +		       required_fields_sz);
>> +	}
>> +
>> +	if (unlikely(defined_field_count == 0))
>> +		goto end;
> 1. If the count is 0, you anyway allocate something on the region. Better do the
> check before allocating anything.


Thanks for noticing! Fixed.


>> +
>> +	struct json_token token = {
>> +		.type = JSON_TOKEN_NUM,
>> +		.num = 0,
>> +	};
>> +	struct tuple_field *field;
>> +
>> +	field = json_tree_lookup_entry(&format->fields, &format->fields.root,
>> +				       &token, struct tuple_field, token);
>> +	/* Check 1st field's type, but don't store offset to it. */
>> +	if (validate) {
>> +		check_field_type(field, pos);
>> +		bit_clear(required_fields, field->id);
>> +	}
> 2. Since you use macro, you can move bit_clear and 'if validate' into it.
> These 4 lines above are repeated 2 times without changes.


Agree, but since your next comment is also on point, let's actually
remove the macro at all. There'd be only one its invocation now.


> Also, why do you separate the first iteration? Doesn't the first field
> have offset_slot == TUPLE_OFFSET_SLOT_NIL? So its slot won't be set anyway.


That's some optimisation I borrowed from 1.10 code without a second thought.
Not needed here, so removed.


>> +	mp_next(&pos);
>> +
>> +	for (uint32_t i = 1; i < defined_field_count; i++, mp_next(&pos)) {
>> +		token.num = i;
>> +		field = json_tree_lookup_entry(&format->fields,
>> +					       &format->fields.root, &token,
>> +					       struct  tuple_field, token);
>> +		if (validate) {
>> +			check_field_type(field, pos);
>> +			bit_clear(required_fields, field->id);
>> +		}
>> +		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL &&
>> +		    field_map_builder_set_slot(builder, field->offset_slot,
>> +					       pos - tuple, MULTIKEY_NONE,
>> +					       0, NULL) != 0) {
>> +			return -1;
>> +		}
>> +	}
>> +
>> +#undef check_field_type
>> +
>> +end:
>> +	return validate ?
>> +	       tuple_format_required_fields_validate(format, required_fields,
>> +						     required_fields_sz) :
>> +	       0;
>> +}


diff:


diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index ad2f251b4..18c292926 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -861,15 +861,6 @@ static int
  tuple_field_map_create_plain(struct tuple_format *format, const char 
*tuple,
                   bool validate, struct field_map_builder *builder)
  {
-#define check_field_type(field, pos) ({        \
-    bool nullable = tuple_field_is_nullable(field);        \
-    if(!field_mp_type_is_compatible(field->type, pos, nullable)) {    
        \
-        diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field),  \
-             field_type_strs[field->type]);        \
-        return -1;                               \
-    }                                       \
-})
-
      struct region *region = &fiber()->gc;
      const char *pos = tuple;
      uint32_t defined_field_count = mp_decode_array(&pos);
@@ -885,37 +876,38 @@ tuple_field_map_create_plain(struct tuple_format 
*format, const char *tuple,

      void *required_fields = NULL;
      uint32_t required_fields_sz = bitmap_size(format->total_field_count);
+
+    if (unlikely(defined_field_count == 0)) {
+        required_fields = format->required_fields;
+        goto end;
+    }
+
      if (validate) {
          required_fields = region_alloc(region, required_fields_sz);
          memcpy(required_fields, format->required_fields,
                 required_fields_sz);
      }

-    if (unlikely(defined_field_count == 0))
-        goto end;
-
      struct json_token token = {
          .type = JSON_TOKEN_NUM,
          .num = 0,
      };
      struct tuple_field *field;

-    field = json_tree_lookup_entry(&format->fields, &format->fields.root,
-                       &token, struct tuple_field, token);
-    /* Check 1st field's type, but don't store offset to it. */
-    if (validate) {
-        check_field_type(field, pos);
-        bit_clear(required_fields, field->id);
-    }
-    mp_next(&pos);
-
-    for (uint32_t i = 1; i < defined_field_count; i++, mp_next(&pos)) {
+    for (uint32_t i = 0; i < defined_field_count; i++, mp_next(&pos)) {
          token.num = i;
          field = json_tree_lookup_entry(&format->fields,
                             &format->fields.root, &token,
-                           struct  tuple_field, token);
+                           struct tuple_field, token);
          if (validate) {
-            check_field_type(field, pos);
+            bool nullable = tuple_field_is_nullable(field);
+            if(!field_mp_type_is_compatible(field->type, pos,
+                            nullable)) {
+                diag_set(ClientError, ER_FIELD_TYPE,
+                     tuple_field_path(field),
+                     field_type_strs[field->type]);
+                return -1;
+            }
              bit_clear(required_fields, field->id);
          }
          if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL &&
@@ -926,8 +918,6 @@ tuple_field_map_create_plain(struct tuple_format 
*format, const char *tuple,
          }
      }

-#undef check_field_type
-
  end:
      return validate ?
             tuple_format_required_fields_validate(format, required_fields,

-- 
Serge Petrenko

  reply	other threads:[~2020-11-23 13:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14 17:28 [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko
2020-11-14 17:28 ` [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create Serge Petrenko
2020-11-14 18:00   ` Cyrill Gorcunov
2020-11-16  7:34     ` Serge Petrenko
2020-11-20 23:26   ` Vladislav Shpilevoy
2020-11-23 13:50     ` Serge Petrenko [this message]
2020-11-24 22:33       ` Vladislav Shpilevoy
2020-12-01  8:48         ` Serge Petrenko
2020-12-01 22:04           ` Vladislav Shpilevoy
2020-12-02  9:53             ` Serge Petrenko
2020-11-14 17:28 ` [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible Serge Petrenko
2020-11-20 23:26   ` Vladislav Shpilevoy
2020-11-23 13:51     ` Serge Petrenko
2020-11-24 22:33       ` Vladislav Shpilevoy
2020-12-01  8:48         ` Serge Petrenko
2020-12-01 22:04           ` Vladislav Shpilevoy
2020-11-16  7:50 ` [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b49855f-3337-0828-465b-38fbd7ef0912@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox