From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp34.i.mail.ru (smtp34.i.mail.ru [94.100.177.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 4FB2A45C305 for ; Fri, 4 Dec 2020 12:53:10 +0300 (MSK) From: Serge Petrenko Date: Fri, 4 Dec 2020 12:52:54 +0300 Message-Id: <5b3a70820a00cdfa26189933a4ae0549a0398c72.1607075291.git.sergepetrenko@tarantool.org> In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Tarantool-patches] [PATCH v3 1/2] box: speed up tuple_field_map_create List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: korablev@tarantool.org Cc: tarantool-patches@dev.tarantool.org Since the introduction of JSON path indices tuple_init_field_map, which was quite a simple routine traversing a tuple and storing its field data offsets in the field map, was renamed to tuple_field_map_create and optimised for working with JSON path indices. The main difference is that tuple format fields are now organised in a tree rather than an array, and the tuple itself may have indexed fields, which are not plain array members, but rather members of some sub-array or map. This requires more complex iteration over tuple format fields and additional tuple parsing. All the changes were, however, unneeded for tuple formats not supporting fields indexed by JSON paths. Rework tuple_field_map_create so that it doesn't go through all the unnecessary JSON path-related checks for simple cases and restore most of the lost performance. Below are some benchmark results for the same workload that pointed to the degradation initially. Snapshot recovery time on RelWithDebInfo build for a 1.5G snapshot containing a single memtx space with one secondary index over 4 integer and 1 string field: Version | Time (s) | Difference relative to 1.10 ---------------------------|----------|---------------------------- 1.10 (the golden standard) | 28 | -/- 2.x (degradation) | 39 | + 39% 2.x (patched) | 31 | + 11% Profile shows that the main difference is in memtx_tuple_new due to tuple_init_field_map/tuple_field_map_create performance difference. Numbers below show cumulative time spent in tuple_init_field_map (1.10) / tuple_field_map_create (unpatched) / tuple_field_map_create (patched). 2.44 s / 8.61 s / 3.19 s More benchmark results can be seen at #4774 Part of #4774 --- src/box/tuple_format.c | 75 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 9b817d3cf..6c9b2a255 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -852,6 +852,72 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, return true; } +static int +tuple_format_required_fields_validate(struct tuple_format *format, + void *required_fields, + uint32_t required_fields_sz); + +static int +tuple_field_map_create_plain(struct tuple_format *format, const char *tuple, + bool validate, struct field_map_builder *builder) +{ + struct region *region = &fiber()->gc; + const char *pos = tuple; + uint32_t defined_field_count = mp_decode_array(&pos); + if (validate && format->exact_field_count > 0 && + format->exact_field_count != defined_field_count) { + diag_set(ClientError, ER_EXACT_FIELD_COUNT, + (unsigned) defined_field_count, + (unsigned) format->exact_field_count); + return -1; + } + defined_field_count = MIN(defined_field_count, + tuple_format_field_count(format)); + + void *required_fields = NULL; + uint32_t required_fields_sz = bitmap_size(format->total_field_count); + + if (unlikely(defined_field_count == 0)) { + required_fields = format->required_fields; + goto end; + } + + if (validate) { + required_fields = region_alloc(region, required_fields_sz); + memcpy(required_fields, format->required_fields, + required_fields_sz); + } + + struct tuple_field *field; + struct json_token **token = format->fields.root.children; + for (uint32_t i = 0; i < defined_field_count; i++, token++, mp_next(&pos)) { + field = json_tree_entry(*token, struct tuple_field, token); + if (validate) { + bool nullable = tuple_field_is_nullable(field); + if(!field_mp_type_is_compatible(field->type, pos, + nullable)) { + diag_set(ClientError, ER_FIELD_TYPE, + tuple_field_path(field), + field_type_strs[field->type]); + return -1; + } + bit_clear(required_fields, field->id); + } + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL && + field_map_builder_set_slot(builder, field->offset_slot, + pos - tuple, MULTIKEY_NONE, + 0, NULL) != 0) { + return -1; + } + } + +end: + return validate ? + tuple_format_required_fields_validate(format, required_fields, + required_fields_sz) : + 0; +} + /** @sa declaration for details. */ int tuple_field_map_create(struct tuple_format *format, const char *tuple, @@ -864,6 +930,15 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, if (tuple_format_field_count(format) == 0) return 0; /* Nothing to initialize */ + /* + * In case tuple format doesn't contain fields accessed by JSON paths, + * tuple field traversal may be simplified. + */ + if (format->fields_depth == 1) { + return tuple_field_map_create_plain(format, tuple, validate, + builder); + } + uint32_t field_count; struct tuple_format_iterator it; uint8_t flags = validate ? TUPLE_FORMAT_ITERATOR_VALIDATE : 0; -- 2.24.3 (Apple Git-128)