Tarantool development patches archive
 help / color / mirror / Atom feed
From: Serge Petrenko <sergepetrenko@tarantool.org>
To: v.shpilevoy@tarantool.org, korablev@tarantool.org
Cc: tarantool-patches@dev.tarantool.org
Subject: [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create
Date: Sat, 14 Nov 2020 20:28:22 +0300	[thread overview]
Message-ID: <20201114172823.8217-2-sergepetrenko@tarantool.org> (raw)
In-Reply-To: <20201114172823.8217-1-sergepetrenko@tarantool.org>

Since the introduction of JSON path indices tuple_init_field_map, which
was quite a simple routine traversing a tuple and storing its field data
offsets in the field map, was renamed to tuple_field_map_create and
optimised for working with JSON path indices.

The main difference is that tuple format fields are now organised in a
tree rather than an array, and the tuple itself may have indexed fields,
which are not plain array members, but rather members of some sub-array
or map. This requires more complex iteration over tuple format fields
and additional tuple parsing.

All the changes were, however, unneeded for tuple formats not supporting
fields indexed by JSON paths.

Rework tuple_field_map_create so that it doesn't go through all the
unnecessary JSON path-related checks for simple cases and restore most
of the lost performance.

Below are some benchmark results for the same workload that pointed to
the degradation initially.
Snapshot recovery time on RelWithDebInfo build for a 1.5G snapshot
containing a single memtx space with one secondary index over 4 integer
and 1 string field:

        Version            | Time (s) | Difference relative to 1.10
---------------------------|----------|----------------------------
1.10 (the golden standard) |    28    |             -/-
2.x (degradation)          |    39    |            + 39%
2.x (patched)              |    31    |            + 11%

Profile shows that the main difference is in memtx_tuple_new due to
tuple_init_field_map/tuple_field_map_create performance difference.

Numbers below show cumulative time spent in tuple_init_field_map (1.10) /
tuple_field_map_create (unpatched) / tuple_field_map_create (patched).
2.44 s / 8.61 s / 3.98 s

More benchmark results can be seen at #4774

Part of #4774
---
 src/box/tuple_format.c | 91 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 9b817d3cf..f2db521ea 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -852,6 +852,88 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
 	return true;
 }
 
+static int
+tuple_format_required_fields_validate(struct tuple_format *format,
+				      void *required_fields,
+				      uint32_t required_fields_sz);
+
+static int
+tuple_field_map_create_plain(struct tuple_format *format, const char *tuple,
+			     bool validate, struct field_map_builder *builder)
+{
+#define check_field_type(field, pos) {					       \
+	if (validate &&							       \
+	    !field_mp_type_is_compatible(field->type, pos,		       \
+					 tuple_field_is_nullable(field))) {    \
+		diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field),  \
+			 field_type_strs[field->type]);			       \
+		return -1;						       \
+	}								       \
+}
+
+	struct region *region = &fiber()->gc;
+	const char *pos = tuple;
+	uint32_t defined_field_count = mp_decode_array(&pos);
+	if (validate && format->exact_field_count > 0  &&
+	    format->exact_field_count != defined_field_count) {
+		diag_set(ClientError, ER_EXACT_FIELD_COUNT,
+			 (unsigned) defined_field_count,
+			 (unsigned) format->exact_field_count);
+		return -1;
+	}
+	defined_field_count = MIN(defined_field_count,
+				  tuple_format_field_count(format));
+
+	void *required_fields = NULL;
+	uint32_t required_fields_sz = bitmap_size(format->total_field_count);
+	if (validate) {
+		required_fields = region_alloc(region, required_fields_sz);
+		memcpy(required_fields, format->required_fields,
+		       required_fields_sz);
+	}
+
+	if (unlikely(defined_field_count == 0))
+		goto end;
+
+	struct json_token token = {
+		.type = JSON_TOKEN_NUM,
+		.num = 0,
+	};
+	struct tuple_field *field;
+
+	field = json_tree_lookup_entry(&format->fields, &format->fields.root,
+				       &token, struct tuple_field, token);
+	/* Check 1st field's type, but don't store offset to it. */
+	check_field_type(field, pos);
+	if (validate)
+		bit_clear(required_fields, field->id);
+	mp_next(&pos);
+
+	for (uint32_t i = 1; i < defined_field_count; i++, mp_next(&pos)) {
+		token.num = i;
+		field = json_tree_lookup_entry(&format->fields,
+					       &format->fields.root, &token,
+					       struct  tuple_field, token);
+		check_field_type(field, pos);
+		if (validate)
+			bit_clear(required_fields, field->id);
+		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL &&
+		    field_map_builder_set_slot(builder, field->offset_slot,
+					       pos - tuple, MULTIKEY_NONE,
+					       0, NULL) != 0) {
+			return -1;
+		}
+	}
+
+#undef check_field_type
+
+end:
+	return validate ?
+	       tuple_format_required_fields_validate(format, required_fields,
+						     required_fields_sz) :
+	       0;
+}
+
 /** @sa declaration for details. */
 int
 tuple_field_map_create(struct tuple_format *format, const char *tuple,
@@ -864,6 +946,15 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple,
 	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to initialize */
 
+	/*
+	 * In case tuple format doesn't contain fields accessed by JSON paths,
+	 * tuple field traversal may be simplified.
+	 */
+	if (format->fields_depth == 1) {
+		return tuple_field_map_create_plain(format, tuple, validate,
+						    builder);
+	}
+
 	uint32_t field_count;
 	struct tuple_format_iterator it;
 	uint8_t flags = validate ? TUPLE_FORMAT_ITERATOR_VALIDATE : 0;
-- 
2.24.3 (Apple Git-128)

  reply	other threads:[~2020-11-14 17:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-14 17:28 [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko
2020-11-14 17:28 ` Serge Petrenko [this message]
2020-11-14 18:00   ` [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create Cyrill Gorcunov
2020-11-16  7:34     ` Serge Petrenko
2020-11-20 23:26   ` Vladislav Shpilevoy
2020-11-23 13:50     ` Serge Petrenko
2020-11-24 22:33       ` Vladislav Shpilevoy
2020-12-01  8:48         ` Serge Petrenko
2020-12-01 22:04           ` Vladislav Shpilevoy
2020-12-02  9:53             ` Serge Petrenko
2020-11-14 17:28 ` [Tarantool-patches] [PATCH 2/2] box: use tuple_field_raw_by_part where possible Serge Petrenko
2020-11-20 23:26   ` Vladislav Shpilevoy
2020-11-23 13:51     ` Serge Petrenko
2020-11-24 22:33       ` Vladislav Shpilevoy
2020-12-01  8:48         ` Serge Petrenko
2020-12-01 22:04           ` Vladislav Shpilevoy
2020-11-16  7:50 ` [Tarantool-patches] [PATCH 0/2] reduce performance degradation introduced by JSON path indices Serge Petrenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201114172823.8217-2-sergepetrenko@tarantool.org \
    --to=sergepetrenko@tarantool.org \
    --cc=korablev@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 1/2] box: speed up tuple_field_map_create' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox