Tarantool development patches archive
 help / color / mirror / Atom feed
* [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path
@ 2018-08-15 12:14 Kirill Shcherbatov
  2018-08-15 12:14 ` [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-15 12:14 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov

Branch: http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes
Issue: https://github.com/tarantool/tarantool/issues/1012

Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
This came possible with auxiliary structures per tuple_format:
tree of intermediate path fields and hashtable refers to leaf field
that use path as key.
To speed-up data access by JSON index key_part structure extended
with offset_slot cache that points to field_map item containing
data offset for current tuple.
RFC contains detailed description of those concepts.
Finally, supported ability to define JSON paths in user-friendly
form containing format field name(that could be changed).

Changes in v2:
  - new single-malloc memory management for paths in key_parts
    and format.
  - fixed multiple bugs, make vynil engine works:
    1) we need to create tuple with holes via routine
       vy_stmt_new_surrogate_from_key.
    2) as vynil engine has two formats - memory format and disk
       format we enforce offset_slot mapping to be consistent in
       vy_lsm_new to make key_parts slot caches work.
  - optimized tuple_init_field map and other tree-walkers routines.
  - start using format epoch per format.
  - a little more effective memory management for routine
    key_def_normalize_json_path - we try to reuse the biggest free
    chunk if any
  - multiple checks on attempts to create JSON index on space with
    incompatible data, duplicates
  - new tests for JSON paths

Kirill Shcherbatov (5):
  rfc: describe a Tarantool JSON indexes
  box: introduce slot_cache in key_part
  box: introduce path field in key_part
  box: introduce path_hash and tuple_field tree
  box: specify indexes in user-friendly form

 doc/rfc/1012-json-indexes.md | 188 ++++++++++++++
 src/box/alter.cc             |   4 +
 src/box/errcode.h            |   2 +-
 src/box/index_def.c          |  10 +-
 src/box/key_def.c            | 281 ++++++++++++++++++---
 src/box/key_def.h            |  27 +-
 src/box/lua/schema.lua       |  58 ++++-
 src/box/lua/space.cc         |   5 +
 src/box/memtx_bitset.c       |   8 +-
 src/box/memtx_engine.c       |   5 +
 src/box/memtx_rtree.c        |   6 +-
 src/box/schema.cc            |  12 +-
 src/box/space.c              |  25 ++
 src/box/space.h              |  10 +
 src/box/tuple.c              |  25 +-
 src/box/tuple_compare.cc     | 123 ++++++++--
 src/box/tuple_extract_key.cc | 161 +++++++++---
 src/box/tuple_format.c       | 573 +++++++++++++++++++++++++++++++++++++++----
 src/box/tuple_format.h       | 103 +++++++-
 src/box/tuple_hash.cc        |  63 ++++-
 src/box/vinyl.c              |   5 +
 src/box/vy_log.c             |   3 +-
 src/box/vy_lsm.c             |  43 ++++
 src/box/vy_point_lookup.c    |   2 -
 src/box/vy_stmt.c            | 124 ++++++++--
 src/box/vy_stmt.h            |   6 +-
 test/box/misc.result         |  51 ++--
 test/engine/iterator.result  |   2 +-
 test/engine/tuple.result     | 307 +++++++++++++++++++++++
 test/engine/tuple.test.lua   |  90 +++++++
 30 files changed, 2103 insertions(+), 219 deletions(-)
 create mode 100644 doc/rfc/1012-json-indexes.md

-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes
  2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
@ 2018-08-15 12:14 ` Kirill Shcherbatov
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 2/5] box: introduce slot_cache in key_part Kirill Shcherbatov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-15 12:14 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov

Part of #1012.
---
 doc/rfc/1012-json-indexes.md | 188 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 188 insertions(+)
 create mode 100644 doc/rfc/1012-json-indexes.md

diff --git a/doc/rfc/1012-json-indexes.md b/doc/rfc/1012-json-indexes.md
new file mode 100644
index 0000000..a603f9a
--- /dev/null
+++ b/doc/rfc/1012-json-indexes.md
@@ -0,0 +1,188 @@
+# Tarantool JSON Indexes
+
+* **Status**: In progress
+* **Start date**: 26-07-2018
+* **Authors**: Vladimir Davydov @locker vdavydov.dev@gmail.com, Vladislav Shpilevoy @Gerold103 v.shpilevoy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org
+* **Issues**: [#1012](https://github.com/tarantool/tarantool/issues/1012)
+
+## Summary
+Tarantool *JSON Indexes* is a feature that allows to index space's tuples that have consistent document-type complex structure.
+
+
+## Background and motivation
+The box.index submodule provides read-only access for index definitions and index keys. Indexes are contained in box.space.space-name.index array within each space object. They provide an API for ordered iteration over tuples. This API is a direct binding to corresponding methods of index objects of type box.index in the storage engine.
+```
+s:create_index('primary', {parts = {1, 'unsigned'}, {2, 'str'}})
+```
+
+Sometimes tuple space data have complex but consistent document structure.
+```
+s:insert{1, {town = 'NY', name = {first = 'Richard', second = 'Feynman'}}}
+s:insert{2, {town = 'NY', name = {first = 'Ayn', second = 'Rand'}}}
+```
+Building index on document fields would be attractive feature that allows to search and iterate over such data.
+```
+s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}})
+```
+
+Fields having complex document structure should have 'map' or 'array' type in format if specified.
+
+
+## Detailed design
+All data in Tarantool stored in atomic database objects called *tuples*. They consist of payload(user data represented as *msgpack*) and extra information that describe how does database should operate with it.
+```
+  [[ Img 0.1 - Tarantool tuple structure ]]
+
+                           +----------------------+-----------------------+-------------+
+  tuple_begin, ..., raw =  |      extra_size      | offset N ... offset 1 | MessagePack |
+  |                        +----------------------+-----------------------+-------------+
+  |                                                                       ^
+  +-----------------------------------------------------------------------data_offset
+```
+Indexed information should be accessible really fast. In case of regular indexes on fields it comes possible with *field_map* - data offsets stored before user payload in each tuple. Metadata cells that are used to store those offsets *offset_slot* are not same for different *formats*(aggregated information of all indexes describing specific space). Without loss of generality this mean that tuples constructed at the moment 'A' when only PK index was registered have another *format* than the tuples constructed at the moment 'B' after new Index "name" was added.
+
+The concept idea with *offset_slot*s could be used for JSON Indexes too.
+
+JSON-path definitely would be a part of *key_part* structure. As we have already noted, *offset_slots* are different for formats of different generations, so we should find an universal and fast way to obtain it by JSON. As code that access tuple data by index is a really hot, the performance is our priority problem.
+We already have *tuple_field_raw_by_path* that follows JSON path in tuple and returns pointer to requested field. Sometimes fields that nested in JSON-structured document may have coinciding path part:
+e.g.:
+```
+s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}})
+s:create_index('name'', { parts = {{ "town", "str" }, {"name.last", "str"}}})
+```
+We don't like to parse coinciding suffixes multiple time for same tuples.
+
+### 1.1. JSON-path tree in tuple format
+Keeping in mind all challenges we suppose to introduce tree-based structure in *tuple_format* that contain JSON-path lexemes and has allocated tuple *offset_slots* as leaves nodes.
+```
+   [[ Img 1.1 - JSON-path tree in tuple format ]]
+
+   ----\
+       |-----> [name] \
+       |              |----> [first]   <slot1 = -3>
+       |              |
+       |              |----> [last]    <slot2 = -1>
+       |
+       \------> [town]    <slot3 = -1>
+```
+
+For example, when first index part "name.first" is built, we would already have name offset and would able to start there parsing only "last" lexeme.
+```
+s:create_index('name'', { parts = {{ "name.first", "str" },
+                                   { "name.last",  "str" }}})
+```
+The resulting information offset would be stored in slot obtained from tree leaf node.
+
+```
+    [[ Img 1.2 - Tuple data stored in database of same format ]]
+
+                                                              |  Description  |
+
+     {-3}    {-2}    {-1}                                     |   slot index  |
+   +=======+=======+=======+======+===================+
+   | slot1 | slot2 | slot3 | town |       name        |       |  tuple schema |
+   +=======+=======+=======+======+=====+======+======+
+   |   2   |   4   |   0   | NY   | Ayn | Rand |              |  tuple1 data  |
+   +-------+-------+-------+------+-----+---+--+------+
+   |   2   |   5   |   0   | NY   | Richard | Feynman |       |  tuple2 data  |
+   +-------+-------+-------+------+---------+---------+
+                           ^      ^     ^
+                           0      2     4                     | tuple1 offset |
+
+                           ^      ^         ^
+                           0      2         5                 | tuple2 offset |
+```
+
+### 1.2. Access data by JSON-path
+The central API to access information is a routine
+```
+static inline const char *
+tuple_field_raw(const struct tuple_format *format, const char *tuple,
+                const uint32_t *field_map, uint32_t field_no)
+```
+
+In most scenarios related to indexes it is called with *field_no* extracted from *key_part* - a descriptor of part representing index.
+```
+field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, part->fieldno);
+```
+With introducing parts containing JSON path we need to rework such calls.
+
+To avoid parsing JSON path each time we suppose to introduce a cache for *offset_slot* in *key_part* structure:
+```
+struct key_part {
+  /** Tuple field index for this part */
+  uint32_t fieldno;
+  ...
+
+  /** Data JSON path. */
+  const char *data_path;
+  /** Epoch of tuple_format cached in offset_slot. */
+  uint32_t slot_epoch;
+  /** Cache of corresponding tuple_format offset_slot. */
+  int32_t slot_cache;
+};
+```
+And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation.
+```
+struct tuple_format {
+  ...
+
+  /** Epoch of tuple format. */
+  uint32_t epoch;
+  /** Formats of the fields. */
+  struct tuple_field fields[0];
+};
+```
+We would modify tuple_field to have a tree structure if necessary and represent all intimidate records.
+
+In all index-related scenarios with *tuple_field_raw* we would rework with a new function:
+```
+const char *
+tuple_field_by_part(const struct tuple *tuple, const struct key_def *def,
+                    uint32_t idx);
+
+```
+
+With first argument *tuple* we can obtain tuple_format that contain *epoch* field.
+```
+  [[ Img 2.1 - Block diagram for tuple_field_by_part routine ]]
+
+            [[def->epoch == tuple_format(tuple)->epoch]]
+                                 ||
+                  YES            ||             NO
+                 ________________/\________________
+                |                                  |
+
+     use offset_slot from             PARSE def->parts[idx]->data_path and
+  def->parts[idx].cached_slot     observe tuple_format(tuple)->fields structure,
+                                     UPDATE def->parts[idx].slot_epoch and
+                               def->parts[idx].slot_cache IF format epoch is bigger
+```
+
+PARSE is about parsing JSON path lexeme-by-lexeme and step down the fields tree until leaf with slot offset would reached.
+Then we need to UPDATE key_part cache IF epoch has increased. This *may* make epoch oscillations not so significant.
+
+To sped-up data access we suppose to use global format hashtable with all paths:
+```
+   HASHTABLE<const char *data_path, int32_t offset_slot>
+
+                            lookup
+   "[2]name.first" ----> hash1 ------> [hash1, "name.first", offset_slot1]
+   "[2]name.last"  ----> hash2 ------> [hash2, "name.last", offset_slot2]
+```
+
+## Rationale and alternatives
+### 2.1 Reallocatable slot_cache
+Epoch-based invalidation is looking complicated enough.
+It is possible to allocate slot_cache[] array for each key_part and store slots for all epoch (where epoch would be an index in that array).
+```
+                         format->epoch
+key_part {                     ||
+   slot_epoch[]:               \/
+        [1: s1, 2: s2, ... epoch: s3, ... max_epoch: sN]
+}
+
+I.e.:
+    tuple_map[key_part->slot_epoch[format->epoch]]
+```
+But as formats are created really often, this approach is really resource-consumable.
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 2/5] box: introduce slot_cache in key_part
  2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
  2018-08-15 12:14 ` [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov
@ 2018-08-15 12:15 ` Kirill Shcherbatov
  2018-08-22  0:27   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 3/5] box: introduce path field " Kirill Shcherbatov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-15 12:15 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov

Same key_part could be used in different formats multiple
times, so different field->offset_slot would be allocated.
In most scenarios we work with series of tuples of same
format, and (in general) format lookup for field would be
expensive operation for JSON-paths defined in key_part.
New slot_cache field in key_part structure and epoch-based
mechanism to validate it actuality should be effective
approach to improve performance.
New routine tuple_field_by_part use tuple and key_part
to access field that allows to rework and speedup all
scenarios of access tuple data by index.
This also allows to work with JSON-path key_parts later.

Part of #1012.
---
 src/box/alter.cc             |  4 ++
 src/box/key_def.c            |  2 +
 src/box/key_def.h            |  4 ++
 src/box/memtx_bitset.c       |  8 +++-
 src/box/memtx_rtree.c        |  6 ++-
 src/box/space.c              | 25 +++++++++++++
 src/box/space.h              | 10 +++++
 src/box/tuple_compare.cc     | 88 ++++++++++++++++++++++++++++++++------------
 src/box/tuple_extract_key.cc | 39 ++++++++++++++------
 src/box/tuple_format.c       | 12 ++++++
 src/box/tuple_format.h       | 19 ++++++++++
 src/box/tuple_hash.cc        | 49 +++++++++++++++++++-----
 src/box/vy_stmt.h            |  6 ++-
 13 files changed, 224 insertions(+), 48 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 3007a13..e1a0d9c 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -887,6 +887,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
 	alter->new_space->sequence = alter->old_space->sequence;
 	memcpy(alter->new_space->access, alter->old_space->access,
 	       sizeof(alter->old_space->access));
+	space_format_update_epoch(alter->new_space,
+				  alter->old_space->format != NULL ?
+				  alter->old_space->format->epoch : 0,
+				  &alter->key_list);
 
 	/*
 	 * Build new indexes, check if tuples conform to
diff --git a/src/box/key_def.c b/src/box/key_def.c
index ee09dc9..8a4262b 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -258,6 +258,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 	def->parts[part_no].type = type;
 	def->parts[part_no].coll = coll;
 	def->parts[part_no].coll_id = coll_id;
+	def->parts[part_no].slot_cache = TUPLE_OFFSET_SLOT_NIL;
+	def->parts[part_no].format_epoch = 0;
 	column_mask_set_fieldno(&def->column_mask, fieldno);
 	/**
 	 * When all parts are set, initialize the tuple
diff --git a/src/box/key_def.h b/src/box/key_def.h
index aecbe03..42c054c 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -74,6 +74,10 @@ struct key_part {
 	struct coll *coll;
 	/** True if a part can store NULLs. */
 	bool is_nullable;
+	/** Format epoch for slot_cache. */
+	uint64_t format_epoch;
+	/** Cache for corresponding tuple_format slot_offset. */
+	int32_t slot_cache;
 };
 
 struct key_def;
diff --git a/src/box/memtx_bitset.c b/src/box/memtx_bitset.c
index a665f1a..cb41be1 100644
--- a/src/box/memtx_bitset.c
+++ b/src/box/memtx_bitset.c
@@ -283,8 +283,12 @@ memtx_bitset_index_replace(struct index *base, struct tuple *old_tuple,
 	}
 
 	if (new_tuple != NULL) {
-		const char *field;
-		field = tuple_field(new_tuple, base->def->key_def->parts[0].fieldno);
+		const char *field =
+			tuple_field_by_part(tuple_format(new_tuple),
+					    tuple_data(new_tuple),
+					    tuple_field_map(new_tuple),
+					    (struct key_part *)
+						base->def->key_def->parts);
 		uint32_t key_len;
 		const void *key = make_key(field, &key_len);
 #ifndef OLD_GOOD_BITSET
diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c
index 0b12cda..9a49acd 100644
--- a/src/box/memtx_rtree.c
+++ b/src/box/memtx_rtree.c
@@ -112,7 +112,11 @@ extract_rectangle(struct rtree_rect *rect, const struct tuple *tuple,
 		  struct index_def *index_def)
 {
 	assert(index_def->key_def->part_count == 1);
-	const char *elems = tuple_field(tuple, index_def->key_def->parts[0].fieldno);
+	const char *elems =
+		tuple_field_by_part(tuple_format(tuple), tuple_data(tuple),
+				    tuple_field_map(tuple),
+				    (struct key_part *)
+					index_def->key_def->parts);
 	unsigned dimension = index_def->opts.dimension;
 	uint32_t count = mp_decode_array(&elems);
 	return mp_decode_rect(rect, dimension, elems, count, "Field");
diff --git a/src/box/space.c b/src/box/space.c
index 871cc67..c34428a 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -226,6 +226,31 @@ space_index_key_def(struct space *space, uint32_t id)
 }
 
 void
+space_format_update_epoch(struct space *space, uint64_t last_epoch,
+			  struct rlist *key_list)
+{
+	struct tuple_format *format = space->format;
+	if (format == NULL)
+		return;
+	bool is_format_epoch_changed = false;
+	struct index_def *index_def;
+	rlist_foreach_entry(index_def, key_list, link) {
+		struct key_part *part = index_def->key_def->parts;
+		struct key_part *parts_end =
+			part + index_def->key_def->part_count;
+		for (; part < parts_end; part++) {
+			struct tuple_field *field =
+				&format->fields[part->fieldno];
+			if (field->offset_slot != part->slot_cache)
+				is_format_epoch_changed = true;
+		}
+	}
+	format->epoch = last_epoch;
+	if (is_format_epoch_changed)
+		format->epoch++;
+}
+
+void
 generic_space_swap_index(struct space *old_space, struct space *new_space,
 			 uint32_t old_index_id, uint32_t new_index_id)
 {
diff --git a/src/box/space.h b/src/box/space.h
index 8888ec8..8d13bc8 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -228,6 +228,16 @@ struct key_def *
 space_index_key_def(struct space *space, uint32_t id);
 
 /**
+ * Setup space format epoch value.
+ * @param space to update.
+ * @param last_epoch last space epo
+ * @param key_list list of index_defs.
+ */
+void
+space_format_update_epoch(struct space *space, uint64_t last_epoch,
+			  struct rlist *key_list);
+
+/**
  * Look up the index by id.
  */
 static inline struct index *
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index e53afba..f07b695 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -432,9 +432,17 @@ tuple_common_key_parts(const struct tuple *tuple_a,
 {
 	uint32_t i;
 	for (i = 0; i < key_def->part_count; i++) {
-		const struct key_part *part = &key_def->parts[i];
-		const char *field_a = tuple_field(tuple_a, part->fieldno);
-		const char *field_b = tuple_field(tuple_b, part->fieldno);
+		struct key_part *part = (struct key_part *)&key_def->parts[i];
+		const char *field_a =
+			tuple_field_by_part(tuple_format(tuple_a),
+					    tuple_data(tuple_a),
+					    tuple_field_map(tuple_a),
+					    part);
+		const char *field_b =
+			tuple_field_by_part(tuple_format(tuple_b),
+					    tuple_data(tuple_b),
+					    tuple_field_map(tuple_b),
+					    part);
 		enum mp_type a_type = field_a != NULL ?
 				      mp_typeof(*field_a) : MP_NIL;
 		enum mp_type b_type = field_b != NULL ?
@@ -449,7 +457,7 @@ tuple_common_key_parts(const struct tuple *tuple_a,
 	return i;
 }
 
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool is_flat>
 static inline int
 tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		       const struct key_def *key_def)
@@ -498,10 +506,21 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		end = part + key_def->part_count;
 
 	for (; part < end; part++) {
-		field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a,
-					  part->fieldno);
-		field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b,
-					  part->fieldno);
+		if (is_flat) {
+			field_a = tuple_field_raw(format_a, tuple_a_raw,
+						  field_map_a,
+						  part->fieldno);
+			field_b = tuple_field_raw(format_b, tuple_b_raw,
+						  field_map_b,
+						  part->fieldno);
+		} else {
+			field_a = tuple_field_by_part(format_a, tuple_a_raw,
+						      field_map_a,
+						      (struct key_part *)part);
+			field_b = tuple_field_by_part(format_b, tuple_b_raw,
+						      field_map_b,
+						      (struct key_part *)part);
+		}
 		assert(has_optional_parts ||
 		       (field_a != NULL && field_b != NULL));
 		if (! is_nullable) {
@@ -548,10 +567,21 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	 */
 	end = key_def->parts + key_def->part_count;
 	for (; part < end; ++part) {
-		field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a,
-					  part->fieldno);
-		field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b,
-					  part->fieldno);
+		if (is_flat) {
+			field_a = tuple_field_raw(format_a, tuple_a_raw,
+						  field_map_a,
+						  part->fieldno);
+			field_b = tuple_field_raw(format_b, tuple_b_raw,
+						  field_map_b,
+						  part->fieldno);
+		} else {
+			field_a = tuple_field_by_part(format_a, tuple_a_raw,
+						      field_map_a,
+						      (struct key_part *)part);
+			field_b = tuple_field_by_part(format_b, tuple_b_raw,
+						      field_map_b,
+						      (struct key_part *)part);
+		}
 		/*
 		 * Extended parts are primary, and they can not
 		 * be absent or be NULLs.
@@ -565,7 +595,7 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	return 0;
 }
 
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool is_flat>
 static inline int
 tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 				uint32_t part_count,
@@ -583,8 +613,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	enum mp_type a_type, b_type;
 	if (likely(part_count == 1)) {
 		const char *field;
-		field = tuple_field_raw(format, tuple_raw, field_map,
-					part->fieldno);
+		if (is_flat) {
+			field = tuple_field_raw(format, tuple_raw, field_map,
+						part->fieldno);
+		} else {
+			field = tuple_field_by_part(format, tuple_raw,
+						    field_map,
+						    (struct key_part *)part);
+		}
 		if (! is_nullable) {
 			return tuple_compare_field(field, key, part->type,
 						   part->coll);
@@ -609,8 +645,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	int rc;
 	for (; part < end; ++part, mp_next(&key)) {
 		const char *field;
-		field = tuple_field_raw(format, tuple_raw, field_map,
-					part->fieldno);
+		if (is_flat) {
+			field = tuple_field_raw(format, tuple_raw, field_map,
+						part->fieldno);
+		} else {
+			field = tuple_field_by_part(format, tuple_raw,
+						    field_map,
+						    (struct key_part *)part);
+		}
 		if (! is_nullable) {
 			rc = tuple_compare_field(field, key, part->type,
 						 part->coll);
@@ -1016,9 +1058,9 @@ tuple_compare_create(const struct key_def *def)
 			else
 				return tuple_compare_sequential<true, false>;
 		} else if (def->has_optional_parts) {
-			return tuple_compare_slowpath<true, true>;
+			return tuple_compare_slowpath<true, true, true>;
 		} else {
-			return tuple_compare_slowpath<true, false>;
+			return tuple_compare_slowpath<true, false, true>;
 		}
 	}
 	assert(! def->has_optional_parts);
@@ -1041,7 +1083,7 @@ tuple_compare_create(const struct key_def *def)
 	if (key_def_is_sequential(def))
 		return tuple_compare_sequential<false, false>;
 	else
-		return tuple_compare_slowpath<false, false>;
+		return tuple_compare_slowpath<false, false, true>;
 }
 
 /* }}} tuple_compare */
@@ -1236,9 +1278,9 @@ tuple_compare_with_key_create(const struct key_def *def)
 									 false>;
 			}
 		} else if (def->has_optional_parts) {
-			return tuple_compare_with_key_slowpath<true, true>;
+			return tuple_compare_with_key_slowpath<true, true, true>;
 		} else {
-			return tuple_compare_with_key_slowpath<true, false>;
+			return tuple_compare_with_key_slowpath<true, false, true>;
 		}
 	}
 	assert(! def->has_optional_parts);
@@ -1264,7 +1306,7 @@ tuple_compare_with_key_create(const struct key_def *def)
 	if (key_def_is_sequential(def))
 		return tuple_compare_with_key_sequential<false, false>;
 	else
-		return tuple_compare_with_key_slowpath<false, false>;
+		return tuple_compare_with_key_slowpath<false, false, true>;
 }
 
 /* }}} tuple_compare_with_key */
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index 880abb6..d95ee8d 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -91,7 +91,7 @@ tuple_extract_key_sequential(const struct tuple *tuple,
  * General-purpose implementation of tuple_extract_key()
  * @copydoc tuple_extract_key()
  */
-template <bool contains_sequential_parts, bool has_optional_parts>
+template <bool contains_sequential_parts, bool has_optional_parts, bool is_flat>
 static char *
 tuple_extract_key_slowpath(const struct tuple *tuple,
 			   const struct key_def *key_def, uint32_t *key_size)
@@ -110,9 +110,15 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 
 	/* Calculate the key size. */
 	for (uint32_t i = 0; i < part_count; ++i) {
-		const char *field =
-			tuple_field_raw(format, data, field_map,
-					key_def->parts[i].fieldno);
+		const char *field;
+		if (is_flat) {
+			field = tuple_field_raw(format, data, field_map,
+						key_def->parts[i].fieldno);
+		} else {
+			field = tuple_field_by_part(format, data, field_map,
+						    (struct key_part *)
+							&key_def->parts[i]);
+		}
 		if (has_optional_parts && field == NULL) {
 			bsize += mp_sizeof_nil();
 			continue;
@@ -152,9 +158,15 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 	}
 	char *key_buf = mp_encode_array(key, part_count);
 	for (uint32_t i = 0; i < part_count; ++i) {
-		const char *field =
-			tuple_field_raw(format, data, field_map,
-					key_def->parts[i].fieldno);
+		const char *field;
+		if (is_flat) {
+			field = tuple_field_raw(format, data, field_map,
+						key_def->parts[i].fieldno);
+		} else {
+			field = tuple_field_by_part(format, data, field_map,
+						    (struct key_part *)
+							&key_def->parts[i]);
+		}
 		if (has_optional_parts && field == NULL) {
 			key_buf = mp_encode_nil(key_buf);
 			continue;
@@ -318,19 +330,22 @@ tuple_extract_key_set(struct key_def *key_def)
 			assert(key_def->is_nullable);
 			if (key_def_contains_sequential_parts(key_def)) {
 				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, true>;
+					tuple_extract_key_slowpath<true, true,
+								   true>;
 			} else {
 				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false, true>;
+					tuple_extract_key_slowpath<false, true,
+								   true>;
 			}
 		} else {
 			if (key_def_contains_sequential_parts(key_def)) {
 				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, false>;
+					tuple_extract_key_slowpath<true, false,
+								   true>;
 			} else {
 				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false,
-								   false>;
+					tuple_extract_key_slowpath<false, false,
+								   true>;
 			}
 		}
 	}
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 2e19d2e..a9fddc0 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -233,6 +233,11 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
+	/*
+	 * Set invalid epoch that should be changed later on
+	 * attaching to space.
+	 */
+	format->epoch = 1;
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
 	format->field_count = field_count;
@@ -542,6 +547,13 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
 	return -1;
 }
 
+const char *
+tuple_field_by_part(const struct tuple_format *format, const char *data,
+		    const uint32_t *field_map, struct key_part *part)
+{
+	return tuple_field_raw(format, data, field_map, part->fieldno);
+}
+
 int
 tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
                         const uint32_t *field_map, const char *path,
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index c7dc48f..a989917 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -115,6 +115,11 @@ struct tuple_field {
  * Tuple format describes how tuple is stored and information about its fields
  */
 struct tuple_format {
+	/**
+	 * Tuple format epoch to validate key_part slot_cache
+	 * actuality. Changed on space rebuild if required.
+	 */
+	uint64_t epoch;
 	/** Virtual function table */
 	struct tuple_format_vtab vtab;
 	/** Pointer to engine-specific data. */
@@ -324,6 +329,20 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple);
 
 /**
+ * Get a field refereed by multipart index @def part @idx in
+ * @tuple.
+ *
+ * @param format tuple format
+ * @param tuple a pointer to MessagePack array
+ * @param field_map a pointer to the LAST element of field map
+ * @param part multipart index part to use.
+ * @retval field data if field exists or NULL
+ */
+const char *
+tuple_field_by_part(const struct tuple_format *format, const char *data,
+		    const uint32_t *field_map, struct key_part *part);
+
+/**
  * Get a field at the specific position in this MessagePack array.
  * Returns a pointer to MessagePack data.
  * @param format tuple format
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index dee9be3..272e814 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -157,7 +157,11 @@ struct TupleHash
 		uint32_t h = HASH_SEED;
 		uint32_t carry = 0;
 		uint32_t total_size = 0;
-		const char *field = tuple_field(tuple, key_def->parts->fieldno);
+		const char *field =
+			tuple_field_by_part(tuple_format(tuple),
+					    tuple_data(tuple),
+					    tuple_field_map(tuple),
+					    (struct key_part *)key_def->parts);
 		TupleFieldHash<TYPE, MORE_TYPES...>::
 			hash(&field, &h, &carry, &total_size);
 		return PMurHash32_Result(h, carry, total_size);
@@ -169,7 +173,11 @@ struct TupleHash<FIELD_TYPE_UNSIGNED> {
 	static uint32_t	hash(const struct tuple *tuple,
 			     const struct key_def *key_def)
 	{
-		const char *field = tuple_field(tuple, key_def->parts->fieldno);
+		const char *field =
+			tuple_field_by_part(tuple_format(tuple),
+					    tuple_data(tuple),
+					    tuple_field_map(tuple),
+					    (struct key_part *)key_def->parts);
 		uint64_t val = mp_decode_uint(&field);
 		if (likely(val <= UINT32_MAX))
 			return val;
@@ -211,7 +219,7 @@ static const hasher_signature hash_arr[] = {
 
 #undef HASHER
 
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool is_flat>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def);
 
@@ -255,9 +263,9 @@ tuple_hash_func_set(struct key_def *key_def) {
 
 slowpath:
 	if (key_def->has_optional_parts)
-		key_def->tuple_hash = tuple_hash_slowpath<true>;
+		key_def->tuple_hash = tuple_hash_slowpath<true, true>;
 	else
-		key_def->tuple_hash = tuple_hash_slowpath<false>;
+		key_def->tuple_hash = tuple_hash_slowpath<false, true>;
 	key_def->key_hash = key_hash_slowpath;
 }
 
@@ -312,13 +320,16 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry,
 		    const struct tuple *tuple,
 		    const struct key_part *part)
 {
-	const char *field = tuple_field(tuple, part->fieldno);
+	const char *field =
+		tuple_field_by_part(tuple_format(tuple), tuple_data(tuple),
+				    tuple_field_map(tuple),
+				    (struct key_part *)part);
 	if (field == NULL)
 		return tuple_hash_null(ph1, pcarry);
 	return tuple_hash_field(ph1, pcarry, &field, part->coll);
 }
 
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool is_flat>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
 {
@@ -327,7 +338,15 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
 	uint32_t carry = 0;
 	uint32_t total_size = 0;
 	uint32_t prev_fieldno = key_def->parts[0].fieldno;
-	const char *field = tuple_field(tuple, key_def->parts[0].fieldno);
+	const char *field;
+	if (is_flat) {
+		field = tuple_field(tuple, prev_fieldno);
+	} else {
+		field = tuple_field_by_part(tuple_format(tuple),
+					    tuple_data(tuple),
+					    tuple_field_map(tuple),
+					    (struct key_part *)&key_def->parts);
+	}
 	const char *end = (char *)tuple + tuple_size(tuple);
 	if (has_optional_parts && field == NULL) {
 		total_size += tuple_hash_null(&h, &carry);
@@ -341,7 +360,19 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
 		 * need of tuple_field
 		 */
 		if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) {
-			field = tuple_field(tuple, key_def->parts[part_id].fieldno);
+			if (is_flat) {
+				field = tuple_field(tuple,
+						    key_def->parts[part_id].
+						    fieldno);
+			} else {
+				struct key_part *part =
+					(struct key_part *)
+					&key_def->parts[part_id];
+				field = tuple_field_by_part(tuple_format(tuple),
+							    tuple_data(tuple),
+							    tuple_field_map(tuple),
+							    part);
+			}
 		}
 		if (has_optional_parts && (field == NULL || field >= end)) {
 			total_size += tuple_hash_null(&h, &carry);
diff --git a/src/box/vy_stmt.h b/src/box/vy_stmt.h
index e53f98c..233c800 100644
--- a/src/box/vy_stmt.h
+++ b/src/box/vy_stmt.h
@@ -665,7 +665,11 @@ static inline bool
 vy_tuple_key_contains_null(const struct tuple *tuple, const struct key_def *def)
 {
 	for (uint32_t i = 0; i < def->part_count; ++i) {
-		const char *field = tuple_field(tuple, def->parts[i].fieldno);
+		const char *field =
+			tuple_field_by_part(tuple_format(tuple),
+					    tuple_data(tuple),
+					    tuple_field_map(tuple),
+					    (struct key_part *)&def->parts[i]);
 		if (field == NULL || mp_typeof(*field) == MP_NIL)
 			return true;
 	}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 3/5] box: introduce path field in key_part
  2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
  2018-08-15 12:14 ` [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 2/5] box: introduce slot_cache in key_part Kirill Shcherbatov
@ 2018-08-15 12:15 ` Kirill Shcherbatov
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 4/5] box: introduce path_hash and tuple_field tree Kirill Shcherbatov
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
  4 siblings, 1 reply; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-15 12:15 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov

As we need to store user-defined JSON path in key_part
and key_part_def, we have introduced path and path_len
fields. JSON path is verified and transformed to canonical
form on index msgpack unpack.
Path string stored as a part of the key_def allocation:

+-------+---------+-------+---------+-------+-------+-------+
|key_def|key_part1|  ...  |key_partN| path1 | pathK | pathN |
+-------+---------+-------+---------+-------+-------+-------+
          |                         ^
          |-> path _________________|

Because of field names specified as format could be changed
key_part path persisted in Tarantool should be always started
with first-level field access via array index(not by name).

Part of #1012.
---
 src/box/index_def.c          |  10 +-
 src/box/key_def.c            | 276 ++++++++++++++++++++++++++++++++++++++-----
 src/box/key_def.h            |  21 +++-
 src/box/lua/space.cc         |   5 +
 src/box/memtx_engine.c       |   5 +
 src/box/schema.cc            |  12 +-
 src/box/tuple_compare.cc     |   2 +
 src/box/tuple_extract_key.cc |   1 +
 src/box/tuple_hash.cc        |   1 +
 src/box/vinyl.c              |   5 +
 src/box/vy_log.c             |   3 +-
 11 files changed, 300 insertions(+), 41 deletions(-)

diff --git a/src/box/index_def.c b/src/box/index_def.c
index 9cda63c..f67b952 100644
--- a/src/box/index_def.c
+++ b/src/box/index_def.c
@@ -209,8 +209,14 @@ index_def_is_valid(struct index_def *index_def, const char *space_name)
 			 * Courtesy to a user who could have made
 			 * a typo.
 			 */
-			if (index_def->key_def->parts[i].fieldno ==
-			    index_def->key_def->parts[j].fieldno) {
+			struct key_part *part_a = &index_def->key_def->parts[i];
+			struct key_part *part_b = &index_def->key_def->parts[j];
+			if ((part_a->fieldno == part_b->fieldno &&
+			     part_a->path == NULL && part_b->path == NULL) ||
+			    (part_a->path_len != 0 &&
+			     part_a->path_len == part_b->path_len &&
+			     memcmp(part_a->path, part_b->path,
+				    part_a->path_len) == 0)) {
 				diag_set(ClientError, ER_MODIFY_INDEX,
 					 index_def->name, space_name,
 					 "same key part is indexed twice");
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 8a4262b..b00e46d 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -35,12 +35,15 @@
 #include "column_mask.h"
 #include "schema_def.h"
 #include "coll_id_cache.h"
+#include "fiber.h"
+#include "json/path.h"
 
 static const struct key_part_def key_part_def_default = {
 	0,
 	field_type_MAX,
 	COLL_NONE,
 	false,
+	NULL
 };
 
 static int64_t
@@ -53,6 +56,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len)
 #define PART_OPT_FIELD		"field"
 #define PART_OPT_COLLATION	"collation"
 #define PART_OPT_NULLABILITY	"is_nullable"
+#define PART_OPT_PATH		"path"
 
 const struct opt_def part_def_reg[] = {
 	OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type,
@@ -61,6 +65,7 @@ const struct opt_def part_def_reg[] = {
 	OPT_DEF(PART_OPT_COLLATION, OPT_UINT32, struct key_part_def, coll_id),
 	OPT_DEF(PART_OPT_NULLABILITY, OPT_BOOL, struct key_part_def,
 		is_nullable),
+	OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path),
 	OPT_END,
 };
 
@@ -97,12 +102,22 @@ struct key_def *
 key_def_dup(const struct key_def *src)
 {
 	size_t sz = key_def_sizeof(src->part_count);
+	const struct key_part *parts = src->parts;
+	const struct key_part *parts_end = parts + src->part_count;
+	for (; parts < parts_end; parts++)
+		sz += parts->path != NULL ? parts->path_len + 1 : 0;
 	struct key_def *res = (struct key_def *)malloc(sz);
 	if (res == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "res");
 		return NULL;
 	}
 	memcpy(res, src, sz);
+	for (uint32_t i = 0; i < src->part_count; i++) {
+		if (src->parts[i].path == NULL)
+			continue;
+		size_t path_offset = src->parts[i].path - (char *)src;
+		res->parts[i].path = (char *)res + path_offset;
+	}
 	return res;
 }
 
@@ -110,8 +125,17 @@ void
 key_def_swap(struct key_def *old_def, struct key_def *new_def)
 {
 	assert(old_def->part_count == new_def->part_count);
-	for (uint32_t i = 0; i < new_def->part_count; i++)
-		SWAP(old_def->parts[i], new_def->parts[i]);
+	for (uint32_t i = 0; i < new_def->part_count; i++) {
+		if (old_def->parts[i].path == NULL) {
+			SWAP(old_def->parts[i], new_def->parts[i]);
+		} else {
+			size_t path_offset =
+				old_def->parts[i].path - (char *)old_def;
+			SWAP(old_def->parts[i], new_def->parts[i]);
+			old_def->parts[i].path = (char *)old_def + path_offset;
+			new_def->parts[i].path = (char *)new_def + path_offset;
+		}
+	}
 	SWAP(*old_def, *new_def);
 }
 
@@ -131,9 +155,9 @@ key_def_set_cmp(struct key_def *def)
 }
 
 struct key_def *
-key_def_new(uint32_t part_count)
+key_def_new(uint32_t part_count, size_t extra_size)
 {
-	size_t sz = key_def_sizeof(part_count);
+	size_t sz = key_def_sizeof(part_count) + extra_size;
 	/** Use calloc() to zero comparator function pointers. */
 	struct key_def *key_def = (struct key_def *) calloc(1, sz);
 	if (key_def == NULL) {
@@ -148,10 +172,13 @@ key_def_new(uint32_t part_count)
 struct key_def *
 key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count)
 {
-	struct key_def *def = key_def_new(part_count);
+	size_t sz = 0;
+	for (uint32_t i = 0; i < part_count; i++)
+		sz += parts[i].path != NULL ? strlen(parts[i].path) + 1 : 0;
+	struct key_def *def = key_def_new(part_count, sz);
 	if (def == NULL)
 		return NULL;
-
+	char *data = (char *)def + key_def_sizeof(part_count);
 	for (uint32_t i = 0; i < part_count; i++) {
 		struct key_part_def *part = &parts[i];
 		struct coll *coll = NULL;
@@ -165,14 +192,22 @@ key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count)
 			}
 			coll = coll_id->coll;
 		}
+		uint32_t path_len = 0;
+		if (part->path != NULL) {
+			path_len = strlen(part->path);
+			def->parts[i].path = data;
+			data += path_len + 1;
+		}
 		key_def_set_part(def, i, part->fieldno, part->type,
-				 part->is_nullable, coll, part->coll_id);
+				 part->is_nullable, coll, part->coll_id,
+				 part->path, path_len);
 	}
 	return def;
 }
 
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+		   struct key_part_def *parts)
 {
 	for (uint32_t i = 0; i < def->part_count; i++) {
 		const struct key_part *part = &def->parts[i];
@@ -181,13 +216,26 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
 		part_def->type = part->type;
 		part_def->is_nullable = part->is_nullable;
 		part_def->coll_id = part->coll_id;
+		if (part->path != NULL) {
+			part_def->path = region_alloc(pool, part->path_len + 1);
+			if (part_def->path == NULL) {
+				diag_set(OutOfMemory, part->path_len + 1,
+					 "region_alloc", "part_def->path");
+				return -1;
+			}
+			memcpy(part_def->path, part->path, part->path_len);
+			part_def->path[part->path_len] = '\0';
+		} else {
+			part_def->path = NULL;
+		}
 	}
+	return 0;
 }
 
 box_key_def_t *
 box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 {
-	struct key_def *key_def = key_def_new(part_count);
+	struct key_def *key_def = key_def_new(part_count, 0);
 	if (key_def == NULL)
 		return key_def;
 
@@ -195,7 +243,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 		key_def_set_part(key_def, item, fields[item],
 				 (enum field_type)types[item],
 				 key_part_def_default.is_nullable, NULL,
-				 COLL_NONE);
+				 COLL_NONE, NULL, 0);
 	}
 	return key_def;
 }
@@ -241,6 +289,11 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
 		if (part1->is_nullable != part2->is_nullable)
 			return part1->is_nullable <
 			       part2->is_nullable ? -1 : 1;
+		/* Lexicographic strings order. */
+		uint32_t len = MIN(part1->path_len, part2->path_len);
+		int rc = 0;
+		if ((rc = memcmp(part1->path, part2->path, len)) != 0)
+			return rc;
 	}
 	return part_count1 < part_count2 ? -1 : part_count1 > part_count2;
 }
@@ -248,11 +301,12 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
 void
 key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 		 enum field_type type, bool is_nullable, struct coll *coll,
-		 uint32_t coll_id)
+		 uint32_t coll_id, char *path, uint32_t path_len)
 {
 	assert(part_no < def->part_count);
 	assert(type < field_type_MAX);
 	def->is_nullable |= is_nullable;
+	def->has_json_paths |= (path != NULL);
 	def->parts[part_no].is_nullable = is_nullable;
 	def->parts[part_no].fieldno = fieldno;
 	def->parts[part_no].type = type;
@@ -260,6 +314,15 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 	def->parts[part_no].coll_id = coll_id;
 	def->parts[part_no].slot_cache = TUPLE_OFFSET_SLOT_NIL;
 	def->parts[part_no].format_epoch = 0;
+	if (path != NULL) {
+		def->parts[part_no].path_len = path_len;
+		assert(def->parts[part_no].path != NULL);
+		memcpy(def->parts[part_no].path, path, path_len);
+		def->parts[part_no].path[path_len] = '\0';
+	} else {
+		def->parts[part_no].path_len = 0;
+		def->parts[part_no].path = NULL;
+	}
 	column_mask_set_fieldno(&def->column_mask, fieldno);
 	/**
 	 * When all parts are set, initialize the tuple
@@ -304,8 +367,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts,
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		assert(part->type < field_type_MAX);
-		SNPRINT(total, snprintf, buf, size, "%d, '%s'",
-			(int)part->fieldno, field_type_strs[part->type]);
+		if (part->path != NULL) {
+			SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'",
+				(int) part->fieldno, part->path,
+				field_type_strs[part->type]);
+		} else {
+			SNPRINT(total, snprintf, buf, size, "%d, '%s'",
+				(int) part->fieldno,
+				field_type_strs[part->type]);
+		}
 		if (i < part_count - 1)
 			SNPRINT(total, snprintf, buf, size, ", ");
 	}
@@ -324,6 +394,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 			count++;
 		if (part->is_nullable)
 			count++;
+		if (part->path != NULL)
+			count++;
 		size += mp_sizeof_map(count);
 		size += mp_sizeof_str(strlen(PART_OPT_FIELD));
 		size += mp_sizeof_uint(part->fieldno);
@@ -338,6 +410,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 			size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY));
 			size += mp_sizeof_bool(part->is_nullable);
 		}
+		if (part->path != NULL) {
+			size += mp_sizeof_str(strlen(PART_OPT_PATH));
+			size += mp_sizeof_str(strlen(part->path));
+		}
 	}
 	return size;
 }
@@ -351,6 +427,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 		int count = 2;
 		if (part->coll_id != COLL_NONE)
 			count++;
+		if (part->path != NULL)
+			count++;
 		if (part->is_nullable)
 			count++;
 		data = mp_encode_map(data, count);
@@ -372,6 +450,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 					     strlen(PART_OPT_NULLABILITY));
 			data = mp_encode_bool(data, part->is_nullable);
 		}
+		if (part->path != NULL) {
+			data = mp_encode_str(data, PART_OPT_PATH,
+					     strlen(PART_OPT_PATH));
+			data = mp_encode_str(data, part->path,
+					     strlen(part->path));
+		}
 	}
 	return data;
 }
@@ -432,10 +516,111 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
 				     fields[part->fieldno].is_nullable :
 				     key_part_def_default.is_nullable);
 		part->coll_id = COLL_NONE;
+		part->path = NULL;
 	}
 	return 0;
 }
 
+/**
+ * Verify key_part JSON path and convert to canonical form.
+ *
+ * @param region to make allocations.
+ * @param part with path to update.
+ * @param path_extra alloated space to reuse if possible.
+ * @param path_extra_size @path_extra size.
+ *
+ * @retval -1 on error.
+ * @retval 0 on success.
+ */
+static int
+key_def_normalize_json_path(struct region *region, struct key_part_def *part,
+			    char **path_extra, uint32_t *path_extra_size)
+{
+	const char *err_msg = NULL;
+
+	uint32_t allocated_size = *path_extra_size;
+	char *path = *path_extra;
+
+	uint32_t path_len = strlen(part->path);
+	struct json_path_parser parser;
+	struct json_path_node node;
+	json_path_parser_create(&parser, part->path, path_len);
+	/*
+	 * A worst-case scenario is .a -> ["a"]
+	 * i.e. 3*path_len + 1 is enough.
+	 */
+	uint32_t new_path_size = 3 * path_len + 1;
+	if (new_path_size >= allocated_size) {
+		path = region_alloc(region, new_path_size);
+		if (path == NULL) {
+			diag_set(OutOfMemory, new_path_size,
+				 "region_alloc", "path");
+			return -1;
+		}
+		allocated_size = new_path_size;
+	}
+	assert(path != NULL);
+	part->path = path;
+	int rc = json_path_next(&parser, &node);
+	if (rc != 0)
+		goto error_invalid_json;
+	if (node.type != JSON_PATH_NUM) {
+		diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+			 part->fieldno,
+			 "invalid JSON path: first part should "
+			 "be defined as array index");
+		return -1;
+	}
+	if (node.num - TUPLE_INDEX_BASE != part->fieldno) {
+		diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+			 part->fieldno,
+			 "invalid JSON path: first part refers "
+			 "to invalid field");
+		return -1;
+	}
+	uint32_t lexemes = 0;
+	do {
+		if (node.type == JSON_PATH_NUM) {
+			path += sprintf(path, "[%u]", (uint32_t) node.num);
+		} else if (node.type == JSON_PATH_STR) {
+			path += sprintf(path, "[\"%.*s\"]", node.len, node.str);
+		} else {
+			unreachable();
+		}
+		lexemes++;
+	} while ((rc = json_path_next(&parser, &node)) == 0 &&
+		 node.type != JSON_PATH_END);
+	if (rc != 0 || node.type != JSON_PATH_END)
+		goto error_invalid_json;
+	if (lexemes == 1) {
+		/* JSON index is useless. */
+		path = part->path;
+		part->path = NULL;
+	} else {
+		/* Skip terminating zero. */
+		path++;
+		/* Account constructed string size. */
+		allocated_size -= path - part->path;
+	}
+	/* Going to try to reuse extra allocation next time. */
+	if ((uint32_t)parser.src_len > path_len) {
+		*path_extra = path;
+		*path_extra_size = allocated_size;
+	} else {
+		*path_extra = (char *)parser.src;
+		*path_extra_size = parser.src_len;
+	}
+	return 0;
+
+error_invalid_json:
+	err_msg = tt_sprintf("invalid JSON path '%.*s': path has invalid "
+			     "structure (error at position %d)", parser.src_len,
+			     parser.src, parser.symbol_count);
+	diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+		 part->fieldno + TUPLE_INDEX_BASE, err_msg);
+	return -1;
+}
+
 int
 key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 		     const char **data, const struct field_def *fields,
@@ -445,8 +630,11 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 		return key_def_decode_parts_166(parts, part_count, data,
 						fields, field_count);
 	}
-	for (uint32_t i = 0; i < part_count; i++) {
-		struct key_part_def *part = &parts[i];
+	char *path = NULL;
+	uint32_t allocated_size = 0;
+	struct key_part_def *part = parts;
+	struct region *region = &fiber()->gc;
+	for (uint32_t i = 0; i < part_count; i++, part++) {
 		if (mp_typeof(**data) != MP_MAP) {
 			diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
 				 i + TUPLE_INDEX_BASE,
@@ -456,7 +644,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 		*part = key_part_def_default;
 		if (opts_decode(part, part_def_reg, data,
 				ER_WRONG_INDEX_OPTIONS, i + TUPLE_INDEX_BASE,
-				NULL) != 0)
+				region) != 0)
 			return -1;
 		if (part->type == field_type_MAX) {
 			diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
@@ -473,6 +661,10 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 				 "string and scalar parts");
 			return -1;
 		}
+		if (part->path != NULL &&
+		    key_def_normalize_json_path(region, part, &path,
+						&allocated_size) != 0)
+			return -1;
 	}
 	return 0;
 }
@@ -497,6 +689,7 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count,
 				     fields[part->fieldno].is_nullable :
 				     key_part_def_default.is_nullable);
 		part->coll_id = COLL_NONE;
+		part->path = NULL;
 	}
 	return 0;
 }
@@ -533,18 +726,29 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	 * Find and remove part duplicates, i.e. parts counted
 	 * twice since they are present in both key defs.
 	 */
-	const struct key_part *part = second->parts;
-	const struct key_part *end = part + second->part_count;
+	size_t sz = 0;
+	const struct key_part *part = first->parts;
+	const struct key_part *end = part + first->part_count;
+	for (; part != end; part++) {
+		if (part->path != NULL)
+			sz += part->path_len + 1;
+	}
+	part = second->parts;
+	end = part + second->part_count;
 	for (; part != end; part++) {
-		if (key_def_find(first, part->fieldno))
+		const struct key_part *duplicate =
+			key_def_find(first, part->fieldno);
+		if (duplicate != NULL &&
+		    part->path_len == duplicate->path_len &&
+		    memcmp(part->path, duplicate->path, part->path_len) == 0)
 			--new_part_count;
+		else if (part->path != NULL)
+			sz += part->path_len + 1;
 	}
-
-	struct key_def *new_def;
-	new_def =  (struct key_def *)calloc(1, key_def_sizeof(new_part_count));
+	sz += key_def_sizeof(new_part_count);
+	struct key_def *new_def = (struct key_def *)calloc(1, sz);
 	if (new_def == NULL) {
-		diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc",
-			 "new_def");
+		diag_set(OutOfMemory, sz, "calloc", "new_def");
 		return NULL;
 	}
 	new_def->part_count = new_part_count;
@@ -552,24 +756,40 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	new_def->is_nullable = first->is_nullable || second->is_nullable;
 	new_def->has_optional_parts = first->has_optional_parts ||
 				      second->has_optional_parts;
+	/* Path data write position in the new key_def. */
+	char *data = (char *)new_def + key_def_sizeof(new_part_count);
 	/* Write position in the new key def. */
 	uint32_t pos = 0;
 	/* Append first key def's parts to the new index_def. */
 	part = first->parts;
 	end = part + first->part_count;
 	for (; part != end; part++) {
+		if (part->path != NULL) {
+			new_def->parts[pos].path = data;
+			data += part->path_len + 1;
+		}
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
-				 part->is_nullable, part->coll, part->coll_id);
+				 part->is_nullable, part->coll, part->coll_id,
+				 part->path, part->path_len);
 	}
 
 	/* Set-append second key def's part to the new key def. */
 	part = second->parts;
 	end = part + second->part_count;
 	for (; part != end; part++) {
-		if (key_def_find(first, part->fieldno))
+		const struct key_part *duplicate =
+			key_def_find(first, part->fieldno);
+		if (duplicate != NULL &&
+		    part->path_len == duplicate->path_len &&
+		    memcmp(part->path, duplicate->path, part->path_len) == 0)
 			continue;
+		if (part->path != NULL) {
+			new_def->parts[pos].path = data;
+			data += part->path_len + 1;
+		}
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
-				 part->is_nullable, part->coll, part->coll_id);
+				 part->is_nullable, part->coll, part->coll_id,
+				 part->path, part->path_len);
 	}
 	return new_def;
 }
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 42c054c..b6d6259 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -54,6 +54,8 @@ struct key_part_def {
 	uint32_t coll_id;
 	/** True if a key part can store NULLs. */
 	bool is_nullable;
+	/** JSON path to data. */
+	char *path;
 };
 
 /**
@@ -78,6 +80,10 @@ struct key_part {
 	uint64_t format_epoch;
 	/** Cache for corresponding tuple_format slot_offset. */
 	int32_t slot_cache;
+	/** JSON path to data. */
+	char *path;
+	/** JSON path length. */
+	uint32_t path_len;
 };
 
 struct key_def;
@@ -137,6 +143,10 @@ struct key_def {
 	 * fields assumed to be MP_NIL.
 	 */
 	bool has_optional_parts;
+	/**
+	 * True, if some key part contain JSON path.
+	 */
+	bool has_json_paths;
 	/** Key fields mask. @sa column_mask.h for details. */
 	uint64_t column_mask;
 	/** The size of the 'parts' array. */
@@ -234,7 +244,7 @@ key_def_sizeof(uint32_t part_count)
  * Allocate a new key_def with the given part count.
  */
 struct key_def *
-key_def_new(uint32_t part_count);
+key_def_new(uint32_t part_count, size_t extra_size);
 
 /**
  * Allocate a new key_def with the given part count
@@ -246,8 +256,9 @@ key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count);
 /**
  * Dump part definitions of the given key def.
  */
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+		   struct key_part_def *parts);
 
 /**
  * Set a single key part in a key def.
@@ -256,7 +267,7 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
 void
 key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 		 enum field_type type, bool is_nullable, struct coll *coll,
-		 uint32_t coll_id);
+		 uint32_t coll_id, char *path, uint32_t path_len);
 
 /**
  * Update 'has_optional_parts' of @a key_def with correspondence
@@ -370,6 +381,8 @@ key_validate_parts(const struct key_def *key_def, const char *key,
 static inline bool
 key_def_is_sequential(const struct key_def *key_def)
 {
+	if (key_def->has_json_paths)
+		return false;
 	for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) {
 		if (key_def->parts[part_id].fieldno != part_id)
 			return false;
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 580e0ea..98bb969 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -295,6 +295,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 			lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE);
 			lua_setfield(L, -2, "fieldno");
 
+			if (part->path != NULL) {
+				lua_pushstring(L, part->path);
+				lua_setfield(L, -2, "path");
+			}
+
 			lua_pushboolean(L, part->is_nullable);
 			lua_setfield(L, -2, "is_nullable");
 
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index f5ace92..827ad01 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1320,6 +1320,11 @@ memtx_index_def_change_requires_rebuild(struct index *index,
 			return true;
 		if (old_part->coll != new_part->coll)
 			return true;
+		if (old_part->path_len != new_part->path_len)
+			return true;
+		if (memcmp(old_part->path, new_part->path,
+			   old_part->path_len) != 0)
+			return true;
 	}
 	return false;
 }
diff --git a/src/box/schema.cc b/src/box/schema.cc
index 433f52c..94c72ff 100644
--- a/src/box/schema.cc
+++ b/src/box/schema.cc
@@ -285,19 +285,19 @@ schema_init()
 	 * (and re-created) first.
 	 */
 	/* _schema - key/value space with schema description */
-	struct key_def *key_def = key_def_new(1); /* part count */
+	struct key_def *key_def = key_def_new(1, 0);
 	if (key_def == NULL)
 		diag_raise();
 	auto key_def_guard = make_scoped_guard([&] { key_def_delete(key_def); });
 
 	key_def_set_part(key_def, 0 /* part no */, 0 /* field no */,
-			 FIELD_TYPE_STRING, false, NULL, COLL_NONE);
+			 FIELD_TYPE_STRING, false, NULL, COLL_NONE, NULL, 0);
 	sc_space_new(BOX_SCHEMA_ID, "_schema", key_def, &on_replace_schema,
 		     NULL);
 
 	/* _space - home for all spaces. */
 	key_def_set_part(key_def, 0 /* part no */, 0 /* field no */,
-			 FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE);
+			 FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0);
 
 	/* _collation - collation description. */
 	sc_space_new(BOX_COLLATION_ID, "_collation", key_def,
@@ -340,15 +340,15 @@ schema_init()
 		     NULL);
 
 	key_def_delete(key_def);
-	key_def = key_def_new(2); /* part count */
+	key_def = key_def_new(2, 0);
 	if (key_def == NULL)
 		diag_raise();
 	/* space no */
 	key_def_set_part(key_def, 0 /* part no */, 0 /* field no */,
-			 FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE);
+			 FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0);
 	/* index no */
 	key_def_set_part(key_def, 1 /* part no */, 1 /* field no */,
-			 FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE);
+			 FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0);
 	sc_space_new(BOX_INDEX_ID, "_index", key_def,
 		     &alter_space_on_replace_index, &on_stmt_begin_index);
 }
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index f07b695..923e71c 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -462,6 +462,7 @@ static inline int
 tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		       const struct key_def *key_def)
 {
+	assert(!is_flat == key_def->has_json_paths);
 	assert(!has_optional_parts || is_nullable);
 	assert(is_nullable == key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
@@ -601,6 +602,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 				uint32_t part_count,
 				const struct key_def *key_def)
 {
+	assert(!is_flat == key_def->has_json_paths);
 	assert(!has_optional_parts || is_nullable);
 	assert(is_nullable == key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index d95ee8d..0301186 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -96,6 +96,7 @@ static char *
 tuple_extract_key_slowpath(const struct tuple *tuple,
 			   const struct key_def *key_def, uint32_t *key_size)
 {
+	assert(!is_flat == key_def->has_json_paths);
 	assert(!has_optional_parts || key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	assert(contains_sequential_parts ==
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index 272e814..ae0bb8e 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -333,6 +333,7 @@ template <bool has_optional_parts, bool is_flat>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
 {
+	assert(!is_flat == key_def->has_json_paths);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	uint32_t h = HASH_SEED;
 	uint32_t carry = 0;
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 7f77963..23abc6b 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -998,6 +998,11 @@ vinyl_index_def_change_requires_rebuild(struct index *index,
 			return true;
 		if (!field_type1_contains_type2(new_part->type, old_part->type))
 			return true;
+		if (old_part->path_len != new_part->path_len)
+			return true;
+		if (memcmp(old_part->path, new_part->path,
+			   old_part->path_len) != 0)
+			return true;
 	}
 	return false;
 }
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 3843cad..b1c6659 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -711,7 +711,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src)
 				 "struct key_part_def");
 			goto err;
 		}
-		key_def_dump_parts(src->key_def, dst->key_parts);
+		if (key_def_dump_parts(pool, src->key_def, dst->key_parts) != 0)
+			goto err;
 		dst->key_part_count = src->key_def->part_count;
 		dst->key_def = NULL;
 	}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 4/5] box: introduce path_hash and tuple_field tree
  2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
                   ` (2 preceding siblings ...)
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 3/5] box: introduce path field " Kirill Shcherbatov
@ 2018-08-15 12:15 ` Kirill Shcherbatov
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
  4 siblings, 1 reply; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-15 12:15 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov

To work with JSON-defined indexes we introduce format JSON
path hashtable data_path and a tree of intermediate path
fields attached to format root fields.

<Hashtable>: format->data_path
[2].FIO.fname -> field "fname" {type=str, off_slot=-1}
[2].FIO.sname -> field "sname" {type=str, off_slot=-2}

<Tree>:      format->field[2] {type = map}
                           |
                   FIO {type = map}
                           |
        "fname"            |             "sname"
{type=str,off_slot=-1} ____|____ {type = str,off_slot=-2}

Leaf fields used in Index have initialized offset_slot.
On new tuple creation we observe fields tree and use leaf
records to init tuple field_map.
At the same time we use data_path hashtable on tuple data
access by index(when cached offset_slot is invalid).
All paths stored at the end of format allocation:
JSON-tree fields same as format->path_hash point to them.

+------------+------------+-------+------------+-------+
|tuple_format|tuple_field1|  ...  |tuple_fieldN| pathK |
+------------+------------+-------+------------+-------+

New routine tuple_format_add_json_path is used to construct
all internal structures for JSON path on new format creation
and duplicating.

Part of #1012.
---
 src/box/errcode.h            |   2 +-
 src/box/key_def.c            |   3 +
 src/box/key_def.h            |   2 +
 src/box/tuple.c              |  25 +-
 src/box/tuple_compare.cc     |  41 ++-
 src/box/tuple_extract_key.cc | 137 +++++++++--
 src/box/tuple_format.c       | 575 +++++++++++++++++++++++++++++++++++++++----
 src/box/tuple_format.h       |  84 ++++++-
 src/box/tuple_hash.cc        |  17 +-
 src/box/vy_lsm.c             |  43 ++++
 src/box/vy_point_lookup.c    |   2 -
 src/box/vy_stmt.c            | 124 ++++++++--
 test/box/misc.result         |  51 ++--
 test/engine/tuple.result     | 271 ++++++++++++++++++++
 test/engine/tuple.test.lua   |  80 ++++++
 15 files changed, 1314 insertions(+), 143 deletions(-)

diff --git a/src/box/errcode.h b/src/box/errcode.h
index 3d5f66a..8cbd59d 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -107,7 +107,7 @@ struct errcode_record {
 	/* 52 */_(ER_FUNCTION_EXISTS,		"Function '%s' already exists") \
 	/* 53 */_(ER_BEFORE_REPLACE_RET,	"Invalid return value of space:before_replace trigger: expected tuple or nil, got %s") \
 	/* 54 */_(ER_FUNCTION_MAX,		"A limit on the total number of functions has been reached: %u") \
-	/* 55 */_(ER_UNUSED4,			"") \
+	/* 55 */_(ER_DATA_MISMATCH_INDEX_PART,	"Tuple doesn't math document structure defined as index") \
 	/* 56 */_(ER_USER_MAX,			"A limit on the total number of users has been reached: %u") \
 	/* 57 */_(ER_NO_SUCH_ENGINE,		"Space engine '%s' does not exist") \
 	/* 58 */_(ER_RELOAD_CFG,		"Can't set option '%s' dynamically") \
diff --git a/src/box/key_def.c b/src/box/key_def.c
index b00e46d..69ae6af 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -36,6 +36,7 @@
 #include "schema_def.h"
 #include "coll_id_cache.h"
 #include "fiber.h"
+#include "assoc.h"
 #include "json/path.h"
 
 static const struct key_part_def key_part_def_default = {
@@ -319,9 +320,11 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 		assert(def->parts[part_no].path != NULL);
 		memcpy(def->parts[part_no].path, path, path_len);
 		def->parts[part_no].path[path_len] = '\0';
+		def->parts[part_no].path_hash = mh_strn_hash(path, path_len);
 	} else {
 		def->parts[part_no].path_len = 0;
 		def->parts[part_no].path = NULL;
+		def->parts[part_no].path_hash = 0;
 	}
 	column_mask_set_fieldno(&def->column_mask, fieldno);
 	/**
diff --git a/src/box/key_def.h b/src/box/key_def.h
index b6d6259..de732ca 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -84,6 +84,8 @@ struct key_part {
 	char *path;
 	/** JSON path length. */
 	uint32_t path_len;
+	/** JSON path hash. */
+	uint32_t path_hash;
 };
 
 struct key_def;
diff --git a/src/box/tuple.c b/src/box/tuple.c
index d7dbad3..130acf7 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -135,6 +135,18 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
 	smfree(&runtime_alloc, tuple, total);
 }
 
+static int
+tuple_validate_json_path_data(const struct tuple_field *field, uint32_t idx,
+			      const char *tuple, const char *offset, void *ctx)
+{
+	(void)tuple;
+	(void)ctx;
+	if (key_mp_type_validate(field->type, mp_typeof(*offset),
+				 ER_KEY_PART_TYPE, idx, field->is_nullable) != 0)
+		return -1;
+	return 0;
+}
+
 int
 tuple_validate_raw(struct tuple_format *format, const char *tuple)
 {
@@ -159,14 +171,23 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
 
 	/* Check field types */
 	struct tuple_field *field = &format->fields[0];
+	const char *pos = tuple;
 	uint32_t i = 0;
 	uint32_t defined_field_count = MIN(field_count, format->field_count);
 	for (; i < defined_field_count; ++i, ++field) {
-		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
+		if (key_mp_type_validate(field->type, mp_typeof(*pos),
 					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
 					 field->is_nullable))
 			return -1;
-		mp_next(&tuple);
+		/* Check all JSON paths. */
+		if (field->array != NULL) {
+			json_field_tree_routine func =
+				tuple_validate_json_path_data;
+			if (json_field_tree_exec_routine(field, i, tuple, pos,
+							 func, NULL) != 0)
+				return -1;
+		}
+		mp_next(&pos);
 	}
 	return 0;
 }
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 923e71c..490b528 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	const struct key_part *part = key_def->parts;
 	const char *tuple_a_raw = tuple_data(tuple_a);
 	const char *tuple_b_raw = tuple_data(tuple_b);
-	if (key_def->part_count == 1 && part->fieldno == 0) {
+	if (key_def->part_count == 1 && part->fieldno == 0 &&
+	    part->path == NULL) {
 		/*
 		 * First field can not be optional - empty tuples
 		 * can not exist.
@@ -1060,13 +1061,19 @@ tuple_compare_create(const struct key_def *def)
 			else
 				return tuple_compare_sequential<true, false>;
 		} else if (def->has_optional_parts) {
-			return tuple_compare_slowpath<true, true, true>;
+			if (def->has_json_paths)
+				return tuple_compare_slowpath<true, true, false>;
+			else
+				return tuple_compare_slowpath<true, true, true>;
 		} else {
-			return tuple_compare_slowpath<true, false, true>;
+			if (def->has_json_paths)
+				return tuple_compare_slowpath<true, false, false>;
+			else
+				return tuple_compare_slowpath<true, false, true>;
 		}
 	}
 	assert(! def->has_optional_parts);
-	if (!key_def_has_collation(def)) {
+	if (!key_def_has_collation(def) && !def->has_json_paths) {
 		/* Precalculated comparators don't use collation */
 		for (uint32_t k = 0;
 		     k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) {
@@ -1084,6 +1091,8 @@ tuple_compare_create(const struct key_def *def)
 	}
 	if (key_def_is_sequential(def))
 		return tuple_compare_sequential<false, false>;
+	else if (def->has_json_paths)
+		return tuple_compare_slowpath<false, false, false>;
 	else
 		return tuple_compare_slowpath<false, false, true>;
 }
@@ -1280,13 +1289,29 @@ tuple_compare_with_key_create(const struct key_def *def)
 									 false>;
 			}
 		} else if (def->has_optional_parts) {
-			return tuple_compare_with_key_slowpath<true, true, true>;
+			if (def->has_json_paths) {
+				return tuple_compare_with_key_slowpath<true,
+									true,
+									false>;
+			} else {
+				return tuple_compare_with_key_slowpath<true,
+									true,
+									true>;
+			}
 		} else {
-			return tuple_compare_with_key_slowpath<true, false, true>;
+			if (def->has_json_paths) {
+				return tuple_compare_with_key_slowpath<true,
+									false,
+									false>;
+			} else {
+				return tuple_compare_with_key_slowpath<true,
+									false,
+									true>;
+			}
 		}
 	}
 	assert(! def->has_optional_parts);
-	if (!key_def_has_collation(def)) {
+	if (!key_def_has_collation(def) && !def->has_json_paths) {
 		/* Precalculated comparators don't use collation */
 		for (uint32_t k = 0;
 		     k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]);
@@ -1307,6 +1332,8 @@ tuple_compare_with_key_create(const struct key_def *def)
 	}
 	if (key_def_is_sequential(def))
 		return tuple_compare_with_key_sequential<false, false>;
+	else if (def->has_json_paths)
+		return tuple_compare_with_key_slowpath<false, false, false>;
 	else
 		return tuple_compare_with_key_slowpath<false, false, true>;
 }
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index 0301186..4c0e201 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -1,15 +1,25 @@
 #include "tuple_extract_key.h"
 #include "tuple.h"
 #include "fiber.h"
+#include "json/path.h"
 
 enum { MSGPACK_NULL = 0xc0 };
 
+static bool
+key_def_parts_are_sequential(const struct key_def *def, int i)
+{
+	uint32_t fieldno1 = def->parts[i].fieldno + 1;
+	uint32_t fieldno2 = def->parts[i + 1].fieldno;
+	return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+		def->parts[i + 1].path == NULL;
+}
+
 /** True, if a key con contain two or more parts in sequence. */
 static bool
 key_def_contains_sequential_parts(const struct key_def *def)
 {
 	for (uint32_t i = 0; i < def->part_count - 1; ++i) {
-		if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno)
+		if (key_def_parts_are_sequential(def, i))
 			return true;
 	}
 	return false;
@@ -132,8 +142,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (key_def->parts[i].fieldno + 1 !=
-				    key_def->parts[i + 1].fieldno) {
+				if (!key_def_parts_are_sequential(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -180,8 +189,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (key_def->parts[i].fieldno + 1 !=
-				    key_def->parts[i + 1].fieldno) {
+				if (!key_def_parts_are_sequential(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -214,12 +222,13 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
  * General-purpose version of tuple_extract_key_raw()
  * @copydoc tuple_extract_key_raw()
  */
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool is_flat>
 static char *
 tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 			       const struct key_def *key_def,
 			       uint32_t *key_size)
 {
+	assert(!is_flat == key_def->has_json_paths);
 	assert(!has_optional_parts || key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	assert(mp_sizeof_nil() == 1);
@@ -247,11 +256,11 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 		uint32_t null_count = 0;
 		for (; i < key_def->part_count - 1; i++) {
-			if (key_def->parts[i].fieldno + 1 !=
-			    key_def->parts[i + 1].fieldno)
+			if (!key_def_parts_are_sequential(key_def, i))
 				break;
 		}
-		uint32_t end_fieldno = key_def->parts[i].fieldno;
+		const struct key_part *part = &key_def->parts[i];
+		uint32_t end_fieldno = part->fieldno;
 
 		if (fieldno < current_fieldno) {
 			/* Rewind. */
@@ -293,6 +302,38 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 				current_fieldno++;
 			}
 		}
+		const char *field_last, *field_end_last;
+		if (!is_flat && part->path != NULL) {
+			field_last = field;
+			field_end_last = field_end;
+			struct json_path_parser parser;
+			struct json_path_node node;
+			json_path_parser_create(&parser, part->path,
+						part->path_len);
+			/* Skip fieldno. */
+			int rc = json_path_next(&parser, &node);
+			assert(rc == 0);
+			while ((rc = json_path_next(&parser, &node)) == 0 &&
+				node.type != JSON_PATH_END) {
+				switch(node.type) {
+				case JSON_PATH_NUM:
+					rc = tuple_field_go_to_index(&field,
+								     node.num);
+					break;
+				case JSON_PATH_STR:
+					rc = tuple_field_go_to_key(&field,
+								   node.str,
+								   node.len);
+					break;
+				default:
+					unreachable();
+				}
+				assert(rc == 0);
+			}
+			assert(rc == 0 && node.type == JSON_PATH_END);
+			field_end = field;
+			mp_next(&field_end);
+		}
 		memcpy(key_buf, field, field_end - field);
 		key_buf += field_end - field;
 		if (has_optional_parts && null_count != 0) {
@@ -301,6 +342,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		} else {
 			assert(key_buf - key <= data_end - data);
 		}
+		if (!is_flat && part->path != NULL) {
+			field = field_last;
+			field_end = field_end_last;
+		}
 	}
 	if (key_size != NULL)
 		*key_size = (uint32_t)(key_buf - key);
@@ -330,32 +375,74 @@ tuple_extract_key_set(struct key_def *key_def)
 		if (key_def->has_optional_parts) {
 			assert(key_def->is_nullable);
 			if (key_def_contains_sequential_parts(key_def)) {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, true,
-								   true>;
+				if (key_def->has_json_paths) {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<true,
+									   true,
+									   false>;
+				} else {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<true,
+									   true,
+									   true>;
+				}
 			} else {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false, true,
-								   true>;
+				if (key_def->has_json_paths) {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<false,
+									   true,
+									   false>;
+				} else {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<false,
+									   true,
+									   true>;
+				}
 			}
 		} else {
 			if (key_def_contains_sequential_parts(key_def)) {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, false,
-								   true>;
+				if (key_def->has_json_paths) {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<true,
+									   false,
+									   false>;
+				} else {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<true,
+									   false,
+									   true>;
+				}
 			} else {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false, false,
-								   true>;
+				if (key_def->has_json_paths) {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<false,
+									   false,
+									   false>;
+				} else {
+					key_def->tuple_extract_key =
+						tuple_extract_key_slowpath<false,
+									   false,
+									   true>;
+				}
 			}
 		}
 	}
 	if (key_def->has_optional_parts) {
 		assert(key_def->is_nullable);
-		key_def->tuple_extract_key_raw =
-			tuple_extract_key_slowpath_raw<true>;
+		if (key_def->has_json_paths) {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<true, false>;
+		} else {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<true, true>;
+		}
 	} else {
-		key_def->tuple_extract_key_raw =
-			tuple_extract_key_slowpath_raw<false>;
+		if (key_def->has_json_paths) {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<false, false>;
+		} else {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<false, true>;
+		}
 	}
 }
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index a9fddc0..1821950 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -30,6 +30,7 @@
  */
 #include "json/path.h"
 #include "tuple_format.h"
+#include "assoc.h"
 
 /** Global table of tuple formats */
 struct tuple_format **tuple_formats;
@@ -38,9 +39,336 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
 static uint32_t formats_size = 0, formats_capacity = 0;
 
 static const struct tuple_field tuple_field_default = {
-	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false,
+	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, {{NULL, 0}}
 };
 
+struct mh_strnptr_node_t *
+json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path,
+		   uint32_t path_len, uint32_t path_hash)
+{
+	assert(hashtable != NULL);
+	struct mh_strnptr_key_t key = {path, path_len, path_hash};
+	mh_int_t rc = mh_strnptr_find(hashtable, &key, NULL);
+	if (rc == mh_end(hashtable))
+		return NULL;
+	return mh_strnptr_node(hashtable, rc);
+}
+
+/**
+ * Create a new hashtable object.
+ * @param[out] hashtable pointer to object to create.
+ * @param records count of records to reserve.
+ * @retval -1 on error.
+ * @retval 0 on success.
+ */
+static int
+json_path_hash_create(struct mh_strnptr_t **hashtable, uint32_t records)
+{
+	struct mh_strnptr_t *ret = mh_strnptr_new();
+	if (ret == NULL) {
+		diag_set(OutOfMemory, sizeof(*hashtable), "mh_strnptr_new",
+			 "hashtable");
+		return -1;
+	}
+	if (mh_strnptr_reserve(ret, records, NULL) != 0) {
+		mh_strnptr_delete(ret);
+		diag_set(OutOfMemory, records, "mh_strnptr_reserve",
+			 "hashtable");
+		return -1;
+	}
+	*hashtable = ret;
+	return 0;
+}
+/**
+ * Delete @hashtable object.
+ * @param hashtable pointer to object to delete.
+ */
+static void
+json_path_hash_delete(struct mh_strnptr_t *hashtable)
+{
+	if (hashtable == NULL)
+		return;
+	while (mh_size(hashtable)) {
+		mh_int_t n = mh_first(hashtable);
+		mh_strnptr_del(hashtable, n, NULL);
+	}
+	mh_strnptr_delete(hashtable);
+}
+
+/**
+ * Insert a new record to hashtable.
+ * @param hashtable storage to insert new record.
+ * @param path string.
+ * @param path_len length of @path.
+ * @param tuple_field value to store in @hashtable.
+ * @retval -1 on error.
+ * @retval 0 on success.
+ */
+static int
+json_path_hash_insert(struct mh_strnptr_t *hashtable, const char *path,
+		      uint32_t path_len, struct tuple_field *field)
+{
+	assert(hashtable != NULL);
+	/* Test if record already present in hash. */
+	uint32_t path_hash = mh_strn_hash(path, path_len);
+	struct mh_strnptr_node_t name_node =
+		{path, path_len, path_hash, field};
+	mh_int_t rc = mh_strnptr_put(hashtable, &name_node, NULL, NULL);
+	if (rc == mh_end(hashtable)) {
+		diag_set(OutOfMemory, sizeof(*hashtable), "mh_strnptr_put",
+			"hashtable");
+		return -1;
+	}
+	return 0;
+}
+
+/**
+ * Construct field tree level for JSON path part.
+ *
+ * @param[in, out] tuple_field pointer to record to start with
+ *                 would be changed to record that math
+ *                 @part lexeme.
+ * @param fieldno number of root space field.
+ * @param part JSON path lexeme to represent in field tree.
+ * @retval -1 on error.
+ * @retval 0 on success.
+ */
+static int
+json_field_tree_append(struct tuple_field **field_subtree, uint32_t fieldno,
+		       struct json_path_node *part)
+{
+	enum field_type type;
+	struct tuple_field *field = *field_subtree;
+	switch (part->type) {
+	case JSON_PATH_NUM: {
+		type = FIELD_TYPE_ARRAY;
+		if (field->type != FIELD_TYPE_ANY && field->type != type)
+			goto error_type_mistmatch;
+		field->type = type;
+		/* Create or resize field array if required. */
+		if (field->array == NULL || part->num >= field->array_size) {
+			struct tuple_field **array =
+				realloc(field->array,
+					part->num * sizeof(struct tuple_field *));
+			if (array == NULL) {
+				diag_set(OutOfMemory,
+					sizeof(struct tuple_field *), "realloc",
+					"array");
+				return -1;
+			}
+			if (field->array == NULL) {
+				memset(array, 0, part->num *
+					sizeof(struct tuple_field *));
+			} else {
+				memset(&array[field->array_size], 0,
+				       (part->num - field->array_size) *
+				       sizeof(struct tuple_field *));
+			}
+			field->array = array;
+			field->array_size = part->num;
+		}
+		/* Record already exists. No actions required */
+		if (field->array[part->num - TUPLE_INDEX_BASE] != NULL) {
+			*field_subtree =
+				field->array[part->num - TUPLE_INDEX_BASE];
+			return 0;
+		}
+		break;
+	}
+	case JSON_PATH_STR: {
+		type = FIELD_TYPE_MAP;
+		if (field->type != FIELD_TYPE_ANY && field->type != type)
+			goto error_type_mistmatch;
+		field->type = type;
+		if (field->map == NULL &&
+		    json_path_hash_create(&field->map, 1) != 0)
+			return -1;
+		struct mh_strnptr_node_t *node =
+			json_path_hash_get(field->map, part->str, part->len,
+					   mh_strn_hash(part->str, part->len));
+		if (node != NULL) {
+			*field_subtree = node->val;
+			return 0;
+		}
+		break;
+	}
+	default:
+		unreachable();
+	}
+
+	/* Construct and insert a new record. */
+	struct tuple_field *new_field = malloc(sizeof(struct tuple_field));
+	if (new_field == NULL) {
+		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
+			"new_field");
+		return -1;
+	}
+	*new_field = tuple_field_default;
+	if (field->type == FIELD_TYPE_MAP) {
+		if (json_path_hash_insert(field->map, part->str, part->len,
+					  new_field) != 0) {
+			free(new_field);
+			return -1;
+		}
+	} else if (field->type == FIELD_TYPE_ARRAY) {
+		field->array[part->num - TUPLE_INDEX_BASE] = new_field;
+	}
+	*field_subtree = new_field;
+
+	return 0;
+
+error_type_mistmatch:
+	diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH,
+		tt_sprintf("%d", fieldno + TUPLE_INDEX_BASE),
+		field_type_strs[type], field_type_strs[field->type]);
+	return -1;
+}
+
+/**
+ * Delete @field_subtree object.
+ * @param field_subtree to delete.
+ */
+static void
+json_field_tree_delete(struct tuple_field *field_subtree)
+{
+	if (field_subtree->type == FIELD_TYPE_MAP &&
+	    field_subtree->map != NULL) {
+		mh_int_t i;
+		mh_foreach(field_subtree->map, i) {
+			struct tuple_field *field =
+				mh_strnptr_node(field_subtree->map, i)->val;
+			if (field == NULL)
+				continue;
+			json_field_tree_delete(field);
+			free(field);
+		}
+		json_path_hash_delete(field_subtree->map);
+	} else if (field_subtree->type == FIELD_TYPE_ARRAY &&
+		   field_subtree->array != NULL) {
+		for (uint32_t i = 0; i < field_subtree->array_size; i++) {
+			struct tuple_field *field = field_subtree->array[i];
+			if (field == NULL)
+				continue;
+			json_field_tree_delete(field_subtree->array[i]);
+			free(field_subtree->array[i]);
+		}
+		free(field_subtree->array);
+	}
+}
+
+int
+json_field_tree_exec_routine(const struct tuple_field *field, uint32_t idx,
+			     const char *tuple, const char *offset,
+			     json_field_tree_routine routine, void *routine_ctx)
+{
+	int rc = 0;
+	if (field->type == FIELD_TYPE_MAP) {
+		mh_int_t i;
+		mh_foreach(field->map, i) {
+			struct mh_strnptr_node_t *node =
+				mh_strnptr_node(field->map, i);
+			const char *raw = offset;
+			if (tuple_field_go_to_key(&raw, node->str,
+						  node->len) != 0) {
+				diag_set(ClientError,
+					 ER_DATA_MISMATCH_INDEX_PART);
+				return -1;
+			}
+			if (json_field_tree_exec_routine(node->val, idx,
+							 tuple, raw, routine,
+							 routine_ctx) != 0)
+				return -1;
+		}
+	} else if (field->type == FIELD_TYPE_ARRAY) {
+		assert(mp_typeof(*offset) == MP_ARRAY);
+		uint32_t count = mp_decode_array(&offset);
+		if (count < field->array_size) {
+			diag_set(ClientError, ER_DATA_MISMATCH_INDEX_PART);
+			return -1;
+		}
+		for (uint32_t i = 0; i < field->array_size;
+		     i++, mp_next(&offset)) {
+			if (field->array[i] == NULL)
+				continue;
+			if (json_field_tree_exec_routine(field->array[i], idx,
+							 tuple, offset, routine,
+							 routine_ctx) != 0)
+				return -1;
+		}
+	} else {
+		rc = routine(field, idx, tuple, offset, routine_ctx);
+	}
+	return rc;
+}
+
+/**
+ * Add new JSON @path to @format.
+ * @param format to modify.
+ * @param path string to add.
+ * @param path_len length of @path.
+ * @param field_type type of field by @path.
+ * @param[out] leaf_field pointer to leaf field.
+ * @retval -1 on error.
+ * @retval 0 on success.
+ */
+static int
+tuple_format_add_json_path(struct tuple_format *format, const char *path,
+			   uint32_t path_len, enum field_type type,
+			   struct tuple_field **leaf_field)
+{
+	assert(format->path_hash != NULL);
+	/*
+	 * Get root field by index.
+	 * Path is specified in canonical form: [i]...
+	 */
+	int rc = 0;
+	struct json_path_parser parser;
+	struct json_path_node node;
+	json_path_parser_create(&parser, path, path_len);
+	rc = json_path_next(&parser, &node);
+	assert(rc == 0 && node.type == JSON_PATH_NUM);
+	assert(node.num < format->field_count + 1);
+
+	/* Test if path is already registered. */
+	struct mh_strnptr_node_t *leaf_node = NULL;
+	uint32_t hash = mh_strn_hash(path, path_len);
+	if ((leaf_node = json_path_hash_get(format->path_hash, path,
+					    path_len, hash)) != NULL) {
+		struct tuple_field *field = leaf_node->val;
+		if (field->type != type) {
+			const char *err =
+				tt_sprintf("JSON path '%.*s' has been already "
+					   "constructed for '%s' leaf record",
+					   path_len, path,
+					   field_type_strs[field->type]);
+			diag_set(ClientError,  ER_WRONG_INDEX_OPTIONS,
+				node.num, err);
+			return -1;
+		}
+		*leaf_field = field;
+		return 0;
+	}
+
+	/* Build data path tree. */
+	uint32_t root_fieldno = node.num - TUPLE_INDEX_BASE;
+	struct tuple_field *field = &format->fields[root_fieldno];
+	while ((rc = json_path_next(&parser, &node)) == 0 &&
+		node.type != JSON_PATH_END) {
+		if (json_field_tree_append(&field, root_fieldno, &node) != 0)
+			return -1;
+	}
+	assert(rc == 0 && node.type == JSON_PATH_END);
+
+	/* Leaf record is a new object as JSON path unique. */
+	field->type = type;
+	if (json_path_hash_insert(format->path_hash, path, path_len,
+				  field) != 0)
+		return -1;
+
+	*leaf_field = field;
+	return 0;
+}
+
 /**
  * Extract all available type info from keys and field
  * definitions.
@@ -63,12 +391,17 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		format->fields[i].type = fields[i].type;
 		format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
 		format->fields[i].is_nullable = fields[i].is_nullable;
+		/* Don't need to init format->fields[i].map. */
+		format->fields[i].array = NULL;
+		format->fields[i].array_size = 0;
 	}
 	/* Initialize remaining fields */
 	for (uint32_t i = field_count; i < format->field_count; i++)
 		format->fields[i] = tuple_field_default;
 
 	int current_slot = 0;
+	char *data = (char *)format + sizeof(struct tuple_format) +
+		     format->field_count * sizeof(struct tuple_field);
 
 	/* extract field type info */
 	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
@@ -101,10 +434,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			 * used in tuple_format.
 			 */
 			if (field_type1_contains_type2(field->type,
-						       part->type)) {
+						       part->type) &&
+			    part->path == NULL) {
 				field->type = part->type;
 			} else if (! field_type1_contains_type2(part->type,
-								field->type)) {
+								field->type) &&
+				   part->path == NULL) {
 				const char *name;
 				int fieldno = part->fieldno + TUPLE_INDEX_BASE;
 				if (part->fieldno >= field_count) {
@@ -131,9 +466,22 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			 * First field is always simply accessible,
 			 * so we don't store an offset for it.
 			 */
-			if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
+			if (part->path != NULL) {
+				assert(is_sequential == false);
+				memcpy(data, part->path, part->path_len);
+				data[part->path_len] = '\0';
+				struct tuple_field *leaf = NULL;
+				if (tuple_format_add_json_path(format, data,
+							       part->path_len,
+							       part->type,
+							       &leaf) != 0)
+					return -1;
+				assert(leaf != NULL);
+				if (leaf->offset_slot == TUPLE_OFFSET_SLOT_NIL)
+					leaf->offset_slot = --current_slot;
+				data += part->path_len + 1;
+			} else if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
 			    is_sequential == false && part->fieldno > 0) {
-
 				field->offset_slot = --current_slot;
 			}
 		}
@@ -201,20 +549,26 @@ static struct tuple_format *
 tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		   uint32_t space_field_count, struct tuple_dictionary *dict)
 {
+	size_t extra_size = 0;
 	uint32_t index_field_count = 0;
+	uint32_t json_path_count = 0;
 	/* find max max field no */
 	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
 		const struct key_def *key_def = keys[key_no];
 		const struct key_part *part = key_def->parts;
 		const struct key_part *pend = part + key_def->part_count;
 		for (; part < pend; part++) {
+			if (part->path != NULL) {
+				json_path_count++;
+				extra_size += part->path_len + 1;
+			}
 			index_field_count = MAX(index_field_count,
 						part->fieldno + 1);
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
 	uint32_t total = sizeof(struct tuple_format) +
-			 field_count * sizeof(struct tuple_field);
+			 field_count * sizeof(struct tuple_field) + extra_size;
 
 	struct tuple_format *format = (struct tuple_format *) malloc(total);
 	if (format == NULL) {
@@ -244,6 +598,11 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 	format->index_field_count = index_field_count;
 	format->exact_field_count = 0;
 	format->min_field_count = 0;
+	if (json_path_hash_create(&format->path_hash, json_path_count) != 0) {
+		tuple_dictionary_unref(format->dict);
+		free(format);
+		return NULL;
+	}
 	return format;
 }
 
@@ -251,6 +610,9 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 static inline void
 tuple_format_destroy(struct tuple_format *format)
 {
+	for (uint32_t i = 0; i < format->field_count; i++)
+		json_field_tree_delete(&format->fields[i]);
+	json_path_hash_delete(format->path_hash);
 	tuple_dictionary_unref(format->dict);
 }
 
@@ -335,21 +697,75 @@ tuple_format_dup(struct tuple_format *src)
 {
 	uint32_t total = sizeof(struct tuple_format) +
 			 src->field_count * sizeof(struct tuple_field);
+	if (src->path_hash != NULL) {
+		mh_int_t i;
+		mh_foreach(src->path_hash, i)
+			total += mh_strnptr_node(src->path_hash, i)->len + 1;
+	}
 	struct tuple_format *format = (struct tuple_format *) malloc(total);
 	if (format == NULL) {
 		diag_set(OutOfMemory, total, "malloc", "tuple format");
 		return NULL;
 	}
 	memcpy(format, src, total);
+
+	/* Fill with NULLs for normal destruction on error. */
+	format->path_hash = NULL;
+	for (uint32_t i = 0; i < format->field_count; i++) {
+		format->fields[i].array = NULL;
+		format->fields[i].array_size = 0;
+	}
+	if (src->path_hash != NULL) {
+		mh_int_t i;
+		if (json_path_hash_create(&format->path_hash,
+					  mh_size(src->path_hash)) != 0)
+			goto error;
+		mh_foreach(src->path_hash, i) {
+			struct mh_strnptr_node_t *node =
+				mh_strnptr_node(src->path_hash, i);
+			/* Path data has been already copied. */
+			char *path = (char *)format + (node->str - (char *)src);
+			/* Store source leaf field offset_slot.  */
+			struct tuple_field *leaf_field = node->val;
+			int32_t offset_slot = leaf_field->offset_slot;
+			if (tuple_format_add_json_path(format, path, node->len,
+						       leaf_field->type,
+						       &leaf_field) != 0)
+				goto error;
+			/* Store offset_slot in a new leaf record. */
+			assert(leaf_field != NULL);
+			leaf_field->offset_slot = offset_slot;
+		}
+	}
 	tuple_dictionary_ref(format->dict);
 	format->id = FORMAT_ID_NIL;
 	format->refs = 0;
-	if (tuple_format_register(format) != 0) {
-		tuple_format_destroy(format);
-		free(format);
-		return NULL;
-	}
+	if (tuple_format_register(format) != 0)
+		goto error;
 	return format;
+error:
+	tuple_format_destroy(format);
+	free(format);
+	return NULL;
+}
+
+/**
+ * Watch json_field_tree_routine description
+ * @param ctx is field_map
+ */
+static int
+tuple_init_json_field_map_routine(const struct tuple_field *field, uint32_t idx,
+				  const char *tuple, const char *offset,
+				  void *ctx)
+{
+	uint32_t *field_map = ctx;
+	assert(field->offset_slot != TUPLE_OFFSET_SLOT_NIL);
+	if (key_mp_type_validate(field->type, mp_typeof(*offset),
+				 ER_KEY_PART_TYPE, idx,
+				 field->is_nullable) != 0)
+		return -1;
+	field_map[field->offset_slot] = (uint32_t)(offset - tuple);
+	return 0;
 }
 
 /** @sa declaration for details. */
@@ -378,18 +794,14 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		return -1;
 	}
 
-	/* first field is simply accessible, so we do not store offset to it */
-	enum mp_type mp_type = mp_typeof(*pos);
+	/*
+	 * First field is simply accessible, store offset to it
+	 * only for JSON path.
+	 */
+	uint32_t i = 0;
+	enum mp_type mp_type;
 	const struct tuple_field *field = &format->fields[0];
-	if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
-				 TUPLE_INDEX_BASE, field->is_nullable))
-		return -1;
-	mp_next(&pos);
-	/* other fields...*/
-	++field;
-	uint32_t i = 1;
-	uint32_t defined_field_count = MIN(field_count, format->field_count);
-	if (field_count < format->index_field_count) {
+	if (field_count < format->index_field_count || field->map != NULL) {
 		/*
 		 * Nullify field map to be able to detect by 0,
 		 * which key fields are absent in tuple_field().
@@ -397,6 +809,16 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		memset((char *)field_map - format->field_map_size, 0,
 		       format->field_map_size);
 	}
+	if (field->map == NULL) {
+		mp_type = mp_typeof(*pos);
+		if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
+					 TUPLE_INDEX_BASE, field->is_nullable))
+			return -1;
+		mp_next(&pos);
+		++field;
+		++i;
+	}
+	uint32_t defined_field_count = MIN(field_count, format->field_count);
 	for (; i < defined_field_count; ++i, ++field) {
 		mp_type = mp_typeof(*pos);
 		if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
@@ -407,6 +829,14 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 			field_map[field->offset_slot] =
 				(uint32_t) (pos - tuple);
 		}
+		if (field->map != NULL) {
+			assert(field->array != NULL);
+			json_field_tree_routine func =
+				tuple_init_json_field_map_routine;
+			if (json_field_tree_exec_routine(field, i, tuple, pos,
+							 func, field_map) != 0)
+				return -1;
+		}
 		mp_next(&pos);
 	}
 	return 0;
@@ -467,15 +897,60 @@ box_tuple_format_unref(box_tuple_format_t *format)
 	tuple_format_unref(format);
 }
 
-/**
- * Propagate @a field to MessagePack(field)[index].
- * @param[in][out] field Field to propagate.
- * @param index 1-based index to propagate to.
- *
- * @retval  0 Success, the index was found.
- * @retval -1 Not found.
- */
-static inline int
+const char *
+tuple_field_by_part(const struct tuple_format *format, const char *data,
+		    const uint32_t *field_map, struct key_part *part)
+{
+	const char *raw = NULL;
+	uint32_t field_no = part->fieldno;
+	struct mh_strnptr_node_t *node;
+	if (unlikely(part->path == NULL)) {
+		raw = tuple_field_raw(format, data, field_map, field_no);
+	} else {
+		int32_t offset_slot = TUPLE_OFFSET_SLOT_NIL;
+		if (part->format_epoch == format->epoch &&
+		    -part->slot_cache * sizeof(uint32_t) <=
+		    format->field_map_size) {
+			offset_slot = part->slot_cache;
+		} else if (format->path_hash != NULL &&
+			   (node = json_path_hash_get(format->path_hash,
+						      part->path,
+						      part->path_len,
+						      part->path_hash)) !=
+						      NULL) {
+			assert(node != NULL);
+			struct tuple_field *field = node->val;
+			assert(field != NULL);
+			offset_slot = field->offset_slot;
+		}
+		if (unlikely(offset_slot == TUPLE_OFFSET_SLOT_NIL ||
+			     field_map[offset_slot] == 0)) {
+			/*
+			 * Legacy tuple having no field map
+			 * for JSON index.
+			 */
+			uint32_t path_hash =
+				field_name_hash(part->path, part->path_len);
+			if (tuple_field_raw_by_path(format, data, field_map,
+						    part->path, part->path_len,
+						    path_hash, &raw) != 0)
+				raw = NULL;
+		} else {
+			assert(offset_slot < 0);
+			assert(-offset_slot * sizeof(uint32_t) <=
+			       format->field_map_size);
+			/* Cache offset_slot if required. */
+			if (part->format_epoch < format->epoch) {
+				part->slot_cache = offset_slot;
+				part->format_epoch = format->epoch;
+			}
+			raw = data + field_map[offset_slot];
+		}
+	}
+	return raw;
+}
+
+int
 tuple_field_go_to_index(const char **field, uint64_t index)
 {
 	enum mp_type type = mp_typeof(**field);
@@ -513,16 +988,7 @@ tuple_field_go_to_index(const char **field, uint64_t index)
 	return -1;
 }
 
-/**
- * Propagate @a field to MessagePack(field)[key].
- * @param[in][out] field Field to propagate.
- * @param key Key to propagate to.
- * @param len Length of @a key.
- *
- * @retval  0 Success, the index was found.
- * @retval -1 Not found.
- */
-static inline int
+int
 tuple_field_go_to_key(const char **field, const char *key, int len)
 {
 	enum mp_type type = mp_typeof(**field);
@@ -547,21 +1013,32 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
 	return -1;
 }
 
-const char *
-tuple_field_by_part(const struct tuple_format *format, const char *data,
-		    const uint32_t *field_map, struct key_part *part)
-{
-	return tuple_field_raw(format, data, field_map, part->fieldno);
-}
-
 int
-tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
+tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple,
                         const uint32_t *field_map, const char *path,
                         uint32_t path_len, uint32_t path_hash,
                         const char **field)
 {
 	assert(path_len > 0);
 	uint32_t fieldno;
+	if (format->path_hash != NULL) {
+		/*
+		 * The path hash for format->path_hash hashtable
+		 * may may be different from path_hash specified
+		 * as function argument.
+		 */
+		struct mh_strnptr_node_t *ht_record =
+			json_path_hash_get(format->path_hash, path, path_len,
+					   mh_strn_hash(path, path_len));
+		if (ht_record != NULL) {
+			struct tuple_field *tuple_field = ht_record->val;
+			int32_t offset_slot = tuple_field->offset_slot;
+			assert(offset_slot != TUPLE_OFFSET_SLOT_NIL);
+			assert(field_map[offset_slot] != 0);
+			*field = tuple + field_map[offset_slot];
+			return 0;
+		}
+	}
 	/*
 	 * It is possible, that a field has a name as
 	 * well-formatted JSON. For example 'a.b.c.d' or '[1]' can
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index a989917..1348f0d 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -108,6 +108,18 @@ struct tuple_field {
 	bool is_key_part;
 	/** True, if a field can store NULL. */
 	bool is_nullable;
+	/** Tree child records. */
+	union {
+		/** Array of fields. */
+		struct  {
+			struct tuple_field **array;
+			uint32_t array_size;
+		};
+		/** Hashtable: path -> tuple_field. */
+		struct mh_strnptr_t *map;
+		/** Leaf argument for tree-walker routine. */
+		void *arg;
+	};
 };
 
 /**
@@ -166,6 +178,8 @@ struct tuple_format {
 	 * Shared names storage used by all formats of a space.
 	 */
 	struct tuple_dictionary *dict;
+	/** JSON path hash table. */
+	struct mh_strnptr_t *path_hash;
 	/* Formats of the fields */
 	struct tuple_field fields[0];
 };
@@ -395,7 +409,7 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple,
  * @retval     NULL No field with @a name.
  */
 static inline const char *
-tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
+tuple_field_raw_by_name(const struct tuple_format *format, const char *tuple,
 			const uint32_t *field_map, const char *name,
 			uint32_t name_len, uint32_t name_hash)
 {
@@ -420,11 +434,77 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
  * @retval -1 Error in JSON path.
  */
 int
-tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
+tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple,
                         const uint32_t *field_map, const char *path,
                         uint32_t path_len, uint32_t path_hash,
                         const char **field);
 
+/**
+ * Get @hashtable record by key @path, @path_len.
+ * @param hashtable to lookup,
+ * @param path string.
+ * @param path_len length of @path.
+ * @retval NULL on nothing found.
+ * @retval hashtable record pointer.
+ */
+struct mh_strnptr_node_t *
+json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path,
+		   uint32_t path_len, uint32_t path_hash);
+
+/**
+ * Routine to execute with json_field_tree_exec_routine on JSON
+ * path field tree records.
+ * @param field to use on initialization.
+ * @param idx root field index to emmit correct error.
+ * @param tuple source raw data.
+ * @param offset calculated offset of field that path refers to.
+ * @param ctx callback argument
+ * @retval 0 on success.
+ * @retval -1 on error.
+ */
+typedef int (*json_field_tree_routine)(const struct tuple_field *field,
+					uint32_t idx, const char *tuple,
+					const char *offset, void *ctx);
+
+/**
+ * Execute a @routine on @field leaf records of JSON path
+ * represented as a tree with specified @tuple.
+ * @param field to use on initialization.
+ * @param idx root field index to emmit correct error.
+ * @param tuple source raw data.
+ * @param offset calculated offset of field that path refers to.
+ * @param ctx callback argument
+ * @retval 0 on success.
+ * @retval -1 on error.
+ */
+int
+json_field_tree_exec_routine(const struct tuple_field *field, uint32_t idx,
+			     const char *tuple, const char *offset,
+			     json_field_tree_routine routine, void *routine_ctx);
+
+/**
+ * Propagate @a field to MessagePack(field)[index].
+ * @param[in][out] field Field to propagate.
+ * @param index 1-based index to propagate to.
+ *
+ * @retval  0 Success, the index was found.
+ * @retval -1 Not found.
+ */
+int
+tuple_field_go_to_index(const char **field, uint64_t index);
+
+/**
+ * Propagate @a field to MessagePack(field)[key].
+ * @param[in][out] field Field to propagate.
+ * @param key Key to propagate to.
+ * @param len Length of @a key.
+ *
+ * @retval  0 Success, the index was found.
+ * @retval -1 Not found.
+ */
+int
+tuple_field_go_to_key(const char **field, const char *key, int len);
+
 #if defined(__cplusplus)
 } /* extern "C" */
 #endif /* defined(__cplusplus) */
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index ae0bb8e..5f3357a 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -228,7 +228,7 @@ key_hash_slowpath(const char *key, const struct key_def *key_def);
 
 void
 tuple_hash_func_set(struct key_def *key_def) {
-	if (key_def->is_nullable)
+	if (key_def->is_nullable || key_def->has_json_paths)
 		goto slowpath;
 	/*
 	 * Check that key_def defines sequential a key without holes
@@ -262,10 +262,17 @@ tuple_hash_func_set(struct key_def *key_def) {
 	}
 
 slowpath:
-	if (key_def->has_optional_parts)
-		key_def->tuple_hash = tuple_hash_slowpath<true, true>;
-	else
-		key_def->tuple_hash = tuple_hash_slowpath<false, true>;
+	if (key_def->has_optional_parts) {
+		if (key_def->has_json_paths)
+			key_def->tuple_hash = tuple_hash_slowpath<true, false>;
+		else
+			key_def->tuple_hash = tuple_hash_slowpath<true, true>;
+	} else {
+		if (key_def->has_json_paths)
+			key_def->tuple_hash = tuple_hash_slowpath<false, false>;
+		else
+			key_def->tuple_hash = tuple_hash_slowpath<false, true>;
+	}
 	key_def->key_hash = key_hash_slowpath;
 }
 
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index cb3c436..1bb5e22 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -36,6 +36,7 @@
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <small/mempool.h>
+#include <assoc.h>
 
 #include "diag.h"
 #include "errcode.h"
@@ -158,6 +159,48 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env,
 						    NULL);
 		if (lsm->disk_format == NULL)
 			goto fail_format;
+		/*
+		 * Tuple formats should be compatible to make
+		 * epoch-based caching work.
+		 */
+		int32_t min_offset_slot = 0;
+		struct tuple_field *dst_fields = lsm->disk_format->fields;
+		struct mh_strnptr_t *dst_ht = lsm->disk_format->path_hash;
+		struct mh_strnptr_t *src_ht = format->path_hash;
+		struct key_part *part = cmp_def->parts;
+		struct key_part *part_end = part + cmp_def->part_count;
+		for (; part < part_end; part++) {
+			struct tuple_field *dst_field =
+				&dst_fields[part->fieldno];
+			struct tuple_field *src_field;
+			if (dst_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+				src_field = &format->fields[part->fieldno];
+			} else if (dst_fields[part->fieldno].map != NULL) {
+				struct mh_strnptr_node_t *node;
+				node = json_path_hash_get(dst_ht, part->path,
+							  part->path_len,
+							  part->path_hash);
+				assert(node != NULL);
+				dst_field = node->val;
+				assert(dst_field != NULL);
+
+				node = json_path_hash_get(src_ht, part->path,
+							  part->path_len,
+							  part->path_hash);
+				assert(node != NULL);
+				src_field = node->val;
+				assert(dst_field != NULL);
+			} else {
+				continue;
+			}
+			if (src_field->offset_slot == TUPLE_OFFSET_SLOT_NIL)
+				continue;
+			dst_field->offset_slot = src_field->offset_slot;
+			min_offset_slot =
+				MIN(src_field->offset_slot, min_offset_slot);
+		}
+		lsm->disk_format->field_map_size =
+			-min_offset_slot * sizeof(uint32_t);
 	}
 	tuple_format_ref(lsm->disk_format);
 
diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
index 5e43340..5e5ed16 100644
--- a/src/box/vy_point_lookup.c
+++ b/src/box/vy_point_lookup.c
@@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
 		const struct vy_read_view **rv,
 		struct tuple *key, struct tuple **ret)
 {
-	assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
-
 	*ret = NULL;
 	double start_time = ev_monotonic_now(loop());
 	int rc = 0;
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index a4b7975..80f08bb 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -44,6 +44,7 @@
 #include "tuple_format.h"
 #include "xrow.h"
 #include "fiber.h"
+#include "assoc.h"
 
 static struct tuple *
 vy_tuple_new(struct tuple_format *format, const char *data, const char *end)
@@ -321,6 +322,67 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
 	return replace;
 }
 
+static void
+vy_stmt_msgpack_build(struct tuple_field *field, char *tuple,
+		      uint32_t *field_map, char **offset, bool write_data)
+{
+	if (field->type == FIELD_TYPE_ARRAY) {
+		if (write_data)
+			*offset = mp_encode_array(*offset, field->array_size);
+		else
+			*offset += mp_sizeof_array(field->array_size);
+		for (uint32_t i = 0; i < field->array_size; i++) {
+			if (field->array[i] == NULL) {
+				if (write_data)
+					*offset = mp_encode_nil(*offset);
+				else
+					*offset += mp_sizeof_nil();
+				continue;
+			}
+			vy_stmt_msgpack_build(field->array[i], tuple, field_map,
+					      offset, write_data);
+		}
+		return;
+	} else if (field->type == FIELD_TYPE_MAP) {
+		if (write_data)
+			*offset = mp_encode_map(*offset, mh_size(field->map));
+		else
+			*offset += mp_sizeof_map(mh_size(field->map));
+		mh_int_t i;
+		mh_foreach(field->map, i) {
+			struct mh_strnptr_node_t *node =
+				mh_strnptr_node(field->map, i);
+			assert(node);
+			if (write_data) {
+				*offset = mp_encode_str(*offset, node->str,
+							node->len);
+			} else {
+				*offset += mp_sizeof_str(node->len);
+			}
+			vy_stmt_msgpack_build(node->val, tuple, field_map,
+					      offset, write_data);
+		}
+		return;;
+	}
+
+	struct iovec *iov = field->arg;
+	if (iov == NULL) {
+		if (write_data)
+			*offset = mp_encode_nil(*offset);
+		else
+			*offset += mp_sizeof_nil();
+	} else {
+		if (write_data) {
+			uint32_t data_offset = *offset - tuple;
+			memcpy(*offset, iov->iov_base, iov->iov_len);
+			field->arg = NULL;
+			if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
+				field_map[field->offset_slot] = data_offset;
+		}
+		*offset += iov->iov_len;
+	}
+}
+
 static struct tuple *
 vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 			       const struct key_def *cmp_def,
@@ -331,27 +393,45 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	struct region *region = &fiber()->gc;
 
 	uint32_t field_count = format->index_field_count;
-	struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
+	uint32_t part_count = mp_decode_array(&key);
+	assert(part_count == cmp_def->part_count);
+	struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count);
 	if (iov == NULL) {
-		diag_set(OutOfMemory, sizeof(*iov) * field_count,
-			 "region", "iov for surrogate key");
+		diag_set(OutOfMemory, sizeof(*iov) * part_count, "region",
+			"iov for surrogate key");
 		return NULL;
 	}
-	memset(iov, 0, sizeof(*iov) * field_count);
-	uint32_t part_count = mp_decode_array(&key);
-	assert(part_count == cmp_def->part_count);
-	assert(part_count <= field_count);
-	uint32_t nulls_count = field_count - cmp_def->part_count;
-	uint32_t bsize = mp_sizeof_array(field_count) +
-			 mp_sizeof_nil() * nulls_count;
-	for (uint32_t i = 0; i < part_count; ++i) {
-		const struct key_part *part = &cmp_def->parts[i];
+	uint32_t bsize = mp_sizeof_array(field_count);
+	uint32_t nulls_count = field_count;
+	memset(iov, 0, sizeof(*iov) * part_count);
+	const struct key_part *part = cmp_def->parts;
+	for (uint32_t i = 0; i < part_count; ++i, ++part) {
 		assert(part->fieldno < field_count);
 		const char *svp = key;
-		iov[part->fieldno].iov_base = (char *) key;
+		iov[i].iov_base = (char *) key;
 		mp_next(&key);
-		iov[part->fieldno].iov_len = key - svp;
-		bsize += key - svp;
+		iov[i].iov_len = key - svp;
+		struct tuple_field *field;
+		if (part->path == NULL) {
+			field = &format->fields[part->fieldno];
+			--nulls_count;
+		} else {
+			struct mh_strnptr_node_t *node =
+				json_path_hash_get(format->path_hash,
+						   part->path, part->path_len,
+						   part->path_hash);
+			assert(node != NULL);
+			field = node->val;
+			assert(field != NULL);
+		}
+		field->arg = &iov[i];
+	}
+	bsize += nulls_count * mp_sizeof_nil();
+	for (uint32_t i = 0; i < field_count; ++i) {
+		char *data = NULL;
+		vy_stmt_msgpack_build(&format->fields[i], NULL, NULL, &data,
+				      false);
+		bsize += data - (char *)NULL;
 	}
 
 	struct tuple *stmt = vy_stmt_alloc(format, bsize);
@@ -362,17 +442,11 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	uint32_t *field_map = (uint32_t *) raw;
 	char *wpos = mp_encode_array(raw, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
-		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
-			field_map[field->offset_slot] = wpos - raw;
-		if (iov[i].iov_base == NULL) {
-			wpos = mp_encode_nil(wpos);
-		} else {
-			memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
-			wpos += iov[i].iov_len;
-		}
+		vy_stmt_msgpack_build(&format->fields[i], raw, field_map, &wpos,
+				      true);
 	}
-	assert(wpos == raw + bsize);
+
+	assert(wpos <= raw + bsize);
 	vy_stmt_set_type(stmt, type);
 	return stmt;
 }
diff --git a/test/box/misc.result b/test/box/misc.result
index 4895a78..556f004 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -348,7 +348,7 @@ t;
   - 'box.error.CANT_CREATE_COLLATION : 150'
   - 'box.error.USER_EXISTS : 46'
   - 'box.error.WAL_IO : 40'
-  - 'box.error.PROC_RET : 21'
+  - 'box.error.RTREE_RECT : 101'
   - 'box.error.PRIV_GRANTED : 89'
   - 'box.error.CREATE_SPACE : 9'
   - 'box.error.GRANT : 88'
@@ -359,7 +359,7 @@ t;
   - 'box.error.VINYL_MAX_TUPLE_SIZE : 139'
   - 'box.error.LOAD_FUNCTION : 99'
   - 'box.error.INVALID_XLOG : 74'
-  - 'box.error.READ_VIEW_ABORTED : 130'
+  - 'box.error.PRIV_NOT_GRANTED : 91'
   - 'box.error.TRANSACTION_CONFLICT : 97'
   - 'box.error.GUEST_USER_PASSWORD : 96'
   - 'box.error.PROC_C : 102'
@@ -370,7 +370,7 @@ t;
   - 'box.error.CFG : 59'
   - 'box.error.NO_SUCH_FIELD : 37'
   - 'box.error.CONNECTION_TO_SELF : 117'
-  - 'box.error.FUNCTION_MAX : 54'
+  - 'box.error.PROC_LUA : 32'
   - 'box.error.ILLEGAL_PARAMS : 1'
   - 'box.error.PARTIAL_KEY : 136'
   - 'box.error.SAVEPOINT_NO_TRANSACTION : 114'
@@ -397,36 +397,37 @@ t;
   - 'box.error.FUNCTION_EXISTS : 52'
   - 'box.error.UPDATE_ARG_TYPE : 26'
   - 'box.error.CROSS_ENGINE_TRANSACTION : 81'
-  - 'box.error.FORMAT_MISMATCH_INDEX_PART : 27'
   - 'box.error.injection : table: <address>
+  - 'box.error.FORMAT_MISMATCH_INDEX_PART : 27'
+  - 'box.error.IDENTIFIER : 70'
   - 'box.error.FUNCTION_TX_ACTIVE : 30'
-  - 'box.error.ITERATOR_TYPE : 72'
   - 'box.error.TRANSACTION_YIELD : 154'
+  - 'box.error.NULLABLE_MISMATCH : 153'
   - 'box.error.NO_SUCH_ENGINE : 57'
   - 'box.error.COMMIT_IN_SUB_STMT : 122'
-  - 'box.error.NULLABLE_MISMATCH : 153'
-  - 'box.error.UNSUPPORTED : 5'
-  - 'box.error.LAST_DROP : 15'
+  - 'box.error.RELOAD_CFG : 58'
   - 'box.error.SPACE_FIELD_IS_DUPLICATE : 149'
+  - 'box.error.LAST_DROP : 15'
+  - 'box.error.SEQUENCE_OVERFLOW : 147'
   - 'box.error.DECOMPRESSION : 124'
   - 'box.error.CREATE_SEQUENCE : 142'
   - 'box.error.CREATE_USER : 43'
-  - 'box.error.SEQUENCE_OVERFLOW : 147'
+  - 'box.error.DATA_MISMATCH_INDEX_PART : 55'
   - 'box.error.INSTANCE_UUID_MISMATCH : 66'
-  - 'box.error.RELOAD_CFG : 58'
+  - 'box.error.TUPLE_FORMAT_LIMIT : 16'
   - 'box.error.SYSTEM : 115'
   - 'box.error.KEY_PART_IS_TOO_LONG : 118'
-  - 'box.error.MORE_THAN_ONE_TUPLE : 41'
+  - 'box.error.INJECTION : 8'
   - 'box.error.TRUNCATE_SYSTEM_SPACE : 137'
   - 'box.error.NO_SUCH_SAVEPOINT : 61'
   - 'box.error.VY_QUOTA_TIMEOUT : 135'
-  - 'box.error.PRIV_NOT_GRANTED : 91'
+  - 'box.error.READ_VIEW_ABORTED : 130'
   - 'box.error.WRONG_INDEX_OPTIONS : 108'
   - 'box.error.INVALID_VYLOG_FILE : 133'
   - 'box.error.INDEX_FIELD_COUNT_LIMIT : 127'
-  - 'box.error.BEFORE_REPLACE_RET : 53'
+  - 'box.error.PROTOCOL : 104'
   - 'box.error.USER_MAX : 56'
-  - 'box.error.INVALID_MSGPACK : 20'
+  - 'box.error.BEFORE_REPLACE_RET : 53'
   - 'box.error.TUPLE_NOT_ARRAY : 22'
   - 'box.error.KEY_PART_COUNT : 31'
   - 'box.error.ALTER_SPACE : 12'
@@ -435,7 +436,7 @@ t;
   - 'box.error.DROP_SEQUENCE : 144'
   - 'box.error.INVALID_XLOG_ORDER : 76'
   - 'box.error.UNKNOWN_REQUEST_TYPE : 48'
-  - 'box.error.PROC_LUA : 32'
+  - 'box.error.PROC_RET : 21'
   - 'box.error.SUB_STMT_MAX : 121'
   - 'box.error.ROLE_NOT_GRANTED : 92'
   - 'box.error.SPACE_EXISTS : 10'
@@ -446,36 +447,36 @@ t;
   - 'box.error.REPLICASET_UUID_MISMATCH : 63'
   - 'box.error.UPDATE_FIELD : 29'
   - 'box.error.INDEX_EXISTS : 85'
-  - 'box.error.SPLICE : 25'
+  - 'box.error.DROP_SPACE : 11'
   - 'box.error.COMPRESSION : 119'
   - 'box.error.INVALID_ORDER : 68'
-  - 'box.error.UNKNOWN : 0'
+  - 'box.error.SPLICE : 25'
   - 'box.error.NO_SUCH_GROUP : 155'
-  - 'box.error.TUPLE_FORMAT_LIMIT : 16'
+  - 'box.error.INVALID_MSGPACK : 20'
   - 'box.error.DROP_PRIMARY_KEY : 17'
   - 'box.error.NULLABLE_PRIMARY : 152'
   - 'box.error.NO_SUCH_SEQUENCE : 145'
-  - 'box.error.INJECTION : 8'
+  - 'box.error.FUNCTION_MAX : 54'
   - 'box.error.INVALID_UUID : 64'
-  - 'box.error.IDENTIFIER : 70'
+  - 'box.error.UNSUPPORTED : 5'
   - 'box.error.TIMEOUT : 78'
+  - 'box.error.ITERATOR_TYPE : 72'
   - 'box.error.REPLICA_MAX : 73'
   - 'box.error.NO_SUCH_ROLE : 82'
-  - 'box.error.DROP_SPACE : 11'
   - 'box.error.MISSING_REQUEST_FIELD : 69'
   - 'box.error.MISSING_SNAPSHOT : 93'
   - 'box.error.WRONG_SPACE_OPTIONS : 111'
   - 'box.error.READONLY : 7'
-  - 'box.error.RTREE_RECT : 101'
+  - 'box.error.UNKNOWN : 0'
   - 'box.error.UPSERT_UNIQUE_SECONDARY_KEY : 105'
   - 'box.error.NO_CONNECTION : 77'
   - 'box.error.UNSUPPORTED_PRIV : 98'
   - 'box.error.WRONG_SCHEMA_VERSION : 109'
   - 'box.error.ROLLBACK_IN_SUB_STMT : 123'
-  - 'box.error.PROTOCOL : 104'
-  - 'box.error.INVALID_XLOG_TYPE : 125'
-  - 'box.error.INDEX_PART_TYPE_MISMATCH : 24'
+  - 'box.error.MORE_THAN_ONE_TUPLE : 41'
   - 'box.error.UNSUPPORTED_INDEX_FEATURE : 112'
+  - 'box.error.INDEX_PART_TYPE_MISMATCH : 24'
+  - 'box.error.INVALID_XLOG_TYPE : 125'
 ...
 test_run:cmd("setopt delimiter ''");
 ---
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 7fb0916..e9efb16 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -891,6 +891,277 @@ t["{"]
 s:drop()
 ---
 ...
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO["fname"]'}, {3, 'str', path = '[3]["FIO"].fname'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
+    part is indexed twice'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '[3]["FIO"]["fname"]'}}})
+---
+- error: 'Wrong index options (field 2): ''path'' must be string'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'field.FIO.fname'}}})
+---
+- error: 'Wrong index options (field 2): invalid JSON path: first part should be defined
+    as array index'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '[3].FIO'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+    ''map'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[3][1]'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+    ''array'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO'}, {3, 'str', path = '[3]["FIO"].fname'}}})
+---
+- error: Field 3 has type 'map' in one index, but type 'string' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3][1].sname'}, {3, 'str', path = '[3]["FIO"].fname'}}})
+---
+- error: Field 3 has type 'map' in one index, but type 'array' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fname'}}})
+---
+- error: 'Wrong index options (field 2): invalid JSON path: first part refers to invalid
+    field'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}})
+---
+- error: 'Wrong index options (field 3): invalid JSON path ''[3].FIO....fname'': path
+    has invalid structure (error at position 9)'
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+---
+- error: Tuple doesn't math document structure defined as index
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+---
+- error: 'Supplied key type of part 2 does not match index part type: expected string'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+---
+- error: Tuple doesn't math document structure defined as index
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+  4, 5]
+...
+idx:select()
+---
+- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+  - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+    4, 5]
+...
+idx:min()
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:max()
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+  4, 5]
+...
+s:drop()
+---
+...
+s = box.schema.create_space('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[1][2]'}
+---
+...
+pk = s:create_index('pk', {parts = parts})
+---
+...
+s:insert{{1, 2}, 3}
+---
+- [[1, 2], 3]
+...
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+---
+...
+s:get(2)
+---
+- [[1, 2], 8]
+...
+s:drop()
+---
+...
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk = s:create_index('primary', { type = 'tree' })
+---
+...
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+---
+- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5]
+...
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+---
+- error: Tuple doesn't math document structure defined as index
+...
+_ = s:delete(1)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+_ = s:delete(2)
+---
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+idx:select()
+---
+- - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:min()
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:max()
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:drop()
+---
+...
+s:drop()
+---
+...
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'str', path='[1][3][2].a'}
+---
+...
+parts[2] = {1, 'unsigned', path = '[1][3][1]'}
+---
+...
+parts[3] = {2, 'str', path = '[2][2].d[1]'}
+---
+...
+pk = s:create_index('primary', { type = 'tree', parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- [[1, 2, [3, {'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- error: Duplicate key exists in unique index 'primary' in space 'withdata'
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[1][3][2].b' }
+---
+...
+parts[2] = {3, 'unsigned'}
+---
+...
+crosspart_idx = s:create_index('crosspart', { parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[1][3][2].b'}
+---
+...
+num_idx = s:create_index('numeric', {parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+num_idx:get(2)
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+num_idx:select()
+---
+- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+  - [[1, 2, [3, {'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+  - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+num_idx:max()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+num_idx:min()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6]
+...
+assert(crosspart_idx:max() == num_idx:max())
+---
+- true
+...
+assert(crosspart_idx:min() == num_idx:min())
+---
+- true
+...
+s:drop()
+---
+...
 engine = nil
 ---
 ...
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index 30d6f1a..d20a547 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -289,5 +289,85 @@ t["{"]
 
 s:drop()
 
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+s = box.schema.space.create('withdata', {engine = engine})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO["fname"]'}, {3, 'str', path = '[3]["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '[3]["FIO"]["fname"]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'field.FIO.fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '[3].FIO'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[3][1]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO'}, {3, 'str', path = '[3]["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3][1].sname'}, {3, 'str', path = '[3]["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}})
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+assert(idx ~= nil)
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+s:drop()
+
+s = box.schema.create_space('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'unsigned', path='[1][2]'}
+pk = s:create_index('pk', {parts = parts})
+s:insert{{1, 2}, 3}
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+s:get(2)
+s:drop()
+
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+pk = s:create_index('primary', { type = 'tree' })
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+_ = s:delete(1)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+_ = s:delete(2)
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
+assert(idx ~= nil)
+idx:select()
+idx:min()
+idx:max()
+idx:drop()
+s:drop()
+
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'str', path='[1][3][2].a'}
+parts[2] = {1, 'unsigned', path = '[1][3][1]'}
+parts[3] = {2, 'str', path = '[2][2].d[1]'}
+pk = s:create_index('primary', { type = 'tree', parts =  parts})
+s:insert{{1, 2, {3, {a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+parts = {}
+parts[1] = {1, 'unsigned', path='[1][3][2].b' }
+parts[2] = {3, 'unsigned'}
+crosspart_idx = s:create_index('crosspart', { parts =  parts})
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+parts = {}
+parts[1] = {1, 'unsigned', path='[1][3][2].b'}
+num_idx = s:create_index('numeric', {parts =  parts})
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+num_idx:get(2)
+num_idx:select()
+num_idx:max()
+num_idx:min()
+assert(crosspart_idx:max() == num_idx:max())
+assert(crosspart_idx:min() == num_idx:min())
+s:drop()
+
 engine = nil
 test_run = nil
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form
  2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
                   ` (3 preceding siblings ...)
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 4/5] box: introduce path_hash and tuple_field tree Kirill Shcherbatov
@ 2018-08-15 12:15 ` Kirill Shcherbatov
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
  2018-08-22  0:28   ` Vladislav Shpilevoy
  4 siblings, 2 replies; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-15 12:15 UTC (permalink / raw)
  To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov

Since now it is possible to create indexes by JSON-path
using field names specified in format.

@TarantoolBot document
Title: Indexes by JSON path
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.

Example:
s:create_index('json_index',
               {parts = {{'data.FIO["fname"]', 'str'}}})

Part of #1012.
---
 src/box/lua/schema.lua      | 58 +++++++++++++++++++++++++++++++++++++++------
 test/engine/iterator.result |  2 +-
 test/engine/tuple.result    | 36 ++++++++++++++++++++++++++++
 test/engine/tuple.test.lua  | 10 ++++++++
 4 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index b9b8c90..62b83aa 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -556,6 +556,48 @@ local function update_index_parts_1_6_0(parts)
     return result
 end
 
+local function format_field_index_by_name(format, name)
+    for k,v in pairs(format) do
+        if v.name == name then
+            return k
+        end
+    end
+    return nil
+end
+
+local function format_field_resolve(format, name)
+    local idx = nil
+    local field_name = nil
+    -- try resolve whole name
+    idx = format_field_index_by_name(format, name)
+    if idx ~= nil then
+        return idx, nil
+    end
+    -- try resolve field by name
+    field_name = string.match(name, "^[a-z][a-z,0-9]*")
+    if field_name ~= nil then
+        idx = format_field_index_by_name(format, field_name)
+        local suffix = string.sub(name, string.len(field_name) + 2)
+        if idx ~= nil and suffix ~= nil then
+            return idx, string.format("[%d]%s", idx, suffix)
+        end
+        -- simplified json_path
+        return idx, nil
+    end
+    -- try resolve field by index
+    field_name = string.match(name, "^%[[0-9]*%]")
+    if field_name ~= nil then
+        idx = tonumber(string.sub(field_name, 2, -2))
+        local suffix = string.sub(name, string.len(field_name) + 2)
+        if suffix == nil then
+            -- simplified json_path
+            return idx, nil
+        end
+        return idx, name
+    end
+    return nil, nil
+end
+
 local function update_index_parts(format, parts)
     if type(parts) ~= "table" then
         box.error(box.error.ILLEGAL_PARAMS,
@@ -607,16 +649,18 @@ local function update_index_parts(format, parts)
             box.error(box.error.ILLEGAL_PARAMS,
                       "options.parts[" .. i .. "]: field (name or number) is expected")
         elseif type(part.field) == 'string' then
-            for k,v in pairs(format) do
-                if v.name == part.field then
-                    part.field = k
-                    break
-                end
-            end
-            if type(part.field) == 'string' then
+            local idx, path = format_field_resolve(format, part.field)
+            if idx == nil then
                 box.error(box.error.ILLEGAL_PARAMS,
                           "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
             end
+            if part.path ~= nil and part.path ~= path then
+                box.error(box.error.ILLEGAL_PARAMS,
+                          "options.parts[" .. i .. "]: field path '"..part.path.." doesn't math path resolved by name '" .. part.field .. "'")
+            end
+            parts_can_be_simplified = parts_can_be_simplified and path == nil
+            part.field = idx
+            part.path = path or part.path
         elseif part.field == 0 then
             box.error(box.error.ILLEGAL_PARAMS,
                       "options.parts[" .. i .. "]: field (number) must be one-based")
diff --git a/test/engine/iterator.result b/test/engine/iterator.result
index 10097ed..0bca87e 100644
--- a/test/engine/iterator.result
+++ b/test/engine/iterator.result
@@ -4213,7 +4213,7 @@ s:replace{35}
 ...
 state, value = gen(param,state)
 ---
-- error: 'builtin/box/schema.lua:1034: usage: next(param, state)'
+- error: 'builtin/box/schema.lua:1078: usage: next(param, state)'
 ...
 value
 ---
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index e9efb16..775c86b 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -946,6 +946,42 @@ assert(idx ~= nil)
 ---
 - true
 ...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+- error: Field 3 has type 'map' in one index, but type 'array' in another
+...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+...
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {'[3]["FIO"]["fname"]', 'str', path = '[3].FIO.fname'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field path ''[3].FIO.fname doesn''t
+    math path resolved by name ''[3]["FIO"]["fname"]'''
+...
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {'[3]["FIO"]["fname"]', 'str', path = '[3]["FIO"]["fname"]'}}})
+---
+...
+assert(idx2 ~= nil)
+---
+- true
+...
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+---
+...
+assert(idx3 ~= nil)
+---
+- true
+...
+assert(idx2.parts[2].path == "[3][\"FIO\"][\"fname\"]")
+---
+- true
+...
 s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
 ---
 - error: Tuple doesn't math document structure defined as index
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index d20a547..53f434b 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -304,6 +304,16 @@ s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fnam
 s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}})
 idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}})
 assert(idx ~= nil)
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {'[3]["FIO"]["fname"]', 'str', path = '[3].FIO.fname'}}})
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {'[3]["FIO"]["fname"]', 'str', path = '[3]["FIO"]["fname"]'}}})
+assert(idx2 ~= nil)
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+assert(idx3 ~= nil)
+assert(idx2.parts[2].path == "[3][\"FIO\"][\"fname\"]")
 s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
 s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
 s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 5/5] box: specify indexes in user-friendly form
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
@ 2018-08-22  0:26   ` Vladislav Shpilevoy
  2018-08-27  7:37     ` Kirill Shcherbatov
  2018-08-22  0:28   ` Vladislav Shpilevoy
  1 sibling, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2018-08-22  0:26 UTC (permalink / raw)
  To: Kirill Shcherbatov, tarantool-patches

Thanks for the patch! See 3 comments below.

On 15/08/2018 15:15, Kirill Shcherbatov wrote:
> Since now it is possible to create indexes by JSON-path
> using field names specified in format.
> 
> @TarantoolBot document
> Title: Indexes by JSON path
> Sometimes field data could have complex document structure.
> When this structure is consistent across whole document,
> you are able to create an index by JSON path.
> 
> Example:
> s:create_index('json_index',
>                 {parts = {{'data.FIO["fname"]', 'str'}}})
> 
> Part of #1012.
> ---
>   src/box/lua/schema.lua      | 58 +++++++++++++++++++++++++++++++++++++++------
>   test/engine/iterator.result |  2 +-
>   test/engine/tuple.result    | 36 ++++++++++++++++++++++++++++
>   test/engine/tuple.test.lua  | 10 ++++++++
>   4 files changed, 98 insertions(+), 8 deletions(-)
> 
> diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
> index b9b8c90..62b83aa 100644
> --- a/src/box/lua/schema.lua
> +++ b/src/box/lua/schema.lua
> @@ -556,6 +556,48 @@ local function update_index_parts_1_6_0(parts)
>       return result
>   end
>   
> +local function format_field_index_by_name(format, name)
> +    for k,v in pairs(format) do
> +        if v.name == name then
> +            return k
> +        end
> +    end
> +    return nil
> +end
> +
> +local function format_field_resolve(format, name)
> +    local idx = nil
> +    local field_name = nil
> +    -- try resolve whole name
> +    idx = format_field_index_by_name(format, name)
> +    if idx ~= nil then
> +        return idx, nil
> +    end
> +    -- try resolve field by name
> +    field_name = string.match(name, "^[a-z][a-z,0-9]*")
> +    if field_name ~= nil then
> +        idx = format_field_index_by_name(format, field_name)
> +        local suffix = string.sub(name, string.len(field_name) + 2)
> +        if idx ~= nil and suffix ~= nil then
> +            return idx, string.format("[%d]%s", idx, suffix)
> +        end
> +        -- simplified json_path
> +        return idx, nil
> +    end
> +    -- try resolve field by index
> +    field_name = string.match(name, "^%[[0-9]*%]")
> +    if field_name ~= nil then
> +        idx = tonumber(string.sub(field_name, 2, -2))
> +        local suffix = string.sub(name, string.len(field_name) + 2)
> +        if suffix == nil then
> +            -- simplified json_path
> +            return idx, nil
> +        end
> +        return idx, name
> +    end
> +    return nil, nil
> +end

1. Please, write comments.

2. Start a sentence with a capital letter and finish
with the dot.

3. Please, move this routes to C and use tuple_dictionary +
json path parser.

> +
>   local function update_index_parts(format, parts)
>       if type(parts) ~= "table" then
>           box.error(box.error.ILLEGAL_PARAMS,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 4/5] box: introduce path_hash and tuple_field tree
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 4/5] box: introduce path_hash and tuple_field tree Kirill Shcherbatov
@ 2018-08-22  0:26   ` Vladislav Shpilevoy
  2018-08-27  7:37     ` Kirill Shcherbatov
  0 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2018-08-22  0:26 UTC (permalink / raw)
  To: tarantool-patches, Kirill Shcherbatov

Thanks for the patch! See 49 comments below.

On 15/08/2018 15:15, Kirill Shcherbatov wrote:
> To work with JSON-defined indexes we introduce format JSON
> path hashtable data_path and a tree of intermediate path
> fields attached to format root fields.
> 
> <Hashtable>: format->data_path
> [2].FIO.fname -> field "fname" {type=str, off_slot=-1}
> [2].FIO.sname -> field "sname" {type=str, off_slot=-2}
> 
> <Tree>:      format->field[2] {type = map}
>                             |
>                     FIO {type = map}
>                             |
>          "fname"            |             "sname"
> {type=str,off_slot=-1} ____|____ {type = str,off_slot=-2}
> 
> Leaf fields used in Index have initialized offset_slot.
> On new tuple creation we observe fields tree and use leaf
> records to init tuple field_map.
> At the same time we use data_path hashtable on tuple data
> access by index(when cached offset_slot is invalid).
> All paths stored at the end of format allocation:
> JSON-tree fields same as format->path_hash point to them.
> 
> +------------+------------+-------+------------+-------+
> |tuple_format|tuple_field1|  ...  |tuple_fieldN| pathK |
> +------------+------------+-------+------------+-------+
> 
> New routine tuple_format_add_json_path is used to construct
> all internal structures for JSON path on new format creation
> and duplicating.
> 
> Part of #1012.
> ---
>   src/box/errcode.h            |   2 +-
>   src/box/key_def.c            |   3 +
>   src/box/key_def.h            |   2 +
>   src/box/tuple.c              |  25 +-
>   src/box/tuple_compare.cc     |  41 ++-
>   src/box/tuple_extract_key.cc | 137 +++++++++--
>   src/box/tuple_format.c       | 575 +++++++++++++++++++++++++++++++++++++++----
>   src/box/tuple_format.h       |  84 ++++++-
>   src/box/tuple_hash.cc        |  17 +-
>   src/box/vy_lsm.c             |  43 ++++
>   src/box/vy_point_lookup.c    |   2 -
>   src/box/vy_stmt.c            | 124 ++++++++--
>   test/box/misc.result         |  51 ++--
>   test/engine/tuple.result     | 271 ++++++++++++++++++++
>   test/engine/tuple.test.lua   |  80 ++++++
>   15 files changed, 1314 insertions(+), 143 deletions(-)
> 
> diff --git a/src/box/errcode.h b/src/box/errcode.h
> index 3d5f66a..8cbd59d 100644
> --- a/src/box/errcode.h
> +++ b/src/box/errcode.h
> @@ -107,7 +107,7 @@ struct errcode_record {
>   	/* 52 */_(ER_FUNCTION_EXISTS,		"Function '%s' already exists") \
>   	/* 53 */_(ER_BEFORE_REPLACE_RET,	"Invalid return value of space:before_replace trigger: expected tuple or nil, got %s") \
>   	/* 54 */_(ER_FUNCTION_MAX,		"A limit on the total number of functions has been reached: %u") \
> -	/* 55 */_(ER_UNUSED4,			"") \
> +	/* 55 */_(ER_DATA_MISMATCH_INDEX_PART,	"Tuple doesn't math document structure defined as index") \

1. Document structure in future could be defined not only
via indexes.

>   	/* 56 */_(ER_USER_MAX,			"A limit on the total number of users has been reached: %u") \
>   	/* 57 */_(ER_NO_SUCH_ENGINE,		"Space engine '%s' does not exist") \
>   	/* 58 */_(ER_RELOAD_CFG,		"Can't set option '%s' dynamically") \
> diff --git a/src/box/tuple.c b/src/box/tuple.c
> index d7dbad3..130acf7 100644
> --- a/src/box/tuple.c
> +++ b/src/box/tuple.c
> @@ -135,6 +135,18 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
>   	smfree(&runtime_alloc, tuple, total);
>   }
>   

2. A comment, please.

> +static int
> +tuple_validate_json_path_data(const struct tuple_field *field, uint32_t idx,
> +			      const char *tuple, const char *offset, void *ctx)

3. Please, add suffix '_cb' which we use for callbacks.

> +{
> +	(void)tuple;
> +	(void)ctx;
> +	if (key_mp_type_validate(field->type, mp_typeof(*offset),
> +				 ER_KEY_PART_TYPE, idx, field->is_nullable) != 0)

4. Why not just 'return key_mp_type_validate(...);'?

> +		return -1;
> +	return 0;> +}
> +
>   int
>   tuple_validate_raw(struct tuple_format *format, const char *tuple)
>   {
> @@ -159,14 +171,23 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
>   
>   	/* Check field types */
>   	struct tuple_field *field = &format->fields[0];
> +	const char *pos = tuple;
>   	uint32_t i = 0;
>   	uint32_t defined_field_count = MIN(field_count, format->field_count);
>   	for (; i < defined_field_count; ++i, ++field) {
> -		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
> +		if (key_mp_type_validate(field->type, mp_typeof(*pos),
>   					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
>   					 field->is_nullable))
>   			return -1;
> -		mp_next(&tuple);
> +		/* Check all JSON paths. */
> +		if (field->array != NULL) {
> +			json_field_tree_routine func =
> +				tuple_validate_json_path_data;
> +			if (json_field_tree_exec_routine(field, i, tuple, pos,
> +							 func, NULL) != 0)
> +				return -1;
> +		}
> +		mp_next(&pos);

5. The same problem as in the previous version - you double decode
the field. First inside the tree walker and second here.

Please, find a way how to avoid it. For help see my comments
for the walker.

>   	}
>   	return 0;
>   }
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index 923e71c..490b528 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
>   	const struct key_part *part = key_def->parts;
>   	const char *tuple_a_raw = tuple_data(tuple_a);
>   	const char *tuple_b_raw = tuple_data(tuple_b);
> -	if (key_def->part_count == 1 && part->fieldno == 0) {
> +	if (key_def->part_count == 1 && part->fieldno == 0 &&
> +	    part->path == NULL) {

6. It should be part of the previous patch together with any
changes of this file.

>   		/*
>   		 * First field can not be optional - empty tuples
>   		 * can not exist.> diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
> index 0301186..4c0e201 100644
> --- a/src/box/tuple_extract_key.cc
> +++ b/src/box/tuple_extract_key.cc
> @@ -1,15 +1,25 @@
>   #include "tuple_extract_key.h"
>   #include "tuple.h"
>   #include "fiber.h"
> +#include "json/path.h"
>   
>   enum { MSGPACK_NULL = 0xc0 };
>   
> +static bool
> +key_def_parts_are_sequential(const struct key_def *def, int i)
> +{
> +	uint32_t fieldno1 = def->parts[i].fieldno + 1;
> +	uint32_t fieldno2 = def->parts[i + 1].fieldno;
> +	return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
> +		def->parts[i + 1].path == NULL;
> +}

7. Static inline and a comment. Also, this function is called
from places having 'is_flat' template parameter which
allows to do not check path for NULL on compilation time, but
you do not use it. Please, make it template too.

> +
>   /** True, if a key con contain two or more parts in sequence. */
>   static bool
>   key_def_contains_sequential_parts(const struct key_def *def)
>   {
>   	for (uint32_t i = 0; i < def->part_count - 1; ++i) {
> -		if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno)
> +		if (key_def_parts_are_sequential(def, i))
>   			return true;
>   	}
>   	return false;
> @@ -132,8 +142,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
>   			 * minimize tuple_field_raw() calls.
>   			 */
>   			for (; i < part_count - 1; i++) {
> -				if (key_def->parts[i].fieldno + 1 !=
> -				    key_def->parts[i + 1].fieldno) {
> +				if (!key_def_parts_are_sequential(key_def, i)) {
>   					/*
>   					 * End of sequential part.
>   					 */
> @@ -180,8 +189,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
>   			 * minimize tuple_field_raw() calls.
>   			 */
>   			for (; i < part_count - 1; i++) {
> -				if (key_def->parts[i].fieldno + 1 !=
> -				    key_def->parts[i + 1].fieldno) {
> +				if (!key_def_parts_are_sequential(key_def, i)) {
>   					/*
>   					 * End of sequential part.
>   					 */
> @@ -214,12 +222,13 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
>    * General-purpose version of tuple_extract_key_raw()
>    * @copydoc tuple_extract_key_raw()
>    */
> -template <bool has_optional_parts>
> +template <bool has_optional_parts, bool is_flat>
>   static char *
>   tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
>   			       const struct key_def *key_def,
>   			       uint32_t *key_size)
>   {
> +	assert(!is_flat == key_def->has_json_paths);
>   	assert(!has_optional_parts || key_def->is_nullable);
>   	assert(has_optional_parts == key_def->has_optional_parts);
>   	assert(mp_sizeof_nil() == 1);
> @@ -247,11 +256,11 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
>   		uint32_t fieldno = key_def->parts[i].fieldno;
>   		uint32_t null_count = 0;
>   		for (; i < key_def->part_count - 1; i++) {
> -			if (key_def->parts[i].fieldno + 1 !=
> -			    key_def->parts[i + 1].fieldno)
> +			if (!key_def_parts_are_sequential(key_def, i))
>   				break;
>   		}
> -		uint32_t end_fieldno = key_def->parts[i].fieldno;
> +		const struct key_part *part = &key_def->parts[i];
> +		uint32_t end_fieldno = part->fieldno;
>   
>   		if (fieldno < current_fieldno) {
>   			/* Rewind. */

8. All the changes between this point and point 7 should be
part of the previous patch.

> @@ -293,6 +302,38 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
>   				current_fieldno++;
>   			}
>   		}
> +		const char *field_last, *field_end_last;
> +		if (!is_flat && part->path != NULL) {
> +			field_last = field;
> +			field_end_last = field_end;
> +			struct json_path_parser parser;
> +			struct json_path_node node;
> +			json_path_parser_create(&parser, part->path,
> +						part->path_len);
> +			/* Skip fieldno. */

9. First level fieldno should not be encoded in the path. See
the previous patch for comments.

> +			int rc = json_path_next(&parser, &node);
> +			assert(rc == 0);
> +			while ((rc = json_path_next(&parser, &node)) == 0 &&
> +				node.type != JSON_PATH_END) {
> +				switch(node.type) {
> +				case JSON_PATH_NUM:
> +					rc = tuple_field_go_to_index(&field,
> +								     node.num);
> +					break;
> +				case JSON_PATH_STR:
> +					rc = tuple_field_go_to_key(&field,
> +								   node.str,
> +								   node.len);
> +					break;
> +				default:
> +					unreachable();
> +				}

10. You have exactly the same code in tuple_field_raw_by_path. Please,
extract either the whole while cycle or this switch into a function.

> +				assert(rc == 0);
> +			}
> +			assert(rc == 0 && node.type == JSON_PATH_END);
> +			field_end = field;
> +			mp_next(&field_end);
> +		}
>   		memcpy(key_buf, field, field_end - field);
>   		key_buf += field_end - field;
>   		if (has_optional_parts && null_count != 0) {
> @@ -330,32 +375,74 @@ tuple_extract_key_set(struct key_def *key_def)
>   		if (key_def->has_optional_parts) {
>   			assert(key_def->is_nullable);
>   			if (key_def_contains_sequential_parts(key_def)) {
> -				key_def->tuple_extract_key =
> -					tuple_extract_key_slowpath<true, true,
> -								   true>;
> +				if (key_def->has_json_paths) {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<true,
> +									   true,
> +									   false>;
> +				} else {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<true,
> +									   true,
> +									   true>;
> +				}
>   			} else {
> -				key_def->tuple_extract_key =
> -					tuple_extract_key_slowpath<false, true,
> -								   true>;
> +				if (key_def->has_json_paths) {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<false,
> +									   true,
> +									   false>;
> +				} else {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<false,
> +									   true,
> +									   true>;
> +				}
>   			}
>   		} else {
>   			if (key_def_contains_sequential_parts(key_def)) {
> -				key_def->tuple_extract_key =
> -					tuple_extract_key_slowpath<true, false,
> -								   true>;
> +				if (key_def->has_json_paths) {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<true,
> +									   false,
> +									   false>;
> +				} else {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<true,
> +									   false,
> +									   true>;
> +				}
>   			} else {
> -				key_def->tuple_extract_key =
> -					tuple_extract_key_slowpath<false, false,
> -								   true>;
> +				if (key_def->has_json_paths) {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<false,
> +									   false,
> +									   false>;
> +				} else {
> +					key_def->tuple_extract_key =
> +						tuple_extract_key_slowpath<false,
> +									   false,
> +									   true>;
> +				}
>   			}
>   		}

11. At first, it is a part of the previous patch. At second, it
run too far. It is time to declare a matrix of functions, maybe
three dimensional or two, indexes of which are calculated from
contains_sequential_parts, has_optional_parts and is_flat. See
field_type1_contains_type2 for an example.

Same for other functions, having >= 3 template parameters.

>   	}
>   	if (key_def->has_optional_parts) {
>   		assert(key_def->is_nullable);
> -		key_def->tuple_extract_key_raw =
> -			tuple_extract_key_slowpath_raw<true>;
> +		if (key_def->has_json_paths) {
> +			key_def->tuple_extract_key_raw =
> +				tuple_extract_key_slowpath_raw<true, false>;
> +		} else {
> +			key_def->tuple_extract_key_raw =
> +				tuple_extract_key_slowpath_raw<true, true>;
> +		}
>   	} else {
> -		key_def->tuple_extract_key_raw =
> -			tuple_extract_key_slowpath_raw<false>;
> +		if (key_def->has_json_paths) {
> +			key_def->tuple_extract_key_raw =
> +				tuple_extract_key_slowpath_raw<false, false>;
> +		} else {
> +			key_def->tuple_extract_key_raw =
> +				tuple_extract_key_slowpath_raw<false, true>;
> +		}
>   	}
>   }
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index a9fddc0..1821950 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -38,9 +39,336 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
>   static uint32_t formats_size = 0, formats_capacity = 0;
>   
>   static const struct tuple_field tuple_field_default = {
> -	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false,
> +	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, {{NULL, 0}}
>   };
>   
> +struct mh_strnptr_node_t *
> +json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path,
> +		   uint32_t path_len, uint32_t path_hash)

12. Why do you return node_t if you always use only node->val? Why not
to return val?

> +{
> +	assert(hashtable != NULL);
> +	struct mh_strnptr_key_t key = {path, path_len, path_hash};
> +	mh_int_t rc = mh_strnptr_find(hashtable, &key, NULL);
> +	if (rc == mh_end(hashtable))
> +		return NULL;
> +	return mh_strnptr_node(hashtable, rc);
> +}
> +
> +/**
> + * Create a new hashtable object.
> + * @param[out] hashtable pointer to object to create.
> + * @param records count of records to reserve.
> + * @retval -1 on error.
> + * @retval 0 on success.
> + */
> +static int
> +json_path_hash_create(struct mh_strnptr_t **hashtable, uint32_t records)

13. Why so complicated? Just rename it to new and return the hash
instead of int.

> +{
> +	struct mh_strnptr_t *ret = mh_strnptr_new();
> +	if (ret == NULL) {
> +		diag_set(OutOfMemory, sizeof(*hashtable), "mh_strnptr_new",
> +			 "hashtable");
> +		return -1;
> +	}
> +	if (mh_strnptr_reserve(ret, records, NULL) != 0) {
> +		mh_strnptr_delete(ret);
> +		diag_set(OutOfMemory, records, "mh_strnptr_reserve",
> +			 "hashtable");
> +		return -1;
> +	}
> +	*hashtable = ret;
> +	return 0;
> +}
> +/**

14. Please, put an empty line between the functions.

> + * Delete @hashtable object.
> + * @param hashtable pointer to object to delete.
> + */
> +static void
> +json_path_hash_delete(struct mh_strnptr_t *hashtable)
> +{
> +	if (hashtable == NULL)
> +		return;

15. Do not use SQLite style of freeing objects, please.

> +	while (mh_size(hashtable)) {
> +		mh_int_t n = mh_first(hashtable);
> +		mh_strnptr_del(hashtable, n, NULL);
> +	}
> +	mh_strnptr_delete(hashtable);
> +}
> +
> +/**
> + * Insert a new record to hashtable.
> + * @param hashtable storage to insert new record.
> + * @param path string.
> + * @param path_len length of @path.
> + * @param tuple_field value to store in @hashtable.
> + * @retval -1 on error.
> + * @retval 0 on success.
> + */
> +static int
> +json_path_hash_insert(struct mh_strnptr_t *hashtable, const char *path,
> +		      uint32_t path_len, struct tuple_field *field)
> +{
> +	assert(hashtable != NULL);
> +	/* Test if record already present in hash. */

16. How is it possible that the record exists already? And
where do you check it here?

> +	uint32_t path_hash = mh_strn_hash(path, path_len);
> +	struct mh_strnptr_node_t name_node =
> +		{path, path_len, path_hash, field};

17. Fits in one line.

> +	mh_int_t rc = mh_strnptr_put(hashtable, &name_node, NULL, NULL);
> +	if (rc == mh_end(hashtable)) {
> +		diag_set(OutOfMemory, sizeof(*hashtable), "mh_strnptr_put",
> +			"hashtable");
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +/**
> + * Construct field tree level for JSON path part.
> + *
> + * @param[in, out] tuple_field pointer to record to start with
> + *                 would be changed to record that math
> + *                 @part lexeme.
> + * @param fieldno number of root space field.
> + * @param part JSON path lexeme to represent in field tree.
> + * @retval -1 on error.
> + * @retval 0 on success.
> + */
> +static int
> +json_field_tree_append(struct tuple_field **field_subtree, uint32_t fieldno,
> +		       struct json_path_node *part)
> +{
> +	enum field_type type;
> +	struct tuple_field *field = *field_subtree;
> +	switch (part->type) {
> +	case JSON_PATH_NUM: {
> +		type = FIELD_TYPE_ARRAY;
> +		if (field->type != FIELD_TYPE_ANY && field->type != type)
> +			goto error_type_mistmatch> +		field->type = type;
> +		/* Create or resize field array if required. */
> +		if (field->array == NULL || part->num >= field->array_size) {
> +			struct tuple_field **array =
> +				realloc(field->array,
> +					part->num * sizeof(struct tuple_field *));

18. Out of 80. Use sizeof(array[0]). The same below.

> +			if (array == NULL) {
> +				diag_set(OutOfMemory,
> +					sizeof(struct tuple_field *), "realloc",
> +					"array");

19. Size here is not correct. You forgot about * part->num.

> +				return -1;
> +			}
> +			if (field->array == NULL) {
> +				memset(array, 0, part->num *
> +					sizeof(struct tuple_field *));
> +			} else {
> +				memset(&array[field->array_size], 0,
> +				       (part->num - field->array_size) *
> +				       sizeof(struct tuple_field *));
> +			}

20. Why are these cases different? If I substitute field->array_size
(which is true when field->array == NULL) with 0, I will match the
first branch. It is not?

> +			field->array = array;
> +			field->array_size = part->num;
> +		}
> +		/* Record already exists. No actions required */

21. I think, it is rather 'else' branch. Evidently, the condition
is unreachable, if the field was just created in the code above.

> +		if (field->array[part->num - TUPLE_INDEX_BASE] != NULL) {
> +			*field_subtree =
> +				field->array[part->num - TUPLE_INDEX_BASE];
> +			return 0;
> +		}
> +		break;
> +	}
> +	case JSON_PATH_STR: {
> +		type = FIELD_TYPE_MAP;
> +		if (field->type != FIELD_TYPE_ANY && field->type != type)
> +			goto error_type_mistmatch;
> +		field->type = type;
> +		if (field->map == NULL &&
> +		    json_path_hash_create(&field->map, 1) != 0)
> +			return -1;

22. If field->map == NULL then sure you do not
need to check node existence below. Please,
split those two cases into 2 branches.

> +		struct mh_strnptr_node_t *node =
> +			json_path_hash_get(field->map, part->str, part->len,
> +					   mh_strn_hash(part->str, part->len));
> +		if (node != NULL) {
> +			*field_subtree = node->val;
> +			return 0;
> +		}
> +		break;
> +	}
> +	default:
> +		unreachable();
> +	}
> +
> +	/* Construct and insert a new record. */
> +	struct tuple_field *new_field = malloc(sizeof(struct tuple_field));
> +	if (new_field == NULL) {
> +		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
> +			"new_field");
> +		return -1;
> +	}
> +	*new_field = tuple_field_default;
> +	if (field->type == FIELD_TYPE_MAP) {
> +		if (json_path_hash_insert(field->map, part->str, part->len,
> +					  new_field) != 0) {
> +			free(new_field);
> +			return -1;
> +		}
> +	} else if (field->type == FIELD_TYPE_ARRAY) {
> +		field->array[part->num - TUPLE_INDEX_BASE] = new_field;
> +	}
> +	*field_subtree = new_field;
> +
> +	return 0;
> +
> +error_type_mistmatch:
> +	diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH,
> +		tt_sprintf("%d", fieldno + TUPLE_INDEX_BASE),
> +		field_type_strs[type], field_type_strs[field->type]);
> +	return -1;
> +}
> +
> +/**
> + * Delete @field_subtree object.
> + * @param field_subtree to delete.
> + */
> +static void
> +json_field_tree_delete(struct tuple_field *field_subtree)
> +{
> +	if (field_subtree->type == FIELD_TYPE_MAP &&
> +	    field_subtree->map != NULL) {
> +		mh_int_t i;
> +		mh_foreach(field_subtree->map, i) {
> +			struct tuple_field *field =
> +				mh_strnptr_node(field_subtree->map, i)->val;
> +			if (field == NULL)
> +				continue;

23. How can it be NULL?

> +			json_field_tree_delete(field);
> +			free(field);
> +		}
> +		json_path_hash_delete(field_subtree->map);
> +	} else if (field_subtree->type == FIELD_TYPE_ARRAY &&
> +		   field_subtree->array != NULL) {
> +		for (uint32_t i = 0; i < field_subtree->array_size; i++) {
> +			struct tuple_field *field = field_subtree->array[i];
> +			if (field == NULL)
> +				continue;
> +			json_field_tree_delete(field_subtree->array[i]);
> +			free(field_subtree->array[i]);
> +		}
> +		free(field_subtree->array);
> +	}
> +}
> +
> +int
> +json_field_tree_exec_routine(const struct tuple_field *field, uint32_t idx,
> +			     const char *tuple, const char *offset,
> +			     json_field_tree_routine routine, void *routine_ctx)

24. I got the idea and I like it - have a walker applicable
for any puprose. But it calls a function by pointer on a hot
path - initialization of a tuple field map + validation.

Lets make it macros. I know, that a macros can not be recursive,
but we can write a generator of such walker. A macros taking a
function and generating new json_field_tree_exec_routine.

For example:
#define TUPLE_FIELD_TREE_WALKER(name, function) \
// json_field_tree_exec_routine declaration and code
// under the specified name.

> +{
> +	int rc = 0;
> +	if (field->type == FIELD_TYPE_MAP) {
> +		mh_int_t i;
> +		mh_foreach(field->map, i) {
> +			struct mh_strnptr_node_t *node =
> +				mh_strnptr_node(field->map, i);
> +			const char *raw = offset;
> +			if (tuple_field_go_to_key(&raw, node->str,
> +						  node->len) != 0) {
> +				diag_set(ClientError,
> +					 ER_DATA_MISMATCH_INDEX_PART);
> +				return -1;
> +			}
> +			if (json_field_tree_exec_routine(node->val, idx,
> +							 tuple, raw, routine,
> +							 routine_ctx) != 0)
> +				return -1;
> +		}> +	} else if (field->type == FIELD_TYPE_ARRAY) {
> +		assert(mp_typeof(*offset) == MP_ARRAY);
> +		uint32_t count = mp_decode_array(&offset);
> +		if (count < field->array_size) {
> +			diag_set(ClientError, ER_DATA_MISMATCH_INDEX_PART);
> +			return -1;
> +		}
> +		for (uint32_t i = 0; i < field->array_size;
> +		     i++, mp_next(&offset)) {
> +			if (field->array[i] == NULL)
> +				continue;
> +			if (json_field_tree_exec_routine(field->array[i], idx,
> +							 tuple, offset, routine,
> +							 routine_ctx) != 0)
> +				return -1;
> +		}

25. Talking about one of my first comments (how to iterate over
the field only once when possible): here, for array, you can
merely continue iteration and mp_next calling behind field->array_size
up to count.

For map the things are more complex. And you can apply the method
I have described on the previous review. You learn map size and
on each tuple_field_go_to_key remember maximal number and position
of the decoded key-value pair. When the iteration is finished,
you decode the tail starting from the maximal decoded key-value
pair until the end. In such a case the tail is decoded only once
always.

> +	} else {
> +		rc = routine(field, idx, tuple, offset, routine_ctx);
> +	}
> +	return rc;
> +}
> +
> +/**
> + * Add new JSON @path to @format.
> + * @param format to modify.
> + * @param path string to add.
> + * @param path_len length of @path.
> + * @param field_type type of field by @path.

26. No such parameter 'field_type'. Only 'type'.

> + * @param[out] leaf_field pointer to leaf field.
> + * @retval -1 on error.
> + * @retval 0 on success.
> + */
> +static int
> +tuple_format_add_json_path(struct tuple_format *format, const char *path,
> +			   uint32_t path_len, enum field_type type,
> +			   struct tuple_field **leaf_field)
> +{
> +	assert(format->path_hash != NULL);
> +	/*
> +	 * Get root field by index.
> +	 * Path is specified in canonical form: [i]...
> +	 */
> +	int rc = 0;

27. Do not use SQLite style of variables declaration. It
can be declared and assigned at the same time below.

> +	struct json_path_parser parser;
> +	struct json_path_node node;
> +	json_path_parser_create(&parser, path, path_len);
> +	rc = json_path_next(&parser, &node);
> +	assert(rc == 0 && node.type == JSON_PATH_NUM);
> +	assert(node.num < format->field_count + 1);
> +
> +	/* Test if path is already registered. */
> +	struct mh_strnptr_node_t *leaf_node = NULL;
> +	uint32_t hash = mh_strn_hash(path, path_len);
> +	if ((leaf_node = json_path_hash_get(format->path_hash, path,
> +					    path_len, hash)) != NULL) {
> +		struct tuple_field *field = leaf_node->val;
> +		if (field->type != type) {
> +			const char *err =
> +				tt_sprintf("JSON path '%.*s' has been already "
> +					   "constructed for '%s' leaf record",
> +					   path_len, path,
> +					   field_type_strs[field->type]);

28. I do not see a test on this. Please, use gcov to be sure that
test coverage of all the new code is 100% (except OOM errors).

29. What about nullability mismatch?

> +			diag_set(ClientError,  ER_WRONG_INDEX_OPTIONS,
> +				node.num, err);
> +			return -1;
> +		}
> +		*leaf_field = field;
> +		return 0;
> +	}
> +
> +	/* Build data path tree. */
> +	uint32_t root_fieldno = node.num - TUPLE_INDEX_BASE;
> +	struct tuple_field *field = &format->fields[root_fieldno];
> +	while ((rc = json_path_next(&parser, &node)) == 0 &&
> +		node.type != JSON_PATH_END) {
> +		if (json_field_tree_append(&field, root_fieldno, &node) != 0)
> +			return -1;
> +	}
> +	assert(rc == 0 && node.type == JSON_PATH_END);
> +
> +	/* Leaf record is a new object as JSON path unique. */
> +	field->type = type;
> +	if (json_path_hash_insert(format->path_hash, path, path_len,
> +				  field) != 0)
> +		return -1;
> +
> +	*leaf_field = field;
> +	return 0;

30. Please. consider this:

  	field->type = type;
-	if (json_path_hash_insert(format->path_hash, path, path_len,
-				  field) != 0)
-		return -1;
-
  	*leaf_field = field;
-	return 0;
+	return json_path_hash_insert(format->path_hash, path, path_len, field);
  }

> +}
> +
>   /**
>    * Extract all available type info from keys and field
>    * definitions.> @@ -101,10 +434,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
>   			 * used in tuple_format.
>   			 */
>   			if (field_type1_contains_type2(field->type,
> -						       part->type)) {
> +						       part->type) &&
> +			    part->path == NULL) {
>   				field->type = part->type;
>   			} else if (! field_type1_contains_type2(part->type,
> -								field->type)) {
> +								field->type) &&
> +				   part->path == NULL) {
>   				const char *name;
>   				int fieldno = part->fieldno + TUPLE_INDEX_BASE;
>   				if (part->fieldno >= field_count) {
> @@ -131,9 +466,22 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
>   			 * First field is always simply accessible,
>   			 * so we don't store an offset for it.
>   			 */
> -			if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
> +			if (part->path != NULL) {
> +				assert(is_sequential == false);
> +				memcpy(data, part->path, part->path_len);
> +				data[part->path_len] = '\0';
> +				struct tuple_field *leaf = NULL;
> +				if (tuple_format_add_json_path(format, data,
> +							       part->path_len,
> +							       part->type,
> +							       &leaf) != 0)
> +					return -1;
> +				assert(leaf != NULL);
> +				if (leaf->offset_slot == TUPLE_OFFSET_SLOT_NIL)
> +					leaf->offset_slot = --current_slot;
> +				data += part->path_len + 1;
> +			} else if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
>   			    is_sequential == false && part->fieldno > 0) {
> -
>   				field->offset_slot = --current_slot;
>   			}

31. The aforementioned two hunks are complete mess. Please,
move the 'is_key_part = true' before them. Then after it at
first check for 'if (part->path != NULL) { ...; continue; }'.
Then remove checks for 'path == NULL'.

>   		}
> @@ -201,20 +549,26 @@ static struct tuple_format *
>   tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
>   		   uint32_t space_field_count, struct tuple_dictionary *dict)
>   {
> +	size_t extra_size = 0;
>   	uint32_t index_field_count = 0;
> +	uint32_t json_path_count = 0;
>   	/* find max max field no */
>   	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
>   		const struct key_def *key_def = keys[key_no];
>   		const struct key_part *part = key_def->parts;
>   		const struct key_part *pend = part + key_def->part_count;
>   		for (; part < pend; part++) {
> +			if (part->path != NULL) {
> +				json_path_count++;
> +				extra_size += part->path_len + 1;

32. When some indexes has common fields, you count
and allocate their paths twice.

> +			}
>   			index_field_count = MAX(index_field_count,
>   						part->fieldno + 1);
>   		}
>   	}
>   	uint32_t field_count = MAX(space_field_count, index_field_count);
>   	uint32_t total = sizeof(struct tuple_format) +
> -			 field_count * sizeof(struct tuple_field);
> +			 field_count * sizeof(struct tuple_field) + extra_size;
>   
>   	struct tuple_format *format = (struct tuple_format *) malloc(total);
>   	if (format == NULL) {
> @@ -335,21 +697,75 @@ tuple_format_dup(struct tuple_format *src)
>   {
>   	uint32_t total = sizeof(struct tuple_format) +
>   			 src->field_count * sizeof(struct tuple_field);
> +	if (src->path_hash != NULL) {
> +		mh_int_t i;
> +		mh_foreach(src->path_hash, i)
> +			total += mh_strnptr_node(src->path_hash, i)->len + 1;
> +	}
>   	struct tuple_format *format = (struct tuple_format *) malloc(total);
>   	if (format == NULL) {
>   		diag_set(OutOfMemory, total, "malloc", "tuple format");
>   		return NULL;
>   	}
>   	memcpy(format, src, total);
> +
> +	/* Fill with NULLs for normal destruction on error. */
> +	format->path_hash = NULL;
> +	for (uint32_t i = 0; i < format->field_count; i++) {
> +		format->fields[i].array = NULL;
> +		format->fields[i].array_size = 0;

33. Just memset them.

> +	}
> +	if (src->path_hash != NULL) {
> +		mh_int_t i;
> +		if (json_path_hash_create(&format->path_hash,
> +					  mh_size(src->path_hash)) != 0)
> +			goto error;
> +		mh_foreach(src->path_hash, i) {
> +			struct mh_strnptr_node_t *node =
> +				mh_strnptr_node(src->path_hash, i);
> +			/* Path data has been already copied. */
> +			char *path = (char *)format + (node->str - (char *)src);
> +			/* Store source leaf field offset_slot.  */
> +			struct tuple_field *leaf_field = node->val;
> +			int32_t offset_slot = leaf_field->offset_slot;
> +			if (tuple_format_add_json_path(format, path, node->len,
> +						       leaf_field->type,
> +						       &leaf_field) != 0)
> +				goto error;
> +			/* Store offset_slot in a new leaf record. */
> +			assert(leaf_field != NULL);
> +			leaf_field->offset_slot = offset_slot;
> +		}
> +	}
>   	tuple_dictionary_ref(format->dict);
>   	format->id = FORMAT_ID_NIL;
>   	format->refs = 0;
> -	if (tuple_format_register(format) != 0) {
> -		tuple_format_destroy(format);
> -		free(format);
> -		return NULL;
> -	}
> +	if (tuple_format_register(format) != 0)
> +		goto error;
>   	return format;

34. Consider this:

         format->id = FORMAT_ID_NIL;
         format->refs = 0;
-       if (tuple_format_register(format) != 0)
-               goto error;
-       return format;
+       if (tuple_format_register(format) == 0)
+               return format;
  error:
         tuple_format_destroy(format);


> +error:
> +	tuple_format_destroy(format);
> +	free(format);
> +	return NULL;
> +}
> @@ -378,18 +794,14 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
>   		return -1;
>   	}
>   
> -	/* first field is simply accessible, so we do not store offset to it */
> -	enum mp_type mp_type = mp_typeof(*pos);
> +	/*
> +	 * First field is simply accessible, store offset to it
> +	 * only for JSON path.
> +	 */
> +	uint32_t i = 0;
> +	enum mp_type mp_type;
>   	const struct tuple_field *field = &format->fields[0];
> -	if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> -				 TUPLE_INDEX_BASE, field->is_nullable))
> -		return -1;
> -	mp_next(&pos);
> -	/* other fields...*/
> -	++field;
> -	uint32_t i = 1;
> -	uint32_t defined_field_count = MIN(field_count, format->field_count);
> -	if (field_count < format->index_field_count) {
> +	if (field_count < format->index_field_count || field->map != NULL) {

35. Why here do you check for map != NULL but in all other places
for array != NULL?

>   		/*
>   		 * Nullify field map to be able to detect by 0,
>   		 * which key fields are absent in tuple_field().
> @@ -397,6 +809,16 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
>   		memset((char *)field_map - format->field_map_size, 0,
>   		       format->field_map_size);
>   	}
> +	if (field->map == NULL) {
> +		mp_type = mp_typeof(*pos);
> +		if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> +					 TUPLE_INDEX_BASE, field->is_nullable))
> +			return -1;
> +		mp_next(&pos);
> +		++field;
> +		++i;
> +	}
> +	uint32_t defined_field_count = MIN(field_count, format->field_count);
>   	for (; i < defined_field_count; ++i, ++field) {
>   		mp_type = mp_typeof(*pos);
>   		if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> @@ -407,6 +829,14 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
>   			field_map[field->offset_slot] =
>   				(uint32_t) (pos - tuple);
>   		}
> +		if (field->map != NULL) {

36. If this is true, then the previous 'if' is unreachable
(field->offset_slot != TUPLE_OFFSET_SLOT_NIL). Complex field
can not have an offset slot. So this is 'else' branch, I guess.

> +			assert(field->array != NULL);
> +			json_field_tree_routine func =
> +				tuple_init_json_field_map_routine;
> +			if (json_field_tree_exec_routine(field, i, tuple, pos,
> +							 func, field_map) != 0)
> +				return -1;
> +		}
>   		mp_next(&pos);

37. After you fix my other comments, this mp_next should
not be called, if json tree walker was used.

>   	}
>   	return 0;
> @@ -467,15 +897,60 @@ box_tuple_format_unref(box_tuple_format_t *format)
>   	tuple_format_unref(format);
>   }
>   
> -/**
> - * Propagate @a field to MessagePack(field)[index].
> - * @param[in][out] field Field to propagate.
> - * @param index 1-based index to propagate to.
> - *
> - * @retval  0 Success, the index was found.
> - * @retval -1 Not found.
> - */
> -static inline int
> +const char *
> +tuple_field_by_part(const struct tuple_format *format, const char *data,
> +		    const uint32_t *field_map, struct key_part *part)
> +{
> +	const char *raw = NULL;
> +	uint32_t field_no = part->fieldno;

38. Just inline it.

> +	struct mh_strnptr_node_t *node;
> +	if (unlikely(part->path == NULL)) {

39. Inversely, it is likely.

> +		raw = tuple_field_raw(format, data, field_map, field_no);

40. Do here 'return' and reduce indentation level below.

> +	} else {
> +		int32_t offset_slot = TUPLE_OFFSET_SLOT_NIL;
> +		if (part->format_epoch == format->epoch &&
> +		    -part->slot_cache * sizeof(uint32_t) <=
> +		    format->field_map_size) {

41. As I understand, when a part is JSON and its epoch
matches the format, it always has a valid slot_cache and you
do not need to check for '< field_map_size'.

Moreover, I do not understand why do you check this slot
to be not SLOT_NIL and has not zero value in field_map. It
is 100% valid and you already now can do
'return data + field_map[part->slot_cache]', it is not?

> +			offset_slot = part->slot_cache;
> +		} else if (format->path_hash != NULL &&
> +			   (node = json_path_hash_get(format->path_hash,
> +						      part->path,
> +						      part->path_len,
> +						      part->path_hash)) !=
> +						      NULL) {
> +			assert(node != NULL);
> +			struct tuple_field *field = node->val;
> +			assert(field != NULL);
> +			offset_slot = field->offset_slot;

42. The same here. When an offset slot is stored in a field
from format->path_hash, it always has not zero field_map
value and not SLOT_NIL offset. So the branch below is actually
'else', I guess.

> +		}
> +		if (unlikely(offset_slot == TUPLE_OFFSET_SLOT_NIL ||
> +			     field_map[offset_slot] == 0)) {
> +			/*
> +			 * Legacy tuple having no field map
> +			 * for JSON index.
> +			 */
> +			uint32_t path_hash =
> +				field_name_hash(part->path, part->path_len);
> +			if (tuple_field_raw_by_path(format, data, field_map,
> +						    part->path, part->path_len,
> +						    path_hash, &raw) != 0)
> +				raw = NULL;
> +		} else {
> +			assert(offset_slot < 0);
> +			assert(-offset_slot * sizeof(uint32_t) <=
> +			       format->field_map_size);
> +			/* Cache offset_slot if required. */
> +			if (part->format_epoch < format->epoch) {
> +				part->slot_cache = offset_slot;
> +				part->format_epoch = format->epoch;
> +			}
> +			raw = data + field_map[offset_slot];
> +		}
> +	}
> +	return raw;
> +}
> +
> +int
>   tuple_field_go_to_index(const char **field, uint64_t index)
>   {
>   	enum mp_type type = mp_typeof(**field);
> @@ -547,21 +1013,32 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
>   	return -1;
>   }
>   
> -const char *
> -tuple_field_by_part(const struct tuple_format *format, const char *data,
> -		    const uint32_t *field_map, struct key_part *part)
> -{
> -	return tuple_field_raw(format, data, field_map, part->fieldno);
> -}

43. Please, move this function above within its original commit.

> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index a989917..1348f0d 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -108,6 +108,18 @@ struct tuple_field {
>   	bool is_key_part;
>   	/** True, if a field can store NULL. */
>   	bool is_nullable;
> +	/** Tree child records. */
> +	union {
> +		/** Array of fields. */
> +		struct  {
> +			struct tuple_field **array;
> +			uint32_t array_size;
> +		};
> +		/** Hashtable: path -> tuple_field. */
> +		struct mh_strnptr_t *map;
> +		/** Leaf argument for tree-walker routine. */
> +		void *arg;

44. Actually, it is not used in tree-walker.

> +	};

45. I see, how do you struggle with checking that the
field has no childs, so I propose to add a member in the
union 'void *child'. And check it for NULL. It is better
than check explicitly array or map in places where it
does not matter.

>   };
>   
>   /**
> diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
> index ae0bb8e..5f3357a 100644
> --- a/src/box/tuple_hash.cc
> +++ b/src/box/tuple_hash.cc
> @@ -228,7 +228,7 @@ key_hash_slowpath(const char *key, const struct key_def *key_def);
>   
>   void
>   tuple_hash_func_set(struct key_def *key_def) {
> -	if (key_def->is_nullable)
> +	if (key_def->is_nullable || key_def->has_json_paths)
>   		goto slowpath;
>   	/*
>   	 * Check that key_def defines sequential a key without holes
> @@ -262,10 +262,17 @@ tuple_hash_func_set(struct key_def *key_def) {
>   	}
>   
>   slowpath:
> -	if (key_def->has_optional_parts)
> -		key_def->tuple_hash = tuple_hash_slowpath<true, true>;
> -	else
> -		key_def->tuple_hash = tuple_hash_slowpath<false, true>;
> +	if (key_def->has_optional_parts) {
> +		if (key_def->has_json_paths)
> +			key_def->tuple_hash = tuple_hash_slowpath<true, false>;
> +		else
> +			key_def->tuple_hash = tuple_hash_slowpath<true, true>;
> +	} else {
> +		if (key_def->has_json_paths)
> +			key_def->tuple_hash = tuple_hash_slowpath<false, false>;
> +		else
> +			key_def->tuple_hash = tuple_hash_slowpath<false, true>;
> +	}
>   	key_def->key_hash = key_hash_slowpath;
>   }

46. All these changes should be done in the previous commit.

>   
> diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
> index cb3c436..1bb5e22 100644
> --- a/src/box/vy_lsm.c
> +++ b/src/box/vy_lsm.c
> @@ -158,6 +159,48 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env,
>   						    NULL);
>   		if (lsm->disk_format == NULL)
>   			goto fail_format;
> +		/*
> +		 * Tuple formats should be compatible to make
> +		 * epoch-based caching work.
> +		 */
> +		int32_t min_offset_slot = 0;
> +		struct tuple_field *dst_fields = lsm->disk_format->fields;
> +		struct mh_strnptr_t *dst_ht = lsm->disk_format->path_hash;
> +		struct mh_strnptr_t *src_ht = format->path_hash;
> +		struct key_part *part = cmp_def->parts;
> +		struct key_part *part_end = part + cmp_def->part_count;
> +		for (; part < part_end; part++) {
> +			struct tuple_field *dst_field =
> +				&dst_fields[part->fieldno];
> +			struct tuple_field *src_field;
> +			if (dst_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> +				src_field = &format->fields[part->fieldno];
> +			} else if (dst_fields[part->fieldno].map != NULL) {

47. Why do you check for child != NULL instead of part->path != NULL?

> +				struct mh_strnptr_node_t *node;
> +				node = json_path_hash_get(dst_ht, part->path,
> +							  part->path_len,
> +							  part->path_hash);
> +				assert(node != NULL);
> +				dst_field = node->val;
> +				assert(dst_field != NULL);
> +
> +				node = json_path_hash_get(src_ht, part->path,
> +							  part->path_len,
> +							  part->path_hash);
> +				assert(node != NULL);
> +				src_field = node->val;
> +				assert(dst_field != NULL);
> +			} else {
> +				continue;
> +			}
> +			if (src_field->offset_slot == TUPLE_OFFSET_SLOT_NIL)
> +				continue;
> +			dst_field->offset_slot = src_field->offset_slot;
> +			min_offset_slot =
> +				MIN(src_field->offset_slot, min_offset_slot);
> +		}
> +		lsm->disk_format->field_map_size =
> +			-min_offset_slot * sizeof(uint32_t);
>   	}
>   	tuple_format_ref(lsm->disk_format);
>   
> diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
> index a4b7975..80f08bb 100644
> --- a/src/box/vy_stmt.c
> +++ b/src/box/vy_stmt.c
> @@ -321,6 +322,67 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
>   	return replace;
>   }
>   
> +static void
> +vy_stmt_msgpack_build(struct tuple_field *field, char *tuple,
> +		      uint32_t *field_map, char **offset, bool write_data)

48. Why you do not use json_field_tree_exec_routine? Because you need
to do something before and afer decoding the field? Then just implement
it as a macro as I said and add 2 parameters:
start_field_cb(field_type, field_size) and end_field_cb(field_type).
And if a _cb is not defined, its call is not generated.

> +{
> +	if (field->type == FIELD_TYPE_ARRAY) {
> +		if (write_data)
> +			*offset = mp_encode_array(*offset, field->array_size);
> +		else
> +			*offset += mp_sizeof_array(field->array_size);
> +		for (uint32_t i = 0; i < field->array_size; i++) {
> +			if (field->array[i] == NULL) {
> +				if (write_data)
> +					*offset = mp_encode_nil(*offset);
> +				else
> +					*offset += mp_sizeof_nil();
> +				continue;
> +			}
> +			vy_stmt_msgpack_build(field->array[i], tuple, field_map,
> +					      offset, write_data);
> +		}
> +		return;
> +	} else if (field->type == FIELD_TYPE_MAP) {
> +		if (write_data)
> +			*offset = mp_encode_map(*offset, mh_size(field->map));
> +		else
> +			*offset += mp_sizeof_map(mh_size(field->map));
> +		mh_int_t i;
> +		mh_foreach(field->map, i) {
> +			struct mh_strnptr_node_t *node =
> +				mh_strnptr_node(field->map, i);
> +			assert(node);
> +			if (write_data) {
> +				*offset = mp_encode_str(*offset, node->str,
> +							node->len);
> +			} else {
> +				*offset += mp_sizeof_str(node->len);
> +			}
> +			vy_stmt_msgpack_build(node->val, tuple, field_map,
> +					      offset, write_data);
> +		}
> +		return;;

49. Double ';'.

> +	}
> +
> +	struct iovec *iov = field->arg;
> +	if (iov == NULL) {
> +		if (write_data)
> +			*offset = mp_encode_nil(*offset);
> +		else
> +			*offset += mp_sizeof_nil();
> +	} else {
> +		if (write_data) {
> +			uint32_t data_offset = *offset - tuple;
> +			memcpy(*offset, iov->iov_base, iov->iov_len);
> +			field->arg = NULL;
> +			if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
> +				field_map[field->offset_slot] = data_offset;
> +		}
> +		*offset += iov->iov_len;
> +	}
> +}
> +
>   static struct tuple *
>   vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
>   			       const struct key_def *cmp_def,

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 3/5] box: introduce path field in key_part
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 3/5] box: introduce path field " Kirill Shcherbatov
@ 2018-08-22  0:26   ` Vladislav Shpilevoy
  2018-08-27  7:37     ` Kirill Shcherbatov
  0 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2018-08-22  0:26 UTC (permalink / raw)
  To: tarantool-patches, Kirill Shcherbatov

Thanks for the patch! See 17 comments below.

On 15/08/2018 15:15, Kirill Shcherbatov wrote:
> As we need to store user-defined JSON path in key_part
> and key_part_def, we have introduced path and path_len
> fields. JSON path is verified and transformed to canonical
> form on index msgpack unpack.
> Path string stored as a part of the key_def allocation:
> 
> +-------+---------+-------+---------+-------+-------+-------+
> |key_def|key_part1|  ...  |key_partN| path1 | pathK | pathN |
> +-------+---------+-------+---------+-------+-------+-------+
>            |                         ^
>            |-> path _________________|
> 
> Because of field names specified as format could be changed
> key_part path persisted in Tarantool should be always started
> with first-level field access via array index(not by name).
> 
> Part of #1012.
> ---
>   src/box/index_def.c          |  10 +-
>   src/box/key_def.c            | 276 ++++++++++++++++++++++++++++++++++++++-----
>   src/box/key_def.h            |  21 +++-
>   src/box/lua/space.cc         |   5 +
>   src/box/memtx_engine.c       |   5 +
>   src/box/schema.cc            |  12 +-
>   src/box/tuple_compare.cc     |   2 +
>   src/box/tuple_extract_key.cc |   1 +
>   src/box/tuple_hash.cc        |   1 +
>   src/box/vinyl.c              |   5 +
>   src/box/vy_log.c             |   3 +-
>   11 files changed, 300 insertions(+), 41 deletions(-)
> 
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index 8a4262b..b00e46d 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -131,9 +155,9 @@ key_def_set_cmp(struct key_def *def)
>   }
>   
>   struct key_def *
> -key_def_new(uint32_t part_count)
> +key_def_new(uint32_t part_count, size_t extra_size)

1. Please, make extra_size be also an argument of key_def_sizeof
and name it literally paths_size.

>   {
> -	size_t sz = key_def_sizeof(part_count);
> +	size_t sz = key_def_sizeof(part_count) + extra_size;
>   	/** Use calloc() to zero comparator function pointers. */
>   	struct key_def *key_def = (struct key_def *) calloc(1, sz);
>   	if (key_def == NULL) {
> @@ -241,6 +289,11 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
>   		if (part1->is_nullable != part2->is_nullable)
>   			return part1->is_nullable <
>   			       part2->is_nullable ? -1 : 1;
> +		/* Lexicographic strings order. */
> +		uint32_t len = MIN(part1->path_len, part2->path_len);
> +		int rc = 0;
> +		if ((rc = memcmp(part1->path, part2->path, len)) != 0)

2. Same as strncmp and has the same bug: if a path is prefix of
another path, then you declare them as equal, but it is wrong.

Please, write a test on it. It should alter an index, extending
path of one of its parts. In such a case it should be rebuilt,
but your patch will not do it.

> +			return rc;
>   	}
>   	return part_count1 < part_count2 ? -1 : part_count1 > part_count2;
>   }
> @@ -248,11 +301,12 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
>   void
>   key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
>   		 enum field_type type, bool is_nullable, struct coll *coll,
> -		 uint32_t coll_id)
> +		 uint32_t coll_id, char *path, uint32_t path_len)

3. Why not const char *path?

>   {
>   	assert(part_no < def->part_count);
>   	assert(type < field_type_MAX);
>   	def->is_nullable |= is_nullable;
> +	def->has_json_paths |= (path != NULL);

4. Please, be consistent in names: you already use 'is_flat'
in the previous patch, not 'has_json_paths'.

5. Why do you need additional parentheses?

>   	def->parts[part_no].is_nullable = is_nullable;
>   	def->parts[part_no].fieldno = fieldno;
>   	def->parts[part_no].type = type;
> @@ -432,10 +516,111 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
>   				     fields[part->fieldno].is_nullable :
>   				     key_part_def_default.is_nullable);
>   		part->coll_id = COLL_NONE;
> +		part->path = NULL;
>   	}
>   	return 0;
>   }
>   
> +/**
> + * Verify key_part JSON path and convert to canonical form.
> + *
> + * @param region to make allocations.
> + * @param part with path to update.
> + * @param path_extra alloated space to reuse if possible.

6. As usual. Start sentences with a capital letter, finish with the
dot.

7. Typo: 'alloated'.

> + * @param path_extra_size @path_extra size.
> + *
> + * @retval -1 on error.
> + * @retval 0 on success.
> + */
> +static int
> +key_def_normalize_json_path(struct region *region, struct key_part_def *part,
> +			    char **path_extra, uint32_t *path_extra_size)
> +{
> +	const char *err_msg = NULL;
> +
> +	uint32_t allocated_size = *path_extra_size;
> +	char *path = *path_extra;
> +
> +	uint32_t path_len = strlen(part->path);
> +	struct json_path_parser parser;
> +	struct json_path_node node;
> +	json_path_parser_create(&parser, part->path, path_len);
> +	/*
> +	 * A worst-case scenario is .a -> ["a"]
> +	 * i.e. 3*path_len + 1 is enough.

8. Your calculations are diligent, but they are unreadable, sorry.
I tried to calculate again, and found that 2.5*len is enough.
A path in the worst case looks like '.a'*. So the biggest
possible number of path parts is len(path)/len('.a') = len / 2.
A part is either in the correct format already, or requires
3 additional symbols. So the worst new len is:
(len / 2) * 3 + len = 2.5 * len
\____________/
   number of
   additional
   characters

It is not? I failed to find a counter-example.

> +	 */
> +	uint32_t new_path_size = 3 * path_len + 1;
> +	if (new_path_size >= allocated_size) {
> +		path = region_alloc(region, new_path_size);
> +		if (path == NULL) {
> +			diag_set(OutOfMemory, new_path_size,
> +				 "region_alloc", "path");
> +			return -1;
> +		}
> +		allocated_size = new_path_size;
> +	}
> +	assert(path != NULL);
> +	part->path = path;
> +	int rc = json_path_next(&parser, &node);
> +	if (rc != 0)
> +		goto error_invalid_json;
> +	if (node.type != JSON_PATH_NUM) {
> +		diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
> +			 part->fieldno,
> +			 "invalid JSON path: first part should "
> +			 "be defined as array index");
> +		return -1;
> +	}
> +	if (node.num - TUPLE_INDEX_BASE != part->fieldno) {
> +		diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
> +			 part->fieldno,
> +			 "invalid JSON path: first part refers "
> +			 "to invalid field");
> +		return -1;
> +	}
> +	uint32_t lexemes = 0;
> +	do {
> +		if (node.type == JSON_PATH_NUM) {
> +			path += sprintf(path, "[%u]", (uint32_t) node.num);

9. As I remember, we have decided to do not store
first path part in the string, because we have it in
part->fieldno already decoded. Please, cut the first
path part off. Also, it is faster when we need to
follow the path along tuple field tree in tuple_format,
because we do not need useless decoding of the first
part.

> +		} else if (node.type == JSON_PATH_STR) {
> +			path += sprintf(path, "[\"%.*s\"]", node.len, node.str);
> +		} else {
> +			unreachable();
> +		}
> +		lexemes++;
> +	} while ((rc = json_path_next(&parser, &node)) == 0 &&
> +		 node.type != JSON_PATH_END);
> +	if (rc != 0 || node.type != JSON_PATH_END)
> +		goto error_invalid_json;
> +	if (lexemes == 1) {
> +		/* JSON index is useless. */
> +		path = part->path;
> +		part->path = NULL;
> +	} else {
> +		/* Skip terminating zero. */
> +		path++;
> +		/* Account constructed string size. */
> +		allocated_size -= path - part->path;
> +	}
> +	/* Going to try to reuse extra allocation next time. */
> +	if ((uint32_t)parser.src_len > path_len) {

10. Here parser.src_len is always equal to path_len - it was
initialized by this value at the beginning of the function.

11. What is more, I do not understand, how can it work. When
you have two key_part_defs, first takes the region memory,
and the second can take exactly the same memory and rewrite it,
if its path len < than the former's. It was not clear on the
previous implementation, but now it is evident. So all
key_part_defs has the same path, if they are in descending
order of their path lens.

> +		*path_extra = path;
> +		*path_extra_size = allocated_size;
> +	} else {
> +		*path_extra = (char *)parser.src;
> +		*path_extra_size = parser.src_len;
> +	}
> +	return 0;
> +
> +error_invalid_json:
> +	err_msg = tt_sprintf("invalid JSON path '%.*s': path has invalid "
> +			     "structure (error at position %d)", parser.src_len,
> +			     parser.src, parser.symbol_count);

12. Symbol_count is not the same as error position. Please, use
json_path_next result as the position.

> +	diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
> +		 part->fieldno + TUPLE_INDEX_BASE, err_msg);
> +	return -1;
> +}
> +
>   int
>   key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
>   		     const char **data, const struct field_def *fields,
> @@ -533,18 +726,29 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
>   	 * Find and remove part duplicates, i.e. parts counted
>   	 * twice since they are present in both key defs.
>   	 */
> -	const struct key_part *part = second->parts;
> -	const struct key_part *end = part + second->part_count;
> +	size_t sz = 0;
> +	const struct key_part *part = first->parts;
> +	const struct key_part *end = part + first->part_count;
> +	for (; part != end; part++) {
> +		if (part->path != NULL)
> +			sz += part->path_len + 1;
> +	}
> +	part = second->parts;
> +	end = part + second->part_count;
>   	for (; part != end; part++) {
> -		if (key_def_find(first, part->fieldno))
> +		const struct key_part *duplicate =
> +			key_def_find(first, part->fieldno);

13. Because key_def_find does not care about paths yourself,
now key_def_contains() is broken, and Vinyl is too as a consequence.
Please, write a test on it. Now it is possible to insert duplicates
into unique Vinyl index, if a secondary index contains field numbers
of the primary one, but with different paths. It is wrong.

> +		if (duplicate != NULL &&
> +		    part->path_len == duplicate->path_len &&
> +		    memcmp(part->path, duplicate->path, part->path_len) == 0)
>   			--new_part_count;
> +		else if (part->path != NULL)
> +			sz += part->path_len + 1;
>   	}
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index 42c054c..b6d6259 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -54,6 +54,8 @@ struct key_part_def {
>   	uint32_t coll_id;
>   	/** True if a key part can store NULLs. */
>   	bool is_nullable;
> +	/** JSON path to data. */

14. Please, note that it does not include the first
path part, that is decoded already into fieldno.

> +	char *path;
>   };
>   
>   /**
> @@ -78,6 +80,10 @@ struct key_part {
>   	uint64_t format_epoch;
>   	/** Cache for corresponding tuple_format slot_offset. */
>   	int32_t slot_cache;
> +	/** JSON path to data. */
> +	char *path;
> +	/** JSON path length. */
> +	uint32_t path_len;

15. The same.

>   };
>   
>   struct key_def;
> diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
> index d95ee8d..0301186 100644
> --- a/src/box/tuple_extract_key.cc
> +++ b/src/box/tuple_extract_key.cc
> @@ -96,6 +96,7 @@ static char *
>   tuple_extract_key_slowpath(const struct tuple *tuple,
>   			   const struct key_def *key_def, uint32_t *key_size)
>   {
> +	assert(!is_flat == key_def->has_json_paths);
>   	assert(!has_optional_parts || key_def->is_nullable);
>   	assert(has_optional_parts == key_def->has_optional_parts);
>   	assert(contains_sequential_parts ==

16. As I see, you did update neither key_def_contains_sequential_parts
nor points of its result usage (if (! contains_sequential_parts) ... ).
So in your implementation such parts as '[1][2][3]' and '[2][4][5]' are
considered to be sequential, but it is wrong obviously. Their MessagePack's
do not follow each other. After I reviewed the whole patchset I see that
you updated these functions in the next commits, but please, do it in
this commit.

17. Why this patch has no tests? You already now can test paths
normalization and the first path part cut into fieldno.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/5] box: introduce slot_cache in key_part
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 2/5] box: introduce slot_cache in key_part Kirill Shcherbatov
@ 2018-08-22  0:27   ` Vladislav Shpilevoy
  2018-08-27  7:37     ` Kirill Shcherbatov
  0 siblings, 1 reply; 16+ messages in thread
From: Vladislav Shpilevoy @ 2018-08-22  0:27 UTC (permalink / raw)
  To: tarantool-patches, Kirill Shcherbatov

Hi! Thanks for the fixes! See 15 comments below.

On 15/08/2018 15:15, Kirill Shcherbatov wrote:
> Same key_part could be used in different formats multiple
> times, so different field->offset_slot would be allocated.
> In most scenarios we work with series of tuples of same
> format, and (in general) format lookup for field would be
> expensive operation for JSON-paths defined in key_part.
> New slot_cache field in key_part structure and epoch-based
> mechanism to validate it actuality should be effective
> approach to improve performance.
> New routine tuple_field_by_part use tuple and key_part
> to access field that allows to rework and speedup all
> scenarios of access tuple data by index.
> This also allows to work with JSON-path key_parts later.
> 
> Part of #1012.
> ---
>   src/box/alter.cc             |  4 ++
>   src/box/key_def.c            |  2 +
>   src/box/key_def.h            |  4 ++
>   src/box/memtx_bitset.c       |  8 +++-
>   src/box/memtx_rtree.c        |  6 ++-
>   src/box/space.c              | 25 +++++++++++++
>   src/box/space.h              | 10 +++++
>   src/box/tuple_compare.cc     | 88 ++++++++++++++++++++++++++++++++------------
>   src/box/tuple_extract_key.cc | 39 ++++++++++++++------
>   src/box/tuple_format.c       | 12 ++++++
>   src/box/tuple_format.h       | 19 ++++++++++
>   src/box/tuple_hash.cc        | 49 +++++++++++++++++++-----
>   src/box/vy_stmt.h            |  6 ++-
>   13 files changed, 224 insertions(+), 48 deletions(-)
> 
> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 3007a13..e1a0d9c 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -887,6 +887,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
>   	alter->new_space->sequence = alter->old_space->sequence;
>   	memcpy(alter->new_space->access, alter->old_space->access,
>   	       sizeof(alter->old_space->access));
> +	space_format_update_epoch(alter->new_space,
> +				  alter->old_space->format != NULL ?
> +				  alter->old_space->format->epoch : 0,
> +				  &alter->key_list);
>   

1. As I remember, we already had this talk during SQL triggers
development, but lets repeat. Please, do not put any operation
specific things onto the main path. Either use ModifySpace,
ModifyIndex, RebuildIndex, CreateIndex, DropIndex etc classes,
or change ModifySpace so that it can be called from _index
trigger, or introduce ModifySpaceFormat or something.
Alter_space_do should not be changed.


>   	/*
>   	 * Build new indexes, check if tuples conform to
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index ee09dc9..8a4262b 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -258,6 +258,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
>   	def->parts[part_no].type = type;
>   	def->parts[part_no].coll = coll;
>   	def->parts[part_no].coll_id = coll_id;
> +	def->parts[part_no].slot_cache = TUPLE_OFFSET_SLOT_NIL;
> +	def->parts[part_no].format_epoch = 0;
>   	column_mask_set_fieldno(&def->column_mask, fieldno);
>   	/**

2. I do not see where do you update cmp_def in this file. Exactly
cmp_def is used in comparators, not key_def.

>   	 * When all parts are set, initialize the tuple
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index aecbe03..42c054c 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -74,6 +74,10 @@ struct key_part {
>   	struct coll *coll;
>   	/** True if a part can store NULLs. */
>   	bool is_nullable;
> +	/** Format epoch for slot_cache. */
> +	uint64_t format_epoch;
> +	/** Cache for corresponding tuple_format slot_offset. */
> +	int32_t slot_cache;

3. From these comments and names it is very hard to understand
anything.
3.1. When I grep 'slot_offset' I see the only match here. For a
novice it would be difficult to find its origin.
3.2. I believe key_part should not depend on a format even in
member names. Please, rename 'format_epoch' to offset_slot_epoch.
3.3. slot_cache -> offset_slot.
3.4. In comments, please, describe what is epoch, where are
offset_slot and epoch from? How epoch works?

>   };
>   
>   struct key_def;
> diff --git a/src/box/space.c b/src/box/space.c
> index 871cc67..c34428a 100644
> --- a/src/box/space.c
> +++ b/src/box/space.c
> @@ -226,6 +226,31 @@ space_index_key_def(struct space *space, uint32_t id)
>   }
>   
>   void
> +space_format_update_epoch(struct space *space, uint64_t last_epoch,
> +			  struct rlist *key_list)
> +{
> +	struct tuple_format *format = space->format;
> +	if (format == NULL)
> +		return;

4. This SQLite style of NULLs handling is not ok, I think. Just
do not call this function, when the format is NULL.

5. This function is a tuple format method and should not take
a space in the arguments. Please, pass the format without space
intermediation, and put it into tuple_format.c/.h.

> +	bool is_format_epoch_changed = false;
> +	struct index_def *index_def;
> +	rlist_foreach_entry(index_def, key_list, link) {
> +		struct key_part *part = index_def->key_def->parts;
> +		struct key_part *parts_end =
> +			part + index_def->key_def->part_count;
> +		for (; part < parts_end; part++) {
> +			struct tuple_field *field =
> +				&format->fields[part->fieldno];
> +			if (field->offset_slot != part->slot_cache)
> +				is_format_epoch_changed = true;
> +		}
> +	}
> +	format->epoch = last_epoch;
> +	if (is_format_epoch_changed)
> +		format->epoch++;
> +}
> +
> +void
>   generic_space_swap_index(struct space *old_space, struct space *new_space,
>   			 uint32_t old_index_id, uint32_t new_index_id)
>   {
> diff --git a/src/box/space.h b/src/box/space.h
> index 8888ec8..8d13bc8 100644
> --- a/src/box/space.h
> +++ b/src/box/space.h
> @@ -228,6 +228,16 @@ struct key_def *
>   space_index_key_def(struct space *space, uint32_t id);
>   
>   /**
> + * Setup space format epoch value.
> + * @param space to update.

6. Parameter name and description shall not be mixed.

> + * @param last_epoch last space epo

7. Typo: 'epo'.

> + * @param key_list list of index_defs.

8. List of which index_defs? Old, new? Besides,
as I see in the alter code, neither of them. Alter
can drop some index_defs from this list before you
update the space_format, for example, via
DropIndex::alter_def.

> + */
> +void
> +space_format_update_epoch(struct space *space, uint64_t last_epoch,
> +			  struct rlist *key_list);
> +
> +/**
>    * Look up the index by id.
>    */
>   static inline struct index *
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index e53afba..f07b695 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -432,9 +432,17 @@ tuple_common_key_parts(const struct tuple *tuple_a,
>   {
>   	uint32_t i;
>   	for (i = 0; i < key_def->part_count; i++) {
> -		const struct key_part *part = &key_def->parts[i];
> -		const char *field_a = tuple_field(tuple_a, part->fieldno);
> -		const char *field_b = tuple_field(tuple_b, part->fieldno);
> +		struct key_part *part = (struct key_part *)&key_def->parts[i];
> +		const char *field_a =
> +			tuple_field_by_part(tuple_format(tuple_a),
> +					    tuple_data(tuple_a),
> +					    tuple_field_map(tuple_a),
> +					    part);
> +		const char *field_b =
> +			tuple_field_by_part(tuple_format(tuple_b),
> +					    tuple_data(tuple_b),
> +					    tuple_field_map(tuple_b),
> +					    part);

9. Do not call tuple_format on each iteration. It can not change
in between. Same about tuple_data, tuple_field_map. As I said in
the previous review, these functions are not free.

>   		enum mp_type a_type = field_a != NULL ?
>   				      mp_typeof(*field_a) : MP_NIL;
>   		enum mp_type b_type = field_b != NULL ?
> @@ -498,10 +506,21 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
>   		end = part + key_def->part_count;
>   
>   	for (; part < end; part++) {
> -		field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a,
> -					  part->fieldno);
> -		field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b,
> -					  part->fieldno);
> +		if (is_flat) {
> +			field_a = tuple_field_raw(format_a, tuple_a_raw,
> +						  field_map_a,
> +						  part->fieldno);
> +			field_b = tuple_field_raw(format_b, tuple_b_raw,
> +						  field_map_b,
> +						  part->fieldno);
> +		} else {
> +			field_a = tuple_field_by_part(format_a, tuple_a_raw,
> +						      field_map_a,
> +						      (struct key_part *)part);
> +			field_b = tuple_field_by_part(format_b, tuple_b_raw,
> +						      field_map_b,
> +						      (struct key_part *)part);

10. As you see, any function retrieving tuple field from const char * has
suffix '_raw'. Please, rename tuple_field_by_part to tuple_field_by_part_raw.

> +		}
>   		assert(has_optional_parts ||
>   		       (field_a != NULL && field_b != NULL));
>   		if (! is_nullable) {
> diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
> index 880abb6..d95ee8d 100644
> --- a/src/box/tuple_extract_key.cc
> +++ b/src/box/tuple_extract_key.cc
> @@ -110,9 +110,15 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
>   
>   	/* Calculate the key size. */
>   	for (uint32_t i = 0; i < part_count; ++i) {
> -		const char *field =
> -			tuple_field_raw(format, data, field_map,
> -					key_def->parts[i].fieldno);
> +		const char *field;
> +		if (is_flat) {
> +			field = tuple_field_raw(format, data, field_map,
> +						key_def->parts[i].fieldno);
> +		} else {
> +			field = tuple_field_by_part(format, data, field_map,
> +						    (struct key_part *)
> +							&key_def->parts[i]);

10. Broken indentation. It is not a single place. Please, fix in others too.

> +		}
>   		if (has_optional_parts && field == NULL) {
>   			bsize += mp_sizeof_nil();
>   			continue;> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index c7dc48f..a989917 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -324,6 +329,20 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
>   		     const char *tuple);
>   
>   /**
> + * Get a field refereed by multipart index @def part @idx in
> + * @tuple.

11. All the tree parameter names (def, idx, tuple) do not
exist in this function.

> + *
> + * @param format tuple format
> + * @param tuple a pointer to MessagePack array
> + * @param field_map a pointer to the LAST element of field map
> + * @param part multipart index part to use.

12. Please, start a sentence with a capital letter and finish
with the dot. In other places and commits too.

> + * @retval field data if field exists or NULL
> + */
> +const char *
> +tuple_field_by_part(const struct tuple_format *format, const char *data,
> +		    const uint32_t *field_map, struct key_part *part);
> +
> +/**
>    * Get a field at the specific position in this MessagePack array.
>    * Returns a pointer to MessagePack data.
>    * @param format tuple format
> diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
> index dee9be3..272e814 100644
> --- a/src/box/tuple_hash.cc
> +++ b/src/box/tuple_hash.cc
> @@ -157,7 +157,11 @@ struct TupleHash
>   		uint32_t h = HASH_SEED;
>   		uint32_t carry = 0;
>   		uint32_t total_size = 0;
> -		const char *field = tuple_field(tuple, key_def->parts->fieldno);
> +		const char *field =
> +			tuple_field_by_part(tuple_format(tuple),
> +					    tuple_data(tuple),
> +					    tuple_field_map(tuple),
> +					    (struct key_part *)key_def->parts);

13. Why do you need this cast to struct key_part *? I removed it
and nothing changed.

>   		TupleFieldHash<TYPE, MORE_TYPES...>::
>   			hash(&field, &h, &carry, &total_size);
>   		return PMurHash32_Result(h, carry, total_size);
> @@ -312,13 +320,16 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry,
>   		    const struct tuple *tuple,
>   		    const struct key_part *part)
>   {
> -	const char *field = tuple_field(tuple, part->fieldno);
> +	const char *field =
> +		tuple_field_by_part(tuple_format(tuple), tuple_data(tuple),
> +				    tuple_field_map(tuple),
> +				    (struct key_part *)part);

14. Why do you need non-const struct key_part in
tuple_field_by_part?

>   	if (field == NULL)
>   		return tuple_hash_null(ph1, pcarry);
>   	return tuple_hash_field(ph1, pcarry, &field, part->coll);
>   }
>   
> -template <bool has_optional_parts>
> +template <bool has_optional_parts, bool is_flat>
>   uint32_t
>   tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
>   {
> @@ -327,7 +338,15 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
>   	uint32_t carry = 0;
>   	uint32_t total_size = 0;
>   	uint32_t prev_fieldno = key_def->parts[0].fieldno;
> -	const char *field = tuple_field(tuple, key_def->parts[0].fieldno);
> +	const char *field;
> +	if (is_flat) {
> +		field = tuple_field(tuple, prev_fieldno);
> +	} else {
> +		field = tuple_field_by_part(tuple_format(tuple),
> +					    tuple_data(tuple),
> +					    tuple_field_map(tuple),
> +					    (struct key_part *)&key_def->parts);
> +	}
>   	const char *end = (char *)tuple + tuple_size(tuple);
>   	if (has_optional_parts && field == NULL) {
>   		total_size += tuple_hash_null(&h, &carry);
> @@ -341,7 +360,19 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def)
>   		 * need of tuple_field
>   		 */
>   		if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) {
> -			field = tuple_field(tuple, key_def->parts[part_id].fieldno);
> +			if (is_flat) {
> +				field = tuple_field(tuple,
> +						    key_def->parts[part_id].
> +						    fieldno);
> +			} else {
> +				struct key_part *part =
> +					(struct key_part *)
> +					&key_def->parts[part_id];
> +				field = tuple_field_by_part(tuple_format(tuple),
> +							    tuple_data(tuple),
> +							    tuple_field_map(tuple),
> +							    part);
> +			}
>   		}

15. Please, do not call tuple_format, tuple_data and tuple_field_map
multiple times.

>   		if (has_optional_parts && (field == NULL || field >= end)) {
>   			total_size += tuple_hash_null(&h, &carry);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 5/5] box: specify indexes in user-friendly form
  2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-08-22  0:28   ` Vladislav Shpilevoy
  1 sibling, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2018-08-22  0:28 UTC (permalink / raw)
  To: tarantool-patches, Kirill Shcherbatov

What a commit closes the issue? I see that all of them are
'part fo 1012', but what is a final one?

On 15/08/2018 15:15, Kirill Shcherbatov wrote:
> Since now it is possible to create indexes by JSON-path
> using field names specified in format.
> 
> @TarantoolBot document
> Title: Indexes by JSON path
> Sometimes field data could have complex document structure.
> When this structure is consistent across whole document,
> you are able to create an index by JSON path.
> 
> Example:
> s:create_index('json_index',
>                 {parts = {{'data.FIO["fname"]', 'str'}}})
> 
> Part of #1012.
> ---
>   src/box/lua/schema.lua      | 58 +++++++++++++++++++++++++++++++++++++++------
>   test/engine/iterator.result |  2 +-
>   test/engine/tuple.result    | 36 ++++++++++++++++++++++++++++
>   test/engine/tuple.test.lua  | 10 ++++++++
>   4 files changed, 98 insertions(+), 8 deletions(-)
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 5/5] box: specify indexes in user-friendly form
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-08-27  7:37     ` Kirill Shcherbatov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-27  7:37 UTC (permalink / raw)
  To: tarantool-patches, Vladislav Shpilevoy

> 1. Please, write comments.
> 2. Start a sentence with a capital letter and finish
> with the dot.
> 3. Please, move this routes to C and use tuple_dictionary +
> json path parser.
Moved to C.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 4/5] box: introduce path_hash and tuple_field tree
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-08-27  7:37     ` Kirill Shcherbatov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-27  7:37 UTC (permalink / raw)
  To: tarantool-patches, Vladislav Shpilevoy

> 1. Document structure in future could be defined not only
> via indexes.
-       /* 55 */_(ER_DATA_MISMATCH_INDEX_PART,  "Tuple doesn't math document structure defined as index") \
+       /* 55 */_(ER_DATA_STRUCTURE_MISMATCH,   "Tuple doesn't math document structure") \
> 2. A comment, please.
> 3. Please, add suffix '_cb' which we use for callbacks.
> 4. Why not just 'return key_mp_type_validate(...);'?

/**
 * Watch json_field_tree_routine description.
 * Validate field tree leaf has correct type.
 */
static int
tuple_validate_json_path_data_cb(const struct tuple_field *field, uint32_t idx,
				 const char *tuple, const char *offset,
				 void *ctx)
{
	(void)tuple;
	(void)ctx;
	return key_mp_type_validate(field->type, mp_typeof(*offset),
				    ER_KEY_PART_TYPE, idx, field->is_nullable);
}

/**
 * Watch json_field_tree_routine description.
 * Initialize tuple field map using field tree leaf offset_slot.
 * @param ctx is field_map
 */
 static int
-tuple_init_json_field_map_routine(const struct tuple_field *field, uint32_t idx,
-                                 const char *tuple, const char *offset,
-                                 void *ctx)
+tuple_init_json_field_map_cb(const struct tuple_field *field, uint32_t idx,
+                            const char *tuple, const char *offset, void *ctx)

> 5. The same problem as in the previous version - you double decode
> the field. First inside the tree walker and second here.
> 
> Please, find a way how to avoid it. For help see my comments
> for the walker.
Believe, this is already fixed.

> 6. It should be part of the previous patch together with any
> changes of this file.
Ok

> 7. Static inline and a comment. Also, this function is called
> from places having 'is_flat' template parameter which
> allows to do not check path for NULL on compilation time, but
> you do not use it. Please, make it template too.
Ok, done.

> 8. All the changes between this point and point 7 should be
> part of the previous patch.
Ok.

> 9. First level fieldno should not be encoded in the path. See
> the previous patch for comments.
It is not so, already answered.
> 10. You have exactly the same code in tuple_field_raw_by_path. Please,
> extract either the whole while cycle or this switch into a function.
I've introduce new tuple_field_dig_with_parser routine.
It calls tuple_field_go_to_index and tuple_field_go_to_key using parser,
so I like this name.

> 11. At first, it is a part of the previous patch. At second, it
> run too far. It is time to declare a matrix of functions, maybe
> three dimensional or two, indexes of which are calculated from
> contains_sequential_parts, has_optional_parts and is_flat. See
> field_type1_contains_type2 for an example.
> Same for other functions, having >= 3 template parameters.
Done as a part of previous path.

> 12. Why do you return node_t if you always use only node->val? Why not
> to return val?
It was not nessesary in path that you've revieved but it is important now:
I update leaf field manually and change str pointer to point to format
chunk path memory.

> 13. Why so complicated? Just rename it to new and return the hash
> instead of int.
Simplified.

> 14. Please, put an empty line between the functions.
Done.

> 15. Do not use SQLite style of freeing objects, please.
Done.

> 16. How is it possible that the record exists already? And
> where do you check it here?
Legacy. Dropped.

> 17. Fits in one line.
Done.

> 18. Out of 80. Use sizeof(array[0]). The same below.
-                                       part->num * sizeof(struct tuple_field *));
+                                       part->num * sizeof(array[0]));
Done.

> 19. Size here is not correct. You forgot about * part->num.
-                                       sizeof(struct tuple_field *), "realloc",
-                                       "array");
+                                        part->num * sizeof(array[0]),
+                                        "realloc","array");
Fixed.

> 20. Why are these cases different? If I substitute field->array_size
> (which is true when field->array == NULL) with 0, I will match the
> first branch. It is not?
-                       if (field->array == NULL) {
-                               memset(array, 0, part->num *
-                                       sizeof(struct tuple_field *));
-                       } else {
-                               memset(&array[field->array_size], 0,
-                                      (part->num - field->array_size) *
-                                      sizeof(struct tuple_field *));
-                       }
+                       memset(&array[field->array_size], 0,
+                               (part->num - field->array_size) *
+                               sizeof(array[0]));

> 21. I think, it is rather 'else' branch. Evidently, the condition
> is unreachable, if the field was just created in the code above.
Yep, upgraded.

> 22. If field->map == NULL then sure you do not
> need to check node existence below. Please,
> split those two cases into 2 branches.
Done.

> 23. How can it be NULL?
Fixed.
>> +int
>> +json_field_tree_exec_routine(const struct tuple_field *field, uint32_t idx,
>> +			     const char *tuple, const char *offset,
>> +			     json_field_tree_routine routine, void *routine_ctx)
> 
> 24. I got the idea and I like it - have a walker applicable
> for any puprose. But it calls a function by pointer on a hot
> path - initialization of a tuple field map + validation.
> 
> Lets make it macros. I know, that a macros can not be recursive,
> but we can write a generator of such walker. A macros taking a
> function and generating new json_field_tree_exec_routine.
I've reworked this function a lot. It is complicated enough. Believe, it is not necessary now.

> 25. Talking about one of my first comments (how to iterate over
> the field only once when possible): here, for array, you can
> merely continue iteration and mp_next calling behind field->array_size
> up to count.
> 
> For map the things are more complex. And you can apply the method
> I have described on the previous review. You learn map size and
> on each tuple_field_go_to_key remember maximal number and position
> of the decoded key-value pair. When the iteration is finished,
> you decode the tail starting from the maximal decoded key-value
> pair until the end. In such a case the tail is decoded only once
> always.

> 26. No such parameter 'field_type'. Only 'type'.
Ok, fixed.

> 27. Do not use SQLite style of variables declaration. It
> can be declared and assigned at the same time below.
Ok.

> 28. I do not see a test on this. Please, use gcov to be sure that
> test coverage of all the new code is 100% (except OOM errors).
Added test.

> 29. What about nullability mismatch?Ugum, there was a lot changes in this scenario. I've fixed it now.

> 30. Please. consider this:
> 
>   	field->type = type;
> -	if (json_path_hash_insert(format->path_hash, path, path_len,
> -				  field) != 0)
> -		return -1;
> -
>   	*leaf_field = field;
> -	return 0;
> +	return json_path_hash_insert(format->path_hash, path, path_len, field);
>   }
Ok, applied.
> 31. The aforementioned two hunks are complete mess. Please,
> move the 'is_key_part = true' before them. Then after it at
> first check for 'if (part->path != NULL) { ...; continue; }'.
> Then remove checks for 'path == NULL'.
Done.

> 32. When some indexes has common fields, you count
> and allocate their paths twice.
I've fixed this using hashtable init on format alloc and on-fact leaf initialization.
It is also important to patch hashtable string source and recreate parser with new string.

>> +	for (uint32_t i = 0; i < format->field_count; i++) {
>> +		format->fields[i].array = NULL;
>> +		format->fields[i].array_size = 0;
> 
> 33. Just memset them.
Those fields has other data that I wouldn't copy again.

>> +	if (tuple_format_register(format) != 0)
>> +		goto error;
>>   	return format;
> 
> 34. Consider thisOk, tnx.

> 35. Why here do you check for map != NULL but in all other places
> for array != NULL?
I've introduce childs field that I use in such situations.

> 36. If this is true, then the previous 'if' is unreachable
> (field->offset_slot != TUPLE_OFFSET_SLOT_NIL). Complex field
> can not have an offset slot. So this is 'else' branch, I guess.
Yep.
> 37. After you fix my other comments, this mp_next should
> not be called, if json tree walker was used.
Ugum, done.

> 38. Just inline it.
Ok, done.

> 39. Inversely, it is likely.
Fixed.

> 40. Do here 'return' and reduce indentation level below.
Done.
> 41. As I understand, when a part is JSON and its epoch
> matches the format, it always has a valid slot_cache and you
> do not need to check for '< field_map_size'.
This also related to 42th comment.
Vinyl now has two formats with compatible field map of same size.
But format field count could differ a little. If I not mistaken, disk format
may have less field_count. So smaller format-tuple field map of is initialized
with 0 where appropriate fields don't present.
On the other hand, part cache is looking valid as format epoch are similar.
And it is so, except the fact that such data is not exists for this format type.
Code is appended below.

> Moreover, I do not understand why do you check this slot
> to be not SLOT_NIL and has not zero value in field_map. It
> is 100% valid and you already now can do
> 'return data + field_map[part->slot_cache]', it is not?> 42. The same here. When an offset slot is stored in a field
> from format->path_hash, it always has not zero field_map
> value and not SLOT_NIL offset. So the branch below is actually
> 'else', I guess.

const char *
tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
			const uint32_t *field_map, struct key_part *part)
{
	if (likely(part->path == NULL))
		return tuple_field_raw(format, data, field_map, part->fieldno);

	struct tuple_field *field = NULL;
	int32_t offset_slot = TUPLE_OFFSET_SLOT_NIL;
	if (likely(part->offset_slot_epoch == format->epoch)) {
		offset_slot = part->offset_slot;
	} else if (format->path_hash != NULL &&
		   (field = json_path_hash_get(format->path_hash, part->path,
					       part->path_len,
					       part->path_hash)) != NULL) {
		offset_slot = field->offset_slot;
	} else {
		/*
		 * Legacy tuple having no field map for
		 * JSON index.
		 */
		uint32_t path_hash =
			field_name_hash(part->path, part->path_len);
		const char *raw = NULL;
		if (tuple_field_raw_by_path(format, data, field_map,
					    part->path, part->path_len,
					    path_hash, &raw) != 0)
			raw = NULL;
		return raw;
	}
	assert(offset_slot < 0);
	assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
	if (unlikely(field_map[offset_slot] == 0))
		return NULL;
	/* Cache offset_slot if required. */
	if (part->offset_slot_epoch < format->epoch) {
		part->offset_slot = offset_slot;
		part->offset_slot_epoch = format->epoch;
	}
	return data + field_map[offset_slot];
}

> 43. Please, move this function above within its original commit.
Done.

> 44. Actually, it is not used in tree-walker.
                /** Leaf argument for manual tuple construction. */
		void *arg;

> 45. I see, how do you struggle with checking that the
> field has no childs, so I propose to add a member in the
> union 'void *child'. And check it for NULL. It is better
> than check explicitly array or map in places where it
> does not matter.
Done.

> 46. All these changes should be done in the previous commit.
Done.

> 47. Why do you check for child != NULL instead of part->path != NULL?
Fixed.

>> +static void
>> +vy_stmt_msgpack_build(struct tuple_field *field, char *tuple,
>> +		      uint32_t *field_map, char **offset, bool write_data)
> 
> 48. Why you do not use json_field_tree_exec_routine? Because you need
> to do something before and afer decoding the field? Then just implement
> it as a macro as I said and add 2 parameters:
> start_field_cb(field_type, field_size) and end_field_cb(field_type).
> And if a _cb is not defined, its call is not generated.
I have no callbacks now.

> 49. Double ';'.Fixed.
Done.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 3/5] box: introduce path field in key_part
  2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-08-27  7:37     ` Kirill Shcherbatov
  0 siblings, 0 replies; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-27  7:37 UTC (permalink / raw)
  To: tarantool-patches, Vladislav Shpilevoy

> 1. Please, make extra_size be also an argument of key_def_sizeof
> and name it literally paths_size.
 struct key_def *
-key_def_new(uint32_t part_count, size_t extra_size)
+key_def_new(uint32_t part_count, size_t paths_size)

 static inline size_t
-key_def_sizeof(uint32_t part_count)
+key_def_sizeof(uint32_t part_count, size_t paths_size)
 {
-       return sizeof(struct key_def) + sizeof(struct key_part) * part_count;
+       return sizeof(struct key_def) + sizeof(struct key_part) * part_count +
+              paths_size;
 }
> 2. Same as strncmp and has the same bug: if a path is prefix of
> another path, then you declare them as equal, but it is wrong.
                /* Lexicographic strings order. */
-               uint32_t len = MIN(part1->path_len, part2->path_len);
+               if (part1->path_len != part2->path_len)
+                       return part1->path_len - part2->path_len;
                int rc = 0;
-               if ((rc = memcmp(part1->path, part2->path, len)) != 0)
+               if ((rc = memcmp(part1->path, part2->path,
+                                part1->path_len)) != 0)
                        return rc;

> Please, write a test on it. It should alter an index, extending
> path of one of its parts. In such a case it should be rebuilt,
> but your patch will not do it.
I've checked this code reachable with assert. But I don't now how handle it manually..
Please, help.

> 3. Why not const char *path?
Okey.

> 4. Please, be consistent in names: you already use 'is_flat'
> in the previous patch, not 'has_json_paths'.
Ok. is_flat now

> 5. Why do you need additional parentheses?
Ok.

> 6. As usual. Start sentences with a capital letter, finish with the
> dot.
Fixed.

> 7. Typo: 'alloated'.
Ok.

> 8. Your calculations are diligent, but they are unreadable, sorry.
> I tried to calculate again, and found that 2.5*len is enough.
> A path in the worst case looks like '.a'*. So the biggest
> possible number of path parts is len(path)/len('.a') = len / 2.
> A part is either in the correct format already, or requires
> 3 additional symbols. So the worst new len is:
> (len / 2) * 3 + len = 2.5 * len
> \____________/
>    number of
>    additional
>    characters
> 
> It is not? I failed to find a counter-example.
Yep, this match my calculations exacly. 5/2 ~= 3 for me)
I didn't like multiply on frac value.
> 9. As I remember, we have decided to do not store
> first path part in the string, because we have it in
> part->fieldno already decoded. Please, cut the first
> path part off. Also, it is faster when we need to
> follow the path along tuple field tree in tuple_format,
> because we do not need useless decoding of the first
> part.
Not exactly. We decided to use path hash per format so we need this prefix
to distinguish same path behind different fields.

> 10. Here parser.src_len is always equal to path_len - it was
> initialized by this value at the beginning of the function.
You are right here:
-       if ((uint32_t)parser.src_len > path_len) {
+       if (allocated_size > parser.src_len) {

> 11. What is more, I do not understand, how can it work. When
> you have two key_part_defs, first takes the region memory,
> and the second can take exactly the same memory and rewrite it,
> if its path len < than the former's. It was not clear on the
> previous implementation, but now it is evident. So all
> key_part_defs has the same path, if they are in descending
> order of their path lens.
No, I'd like to reuse the biggest available chunk:
old path buffer or rest of new path buffer; I mean:
old = "[0].FIO.fname\0";
          ^old
new = "[0]["FIO"]["fname"]\0\0\0\0\0\0\0\"
                                                   ^rest
        if (allocated_size > (uint32_t)parser.src_len) {
		/* Use rest of new buffer next time. */
		*path_extra = path;
		*path_extra_size = allocated_size;
	} else {
		/* Reuse old path buffer. */
		*path_extra = (char *)parser.src;
		*path_extra_size = parser.src_len;
	}
allocated_size and path were previously updated to match new buffer tail

> 12. Symbol_count is not the same as error position. Please, use
> json_path_next result as the position.
        err_msg = tt_sprintf("invalid JSON path '%.*s': path has invalid "
                             "structure (error at position %d)", parser.src_len,
-                            parser.src, parser.symbol_count);
+                            parser.src, rc);

> 13. Because key_def_find does not care about paths yourself,
> now key_def_contains() is broken, and Vinyl is too as a consequence.
> Please, write a test on it. Now it is possible to insert duplicates
> into unique Vinyl index, if a secondary index contains field numbers
> of the primary one, but with different paths. It is wrong.
You are right, let's better refractore key_def_find function:

 const struct key_part *
-key_def_find(const struct key_def *key_def, uint32_t fieldno)
+key_def_find(const struct key_def *key_def, uint32_t fieldno, const char *path,
+            uint32_t path_len)
 {
        const struct key_part *part = key_def->parts;
        const struct key_part *end = part + key_def->part_count;
+       assert(path == NULL || strlen(path) == path_len);
        for (; part != end; part++) {
-               if (part->fieldno == fieldno)
+               if (part->fieldno == fieldno && part->path_len == path_len &&
+                   (part->path == NULL ||
+                    memcmp(part->path, path, path_len) == 0))
                        return part;
        }

I.e.
-               const struct key_part *duplicate =
-                       key_def_find(first, part->fieldno);
-               if (duplicate != NULL &&
-                   part->path_len == duplicate->path_len &&
-                   memcmp(part->path, duplicate->path, part->path_len) == 0)
+               if (key_def_find(first, part->fieldno, part->path,
+                                part->path_len) != NULL)
everywhere.

>> +	/** JSON path to data. */
> 14. Please, note that it does not include the first
> path part, that is decoded already into fieldno.
It is not correct for values jet extracted from msgpack and I don't
trim first field as I've already noted below. 
>> +	/** JSON path to data. */
>> +	char *path;
> 15. The same.
-       /** JSON path to data. */
+       /** JSON path to data in canonical form. */

> 16. As I see, you did update neither key_def_contains_sequential_parts
> nor points of its result usage (if (! contains_sequential_parts) ... ).
> So in your implementation such parts as '[1][2][3]' and '[2][4][5]' are
> considered to be sequential, but it is wrong obviously. Their MessagePack's
> do not follow each other. After I reviewed the whole patchset I see that
> you updated these functions in the next commits, but please, do it in
> this commit.
+static bool
+key_def_parts_are_sequential(const struct key_def *def, int i)
+{
+       uint32_t fieldno1 = def->parts[i].fieldno + 1;
+       uint32_t fieldno2 = def->parts[i + 1].fieldno;
+       return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+               def->parts[i + 1].path == NULL;
+}
+

-               if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno)
+               if (key_def_parts_are_sequential(def, i))

-                               if (key_def->parts[i].fieldno + 1 !=
-                                   key_def->parts[i + 1].fieldno) {
+                               if (!key_def_parts_are_sequential(key_def, i)) {

etc.

> 17. Why this patch has no tests? You already now can test paths
> normalization and the first path part cut into fieldno.
This path didn't introduce working feature. After small discussion, we decided to merge 2nd and 3rd paths.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/5] box: introduce slot_cache in key_part
  2018-08-22  0:27   ` [tarantool-patches] " Vladislav Shpilevoy
@ 2018-08-27  7:37     ` Kirill Shcherbatov
  2018-09-03 10:32       ` Vladislav Shpilevoy
  0 siblings, 1 reply; 16+ messages in thread
From: Kirill Shcherbatov @ 2018-08-27  7:37 UTC (permalink / raw)
  To: tarantool-patches, Vladislav Shpilevoy

> 1. As I remember, we already had this talk during SQL triggers
> development, but lets repeat. Please, do not put any operation
> specific things onto the main path. Either use ModifySpace,
> ModifyIndex, RebuildIndex, CreateIndex, DropIndex etc classes,
> or change ModifySpace so that it can be called from _index
> trigger, or introduce ModifySpaceFormat or something.
> Alter_space_do should not be changed.
Done. I've introduced new ModifySpaceFormat that is called at the end
on_replace_index_trigger.

> 2. I do not see where do you update cmp_def in this file. Exactly
> cmp_def is used in comparators, not key_def.
key_def_merge makes key_def_set_part that setup invalid epoch 0 for
every key part.

> 3. From these comments and names it is very hard to understand
> anything.
> 3.1. When I grep 'slot_offset' I see the only match here. For a
> novice it would be difficult to find its origin.
> 3.2. I believe key_part should not depend on a format even in
> member names. Please, rename 'format_epoch' to offset_slot_epoch.
> 3.3. slot_cache -> offset_slot.
> 3.4. In comments, please, describe what is epoch, where are
> offset_slot and epoch from? How epoch works?
@@ -74,10 +74,17 @@ struct key_part {
+       /**
+        * Epoch of offset slot cache. Initialized with
+        * incremental epoch of format on caching it's field's
+        * offset_slot via tuple_field_by_part_raw to speed up
+        * access on subsequent calls with same format.
+        * Cache use "the eldest format is most relevant"
+        * strategy.
+        */
+       uint64_t offset_slot_epoch;
+       /** Cache with format's field offset slot. */
+       int32_t offset_slot;

 struct tuple_format {
        /**
+        * Counter that grows incrementally on space rebuild if
+        * format has other distribution of offset slots comparing
+        * with previous one.
         */
        uint64_t epoch;

> 4. This SQLite style of NULLs handling is not ok, I think. Just
> do not call this function, when the format is NULL.
Ok, fixed.

> 5. This function is a tuple format method and should not take
> a space in the arguments. Please, pass the format without space
> intermediation, and put it into tuple_format.c/.h.

> 6. Parameter name and description shall not be mixed.
> 7. Typo: 'epo'.
Ok, fixed.

> 8. List of which index_defs? Old, new? Besides,
> as I see in the alter code, neither of them. Alter
> can drop some index_defs from this list before you
> update the space_format, for example, via
> DropIndex::alter_defList of new space keys.
I need the same keys that was used to build new space.

> 9. Do not call tuple_format on each iteration. It can not change
> in between. Same about tuple_data, tuple_field_map. As I said in
> the previous review, these functions are not free.
+       struct tuple_format *tuple_a_format = tuple_format(tuple_a);
+       struct tuple_format *tuple_b_format = tuple_format(tuple_b);
+       const char *tuple_a_raw = tuple_data(tuple_a);
+       const char *tuple_b_raw = tuple_data(tuple_b);
+       const uint32_t *tuple_a_field_map = tuple_field_map(tuple_a);
+       const uint32_t *tuple_b_field_map = tuple_field_map(tuple_b);
        for (i = 0; i < key_def->part_count; i++) {
                struct key_part *part = (struct key_part *)&key_def->parts[i];
                const char *field_a =
-                       tuple_field_by_part(tuple_format(tuple_a),
-                                           tuple_data(tuple_a),
-                                           tuple_field_map(tuple_a),
-                                           part);
+                       tuple_field_by_part(tuple_a_format, tuple_a_raw,
+                                           tuple_a_field_map, part);
                const char *field_b =
-                       tuple_field_by_part(tuple_format(tuple_b),
-                                           tuple_data(tuple_b),
-                                           tuple_field_map(tuple_b),
-                                           part);
+                       tuple_field_by_part(tuple_b_format, tuple_b_raw,
+                                           tuple_b_field_map, part);

> 10. As you see, any function retrieving tuple field from const char * has
> suffix '_raw'. Please, rename tuple_field_by_part to tuple_field_by_part_raw.
 const char *
-tuple_field_by_part(const struct tuple_format *format, const char *data,
-                   const uint32_t *field_map, struct key_part *part);
+tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
+                       const uint32_t *field_map, struct key_part *part);

> 10. Broken indentation. It is not a single place. Please, fix in others too.
Fixed.

> 11. All the tree parameter names (def, idx, tuple) do not
> exist in this function.
Done.

> 12. Please, start a sentence with a capital letter and finish
> with the dot. In other places and commits too.
Ok, I'll try.

> 13. Why do you need this cast to struct key_part *? I removed it
> and nothing changed.
> 14. Why do you need non-const struct key_part in
> tuple_field_by_part?
Because I am going to update key_part offset_slot and offset_slot_epoch in
tuple_field_by_part_raw routine.

> 15. Please, do not call tuple_format, tuple_data and tuple_field_map
> multiple times.
Done, similar to 9.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tarantool-patches] Re: [PATCH v2 2/5] box: introduce slot_cache in key_part
  2018-08-27  7:37     ` Kirill Shcherbatov
@ 2018-09-03 10:32       ` Vladislav Shpilevoy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladislav Shpilevoy @ 2018-09-03 10:32 UTC (permalink / raw)
  To: Kirill Shcherbatov, tarantool-patches


> 
>> 2. I do not see where do you update cmp_def in this file. Exactly
>> cmp_def is used in comparators, not key_def.
> key_def_merge makes key_def_set_part that setup invalid epoch 0 for
> every key part.

It is not okay. Why do you discard epoch in a result key_def, if
slots are not changed and still valid?

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-09-03 10:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-15 12:14 [tarantool-patches] [PATCH v2 0/5] box: indexes by JSON path Kirill Shcherbatov
2018-08-15 12:14 ` [tarantool-patches] [PATCH v2 1/5] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 2/5] box: introduce slot_cache in key_part Kirill Shcherbatov
2018-08-22  0:27   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-09-03 10:32       ` Vladislav Shpilevoy
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 3/5] box: introduce path field " Kirill Shcherbatov
2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 4/5] box: introduce path_hash and tuple_field tree Kirill Shcherbatov
2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-08-15 12:15 ` [tarantool-patches] [PATCH v2 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-08-22  0:26   ` [tarantool-patches] " Vladislav Shpilevoy
2018-08-27  7:37     ` Kirill Shcherbatov
2018-08-22  0:28   ` Vladislav Shpilevoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox