Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH v4 00/14] box: indexes by JSON path
@ 2018-10-11  7:58 Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 01/14] box: refactor key_def_find routine Kirill Shcherbatov
                   ` (13 more replies)
  0 siblings, 14 replies; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes
https://github.com/tarantool/tarantool/issues/1012

Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
This came possible with auxiliary JSON tree structure that
contain intermediate path fields and hashtable.
To speed-up data access by JSON index key_part structure extended
with offset_slot cache that points to field_map item containing
data offset for current tuple. Initialization of the field map is
done by traversing the tree to detect vertices that are missing
in the msgpack.
Introduced offset_slot_cache in key_part to tune data access
for typical scenario of using tuples that have same format.

Kirill Shcherbatov (14):
  box: refactor key_def_find routine
  box: introduce key_def_parts_are_sequential
  box: introduce tuple_field_by_relative_path
  box: introduce tuple_format_add_key_part
  box: introduce tuple_format_sizeof routine
  box: move tuple_field_go_to_{index,key} definition
  box: drop format const qualifier in *init_field_map
  lib: implement JSON tree class for json library
  lib: introduce json_path_normalize routine
  box: introduce JSON indexes
  box: introduce has_json_paths flag in templates
  box: tune tuple_field_raw_by_path for indexed data
  box: introduce offset slot cache in key_part
  box: specify indexes in user-friendly form

 src/box/alter.cc                |   7 +-
 src/box/blackhole.c             |   5 +-
 src/box/engine.h                |  11 +-
 src/box/errcode.h               |   2 +-
 src/box/index_def.c             |   6 +-
 src/box/key_def.c               | 216 ++++++++--
 src/box/key_def.h               |  52 ++-
 src/box/lua/index.c             |  60 +++
 src/box/lua/schema.lua          |  22 +-
 src/box/lua/space.cc            |   5 +
 src/box/memtx_engine.c          |   6 +-
 src/box/memtx_space.c           |   5 +-
 src/box/memtx_space.h           |   2 +-
 src/box/schema.cc               |   4 +-
 src/box/space.c                 |   4 +-
 src/box/space.h                 |   8 +-
 src/box/sysview.c               |   3 +-
 src/box/tuple.c                 |  21 +-
 src/box/tuple_compare.cc        | 119 ++++--
 src/box/tuple_extract_key.cc    | 131 ++++--
 src/box/tuple_format.c          | 864 +++++++++++++++++++++++++++++++---------
 src/box/tuple_format.h          |  73 +++-
 src/box/tuple_hash.cc           |  47 ++-
 src/box/vinyl.c                 |   9 +-
 src/box/vy_log.c                |   3 +-
 src/box/vy_lsm.c                |  40 +-
 src/box/vy_point_lookup.c       |   2 -
 src/box/vy_stmt.c               | 184 +++++++--
 src/lib/json/CMakeLists.txt     |   2 +
 src/lib/json/path.c             |  25 ++
 src/lib/json/path.h             |  18 +
 src/lib/json/tree.c             | 300 ++++++++++++++
 src/lib/json/tree.h             | 260 ++++++++++++
 test/box/misc.result            |  57 +--
 test/engine/tuple.result        | 417 +++++++++++++++++++
 test/engine/tuple.test.lua      | 120 ++++++
 test/unit/json_path.c           | 243 ++++++++++-
 test/unit/json_path.result      |  60 ++-
 test/unit/vy_iterators_helper.c |   6 +-
 test/unit/vy_mem.c              |   2 +-
 test/unit/vy_point_lookup.c     |   2 +-
 41 files changed, 2983 insertions(+), 440 deletions(-)
 create mode 100644 src/lib/json/tree.c
 create mode 100644 src/lib/json/tree.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 01/14] box: refactor key_def_find routine
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-15 17:27   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 10/14] box: introduce JSON indexes Kirill Shcherbatov
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Replaced key_def_find with a new key_def_contains_part routine
that checks if key_def contains specified part. In all cases
legacy key_def_find has been used for such purpose. New API
is more convenient for complex key_part that will appear with
JSON paths introduction.

Part of #1012.
---
 src/box/key_def.c | 17 +++++++++--------
 src/box/key_def.h | 10 ++++------
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/src/box/key_def.c b/src/box/key_def.c
index 545b5da..67b517f 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -483,16 +483,17 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count,
 	return 0;
 }
 
-const struct key_part *
-key_def_find(const struct key_def *key_def, uint32_t fieldno)
+bool
+key_def_contains_part(const struct key_def *key_def,
+		      const struct key_part *to_find)
 {
 	const struct key_part *part = key_def->parts;
 	const struct key_part *end = part + key_def->part_count;
 	for (; part != end; part++) {
-		if (part->fieldno == fieldno)
-			return part;
+		if (part->fieldno == to_find->fieldno)
+			return true;
 	}
-	return NULL;
+	return false;
 }
 
 bool
@@ -501,7 +502,7 @@ key_def_contains(const struct key_def *first, const struct key_def *second)
 	const struct key_part *part = second->parts;
 	const struct key_part *end = part + second->part_count;
 	for (; part != end; part++) {
-		if (key_def_find(first, part->fieldno) == NULL)
+		if (!key_def_contains_part(first, part))
 			return false;
 	}
 	return true;
@@ -518,7 +519,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	const struct key_part *part = second->parts;
 	const struct key_part *end = part + second->part_count;
 	for (; part != end; part++) {
-		if (key_def_find(first, part->fieldno))
+		if (key_def_contains_part(first, part))
 			--new_part_count;
 	}
 
@@ -548,7 +549,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	part = second->parts;
 	end = part + second->part_count;
 	for (; part != end; part++) {
-		if (key_def_find(first, part->fieldno))
+		if (key_def_contains_part(first, part))
 			continue;
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
 				 part->is_nullable, part->coll, part->coll_id);
diff --git a/src/box/key_def.h b/src/box/key_def.h
index df85321..37b9045 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -301,12 +301,10 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count,
 			 const char **data, const struct field_def *fields,
 			 uint32_t field_count);
 
-/**
- * Returns the part in index_def->parts for the specified fieldno.
- * If fieldno is not in index_def->parts returns NULL.
- */
-const struct key_part *
-key_def_find(const struct key_def *key_def, uint32_t fieldno);
+/** Check if @a key_def contains @a to_find part. */
+bool
+key_def_contains_part(const struct key_def *key_def,
+		      const struct key_part *to_find);
 
 /**
  * Check if key definition @a first contains all parts of
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 10/14] box: introduce JSON indexes
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 01/14] box: refactor key_def_find routine Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-16  9:33   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 11/14] box: introduce has_json_paths flag in templates Kirill Shcherbatov
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

As we need to store user-defined JSON path in key_part
and key_part_def, we have introduced path and path_len
fields. JSON path is verified and transformed to canonical
form on index msgpack unpack.
Path string stored as a part of the key_def allocation:

+-------+---------+-------+---------+-------+-------+-------+
|key_def|key_part1|  ...  |key_partN| path1 | pathK | pathN |
+-------+---------+-------+---------+-------+-------+-------+
          |                         ^
          |-> path _________________|

To work with JSON-defined indexes we use json_tree class:
[2].FIO.fname -> field "fname" {type=str, off_slot=-1}
[2].FIO.sname -> field "sname" {type=str, off_slot=-2}

<Tree>:      format->field[2] {type = map}
                           |
                   FIO {type = map}
                           |
        "fname"            |             "sname"
{type=str,off_slot=-1} ____|____ {type = str,off_slot=-2}

All JSON paths stored at the end of format allocation:

+------------+------------+-------+------------+-------+
|tuple_format|tuple_field1|  ...  |tuple_fieldN| pathK |
+------------+------------+-------+------------+-------+

Part of #1012.
---
 src/box/errcode.h            |   2 +-
 src/box/index_def.c          |   6 +-
 src/box/key_def.c            | 189 +++++++++++++++--
 src/box/key_def.h            |  31 ++-
 src/box/lua/space.cc         |   5 +
 src/box/memtx_engine.c       |   2 +
 src/box/tuple.c              |  17 +-
 src/box/tuple_compare.cc     |   7 +-
 src/box/tuple_extract_key.cc |  21 +-
 src/box/tuple_format.c       | 496 ++++++++++++++++++++++++++++++++++++++-----
 src/box/tuple_format.h       |  65 +++++-
 src/box/tuple_hash.cc        |   2 +-
 src/box/vinyl.c              |   2 +
 src/box/vy_log.c             |   3 +-
 src/box/vy_point_lookup.c    |   2 -
 src/box/vy_stmt.c            | 184 +++++++++++++---
 test/box/misc.result         |  57 ++---
 test/engine/tuple.result     | 371 ++++++++++++++++++++++++++++++++
 test/engine/tuple.test.lua   | 106 +++++++++
 19 files changed, 1415 insertions(+), 153 deletions(-)

diff --git a/src/box/errcode.h b/src/box/errcode.h
index 4115e6b..464f413 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -107,7 +107,7 @@ struct errcode_record {
 	/* 52 */_(ER_FUNCTION_EXISTS,		"Function '%s' already exists") \
 	/* 53 */_(ER_BEFORE_REPLACE_RET,	"Invalid return value of space:before_replace trigger: expected tuple or nil, got %s") \
 	/* 54 */_(ER_FUNCTION_MAX,		"A limit on the total number of functions has been reached: %u") \
-	/* 55 */_(ER_UNUSED4,			"") \
+	/* 55 */_(ER_DATA_STRUCTURE_MISMATCH,	"Tuple doesn't math document structure: %s") \
 	/* 56 */_(ER_USER_MAX,			"A limit on the total number of users has been reached: %u") \
 	/* 57 */_(ER_NO_SUCH_ENGINE,		"Space engine '%s' does not exist") \
 	/* 58 */_(ER_RELOAD_CFG,		"Can't set option '%s' dynamically") \
diff --git a/src/box/index_def.c b/src/box/index_def.c
index 9cda63c..6e206b1 100644
--- a/src/box/index_def.c
+++ b/src/box/index_def.c
@@ -209,8 +209,10 @@ index_def_is_valid(struct index_def *index_def, const char *space_name)
 			 * Courtesy to a user who could have made
 			 * a typo.
 			 */
-			if (index_def->key_def->parts[i].fieldno ==
-			    index_def->key_def->parts[j].fieldno) {
+			struct key_part *part_a = &index_def->key_def->parts[i];
+			struct key_part *part_b = &index_def->key_def->parts[j];
+			if (part_a->fieldno == part_b->fieldno &&
+			    key_part_path_cmp(part_a, part_b) == 0) {
 				diag_set(ClientError, ER_MODIFY_INDEX,
 					 index_def->name, space_name,
 					 "same key part is indexed twice");
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 67b517f..1b32387 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -28,6 +28,8 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
+#include "fiber.h"
+#include "json/path.h"
 #include "key_def.h"
 #include "tuple_compare.h"
 #include "tuple_extract_key.h"
@@ -41,6 +43,7 @@ const struct key_part_def key_part_def_default = {
 	field_type_MAX,
 	COLL_NONE,
 	false,
+	NULL
 };
 
 static int64_t
@@ -53,6 +56,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len)
 #define PART_OPT_FIELD		"field"
 #define PART_OPT_COLLATION	"collation"
 #define PART_OPT_NULLABILITY	"is_nullable"
+#define PART_OPT_PATH		"path"
 
 const struct opt_def part_def_reg[] = {
 	OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type,
@@ -61,6 +65,7 @@ const struct opt_def part_def_reg[] = {
 	OPT_DEF(PART_OPT_COLLATION, OPT_UINT32, struct key_part_def, coll_id),
 	OPT_DEF(PART_OPT_NULLABILITY, OPT_BOOL, struct key_part_def,
 		is_nullable),
+	OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path),
 	OPT_END,
 };
 
@@ -96,13 +101,25 @@ const uint32_t key_mp_type[] = {
 struct key_def *
 key_def_dup(const struct key_def *src)
 {
-	size_t sz = key_def_sizeof(src->part_count);
-	struct key_def *res = (struct key_def *)malloc(sz);
+	const struct key_part *parts = src->parts;
+	const struct key_part *parts_end = parts + src->part_count;
+	size_t sz = 0;
+	for (; parts < parts_end; parts++)
+		sz += parts->path != NULL ? parts->path_len + 1 : 0;
+	sz = key_def_sizeof(src->part_count, sz);
+	struct key_def *res = (struct key_def *)calloc(1, sz);
 	if (res == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "res");
 		return NULL;
 	}
 	memcpy(res, src, sz);
+	/* Update paths to point to the new memory chunk.*/
+	for (uint32_t i = 0; i < src->part_count; i++) {
+		if (src->parts[i].path == NULL)
+			continue;
+		size_t path_offset = src->parts[i].path - (char *)src;
+		res->parts[i].path = (char *)res + path_offset;
+	}
 	return res;
 }
 
@@ -110,8 +127,23 @@ void
 key_def_swap(struct key_def *old_def, struct key_def *new_def)
 {
 	assert(old_def->part_count == new_def->part_count);
-	for (uint32_t i = 0; i < new_def->part_count; i++)
-		SWAP(old_def->parts[i], new_def->parts[i]);
+	for (uint32_t i = 0; i < new_def->part_count; i++) {
+		if (old_def->parts[i].path == NULL) {
+			SWAP(old_def->parts[i], new_def->parts[i]);
+		} else {
+			/*
+			 * Since the data is located in  memory
+			 * in the same order (otherwise rebuild
+			 * would be called), just update the
+			 * pointers.
+			 */
+			size_t path_offset =
+				old_def->parts[i].path - (char *)old_def;
+			SWAP(old_def->parts[i], new_def->parts[i]);
+			old_def->parts[i].path = (char *)old_def + path_offset;
+			new_def->parts[i].path = (char *)new_def + path_offset;
+		}
+	}
 	SWAP(*old_def, *new_def);
 }
 
@@ -133,23 +165,36 @@ key_def_set_cmp(struct key_def *def)
 static void
 key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 		 enum field_type type, bool is_nullable, struct coll *coll,
-		 uint32_t coll_id)
+		 uint32_t coll_id, const char *path, uint32_t path_len)
 {
 	assert(part_no < def->part_count);
 	assert(type < field_type_MAX);
 	def->is_nullable |= is_nullable;
+	def->has_json_paths |= path != NULL;
 	def->parts[part_no].is_nullable = is_nullable;
 	def->parts[part_no].fieldno = fieldno;
 	def->parts[part_no].type = type;
 	def->parts[part_no].coll = coll;
 	def->parts[part_no].coll_id = coll_id;
+	if (path != NULL) {
+		def->parts[part_no].path_len = path_len;
+		assert(def->parts[part_no].path != NULL);
+		memcpy(def->parts[part_no].path, path, path_len);
+		def->parts[part_no].path[path_len] = '\0';
+	} else {
+		def->parts[part_no].path_len = 0;
+		def->parts[part_no].path = NULL;
+	}
 	column_mask_set_fieldno(&def->column_mask, fieldno);
 }
 
 struct key_def *
 key_def_new(const struct key_part_def *parts, uint32_t part_count)
 {
-	size_t sz = key_def_sizeof(part_count);
+	ssize_t sz = 0;
+	for (uint32_t i = 0; i < part_count; i++)
+		sz += parts[i].path != NULL ? strlen(parts[i].path) + 1 : 0;
+	sz = key_def_sizeof(part_count, sz);
 	struct key_def *def = calloc(1, sz);
 	if (def == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -159,6 +204,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
 	def->part_count = part_count;
 	def->unique_part_count = part_count;
 
+	char *data = (char *)def + key_def_sizeof(part_count, 0);
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		struct coll *coll = NULL;
@@ -172,15 +218,23 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
 			}
 			coll = coll_id->coll;
 		}
+		uint32_t path_len = 0;
+		if (part->path != NULL) {
+			path_len = strlen(part->path);
+			def->parts[i].path = data;
+			data += path_len + 1;
+		}
 		key_def_set_part(def, i, part->fieldno, part->type,
-				 part->is_nullable, coll, part->coll_id);
+				 part->is_nullable, coll, part->coll_id,
+				 part->path, path_len);
 	}
 	key_def_set_cmp(def);
 	return def;
 }
 
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+		   struct key_part_def *parts)
 {
 	for (uint32_t i = 0; i < def->part_count; i++) {
 		const struct key_part *part = &def->parts[i];
@@ -189,13 +243,27 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
 		part_def->type = part->type;
 		part_def->is_nullable = part->is_nullable;
 		part_def->coll_id = part->coll_id;
+		if (part->path != NULL) {
+			char *path  = region_alloc(pool, part->path_len + 1);
+			if (path == NULL) {
+				diag_set(OutOfMemory, part->path_len + 1,
+					 "region_alloc", "part_def->path");
+				return -1;
+			}
+			memcpy(path, part->path, part->path_len);
+			path[part->path_len] = '\0';
+			part_def->path = path;
+		} else {
+			part_def->path = NULL;
+}
 	}
+	return 0;
 }
 
 box_key_def_t *
 box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 {
-	size_t sz = key_def_sizeof(part_count);
+	size_t sz = key_def_sizeof(part_count, 0);
 	struct key_def *key_def = calloc(1, sz);
 	if (key_def == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -208,7 +276,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 	for (uint32_t item = 0; item < part_count; ++item) {
 		key_def_set_part(key_def, item, fields[item],
 				 (enum field_type)types[item],
-				 false, NULL, COLL_NONE);
+				 false, NULL, COLL_NONE, NULL, 0);
 	}
 	key_def_set_cmp(key_def);
 	return key_def;
@@ -255,6 +323,9 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
 		if (part1->is_nullable != part2->is_nullable)
 			return part1->is_nullable <
 			       part2->is_nullable ? -1 : 1;
+		int rc;
+		if ((rc = key_part_path_cmp(part1, part2)) != 0)
+			return rc;
 	}
 	return part_count1 < part_count2 ? -1 : part_count1 > part_count2;
 }
@@ -286,8 +357,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts,
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		assert(part->type < field_type_MAX);
-		SNPRINT(total, snprintf, buf, size, "%d, '%s'",
-			(int)part->fieldno, field_type_strs[part->type]);
+		if (part->path != NULL) {
+			SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'",
+				(int)part->fieldno, part->path,
+				field_type_strs[part->type]);
+		} else {
+			SNPRINT(total, snprintf, buf, size, "%d, '%s'",
+				(int)part->fieldno,
+				field_type_strs[part->type]);
+		}
 		if (i < part_count - 1)
 			SNPRINT(total, snprintf, buf, size, ", ");
 	}
@@ -306,6 +384,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 			count++;
 		if (part->is_nullable)
 			count++;
+		if (part->path != NULL)
+			count++;
 		size += mp_sizeof_map(count);
 		size += mp_sizeof_str(strlen(PART_OPT_FIELD));
 		size += mp_sizeof_uint(part->fieldno);
@@ -320,6 +400,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 			size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY));
 			size += mp_sizeof_bool(part->is_nullable);
 		}
+		if (part->path != NULL) {
+			size += mp_sizeof_str(strlen(PART_OPT_PATH));
+			size += mp_sizeof_str(strlen(part->path));
+		}
 	}
 	return size;
 }
@@ -335,6 +419,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 			count++;
 		if (part->is_nullable)
 			count++;
+		if (part->path != NULL)
+			count++;
 		data = mp_encode_map(data, count);
 		data = mp_encode_str(data, PART_OPT_FIELD,
 				     strlen(PART_OPT_FIELD));
@@ -354,6 +440,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 					     strlen(PART_OPT_NULLABILITY));
 			data = mp_encode_bool(data, part->is_nullable);
 		}
+		if (part->path != NULL) {
+			data = mp_encode_str(data, PART_OPT_PATH,
+					     strlen(PART_OPT_PATH));
+			data = mp_encode_str(data, part->path,
+					     strlen(part->path));
+		}
 	}
 	return data;
 }
@@ -414,6 +506,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
 				     fields[part->fieldno].is_nullable :
 				     key_part_def_default.is_nullable);
 		part->coll_id = COLL_NONE;
+		part->path = NULL;
 	}
 	return 0;
 }
@@ -427,6 +520,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 		return key_def_decode_parts_166(parts, part_count, data,
 						fields, field_count);
 	}
+	struct region *region = &fiber()->gc;
 	for (uint32_t i = 0; i < part_count; i++) {
 		struct key_part_def *part = &parts[i];
 		if (mp_typeof(**data) != MP_MAP) {
@@ -438,7 +532,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 		*part = key_part_def_default;
 		if (opts_decode(part, part_def_reg, data,
 				ER_WRONG_INDEX_OPTIONS, i + TUPLE_INDEX_BASE,
-				NULL) != 0)
+				region) != 0)
 			return -1;
 		if (part->type == field_type_MAX) {
 			diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
@@ -455,6 +549,34 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 				 "string and scalar parts");
 			return -1;
 		}
+		/*
+		 * We should convert JSON path to canonical form
+		 * to prevent indexing same field twice.
+		 */
+		if (part->path != NULL) {
+			uint32_t path_len = strlen(part->path);
+			char *path_normalized =
+				region_alloc(region, 2.5 * path_len + 1);
+			if (path_normalized == NULL) {
+				diag_set(OutOfMemory, 2.5 * path_len + 1,
+					 "region_alloc", "path");
+				return -1;
+			}
+			int rc = json_path_normalize(part->path, path_len,
+						     path_normalized);
+			if (rc != 0) {
+				const char *err_msg =
+					tt_sprintf("invalid JSON path '%s': "
+						   "path has invalid structure "
+						   "(error at position %d)",
+						   part->path, rc);
+				diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+					 part->fieldno + TUPLE_INDEX_BASE,
+					 err_msg);
+				return -1;
+			}
+			part->path = path_normalized;
+		}
 	}
 	return 0;
 }
@@ -479,6 +601,7 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count,
 				     fields[part->fieldno].is_nullable :
 				     key_part_def_default.is_nullable);
 		part->coll_id = COLL_NONE;
+		part->path = NULL;
 	}
 	return 0;
 }
@@ -490,7 +613,8 @@ key_def_contains_part(const struct key_def *key_def,
 	const struct key_part *part = key_def->parts;
 	const struct key_part *end = part + key_def->part_count;
 	for (; part != end; part++) {
-		if (part->fieldno == to_find->fieldno)
+		if (part->fieldno == to_find->fieldno &&
+		    key_part_path_cmp(part, to_find) == 0)
 			return true;
 	}
 	return false;
@@ -516,18 +640,27 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	 * Find and remove part duplicates, i.e. parts counted
 	 * twice since they are present in both key defs.
 	 */
-	const struct key_part *part = second->parts;
-	const struct key_part *end = part + second->part_count;
+	size_t sz = 0;
+	const struct key_part *part = first->parts;
+	const struct key_part *end = part + first->part_count;
+	for (; part != end; part++) {
+		if (part->path != NULL)
+			sz += part->path_len + 1;
+	}
+	part = second->parts;
+	end = part + second->part_count;
 	for (; part != end; part++) {
 		if (key_def_contains_part(first, part))
 			--new_part_count;
+		else if (part->path != NULL)
+			sz += part->path_len + 1;
 	}
 
+	sz = key_def_sizeof(new_part_count, sz);
 	struct key_def *new_def;
-	new_def =  (struct key_def *)calloc(1, key_def_sizeof(new_part_count));
+	new_def =  (struct key_def *)calloc(1, sz);
 	if (new_def == NULL) {
-		diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc",
-			 "new_def");
+		diag_set(OutOfMemory, sz, "malloc", "new_def");
 		return NULL;
 	}
 	new_def->part_count = new_part_count;
@@ -535,14 +668,21 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	new_def->is_nullable = first->is_nullable || second->is_nullable;
 	new_def->has_optional_parts = first->has_optional_parts ||
 				      second->has_optional_parts;
+	/* Path data write position in the new key_def. */
+	char *data = (char *)new_def + key_def_sizeof(new_part_count, 0);
 	/* Write position in the new key def. */
 	uint32_t pos = 0;
 	/* Append first key def's parts to the new index_def. */
 	part = first->parts;
 	end = part + first->part_count;
 	for (; part != end; part++) {
+		if (part->path != NULL) {
+			new_def->parts[pos].path = data;
+			data += part->path_len + 1;
+		}
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
-				 part->is_nullable, part->coll, part->coll_id);
+				 part->is_nullable, part->coll, part->coll_id,
+				 part->path, part->path_len);
 	}
 
 	/* Set-append second key def's part to the new key def. */
@@ -551,8 +691,13 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	for (; part != end; part++) {
 		if (key_def_contains_part(first, part))
 			continue;
+		if (part->path != NULL) {
+			new_def->parts[pos].path = data;
+			data += part->path_len + 1;
+		}
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
-				 part->is_nullable, part->coll, part->coll_id);
+				 part->is_nullable, part->coll, part->coll_id,
+				 part->path, part->path_len);
 	}
 	key_def_set_cmp(new_def);
 	return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 37b9045..ca73015 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -54,6 +54,8 @@ struct key_part_def {
 	uint32_t coll_id;
 	/** True if a key part can store NULLs. */
 	bool is_nullable;
+	/** JSON path to data. */
+	const char *path;
 };
 
 extern const struct key_part_def key_part_def_default;
@@ -76,6 +78,13 @@ struct key_part {
 	struct coll *coll;
 	/** True if a part can store NULLs. */
 	bool is_nullable;
+	/**
+	 * JSON path to data in 'canonical' form.
+	 * Read json_path_normalize to get more details.
+	 */
+	char *path;
+	/** The length of JSON path. */
+	uint32_t path_len;
 };
 
 struct key_def;
@@ -130,6 +139,8 @@ struct key_def {
 	uint32_t unique_part_count;
 	/** True, if at least one part can store NULL. */
 	bool is_nullable;
+	/** True, if some key part has JSON path. */
+	bool has_json_paths;
 	/**
 	 * True, if some key parts can be absent in a tuple. These
 	 * fields assumed to be MP_NIL.
@@ -223,9 +234,10 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b,
 /** \endcond public */
 
 static inline size_t
-key_def_sizeof(uint32_t part_count)
+key_def_sizeof(uint32_t part_count, uint32_t paths_size)
 {
-	return sizeof(struct key_def) + sizeof(struct key_part) * part_count;
+	return sizeof(struct key_def) + sizeof(struct key_part) * part_count +
+	       paths_size;
 }
 
 /**
@@ -238,8 +250,9 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count);
 /**
  * Dump part definitions of the given key def.
  */
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+		   struct key_part_def *parts);
 
 /**
  * Update 'has_optional_parts' of @a key_def with correspondence
@@ -351,6 +364,8 @@ key_validate_parts(const struct key_def *key_def, const char *key,
 static inline bool
 key_def_is_sequential(const struct key_def *key_def)
 {
+	if (key_def->has_json_paths)
+		return false;
 	for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) {
 		if (key_def->parts[part_id].fieldno != part_id)
 			return false;
@@ -401,6 +416,14 @@ key_mp_type_validate(enum field_type key_type, enum mp_type mp_type,
 	return 0;
 }
 
+static inline int
+key_part_path_cmp(const struct key_part *part1, const struct key_part *part2)
+{
+	if (part1->path_len != part2->path_len)
+		return part1->path_len - part2->path_len;
+	return memcmp(part1->path, part2->path, part1->path_len);
+}
+
 /**
  * Compare two key part arrays.
  *
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 25b7e36..875e51f 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -295,6 +295,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 			lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE);
 			lua_setfield(L, -2, "fieldno");
 
+			if (part->path != NULL) {
+				lua_pushstring(L, part->path);
+				lua_setfield(L, -2, "path");
+			}
+
 			lua_pushboolean(L, part->is_nullable);
 			lua_setfield(L, -2, "is_nullable");
 
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index ae1f5a0..a35f5e7 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1317,6 +1317,8 @@ memtx_index_def_change_requires_rebuild(struct index *index,
 			return true;
 		if (old_part->coll != new_part->coll)
 			return true;
+		if (key_part_path_cmp(old_part, new_part) != 0)
+			return true;
 	}
 	return false;
 }
diff --git a/src/box/tuple.c b/src/box/tuple.c
index d7dbad3..2bc414f 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -161,10 +161,21 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
 	struct tuple_field *field = &format->fields[0];
 	uint32_t i = 0;
 	uint32_t defined_field_count = MIN(field_count, format->field_count);
+	size_t off_stack_size =
+		format->max_path_tree_depth * sizeof(const char *);
+	const char **off_stack = region_alloc(&fiber()->gc, off_stack_size);
+	if (off_stack == NULL) {
+		diag_set(OutOfMemory, off_stack_size, "region_alloc",
+			 "off_stack");
+		return -1;
+	}
 	for (; i < defined_field_count; ++i, ++field) {
-		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
-					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
-					 field->is_nullable))
+		/*
+		 * Don't fill field_map, just make types
+		 * validations.
+		 */
+		if (tuple_field_tree_parse_raw(field, tuple, NULL, i, NULL,
+					       off_stack) != 0)
 			return -1;
 		mp_next(&tuple);
 	}
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index e21b009..9bff340 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	struct key_part *part = key_def->parts;
 	const char *tuple_a_raw = tuple_data(tuple_a);
 	const char *tuple_b_raw = tuple_data(tuple_b);
-	if (key_def->part_count == 1 && part->fieldno == 0) {
+	if (key_def->part_count == 1 && part->fieldno == 0 &&
+	    part->path == NULL) {
 		/*
 		 * First field can not be optional - empty tuples
 		 * can not exist.
@@ -1027,7 +1028,7 @@ tuple_compare_create(const struct key_def *def)
 		}
 	}
 	assert(! def->has_optional_parts);
-	if (!key_def_has_collation(def)) {
+	if (!key_def_has_collation(def) && !def->has_json_paths) {
 		/* Precalculated comparators don't use collation */
 		for (uint32_t k = 0;
 		     k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) {
@@ -1247,7 +1248,7 @@ tuple_compare_with_key_create(const struct key_def *def)
 		}
 	}
 	assert(! def->has_optional_parts);
-	if (!key_def_has_collation(def)) {
+	if (!key_def_has_collation(def) && !def->has_json_paths) {
 		/* Precalculated comparators don't use collation */
 		for (uint32_t k = 0;
 		     k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]);
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index e9d7cac..cb4ae71 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -10,7 +10,8 @@ key_def_parts_are_sequential(const struct key_def *def, int i)
 {
 	uint32_t fieldno1 = def->parts[i].fieldno + 1;
 	uint32_t fieldno2 = def->parts[i + 1].fieldno;
-	return fieldno1 == fieldno2;
+	return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+	       def->parts[i + 1].path == NULL;
 }
 
 /** True, if a key con contain two or more parts in sequence. */
@@ -241,7 +242,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 			if (!key_def_parts_are_sequential(key_def, i))
 				break;
 		}
-		uint32_t end_fieldno = key_def->parts[i].fieldno;
+		const struct key_part *part = &key_def->parts[i];
+		uint32_t end_fieldno = part->fieldno;
 
 		if (fieldno < current_fieldno) {
 			/* Rewind. */
@@ -283,6 +285,17 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 				current_fieldno++;
 			}
 		}
+		const char *field_last, *field_end_last;
+		if (part->path != NULL) {
+			field_last = field;
+			field_end_last = field_end;
+			int rc = tuple_field_by_relative_path(&field,
+							      part->path,
+							      part->path_len);
+			assert(rc == 0);
+			field_end = field;
+			mp_next(&field_end);
+		}
 		memcpy(key_buf, field, field_end - field);
 		key_buf += field_end - field;
 		if (has_optional_parts && null_count != 0) {
@@ -291,6 +304,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		} else {
 			assert(key_buf - key <= data_end - data);
 		}
+		if (part->path != NULL) {
+			field = field_last;
+			field_end = field_end_last;
+		}
 	}
 	if (key_size != NULL)
 		*key_size = (uint32_t)(key_buf - key);
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index a8aceef..30deac5 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -28,6 +28,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
+#include "fiber.h"
 #include "json/path.h"
 #include "tuple_format.h"
 
@@ -37,20 +38,17 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
 
 static uint32_t formats_size = 0, formats_capacity = 0;
 
-static const struct tuple_field tuple_field_default = {
-	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false,
-};
-
 /**
  * Calculate the size of tuple format allocation.
  * @param field_count Count of tuple fields.
+ * @param path_size Size of paths extention.
  * @retval Size of memory allocation.
  */
 static inline uint32_t
-tuple_format_sizeof(uint32_t field_count)
+tuple_format_sizeof(uint32_t field_count, uint32_t paths_size)
 {
 	return sizeof(struct tuple_format) +
-	       field_count * sizeof(struct tuple_field);
+	       field_count * sizeof(struct tuple_field) + paths_size;
 }
 
 /**
@@ -134,6 +132,146 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
 }
 
 /**
+ * Initialize tuple_field with default values.
+ * @param field Fields definition.
+ * @param path_node JSON parser node to initialize JSON tree
+ *                  tuple_field. May be NULL for regular fields.
+ * @param path_node_hash Hash of @path_node calculated with
+ *                       json_path_node_hash.
+ */
+static inline void
+tuple_field_create(struct tuple_field *field, struct json_path_node *path_node,
+		   uint32_t path_node_hash)
+{
+	memset(field, 0, sizeof(struct tuple_field));
+	field->type = FIELD_TYPE_ANY;
+	field->offset_slot = TUPLE_OFFSET_SLOT_NIL;
+	json_tree_node_create(&field->path_tree_node, path_node,
+			      path_node_hash);
+}
+
+/**
+ * Release tuple_field resources.
+ * @param field Fields definition.
+ */
+static inline void
+tuple_field_destroy(struct tuple_field *field)
+{
+	json_tree_node_destroy(&field->path_tree_node);
+}
+
+/**
+ * Construct a new JSON tree path for tuple_field by path string.
+ * Make types checks for existent paths and initialize it with
+ * value allocated form offset_slot pool.
+ * @param parent Root field to build JSON path tree.
+ * @param path JSON path to data used to construct tree path.
+ * @param path_len Length of the path.
+ * @param fieldno Index of field used for error handling.
+ * @param field_type Tuple field type of leaf record.
+ * @param is_nullable Nullability of leaf record.
+ * @param current_slot Pointer to offset slot pool used to
+ *                     allocate leaf field offset_slot.
+ * @retval not NULL JSON tree leaf tuple record.
+ * @retval NULL on memory allocation or path conflict error.
+ *              diag message is set.
+ */
+static struct tuple_field *
+tuple_field_tree_add_path(struct tuple_field *parent, const char *path,
+			  uint32_t path_len, uint32_t fieldno,
+			  enum field_type type, bool is_nullable,
+			  int *current_slot)
+{
+	int rc;
+	enum field_type iterm_node_type = FIELD_TYPE_ANY;
+	struct json_path_parser parser;
+	struct json_path_node path_node;
+	struct tuple_field *field = parent;
+	/* True if the last entry has just been allocated. */
+	bool is_last_new = false;
+	json_path_parser_create(&parser, path, path_len);
+	while ((rc = json_path_next(&parser, &path_node)) == 0 &&
+	       path_node.type != JSON_PATH_END) {
+		iterm_node_type = path_node.type == JSON_PATH_STR ?
+				  FIELD_TYPE_MAP : FIELD_TYPE_ARRAY;
+		if (field->type != FIELD_TYPE_ANY &&
+		    field->type != iterm_node_type)
+			goto error_type_mistmatch;
+		uint32_t path_node_hash = json_path_node_hash(&path_node);
+		struct tuple_field *next_field =
+			json_tree_lookup_entry_by_path_node(field, &path_node,
+							    path_node_hash,
+							    struct tuple_field,
+							    path_tree_node);
+		if (next_field == NULL) {
+			is_last_new = true;
+			next_field = malloc(sizeof(struct tuple_field));
+			if (next_field == NULL) {
+				diag_set(OutOfMemory,
+					 sizeof(struct tuple_field),
+					 "malloc", "next_field");
+				return NULL;
+			}
+			tuple_field_create(next_field, &path_node,
+					   path_node_hash);
+			rc = json_tree_add(&field->path_tree_node,
+					   &next_field->path_tree_node);
+			if (rc != 0) {
+				diag_set(OutOfMemory,
+					 sizeof(struct json_tree_node),
+					 "json_tree_add", "hashtable");
+				return NULL;
+			}
+		} else {
+			is_last_new = false;
+		}
+		field->type = iterm_node_type;
+		field = next_field;
+	}
+	/*
+	 * Key parts path is already checked and normalized,
+	 * so we don't need to handle parse error.
+	 */
+	assert(rc == 0 && path_node.type == JSON_PATH_END);
+	assert(field != NULL);
+	if (!is_last_new) {
+		assert(field->offset_slot != TUPLE_OFFSET_SLOT_NIL);
+		/* Set up the strongest type and nullability. */
+		if (field->is_nullable != is_nullable)
+			field->is_nullable = false;
+		iterm_node_type = type;
+		if (field_type1_contains_type2(field->type, iterm_node_type)) {
+			field->type = iterm_node_type;
+		} else if (!field_type1_contains_type2(iterm_node_type,
+						       field->type))
+			goto error_type_mistmatch;
+	} else {
+		field->is_nullable = is_nullable;
+		field->type = type;
+		field->offset_slot = (*current_slot = *current_slot - 1);
+
+		/* Update tree depths for path. */
+		uint32_t depth = 1;
+		for (struct json_tree_node *iter = field->path_tree_node.parent;
+		     iter != NULL; iter = iter->parent, ++depth) {
+			struct tuple_field *record =
+				json_tree_entry(iter, struct tuple_field,
+						path_tree_node);
+			record->path_tree_depth =
+				MAX(record->path_tree_depth, depth);
+		}
+	}
+	return field;
+
+error_type_mistmatch: ;
+	diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH,
+		 tt_sprintf("%d", fieldno + TUPLE_INDEX_BASE),
+			    field_type_strs[iterm_node_type],
+			    field_type_strs[field->type]);
+	return NULL;
+}
+
+/**
  * Add and initialize a new key_part to format.
  * @param format Format to initialize.
  * @param fields Fields definition if any.
@@ -141,6 +279,7 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
  * @param part An index part to append.
  * @param is_sequential Does this part sequential.
  * @param current_slot Pointer to last offset slot.
+ * @param path_data Pointer to memory to store paths strings.
  * @retval -1 On error.
  * @retval 0 On success.
  */
@@ -148,10 +287,33 @@ static int
 tuple_format_add_key_part(struct tuple_format *format,
 			  const struct field_def *fields, uint32_t field_count,
 			  const struct key_part *part, bool is_sequential,
-			  int *current_slot)
+			  int *current_slot, char **path_data)
 {
 	assert(part->fieldno < format->field_count);
 	struct tuple_field *field = &format->fields[part->fieldno];
+	if (part->path != NULL) {
+		field->is_key_part = true;
+		assert(!is_sequential);
+		/**
+		 * Copy JSON path data to reserved area at the
+		 * end of format allocation.
+		 */
+		memcpy(*path_data, part->path, part->path_len);
+		(*path_data)[part->path_len] = '\0';
+		field = tuple_field_tree_add_path(field, *path_data,
+						  part->path_len, part->fieldno,
+						  part->type, part->is_nullable,
+						  current_slot);
+		*path_data += part->path_len + 1;
+		if (field != NULL) {
+			format->max_path_tree_depth =
+				MAX(format->max_path_tree_depth,
+				    field->path_tree_depth);
+			return 0;
+		} else {
+			return -1;
+		}
+	}
 	if (part->fieldno >= field_count) {
 		field->is_nullable = part->is_nullable;
 	} else if (field->is_nullable != part->is_nullable) {
@@ -214,18 +376,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		format->field_map_size = 0;
 		return 0;
 	}
-	/* Initialize defined fields */
+	/* Update defined fields attributes. */
 	for (uint32_t i = 0; i < field_count; ++i) {
-		format->fields[i].is_key_part = false;
 		format->fields[i].type = fields[i].type;
-		format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
 		format->fields[i].is_nullable = fields[i].is_nullable;
 	}
-	/* Initialize remaining fields */
-	for (uint32_t i = field_count; i < format->field_count; i++)
-		format->fields[i] = tuple_field_default;
 
 	int current_slot = 0;
+	char *paths_data =
+		(char *)format + tuple_format_sizeof(format->field_count, 0);
 
 	/* extract field type info */
 	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
@@ -238,7 +397,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			if (tuple_format_add_key_part(format, fields,
 						      field_count, part,
 						      is_sequential,
-						      &current_slot) != 0)
+						      &current_slot,
+						      &paths_data) != 0)
 				return -1;
 		}
 	}
@@ -305,6 +465,8 @@ static struct tuple_format *
 tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		   uint32_t space_field_count, struct tuple_dictionary *dict)
 {
+	/* Size of area to store paths. */
+	uint32_t paths_size = 0;
 	uint32_t index_field_count = 0;
 	/* find max max field no */
 	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
@@ -314,10 +476,12 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		for (; part < pend; part++) {
 			index_field_count = MAX(index_field_count,
 						part->fieldno + 1);
+			if (part->path != NULL)
+				paths_size += part->path_len + 1;
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
-	uint32_t total = tuple_format_sizeof(field_count);
+	uint32_t total = tuple_format_sizeof(field_count, paths_size);
 
 	struct tuple_format *format = (struct tuple_format *) malloc(total);
 	if (format == NULL) {
@@ -336,6 +500,10 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
+	for (uint32_t i = 0; i < field_count; i++)
+		tuple_field_create(&format->fields[i], NULL, 0);
+	format->max_path_tree_depth = 1;
+	format->allocation_size = total;
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
 	format->field_count = field_count;
@@ -349,6 +517,19 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 static inline void
 tuple_format_destroy(struct tuple_format *format)
 {
+	for (uint32_t i = 0; i < format->field_count; i++) {
+		struct tuple_field *field;
+		json_tree_foreach_entry_safe(field, &format->fields[i],
+					     path_tree_node) {
+			tuple_field_destroy(field);
+			/*
+			 * Root records has been allocated with
+			 * format. Don't need to free them.
+			 */
+			if (field != &format->fields[i])
+				free(field);
+		}
+	}
 	tuple_dictionary_unref(format->dict);
 }
 
@@ -428,26 +609,201 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
 	return true;
 }
 
+/**
+ * Build a copy of JSON tree that use a new source path string.
+ * Function use the assumption that format data was memcpy(ied)
+ * so new_data and old_data memory have similar paths order.
+ * @param new_root Root of a new JSON path tree.
+ * @param old_root Root of original JSON path tree to be copied.
+ * @param new_data Pointer to memory containing new paths strings.
+ * @param old_data Painter to memory containing old paths strings.
+ * @retval 0 On success, -1 On memory allocation error.
+ */
+static int
+tuple_field_tree_dup(struct tuple_field *new_root, struct tuple_field *old_root,
+		     const char *new_data, const char *old_data)
+{
+	struct json_tree_node *new_prev = &new_root->path_tree_node;
+	struct json_tree_node *old_prev = &old_root->path_tree_node;
+	struct json_tree_node *old_curr;
+	json_tree_foreach_pre(old_curr, &old_root->path_tree_node) {
+		if (unlikely(old_curr == &old_root->path_tree_node))
+			continue;
+		struct tuple_field *new_field =
+			calloc(1, sizeof(struct tuple_field));
+		if (new_field == NULL) {
+			diag_set(OutOfMemory, sizeof(struct tuple_field),
+				 "calloc", "new_field");
+			return -1;
+		}
+		*new_field = *json_tree_entry(old_curr, struct tuple_field,
+					      path_tree_node);
+		/*
+		 * Re-initialize JSON tree node with a new
+		 * string pointer.
+		 */
+		struct json_path_node node = old_curr->key;
+		if (node.type == JSON_PATH_STR)
+			node.str = new_data + (node.str - old_data);
+		assert(old_curr->key_hash == json_path_node_hash(&node));
+		json_tree_node_create(&new_field->path_tree_node, &node,
+				      old_curr->key_hash);
+		/* Lookup for a valid parent node in new tree. */
+		while (old_prev != old_curr->parent) {
+			old_prev = old_prev->parent;
+			new_prev = new_prev->parent;
+			assert(old_prev != NULL);
+		}
+		assert(new_prev != NULL);
+		int rc = json_tree_add(new_prev, &new_field->path_tree_node);
+		if (rc != 0) {
+			diag_set(OutOfMemory, sizeof(struct json_tree_node),
+				 "json_tree_add", "hashtable");
+			return -1;
+		}
+		old_prev = old_curr;
+		new_prev = &new_field->path_tree_node;
+	}
+	return 0;
+}
+
 struct tuple_format *
 tuple_format_dup(struct tuple_format *src)
 {
-	uint32_t total = sizeof(struct tuple_format) +
-			 src->field_count * sizeof(struct tuple_field);
+	uint32_t total = src->allocation_size;
 	struct tuple_format *format = (struct tuple_format *) malloc(total);
 	if (format == NULL) {
 		diag_set(OutOfMemory, total, "malloc", "tuple format");
 		return NULL;
 	}
 	memcpy(format, src, total);
+
+	char *src_data_begin =
+		(char *)src + tuple_format_sizeof(src->field_count, 0);
+	char *new_data_begin =
+		(char *)format + tuple_format_sizeof(format->field_count, 0);
+	/*
+	 * Destroy references to source tree to prevent source
+	 * corruption during resources release on dup failure.
+	 */
+	for (uint32_t i = 0; i < src->field_count; i++)
+		json_tree_node_create(&format->fields[i].path_tree_node, NULL, 0);
+	for (uint32_t i = 0; i < src->field_count; i++) {
+		if (tuple_field_tree_dup(&format->fields[i], &src->fields[i],
+					 new_data_begin, src_data_begin) != 0)
+			goto error;
+	}
 	tuple_dictionary_ref(format->dict);
 	format->id = FORMAT_ID_NIL;
 	format->refs = 0;
-	if (tuple_format_register(format) != 0) {
-		tuple_format_destroy(format);
-		free(format);
-		return NULL;
-	}
+	if (tuple_format_register(format) != 0)
+		goto error;
 	return format;
+error:
+	tuple_format_destroy(format);
+	free(format);
+	return NULL;
+}
+
+int
+tuple_field_tree_parse_raw(struct tuple_field *field, const char *field_raw,
+			   const char *tuple_raw, uint32_t fieldno,
+			   uint32_t *field_map, const char **off_stack)
+{
+	/*
+	 * Need to make root field type validation here to
+	 * do not make prev != NULL ? prev->type : field->type
+	 * each iteration.
+	*/
+	if (key_mp_type_validate(field->type, mp_typeof(*field_raw),
+				 ER_FIELD_TYPE, TUPLE_INDEX_BASE + fieldno,
+				 field->is_nullable) != 0)
+		return -1;
+	const char *err_details = NULL;
+	/* Stack of JSON path nodes offsets. */
+	uint32_t off_stack_idx = 0;
+	const char *offset = field_raw;
+	struct tuple_field *prev = NULL;
+	struct tuple_field *curr;
+	json_tree_foreach_entry_pre(curr, field, path_tree_node) {
+		int rc = 0;
+		struct tuple_field *curr_parent =
+			json_tree_entry_safe(curr->path_tree_node.parent,
+					     struct tuple_field,
+					     path_tree_node);
+		/*
+		 * If currect record parent is not previous
+		 * record, there were a switch to other path tree
+		 * path.
+		 */
+		while (curr_parent != prev) {
+			assert(prev != NULL);
+			struct json_tree_node *prev_parent =
+				prev->path_tree_node.parent;
+			assert(prev_parent != NULL);
+			assert(off_stack_idx > 0);
+			prev = json_tree_entry(prev_parent, struct tuple_field,
+					       path_tree_node);
+			/*
+			 * Pop-up offset coresponding to the
+			 * current field.
+			 */
+			offset = off_stack[--off_stack_idx];
+		}
+		off_stack[off_stack_idx++] = offset;
+
+		struct json_path_node *key = &curr->path_tree_node.key;
+		if (key->type == JSON_PATH_NUM) {
+			assert(prev == NULL || prev->type == FIELD_TYPE_ARRAY);
+			rc = tuple_field_go_to_index(&offset, key->num);
+			if (rc != 0 && !curr->is_nullable) {
+				err_details =
+					tt_sprintf("array size %d is less than "
+						   "size of %d defined in "
+						   "index",
+						   key->num, key->num + 1);
+				goto error_invalid_document;
+			}
+		} else if (key->type == JSON_PATH_STR) {
+			assert(prev == NULL || prev->type == FIELD_TYPE_MAP);
+			rc = tuple_field_go_to_key(&offset, key->str, key->len);
+			if (rc != 0 && !curr->is_nullable) {
+				err_details =
+					tt_sprintf("map doesn't contain key "
+						   "'%.*s' defined in index",
+						   key->len, key->str);
+				goto error_invalid_document;
+			}
+		}
+		/*
+		 * We can't make this validation below as we
+		 * didn't know, does this field present in tuple,
+		 * as offset could point to the next field.
+		 */
+		if (rc == 0 &&
+		    key_mp_type_validate(curr->type, mp_typeof(*offset),
+					 ER_FIELD_TYPE,
+					 TUPLE_INDEX_BASE + fieldno,
+					 curr->is_nullable) != 0)
+			return -1;
+		if (field_map != NULL &&
+		    curr->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+			field_map[curr->offset_slot] =
+				rc == 0 ? (uint32_t)(offset - tuple_raw) : 0;
+		}
+		prev = curr;
+	}
+	return 0;
+
+error_invalid_document:
+	assert(err_details != NULL);
+	char *data_buff = tt_static_buf();
+	mp_snprint(data_buff, TT_STATIC_BUF_LEN, field_raw);
+	const char *err_msg =
+		tt_sprintf("invalid field %d document content '%s': %s",
+			   fieldno + TUPLE_INDEX_BASE, data_buff, err_details);
+	diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg);
+	return -1;
 }
 
 /** @sa declaration for details. */
@@ -477,34 +833,39 @@ tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
 	}
 
 	/* first field is simply accessible, so we do not store offset to it */
-	enum mp_type mp_type = mp_typeof(*pos);
-	const struct tuple_field *field = &format->fields[0];
-	if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
-				 TUPLE_INDEX_BASE, field->is_nullable))
-		return -1;
-	mp_next(&pos);
-	/* other fields...*/
-	++field;
-	uint32_t i = 1;
-	uint32_t defined_field_count = MIN(field_count, format->field_count);
-	if (field_count < format->index_field_count) {
+	struct tuple_field *field = &format->fields[0];
+	uint32_t i = 0;
+	if (field_count < format->index_field_count ||
+	    json_tree_node_children_count(&field->path_tree_node) == 0) {
 		/*
-		 * Nullify field map to be able to detect by 0,
-		 * which key fields are absent in tuple_field().
-		 */
+		* Nullify field map to be able to detect by 0,
+		* which key fields are absent in tuple_field().
+		*/
 		memset((char *)field_map - format->field_map_size, 0,
-		       format->field_map_size);
+		format->field_map_size);
 	}
-	for (; i < defined_field_count; ++i, ++field) {
-		mp_type = mp_typeof(*pos);
+	if (json_tree_node_children_count(&field->path_tree_node) == 0) {
+		enum mp_type mp_type = mp_typeof(*pos);
 		if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
-					 i + TUPLE_INDEX_BASE,
-					 field->is_nullable))
+					TUPLE_INDEX_BASE, field->is_nullable))
+			return -1;
+		mp_next(&pos);
+		++field;
+		++i;
+	}
+	size_t off_stack_size =
+		format->max_path_tree_depth * sizeof(const char *);
+	const char **off_stack = region_alloc(&fiber()->gc, off_stack_size);
+	if (off_stack == NULL) {
+		diag_set(OutOfMemory, off_stack_size, "region_alloc",
+			"off_stack");
+		return -1;
+	}
+	uint32_t defined_field_count = MIN(field_count, format->field_count);
+	for (; i < defined_field_count; ++i, ++field) {
+		if (tuple_field_tree_parse_raw(field, pos, tuple, i, field_map,
+					       off_stack) != 0)
 			return -1;
-		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
-			field_map[field->offset_slot] =
-				(uint32_t) (pos - tuple);
-		}
 		mp_next(&pos);
 	}
 	return 0;
@@ -565,15 +926,7 @@ box_tuple_format_unref(box_tuple_format_t *format)
 	tuple_format_unref(format);
 }
 
-/**
- * Retrieve field data by JSON path.
- * @param field Pointer to msgpack with data.
- * @param path The path to process.
- * @param path_len The length of the @path.
- * @retval 0 On success.
- * @retval >0 On path parsing error, invalid character position.
- */
-static inline int
+int
 tuple_field_by_relative_path(const char **field, const char *path,
 			     uint32_t path_len)
 {
@@ -677,3 +1030,38 @@ error:
 		 tt_sprintf("error in path on position %d", rc));
 	return -1;
 }
+
+const char *
+tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
+			const uint32_t *field_map, struct key_part *part)
+{
+	if (likely(part->path == NULL))
+		return tuple_field_raw(format, data, field_map, part->fieldno);
+
+	struct tuple_field *root_field =
+		unlikely(part->fieldno < format->field_count) ?
+		(struct tuple_field *)&format->fields[part->fieldno] : NULL;
+	struct tuple_field *field =
+		unlikely(root_field == NULL) ? NULL:
+		tuple_field_tree_lookup(root_field, part->path, part->path_len);
+	if (unlikely(field == NULL)) {
+		/*
+		 * Legacy tuple having no field map for JSON
+		 * index require full path parse.
+		 */
+		const char *field_raw =
+			tuple_field_raw(format, data, field_map, part->fieldno);
+		if (unlikely(field_raw == NULL))
+			return NULL;
+		if (tuple_field_by_relative_path(&field_raw, part->path,
+						 part->path_len) != 0)
+			return NULL;
+		return field_raw;
+	}
+	int32_t offset_slot = field->offset_slot;
+	assert(offset_slot < 0);
+	assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
+	if (unlikely(field_map[offset_slot] == 0))
+		return NULL;
+	return data + field_map[offset_slot];
+}
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 894620f..599e768 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -35,6 +35,7 @@
 #include "field_def.h"
 #include "errinj.h"
 #include "tuple_dictionary.h"
+#include "json/tree.h"
 
 #if defined(__cplusplus)
 extern "C" {
@@ -108,6 +109,10 @@ struct tuple_field {
 	bool is_key_part;
 	/** True, if a field can store NULL. */
 	bool is_nullable;
+	/** JSON path tree max depth. */
+	uint32_t path_tree_depth;
+	/** JSON root path tree node for registered indexes. */
+	struct json_tree_node path_tree_node;
 };
 
 /**
@@ -157,6 +162,14 @@ struct tuple_format {
 	uint32_t min_field_count;
 	/* Length of 'fields' array. */
 	uint32_t field_count;
+	/** Size of format allocation. */
+	uint32_t allocation_size;
+	/**
+	 * Maximum depth of JSON path tree. This value is used to
+	 * allocate stack that used in pre order JSON path tree
+	 * traversal.
+	 */
+	uint32_t max_path_tree_depth;
 	/**
 	 * Shared names storage used by all formats of a space.
 	 */
@@ -388,6 +401,38 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
 }
 
 /**
+ * Retrieve field data by JSON path.
+ * @param field Pointer to msgpack with data.
+ * @param path The path to process.
+ * @param path_len The length of the @path.
+ * @retval 0 On success.
+ * @retval >0 On path parsing error, invalid character position.
+ */
+int
+tuple_field_by_relative_path(const char **field, const char *path,
+			     uint32_t path_len);
+
+/**
+ * Make tuple parsing doing JSON tree traversal. Make types
+ * validations. Init field_map for fields that have offset_slots.
+ * @param field Root field of path tree.
+ * @param field_raw Pointer to field data.
+ * @param tuple_raw Pointer to tuple data.
+ *                  May be NULL when field_map is NULL.
+ * @param fieldno Number of root field in tuple.
+ * @param field_map[out] Tuple field map to initialize.
+ *                       May be NULL if initialization is not
+ *                       required.
+ * @param off_stack Stack of intermediate nodes offsets.
+ * @retval 0  On success.
+ * @retval -1 On memory allocation or field parsing error.
+ */
+int
+tuple_field_tree_parse_raw(struct tuple_field *field, const char *field_raw,
+			   const char *tuple_raw, uint32_t fieldno,
+			   uint32_t *field_map, const char **off_stack);
+
+/**
  * Get tuple field by its path.
  * @param format Tuple format.
  * @param tuple MessagePack tuple's body.
@@ -414,11 +459,25 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
  * @param part Index part to use.
  * @retval Field data if the field exists or NULL.
  */
-static inline const char *
+const char *
 tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
-			const uint32_t *field_map, struct key_part *part)
+			const uint32_t *field_map, struct key_part *part);
+
+/**
+ * Lookup JSON path in tuple_field JSON paths tree.
+ * @param field Tuple field with JSON tree root.
+ * @param path JSON path string to lookup.
+ * @param path_len Length of path.
+ * @retval not NULL tuple_field pointer on success.
+ * @retval NULL On nothing has been found.
+ */
+static inline struct tuple_field *
+tuple_field_tree_lookup(struct tuple_field *field, const char *path,
+			uint32_t path_len)
 {
-	return tuple_field_raw(format, data, field_map, part->fieldno);
+	return json_tree_lookup_entry_by_path(field, path, path_len,
+					     struct tuple_field,
+					     path_tree_node);
 }
 
 #if defined(__cplusplus)
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index b394804..3486ce1 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -222,7 +222,7 @@ key_hash_slowpath(const char *key, struct key_def *key_def);
 
 void
 tuple_hash_func_set(struct key_def *key_def) {
-	if (key_def->is_nullable)
+	if (key_def->is_nullable || key_def->has_json_paths)
 		goto slowpath;
 	/*
 	 * Check that key_def defines sequential a key without holes
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index acfe86d..0a6ead9 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -975,6 +975,8 @@ vinyl_index_def_change_requires_rebuild(struct index *index,
 			return true;
 		if (!field_type1_contains_type2(new_part->type, old_part->type))
 			return true;
+		if (key_part_path_cmp(old_part, new_part) != 0)
+			return true;
 	}
 	return false;
 }
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index fc8ede5..f396705 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -711,7 +711,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src)
 				 "struct key_part_def");
 			goto err;
 		}
-		key_def_dump_parts(src->key_def, dst->key_parts);
+		if (key_def_dump_parts(pool, src->key_def, dst->key_parts) != 0)
+			goto err;
 		dst->key_part_count = src->key_def->part_count;
 		dst->key_def = NULL;
 	}
diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
index 7b704b8..9d5e220 100644
--- a/src/box/vy_point_lookup.c
+++ b/src/box/vy_point_lookup.c
@@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
 		const struct vy_read_view **rv,
 		struct tuple *key, struct tuple **ret)
 {
-	assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
-
 	*ret = NULL;
 	double start_time = ev_monotonic_now(loop());
 	int rc = 0;
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index 08b8fb6..9c47b75 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -29,6 +29,7 @@
  * SUCH DAMAGE.
  */
 
+#include "assoc.h"
 #include "vy_stmt.h"
 
 #include <stdlib.h>
@@ -349,6 +350,103 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
 	return replace;
 }
 
+/**
+ * Construct tuple field or calculate it's size. The fields_iov_ht
+ * is a hashtable that links leaf field records of field path
+ * tree and iovs that contain raw data. Function also fills the
+ * tuple field_map when write_data flag is set true.
+ */
+static void
+tuple_field_restore_raw(struct tuple_field *field, char *tuple_raw,
+			uint32_t *field_map, char **offset,
+			struct mh_i64ptr_t *fields_iov_ht, bool write_data)
+{
+	struct tuple_field *prev = NULL;
+	struct tuple_field *curr;
+	json_tree_foreach_entry_pre(curr, field, path_tree_node) {
+		struct json_tree_node *curr_node = &curr->path_tree_node;
+		uint32_t children_count =
+			json_tree_node_children_count(curr_node);
+		struct tuple_field *parent =
+			curr_node->parent == NULL ? NULL :
+			json_tree_entry(curr_node->parent, struct tuple_field,
+					path_tree_node);
+		if (parent != NULL && parent->type == FIELD_TYPE_ARRAY){
+			/*
+			 * Fill unindexed array items with nulls.
+			 * Gaps size calculated as a difference
+			 * between sibling nodes.
+			 */
+			uint32_t nulls_count = curr_node->key.num - 1;
+			if (prev != parent) {
+				struct json_tree_node *prev_node =
+					rlist_prev_entry(curr_node, sibling);
+				assert(prev_node != NULL);
+				nulls_count -= prev_node->key.num;
+			}
+			for (uint32_t i = 0; i < nulls_count; i++) {
+				*offset = !write_data ?
+					  (*offset += mp_sizeof_nil()) :
+					  mp_encode_nil(*offset);
+			}
+		} else if (parent != NULL && parent->type == FIELD_TYPE_MAP) {
+			const char *str = curr_node->key.str;
+			uint32_t len = curr_node->key.len;
+			*offset = !write_data ?
+				  (*offset += mp_sizeof_str(len)) :
+				  mp_encode_str(*offset, str, len);
+		}
+		/* Fill data. */
+		if (curr->type == FIELD_TYPE_ARRAY) {
+			/*
+			 * As numeric records list is sorted in
+			 * ascending order in JSON tree, last
+			 * entry contain the biggest number
+			 * matching the size of the array.
+			 */
+			assert(children_count > 0);
+			struct json_tree_node *max_idx_node =
+				rlist_last_entry(&curr_node->children,
+						 struct json_tree_node,
+						 sibling);
+			uint32_t array_size = max_idx_node->key.num;
+			assert(array_size > 0);
+			*offset = !write_data ?
+				  (*offset += mp_sizeof_array(array_size)) :
+				  mp_encode_array(*offset, array_size);
+		} else if (curr->type == FIELD_TYPE_MAP) {
+			assert(children_count > 0);
+			*offset = !write_data ?
+				  (*offset += mp_sizeof_map(children_count)) :
+				  mp_encode_map(*offset, children_count);
+		} else {
+			/* Leaf record. */
+			assert(children_count == 0);
+			mh_int_t k = mh_i64ptr_find(fields_iov_ht,
+						    (uint64_t)curr, NULL);
+			struct iovec *iov =
+				k != mh_end(fields_iov_ht) ?
+				mh_i64ptr_node(fields_iov_ht, k)->val : NULL;
+			if (iov == NULL) {
+				*offset = !write_data ?
+					  (*offset += mp_sizeof_nil()) :
+					  mp_encode_nil(*offset);
+			} else {
+				uint32_t data_offset = *offset - tuple_raw;
+				int32_t slot = curr->offset_slot;
+				if (write_data) {
+					memcpy(*offset, iov->iov_base,
+					       iov->iov_len);
+					if (slot != TUPLE_OFFSET_SLOT_NIL)
+						field_map[slot] = data_offset;
+				}
+				*offset += iov->iov_len;
+			}
+		}
+		prev = curr;
+	}
+}
+
 static struct tuple *
 vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 			       const struct key_def *cmp_def,
@@ -357,51 +455,83 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	/* UPSERT can't be surrogate. */
 	assert(type != IPROTO_UPSERT);
 	struct region *region = &fiber()->gc;
+	struct tuple *stmt = NULL;
 
 	uint32_t field_count = format->index_field_count;
-	struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
+	uint32_t part_count = mp_decode_array(&key);
+	assert(part_count == cmp_def->part_count);
+	struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count);
 	if (iov == NULL) {
-		diag_set(OutOfMemory, sizeof(*iov) * field_count,
-			 "region", "iov for surrogate key");
+		diag_set(OutOfMemory, sizeof(*iov) * part_count, "region",
+			"iov for surrogate key");
 		return NULL;
 	}
-	memset(iov, 0, sizeof(*iov) * field_count);
-	uint32_t part_count = mp_decode_array(&key);
-	assert(part_count == cmp_def->part_count);
-	assert(part_count <= field_count);
-	uint32_t nulls_count = field_count - cmp_def->part_count;
-	uint32_t bsize = mp_sizeof_array(field_count) +
-			 mp_sizeof_nil() * nulls_count;
-	for (uint32_t i = 0; i < part_count; ++i) {
-		const struct key_part *part = &cmp_def->parts[i];
+	/* Hastable linking leaf field and corresponding iov. */
+	struct mh_i64ptr_t *fields_iov_ht = mh_i64ptr_new();
+	if (fields_iov_ht == NULL) {
+		diag_set(OutOfMemory, sizeof(struct mh_i64ptr_t),
+			 "mh_i64ptr_new", "fields_iov_ht");
+		return NULL;
+	}
+	if (mh_i64ptr_reserve(fields_iov_ht, part_count, NULL) != 0) {
+		diag_set(OutOfMemory, part_count, "mh_i64ptr_reserve",
+			 "fields_iov_ht");
+		goto end;
+	}
+	uint32_t bsize = mp_sizeof_array(field_count);
+	uint32_t nulls_count = field_count;
+	memset(iov, 0, sizeof(*iov) * part_count);
+	const struct key_part *part = cmp_def->parts;
+	for (uint32_t i = 0; i < part_count; ++i, ++part) {
 		assert(part->fieldno < field_count);
 		const char *svp = key;
-		iov[part->fieldno].iov_base = (char *) key;
+		iov[i].iov_base = (char *) key;
 		mp_next(&key);
-		iov[part->fieldno].iov_len = key - svp;
-		bsize += key - svp;
+		iov[i].iov_len = key - svp;
+		struct tuple_field *field;
+		if (part->path == NULL) {
+			field = &format->fields[part->fieldno];
+			--nulls_count;
+		} else {
+			struct tuple_field *root_field =
+				&format->fields[part->fieldno];
+			field = tuple_field_tree_lookup(root_field, part->path,
+							part->path_len);
+			assert(field != NULL);
+		}
+		struct mh_i64ptr_node_t node = {(uint64_t)field, &iov[i]};
+		mh_int_t k = mh_i64ptr_put(fields_iov_ht, &node, NULL, NULL);
+		if (unlikely(k == mh_end(fields_iov_ht))) {
+			diag_set(OutOfMemory, part_count, "mh_i64ptr_put",
+				 "fields_iov_ht");
+			goto end;
+		}
+		k = mh_i64ptr_find(fields_iov_ht, (uint64_t)field, NULL);
+		assert(k != mh_end(fields_iov_ht));
+	}
+	/* Calculate tuple size to make allocation. */
+	for (uint32_t i = 0; i < field_count; ++i) {
+		char *data = NULL;
+		tuple_field_restore_raw(&format->fields[i], NULL, NULL, &data,
+					fields_iov_ht, false);
+		bsize += data - (char *)NULL;
 	}
 
-	struct tuple *stmt = vy_stmt_alloc(format, bsize);
+	stmt = vy_stmt_alloc(format, bsize);
 	if (stmt == NULL)
-		return NULL;
+		goto end;
 
 	char *raw = (char *) tuple_data(stmt);
 	uint32_t *field_map = (uint32_t *) raw;
 	char *wpos = mp_encode_array(raw, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
-		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
-			field_map[field->offset_slot] = wpos - raw;
-		if (iov[i].iov_base == NULL) {
-			wpos = mp_encode_nil(wpos);
-		} else {
-			memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
-			wpos += iov[i].iov_len;
-		}
+		tuple_field_restore_raw(&format->fields[i], raw, field_map,
+					&wpos, fields_iov_ht, true);
 	}
-	assert(wpos == raw + bsize);
+	assert(wpos <= raw + bsize);
 	vy_stmt_set_type(stmt, type);
+end:
+	mh_i64ptr_delete(fields_iov_ht);
 	return stmt;
 }
 
diff --git a/test/box/misc.result b/test/box/misc.result
index 4ee4797..b1fda78 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -350,7 +350,7 @@ t;
   - 'box.error.CANT_CREATE_COLLATION : 150'
   - 'box.error.USER_EXISTS : 46'
   - 'box.error.WAL_IO : 40'
-  - 'box.error.PROC_RET : 21'
+  - 'box.error.RTREE_RECT : 101'
   - 'box.error.PRIV_GRANTED : 89'
   - 'box.error.CREATE_SPACE : 9'
   - 'box.error.GRANT : 88'
@@ -361,7 +361,7 @@ t;
   - 'box.error.VINYL_MAX_TUPLE_SIZE : 139'
   - 'box.error.LOAD_FUNCTION : 99'
   - 'box.error.INVALID_XLOG : 74'
-  - 'box.error.READ_VIEW_ABORTED : 130'
+  - 'box.error.PRIV_NOT_GRANTED : 91'
   - 'box.error.TRANSACTION_CONFLICT : 97'
   - 'box.error.GUEST_USER_PASSWORD : 96'
   - 'box.error.PROC_C : 102'
@@ -371,8 +371,8 @@ t;
   - 'box.error.DROP_FUNCTION : 71'
   - 'box.error.CFG : 59'
   - 'box.error.NO_SUCH_FIELD : 37'
-  - 'box.error.CONNECTION_TO_SELF : 117'
-  - 'box.error.FUNCTION_MAX : 54'
+  - 'box.error.MORE_THAN_ONE_TUPLE : 41'
+  - 'box.error.PROC_LUA : 32'
   - 'box.error.ILLEGAL_PARAMS : 1'
   - 'box.error.PARTIAL_KEY : 136'
   - 'box.error.SAVEPOINT_NO_TRANSACTION : 114'
@@ -400,34 +400,35 @@ t;
   - 'box.error.UPDATE_ARG_TYPE : 26'
   - 'box.error.CROSS_ENGINE_TRANSACTION : 81'
   - 'box.error.FORMAT_MISMATCH_INDEX_PART : 27'
-  - 'box.error.FUNCTION_TX_ACTIVE : 30'
   - 'box.error.injection : table: <address>
-  - 'box.error.ITERATOR_TYPE : 72'
+  - 'box.error.FUNCTION_TX_ACTIVE : 30'
+  - 'box.error.IDENTIFIER : 70'
+  - 'box.error.TRANSACTION_YIELD : 154'
   - 'box.error.NO_SUCH_ENGINE : 57'
   - 'box.error.COMMIT_IN_SUB_STMT : 122'
-  - 'box.error.TRANSACTION_YIELD : 154'
-  - 'box.error.UNSUPPORTED : 5'
-  - 'box.error.LAST_DROP : 15'
+  - 'box.error.RELOAD_CFG : 58'
   - 'box.error.SPACE_FIELD_IS_DUPLICATE : 149'
+  - 'box.error.LAST_DROP : 15'
+  - 'box.error.SEQUENCE_OVERFLOW : 147'
   - 'box.error.DECOMPRESSION : 124'
   - 'box.error.CREATE_SEQUENCE : 142'
   - 'box.error.CREATE_USER : 43'
-  - 'box.error.SEQUENCE_OVERFLOW : 147'
+  - 'box.error.FUNCTION_MAX : 54'
   - 'box.error.INSTANCE_UUID_MISMATCH : 66'
-  - 'box.error.RELOAD_CFG : 58'
+  - 'box.error.TUPLE_FORMAT_LIMIT : 16'
   - 'box.error.SYSTEM : 115'
   - 'box.error.KEY_PART_IS_TOO_LONG : 118'
-  - 'box.error.MORE_THAN_ONE_TUPLE : 41'
   - 'box.error.TRUNCATE_SYSTEM_SPACE : 137'
-  - 'box.error.NO_SUCH_SAVEPOINT : 61'
   - 'box.error.VY_QUOTA_TIMEOUT : 135'
-  - 'box.error.PRIV_NOT_GRANTED : 91'
+  - 'box.error.NO_SUCH_SAVEPOINT : 61'
+  - 'box.error.PROTOCOL : 104'
+  - 'box.error.READ_VIEW_ABORTED : 130'
   - 'box.error.WRONG_INDEX_OPTIONS : 108'
   - 'box.error.INVALID_VYLOG_FILE : 133'
   - 'box.error.INDEX_FIELD_COUNT_LIMIT : 127'
-  - 'box.error.BEFORE_REPLACE_RET : 53'
+  - 'box.error.DATA_STRUCTURE_MISMATCH : 55'
   - 'box.error.USER_MAX : 56'
-  - 'box.error.INVALID_MSGPACK : 20'
+  - 'box.error.BEFORE_REPLACE_RET : 53'
   - 'box.error.TUPLE_NOT_ARRAY : 22'
   - 'box.error.KEY_PART_COUNT : 31'
   - 'box.error.ALTER_SPACE : 12'
@@ -436,47 +437,47 @@ t;
   - 'box.error.DROP_SEQUENCE : 144'
   - 'box.error.INVALID_XLOG_ORDER : 76'
   - 'box.error.UNKNOWN_REQUEST_TYPE : 48'
-  - 'box.error.PROC_LUA : 32'
+  - 'box.error.PROC_RET : 21'
   - 'box.error.SUB_STMT_MAX : 121'
   - 'box.error.ROLE_NOT_GRANTED : 92'
   - 'box.error.SPACE_EXISTS : 10'
-  - 'box.error.UPDATE_INTEGER_OVERFLOW : 95'
+  - 'box.error.UNSUPPORTED : 5'
   - 'box.error.MIN_FIELD_COUNT : 39'
   - 'box.error.NO_SUCH_SPACE : 36'
   - 'box.error.WRONG_INDEX_PARTS : 107'
   - 'box.error.REPLICASET_UUID_MISMATCH : 63'
   - 'box.error.UPDATE_FIELD : 29'
   - 'box.error.INDEX_EXISTS : 85'
-  - 'box.error.SPLICE : 25'
+  - 'box.error.DROP_SPACE : 11'
   - 'box.error.COMPRESSION : 119'
   - 'box.error.INVALID_ORDER : 68'
-  - 'box.error.UNKNOWN : 0'
+  - 'box.error.SPLICE : 25'
   - 'box.error.NO_SUCH_GROUP : 155'
-  - 'box.error.TUPLE_FORMAT_LIMIT : 16'
+  - 'box.error.INVALID_MSGPACK : 20'
   - 'box.error.DROP_PRIMARY_KEY : 17'
   - 'box.error.NULLABLE_PRIMARY : 152'
   - 'box.error.NO_SUCH_SEQUENCE : 145'
   - 'box.error.INJECTION : 8'
   - 'box.error.INVALID_UUID : 64'
-  - 'box.error.IDENTIFIER : 70'
+  - 'box.error.NO_SUCH_ROLE : 82'
   - 'box.error.TIMEOUT : 78'
+  - 'box.error.ITERATOR_TYPE : 72'
   - 'box.error.REPLICA_MAX : 73'
-  - 'box.error.NO_SUCH_ROLE : 82'
-  - 'box.error.DROP_SPACE : 11'
+  - 'box.error.UNKNOWN : 0'
   - 'box.error.MISSING_REQUEST_FIELD : 69'
   - 'box.error.MISSING_SNAPSHOT : 93'
   - 'box.error.WRONG_SPACE_OPTIONS : 111'
   - 'box.error.READONLY : 7'
-  - 'box.error.RTREE_RECT : 101'
+  - 'box.error.UPDATE_INTEGER_OVERFLOW : 95'
   - 'box.error.UPSERT_UNIQUE_SECONDARY_KEY : 105'
   - 'box.error.NO_CONNECTION : 77'
   - 'box.error.UNSUPPORTED_PRIV : 98'
   - 'box.error.WRONG_SCHEMA_VERSION : 109'
   - 'box.error.ROLLBACK_IN_SUB_STMT : 123'
-  - 'box.error.PROTOCOL : 104'
-  - 'box.error.INVALID_XLOG_TYPE : 125'
-  - 'box.error.INDEX_PART_TYPE_MISMATCH : 24'
   - 'box.error.UNSUPPORTED_INDEX_FEATURE : 112'
+  - 'box.error.CONNECTION_TO_SELF : 117'
+  - 'box.error.INDEX_PART_TYPE_MISMATCH : 24'
+  - 'box.error.INVALID_XLOG_TYPE : 125'
 ...
 test_run:cmd("setopt delimiter ''");
 ---
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 35c700e..e551f1a 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -954,6 +954,377 @@ type(tuple:tomap().fourth)
 s:drop()
 ---
 ...
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+box.cfg()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
+    part is indexed twice'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+- error: 'Wrong index options (field 2): ''path'' must be string'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+    ''map'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+    ''array'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field 3 has type 'map' in one index, but type 'string' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field 3 has type 'map' in one index, but type 'array' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+---
+- error: 'Wrong index options (field 3): invalid JSON path ''FIO....fname'': path
+    has invalid structure (error at position 5)'
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected string'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+---
+- error: 'Tuple doesn''t math document structure: invalid field 3 document content
+    ''{"town": "London", "FIO": {"fname": "James"}}'': map doesn''t contain key ''sname''
+    defined in index'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+  4, 5]
+...
+idx:select()
+---
+- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+  - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+    4, 5]
+...
+idx:min()
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:max()
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+  4, 5]
+...
+s:drop()
+---
+...
+s = box.schema.create_space('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[2]'}
+---
+...
+pk = s:create_index('pk', {parts = parts})
+---
+...
+s:insert{{1, 2}, 3}
+---
+- [[1, 2], 3]
+...
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+---
+...
+s:get(2)
+---
+- [[1, 2], 8]
+...
+s:drop()
+---
+...
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk = s:create_index('primary', { type = 'tree' })
+---
+...
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+---
+- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5]
+...
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+---
+- [4, 7, {'town': 'London', 'FIO': [1, 2, 3]}, 4, 5]
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+_ = s:delete(1)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+_ = s:delete(2)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+_ = s:delete(4)
+---
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+---
+- error: Field 3 has type 'number' in one index, but type 'string' in another
+...
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+...
+assert(idx2 ~= nil)
+---
+- true
+...
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+---
+...
+idx:select()
+---
+- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+  - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:min()
+---
+- [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+...
+idx:max()
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:drop()
+---
+...
+s:drop()
+---
+...
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'str', path='[3][2].a'}
+---
+...
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+---
+...
+parts[3] = {2, 'str', path = '[2].d[1]'}
+---
+...
+pk = s:create_index('primary', { type = 'tree', parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+---
+- [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6,
+  [1, 2, 3]]
+...
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- error: Duplicate key exists in unique index 'primary' in space 'withdata'
+...
+parts = {}
+---
+...
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+---
+...
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+---
+...
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+---
+...
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+---
+- error: 'Tuple doesn''t math document structure: invalid field 4 document content
+    ''[]'': array size 1 is less than size of 2 defined in index'
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+---
+...
+parts[2] = {3, 'unsigned'}
+---
+...
+crosspart_idx = s:create_index('crosspart', { parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+---
+...
+num_idx = s:create_index('numeric', {parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:get(2)
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+num_idx:select()
+---
+- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+      9, 2, 3]]
+  - [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}],
+    6, [1, 2, 3]]
+  - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+      0]]
+...
+num_idx:max()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:min()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+assert(crosspart_idx:max() == num_idx:max())
+---
+- true
+...
+assert(crosspart_idx:min() == num_idx:min())
+---
+- true
+...
+trap_idx:max()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+trap_idx:min()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+s:drop()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk_simplified = s:create_index('primary', { type = 'tree',  parts = {{1, 'unsigned'}}})
+---
+...
+assert(pk_simplified.path == box.NULL)
+---
+- true
+...
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+---
+...
+s:insert{31, {a = 1, aa = -1}}
+---
+- [31, {'a': 1, 'aa': -1}]
+...
+s:insert{22, {a = 2, aa = -2}}
+---
+- [22, {'a': 2, 'aa': -2}]
+...
+s:insert{13, {a = 3, aa = -3}}
+---
+- [13, {'a': 3, 'aa': -3}]
+...
+idx:select()
+---
+- - [31, {'a': 1, 'aa': -1}]
+  - [22, {'a': 2, 'aa': -2}]
+  - [13, {'a': 3, 'aa': -3}]
+...
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+---
+...
+idx:select()
+---
+- - [13, {'a': 3, 'aa': -3}]
+  - [22, {'a': 2, 'aa': -2}]
+  - [31, {'a': 1, 'aa': -1}]
+...
+s:drop()
+---
+...
 engine = nil
 ---
 ...
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index edc3dab..e865c67 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -312,5 +312,111 @@ tuple:tomap().fourth
 type(tuple:tomap().fourth)
 s:drop()
 
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+box.cfg()
+s = box.schema.space.create('withdata', {engine = engine})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+assert(idx ~= nil)
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+s:drop()
+
+s = box.schema.create_space('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'unsigned', path='[2]'}
+pk = s:create_index('pk', {parts = parts})
+s:insert{{1, 2}, 3}
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+s:get(2)
+s:drop()
+
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+pk = s:create_index('primary', { type = 'tree' })
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(1)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(2)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(4)
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+assert(idx ~= nil)
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+assert(idx2 ~= nil)
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+idx:drop()
+s:drop()
+
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'str', path='[3][2].a'}
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+parts[3] = {2, 'str', path = '[2].d[1]'}
+pk = s:create_index('primary', { type = 'tree', parts =  parts})
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+parts = {}
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+parts[2] = {3, 'unsigned'}
+crosspart_idx = s:create_index('crosspart', { parts =  parts})
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+num_idx = s:create_index('numeric', {parts =  parts})
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+num_idx:get(2)
+num_idx:select()
+num_idx:max()
+num_idx:min()
+assert(crosspart_idx:max() == num_idx:max())
+assert(crosspart_idx:min() == num_idx:min())
+trap_idx:max()
+trap_idx:min()
+s:drop()
+
+s = box.schema.space.create('withdata', {engine = engine})
+pk_simplified = s:create_index('primary', { type = 'tree',  parts = {{1, 'unsigned'}}})
+assert(pk_simplified.path == box.NULL)
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+s:insert{31, {a = 1, aa = -1}}
+s:insert{22, {a = 2, aa = -2}}
+s:insert{13, {a = 3, aa = -3}}
+idx:select()
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+idx:select()
+s:drop()
+
 engine = nil
 test_run = nil
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 11/14] box: introduce has_json_paths flag in templates
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 01/14] box: refactor key_def_find routine Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 10/14] box: introduce JSON indexes Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 12/14] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Introduced has_json_path flag for compare, hash and extract
function(that are really hot) to make possible do not look to
path field for flat indexes without any JSON paths.

Part of #1012.
---
 src/box/tuple_compare.cc     | 112 +++++++++++++++++++++++++++++++------------
 src/box/tuple_extract_key.cc | 104 ++++++++++++++++++++++++++--------------
 src/box/tuple_hash.cc        |  45 ++++++++++++-----
 3 files changed, 182 insertions(+), 79 deletions(-)

diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 9bff340..5a3a968 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -458,11 +458,12 @@ tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	return i;
 }
 
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_path>
 static inline int
 tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		       struct key_def *key_def)
 {
+	assert(has_json_path == key_def->has_json_paths);
 	assert(!has_optional_parts || is_nullable);
 	assert(is_nullable == key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
@@ -508,10 +509,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		end = part + key_def->part_count;
 
 	for (; part < end; part++) {
-		field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
-						  field_map_a, part);
-		field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
-						  field_map_b, part);
+		if (!has_json_path) {
+			field_a = tuple_field_raw(format_a, tuple_a_raw,
+						  field_map_a,
+						  part->fieldno);
+			field_b = tuple_field_raw(format_b, tuple_b_raw,
+						  field_map_b,
+						  part->fieldno);
+		} else {
+			field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+							  field_map_a, part);
+			field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+							  field_map_b, part);
+		}
 		assert(has_optional_parts ||
 		       (field_a != NULL && field_b != NULL));
 		if (! is_nullable) {
@@ -558,10 +568,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	 */
 	end = key_def->parts + key_def->part_count;
 	for (; part < end; ++part) {
-		field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
-						  field_map_a, part);
-		field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
-						  field_map_b, part);
+		if (!has_json_path) {
+			field_a = tuple_field_raw(format_a, tuple_a_raw,
+						  field_map_a,
+						  part->fieldno);
+			field_b = tuple_field_raw(format_b, tuple_b_raw,
+						  field_map_b,
+						  part->fieldno);
+		} else {
+			field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+							  field_map_a, part);
+			field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+							  field_map_b, part);
+		}
 		/*
 		 * Extended parts are primary, and they can not
 		 * be absent or be NULLs.
@@ -575,11 +594,12 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	return 0;
 }
 
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_paths>
 static inline int
 tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 				uint32_t part_count, struct key_def *key_def)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(!has_optional_parts || is_nullable);
 	assert(is_nullable == key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
@@ -591,9 +611,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	const uint32_t *field_map = tuple_field_map(tuple);
 	enum mp_type a_type, b_type;
 	if (likely(part_count == 1)) {
-		const char *field =
-			tuple_field_by_part_raw(format, tuple_raw, field_map,
-						part);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, tuple_raw, field_map,
+						part->fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, tuple_raw,
+							field_map, part);
+		}
 		if (! is_nullable) {
 			return tuple_compare_field(field, key, part->type,
 						   part->coll);
@@ -617,9 +642,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	struct key_part *end = part + part_count;
 	int rc;
 	for (; part < end; ++part, mp_next(&key)) {
-		const char *field =
-			tuple_field_by_part_raw(format, tuple_raw,
-						field_map, part);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, tuple_raw, field_map,
+						part->fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, tuple_raw,
+							field_map, part);
+		}
 		if (! is_nullable) {
 			rc = tuple_compare_field(field, key, part->type,
 						 part->coll);
@@ -1012,19 +1042,31 @@ static const comparator_signature cmp_arr[] = {
 
 #undef COMPARATOR
 
+static const tuple_compare_t compare_slowpath_funcs[] = {
+	tuple_compare_slowpath<false, false, false>,
+	tuple_compare_slowpath<true, false, false>,
+	tuple_compare_slowpath<false, true, false>,
+	tuple_compare_slowpath<true, true, false>,
+	tuple_compare_slowpath<false, false, true>,
+	tuple_compare_slowpath<true, false, true>,
+	tuple_compare_slowpath<false, true, true>,
+	tuple_compare_slowpath<true, true, true>
+};
+
 tuple_compare_t
 tuple_compare_create(const struct key_def *def)
 {
+	int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+			   2 * (def->has_optional_parts ? 1 : 0) +
+			   4 * (def->has_json_paths ? 1 : 0);
 	if (def->is_nullable) {
 		if (key_def_is_sequential(def)) {
 			if (def->has_optional_parts)
 				return tuple_compare_sequential<true, true>;
 			else
 				return tuple_compare_sequential<true, false>;
-		} else if (def->has_optional_parts) {
-			return tuple_compare_slowpath<true, true>;
 		} else {
-			return tuple_compare_slowpath<true, false>;
+			return compare_slowpath_funcs[cmp_func_idx];
 		}
 	}
 	assert(! def->has_optional_parts);
@@ -1044,10 +1086,9 @@ tuple_compare_create(const struct key_def *def)
 				return cmp_arr[k].f;
 		}
 	}
-	if (key_def_is_sequential(def))
-		return tuple_compare_sequential<false, false>;
-	else
-		return tuple_compare_slowpath<false, false>;
+	return key_def_is_sequential(def) ?
+	       tuple_compare_sequential<false, false> :
+	       compare_slowpath_funcs[cmp_func_idx];
 }
 
 /* }}} tuple_compare */
@@ -1229,9 +1270,23 @@ static const comparator_with_key_signature cmp_wk_arr[] = {
 
 #undef KEY_COMPARATOR
 
+static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = {
+	tuple_compare_with_key_slowpath<false, false, false>,
+	tuple_compare_with_key_slowpath<true, false, false>,
+	tuple_compare_with_key_slowpath<false, true, false>,
+	tuple_compare_with_key_slowpath<true, true, false>,
+	tuple_compare_with_key_slowpath<false, false, true>,
+	tuple_compare_with_key_slowpath<true, false, true>,
+	tuple_compare_with_key_slowpath<false, true, true>,
+	tuple_compare_with_key_slowpath<true, true, true>
+};
+
 tuple_compare_with_key_t
 tuple_compare_with_key_create(const struct key_def *def)
 {
+	int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+			   2 * (def->has_optional_parts ? 1 : 0) +
+			   4 * (def->has_json_paths ? 1 : 0);
 	if (def->is_nullable) {
 		if (key_def_is_sequential(def)) {
 			if (def->has_optional_parts) {
@@ -1241,10 +1296,8 @@ tuple_compare_with_key_create(const struct key_def *def)
 				return tuple_compare_with_key_sequential<true,
 									 false>;
 			}
-		} else if (def->has_optional_parts) {
-			return tuple_compare_with_key_slowpath<true, true>;
 		} else {
-			return tuple_compare_with_key_slowpath<true, false>;
+			return compare_with_key_slowpath_funcs[cmp_func_idx];
 		}
 	}
 	assert(! def->has_optional_parts);
@@ -1267,10 +1320,9 @@ tuple_compare_with_key_create(const struct key_def *def)
 				return cmp_wk_arr[k].f;
 		}
 	}
-	if (key_def_is_sequential(def))
-		return tuple_compare_with_key_sequential<false, false>;
-	else
-		return tuple_compare_with_key_slowpath<false, false>;
+	return key_def_is_sequential(def) ?
+	       tuple_compare_with_key_sequential<false, false> :
+	       compare_with_key_slowpath_funcs[cmp_func_idx];
 }
 
 /* }}} tuple_compare_with_key */
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index cb4ae71..f322572 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -5,13 +5,18 @@
 enum { MSGPACK_NULL = 0xc0 };
 
 /** True if key part i and i+1 are sequential. */
+template <bool has_json_paths>
 static inline bool
 key_def_parts_are_sequential(const struct key_def *def, int i)
 {
 	uint32_t fieldno1 = def->parts[i].fieldno + 1;
 	uint32_t fieldno2 = def->parts[i + 1].fieldno;
-	return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
-	       def->parts[i + 1].path == NULL;
+	if (!has_json_paths) {
+		return fieldno1 == fieldno2;
+	} else {
+		return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+		       def->parts[i + 1].path == NULL;
+	}
 }
 
 /** True, if a key con contain two or more parts in sequence. */
@@ -19,7 +24,7 @@ static bool
 key_def_contains_sequential_parts(const struct key_def *def)
 {
 	for (uint32_t i = 0; i < def->part_count - 1; ++i) {
-		if (key_def_parts_are_sequential(def, i))
+		if (key_def_parts_are_sequential<true>(def, i))
 			return true;
 	}
 	return false;
@@ -99,11 +104,13 @@ tuple_extract_key_sequential(const struct tuple *tuple, struct key_def *key_def,
  * General-purpose implementation of tuple_extract_key()
  * @copydoc tuple_extract_key()
  */
-template <bool contains_sequential_parts, bool has_optional_parts>
+template <bool contains_sequential_parts, bool has_optional_parts,
+	  bool has_json_paths>
 static char *
 tuple_extract_key_slowpath(const struct tuple *tuple,
 			   struct key_def *key_def, uint32_t *key_size)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(!has_optional_parts || key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	assert(contains_sequential_parts ==
@@ -118,9 +125,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 
 	/* Calculate the key size. */
 	for (uint32_t i = 0; i < part_count; ++i) {
-		const char *field =
-			tuple_field_by_part_raw(format, data, field_map,
-						&key_def->parts[i]);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, data, field_map,
+						key_def->parts[i].fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, data, field_map,
+							&key_def->parts[i]);
+		}
 		if (has_optional_parts && field == NULL) {
 			bsize += mp_sizeof_nil();
 			continue;
@@ -133,7 +145,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (!key_def_parts_are_sequential(key_def, i)) {
+				if (!key_def_parts_are_sequential
+				        <has_json_paths>(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -159,9 +172,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 	}
 	char *key_buf = mp_encode_array(key, part_count);
 	for (uint32_t i = 0; i < part_count; ++i) {
-		const char *field =
-			tuple_field_by_part_raw(format, data, field_map,
-						&key_def->parts[i]);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, data, field_map,
+						key_def->parts[i].fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, data, field_map,
+							&key_def->parts[i]);
+		}
 		if (has_optional_parts && field == NULL) {
 			key_buf = mp_encode_nil(key_buf);
 			continue;
@@ -174,7 +192,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (!key_def_parts_are_sequential(key_def, i)) {
+				if (!key_def_parts_are_sequential
+				        <has_json_paths>(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -207,11 +226,12 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
  * General-purpose version of tuple_extract_key_raw()
  * @copydoc tuple_extract_key_raw()
  */
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
 static char *
 tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 			       struct key_def *key_def, uint32_t *key_size)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(!has_optional_parts || key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	assert(mp_sizeof_nil() == 1);
@@ -239,7 +259,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 		uint32_t null_count = 0;
 		for (; i < key_def->part_count - 1; i++) {
-			if (!key_def_parts_are_sequential(key_def, i))
+			if (!key_def_parts_are_sequential
+			        <has_json_paths>(key_def, i))
 				break;
 		}
 		const struct key_part *part = &key_def->parts[i];
@@ -314,6 +335,17 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 	return key;
 }
 
+static const tuple_extract_key_t extract_key_slowpath_funcs[] = {
+	tuple_extract_key_slowpath<false, false, false>,
+	tuple_extract_key_slowpath<true, false, false>,
+	tuple_extract_key_slowpath<false, true, false>,
+	tuple_extract_key_slowpath<true, true, false>,
+	tuple_extract_key_slowpath<false, false, true>,
+	tuple_extract_key_slowpath<true, false, true>,
+	tuple_extract_key_slowpath<false, true, true>,
+	tuple_extract_key_slowpath<true, true, true>
+};
+
 /**
  * Initialize tuple_extract_key() and tuple_extract_key_raw()
  */
@@ -334,32 +366,30 @@ tuple_extract_key_set(struct key_def *key_def)
 				tuple_extract_key_sequential_raw<false>;
 		}
 	} else {
-		if (key_def->has_optional_parts) {
-			assert(key_def->is_nullable);
-			if (key_def_contains_sequential_parts(key_def)) {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, true>;
-			} else {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false, true>;
-			}
-		} else {
-			if (key_def_contains_sequential_parts(key_def)) {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, false>;
-			} else {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false,
-								   false>;
-			}
-		}
+		int func_idx =
+			(key_def_contains_sequential_parts(key_def) ? 1 : 0) +
+			2 * (key_def->has_optional_parts ? 1 : 0) +
+			4 * (key_def->has_json_paths ? 1 : 0);
+		key_def->tuple_extract_key =
+			extract_key_slowpath_funcs[func_idx];
+		assert(!key_def->has_optional_parts || key_def->is_nullable);
 	}
 	if (key_def->has_optional_parts) {
 		assert(key_def->is_nullable);
-		key_def->tuple_extract_key_raw =
-			tuple_extract_key_slowpath_raw<true>;
+		if (key_def->has_json_paths) {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<true, true>;
+		} else {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<true, false>;
+		}
 	} else {
-		key_def->tuple_extract_key_raw =
-			tuple_extract_key_slowpath_raw<false>;
+		if (key_def->has_json_paths) {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<false, true>;
+		} else {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<false, false>;
+		}
 	}
 }
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index 3486ce1..8ede290 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -213,7 +213,7 @@ static const hasher_signature hash_arr[] = {
 
 #undef HASHER
 
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def);
 
@@ -256,10 +256,17 @@ tuple_hash_func_set(struct key_def *key_def) {
 	}
 
 slowpath:
-	if (key_def->has_optional_parts)
-		key_def->tuple_hash = tuple_hash_slowpath<true>;
-	else
-		key_def->tuple_hash = tuple_hash_slowpath<false>;
+	if (key_def->has_optional_parts) {
+		if (key_def->has_json_paths)
+			key_def->tuple_hash = tuple_hash_slowpath<true, true>;
+		else
+			key_def->tuple_hash = tuple_hash_slowpath<true, false>;
+	} else {
+		if (key_def->has_json_paths)
+			key_def->tuple_hash = tuple_hash_slowpath<false, true>;
+		else
+			key_def->tuple_hash = tuple_hash_slowpath<false, false>;
+	}
 	key_def->key_hash = key_hash_slowpath;
 }
 
@@ -319,10 +326,11 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple,
 	return tuple_hash_field(ph1, pcarry, &field, part->coll);
 }
 
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	uint32_t h = HASH_SEED;
 	uint32_t carry = 0;
@@ -331,9 +339,13 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
 	struct tuple_format *format = tuple_format(tuple);
 	const char *tuple_raw = tuple_data(tuple);
 	const uint32_t *field_map = tuple_field_map(tuple);
-	const char *field =
-		tuple_field_by_part_raw(format, tuple_raw, field_map,
-					key_def->parts);
+	const char *field;
+	if (!has_json_paths) {
+		field = tuple_field(tuple, prev_fieldno);
+	} else {
+		field = tuple_field_by_part_raw(format, tuple_raw, field_map,
+						key_def->parts);
+	}
 	const char *end = (char *)tuple + tuple_size(tuple);
 	if (has_optional_parts && field == NULL) {
 		total_size += tuple_hash_null(&h, &carry);
@@ -347,9 +359,18 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
 		 * need of tuple_field
 		 */
 		if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) {
-			struct key_part *part = &key_def->parts[part_id];
-			field = tuple_field_by_part_raw(format, tuple_raw,
-							field_map, part);
+			if (!has_json_paths) {
+				field = tuple_field(tuple,
+						    key_def->parts[part_id].
+						    fieldno);
+			} else {
+				struct key_part *part =
+					&key_def->parts[part_id];
+				field = tuple_field_by_part_raw(format,
+								tuple_raw,
+								field_map,
+								part);
+			}
 		}
 		if (has_optional_parts && (field == NULL || field >= end)) {
 			total_size += tuple_hash_null(&h, &carry);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 12/14] box: tune tuple_field_raw_by_path for indexed data
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (2 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 11/14] box: introduce has_json_paths flag in templates Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 13/14] box: introduce offset slot cache in key_part Kirill Shcherbatov
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

We don't need to parse tuple in tuple_field_raw_by_path if
required field has been indexed. We do path lookup in field
tree of JSON paths and return data by it's offset from field_map
instead of whole tuple parsing.

Part of #1012.
---
 src/box/tuple_format.c     | 30 +++++++++++++++++++++---------
 test/engine/tuple.result   |  5 +++++
 test/engine/tuple.test.lua |  2 ++
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 30deac5..4738eb9 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -981,15 +981,12 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		goto error;
 	switch(node.type) {
 	case JSON_PATH_NUM: {
-		int index = node.num;
-		if (index == 0) {
+		fieldno = node.num;
+		if (fieldno == 0) {
 			*field = NULL;
 			return 0;
 		}
-		index -= TUPLE_INDEX_BASE;
-		*field = tuple_field_raw(format, tuple, field_map, index);
-		if (*field == NULL)
-			return 0;
+		fieldno -= TUPLE_INDEX_BASE;
 		break;
 	}
 	case JSON_PATH_STR: {
@@ -1006,9 +1003,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			 */
 			name_hash = field_name_hash(node.str, node.len);
 		}
-		*field = tuple_field_raw_by_name(format, tuple, field_map,
-						 node.str, node.len, name_hash);
-		if (*field == NULL)
+		if (tuple_fieldno_by_name(format->dict, node.str, node.len,
+					  name_hash, &fieldno) != 0)
 			return 0;
 		break;
 	}
@@ -1017,6 +1013,22 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		*field = NULL;
 		return 0;
 	}
+	/* Optimize indexed JSON field data access. */
+	struct tuple_field *indexed_field =
+		unlikely(fieldno >= format->field_count) ? NULL :
+		tuple_field_tree_lookup(&format->fields[fieldno],
+					path + parser.offset,
+					path_len - parser.offset);
+	if (indexed_field != NULL &&
+	    indexed_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+		*field = tuple + field_map[indexed_field->offset_slot];
+		return 0;
+	}
+
+	/* No such field in index. Continue parsing JSON path. */
+	*field = tuple_field_raw(format, tuple, field_map, fieldno);
+	if (*field == NULL)
+		return 0;
 	rc = tuple_field_by_relative_path(field, path + parser.offset,
 					  path_len - parser.offset);
 	if (rc == 0)
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index e551f1a..1842420 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -1142,6 +1142,11 @@ assert(idx2 ~= nil)
 t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
 ---
 ...
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
+---
+- Agent
+...
 idx:select()
 ---
 - - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index e865c67..8be6505 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -366,6 +366,8 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["
 idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
 assert(idx2 ~= nil)
 t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
 idx:select()
 idx:min()
 idx:max()
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 13/14] box: introduce offset slot cache in key_part
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (3 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 12/14] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 14/14] box: specify indexes in user-friendly form Kirill Shcherbatov
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Same key_part could be used in different formats multiple
times, so different field->offset_slot would be allocated.
In most scenarios we work with series of tuples of same
format, and (in general) format lookup for field would be
expensive operation for JSON-paths defined in key_part.
New slot_cache field in key_part structure and epoch-based
mechanism to validate it actuality should be effective
approach to improve performance.

Part of #1012.
---
 src/box/alter.cc                |  7 +++--
 src/box/blackhole.c             |  5 ++--
 src/box/engine.h                | 11 ++++----
 src/box/key_def.c               | 12 +++++++--
 src/box/key_def.h               | 11 ++++++++
 src/box/memtx_engine.c          |  4 +--
 src/box/memtx_space.c           |  5 ++--
 src/box/memtx_space.h           |  2 +-
 src/box/schema.cc               |  4 +--
 src/box/space.c                 |  4 +--
 src/box/space.h                 |  8 +++---
 src/box/sysview.c               |  3 ++-
 src/box/tuple.c                 |  4 +--
 src/box/tuple_format.c          | 60 ++++++++++++++++++++++++++---------------
 src/box/tuple_format.h          |  6 ++++-
 src/box/vinyl.c                 |  7 ++---
 src/box/vy_lsm.c                | 40 +++++++++++++++++++++++++--
 test/unit/vy_iterators_helper.c |  6 ++---
 test/unit/vy_mem.c              |  2 +-
 test/unit/vy_point_lookup.c     |  2 +-
 20 files changed, 145 insertions(+), 58 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 4ac44c2..ded5f5b 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -883,7 +883,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
 	 * Create a new (empty) space for the new definition.
 	 * Sic: the triggers are not moved over yet.
 	 */
-	alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
+	alter->new_space =
+		space_new_xc(alter->space_def, &alter->key_list,
+			     alter->old_space->format != NULL ?
+			     alter->old_space->format->epoch + 1 : 1);
 	/*
 	 * Copy the replace function, the new space is at the same recovery
 	 * phase as the old one. This hack is especially necessary for
@@ -1604,7 +1607,7 @@ on_replace_dd_space(struct trigger * /* trigger */, void *event)
 		auto def_guard =
 			make_scoped_guard([=] { space_def_delete(def); });
 		RLIST_HEAD(empty_list);
-		struct space *space = space_new_xc(def, &empty_list);
+		struct space *space = space_new_xc(def, &empty_list, 0);
 		/**
 		 * The new space must be inserted in the space
 		 * cache right away to achieve linearisable
diff --git a/src/box/blackhole.c b/src/box/blackhole.c
index f979304..517daf9 100644
--- a/src/box/blackhole.c
+++ b/src/box/blackhole.c
@@ -135,7 +135,7 @@ blackhole_engine_shutdown(struct engine *engine)
 
 static struct space *
 blackhole_engine_create_space(struct engine *engine, struct space_def *def,
-			      struct rlist *key_list)
+			      struct rlist *key_list, uint64_t epoch)
 {
 	if (!rlist_empty(key_list)) {
 		diag_set(ClientError, ER_UNSUPPORTED, "Blackhole", "indexes");
@@ -152,7 +152,8 @@ blackhole_engine_create_space(struct engine *engine, struct space_def *def,
 	/* Allocate tuples on runtime arena, but check space format. */
 	struct tuple_format *format;
 	format = tuple_format_new(&tuple_format_runtime->vtab, NULL, 0, 0,
-				  def->fields, def->field_count, def->dict);
+				  def->fields, def->field_count, def->dict,
+				  epoch);
 	if (format == NULL) {
 		free(space);
 		return NULL;
diff --git a/src/box/engine.h b/src/box/engine.h
index 5b96c74..0e8c76c 100644
--- a/src/box/engine.h
+++ b/src/box/engine.h
@@ -72,7 +72,8 @@ struct engine_vtab {
 	void (*shutdown)(struct engine *);
 	/** Allocate a new space instance. */
 	struct space *(*create_space)(struct engine *engine,
-			struct space_def *def, struct rlist *key_list);
+			struct space_def *def, struct rlist *key_list,
+			uint64_t epoch);
 	/**
 	 * Write statements stored in checkpoint @vclock to @stream.
 	 */
@@ -237,9 +238,9 @@ engine_find(const char *name)
 
 static inline struct space *
 engine_create_space(struct engine *engine, struct space_def *def,
-		    struct rlist *key_list)
+		    struct rlist *key_list, uint64_t epoch)
 {
-	return engine->vtab->create_space(engine, def, key_list);
+	return engine->vtab->create_space(engine, def, key_list, epoch);
 }
 
 static inline int
@@ -390,9 +391,9 @@ engine_find_xc(const char *name)
 
 static inline struct space *
 engine_create_space_xc(struct engine *engine, struct space_def *def,
-		    struct rlist *key_list)
+		    struct rlist *key_list, uint64_t epoch)
 {
-	struct space *space = engine_create_space(engine, def, key_list);
+	struct space *space = engine_create_space(engine, def, key_list, epoch);
 	if (space == NULL)
 		diag_raise();
 	return space;
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 1b32387..f0f0313 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -176,6 +176,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 	def->parts[part_no].type = type;
 	def->parts[part_no].coll = coll;
 	def->parts[part_no].coll_id = coll_id;
+	def->parts[part_no].offset_slot = TUPLE_OFFSET_SLOT_NIL;
+	def->parts[part_no].offset_slot_epoch = 0;
 	if (path != NULL) {
 		def->parts[part_no].path_len = path_len;
 		assert(def->parts[part_no].path != NULL);
@@ -680,9 +682,12 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 			new_def->parts[pos].path = data;
 			data += part->path_len + 1;
 		}
-		key_def_set_part(new_def, pos++, part->fieldno, part->type,
+		key_def_set_part(new_def, pos, part->fieldno, part->type,
 				 part->is_nullable, part->coll, part->coll_id,
 				 part->path, part->path_len);
+		new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch;
+		new_def->parts[pos].offset_slot = part->offset_slot;
+		pos++;
 	}
 
 	/* Set-append second key def's part to the new key def. */
@@ -695,9 +700,12 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 			new_def->parts[pos].path = data;
 			data += part->path_len + 1;
 		}
-		key_def_set_part(new_def, pos++, part->fieldno, part->type,
+		key_def_set_part(new_def, pos, part->fieldno, part->type,
 				 part->is_nullable, part->coll, part->coll_id,
 				 part->path, part->path_len);
+		new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch;
+		new_def->parts[pos].offset_slot = part->offset_slot;
+		pos++;
 	}
 	key_def_set_cmp(new_def);
 	return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index ca73015..8271cee 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -85,6 +85,17 @@ struct key_part {
 	char *path;
 	/** The length of JSON path. */
 	uint32_t path_len;
+	/**
+	 * Epoch of offset slot cache. Initialized with
+	 * incremental epoch of format on caching it's field's
+	 * offset_slot via tuple_field_by_part_raw to speed up
+	 * access on subsequent calls with same format.
+	 * Cache is expected to use "the newest format is most
+	 * relevant" strategy.
+	 */
+	uint64_t offset_slot_epoch;
+	/** Cache with format's field offset slot. */
+	int32_t offset_slot;
 };
 
 struct key_def;
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index a35f5e7..6051cc7 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -358,10 +358,10 @@ memtx_engine_end_recovery(struct engine *engine)
 
 static struct space *
 memtx_engine_create_space(struct engine *engine, struct space_def *def,
-			  struct rlist *key_list)
+			  struct rlist *key_list, uint64_t epoch)
 {
 	struct memtx_engine *memtx = (struct memtx_engine *)engine;
-	return memtx_space_new(memtx, def, key_list);
+	return memtx_space_new(memtx, def, key_list, epoch);
 }
 
 static int
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index 08ae0da..3ac606f 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -884,7 +884,7 @@ static const struct space_vtab memtx_space_vtab = {
 
 struct space *
 memtx_space_new(struct memtx_engine *memtx,
-		struct space_def *def, struct rlist *key_list)
+		struct space_def *def, struct rlist *key_list, uint64_t epoch)
 {
 	struct memtx_space *memtx_space = malloc(sizeof(*memtx_space));
 	if (memtx_space == NULL) {
@@ -910,7 +910,8 @@ memtx_space_new(struct memtx_engine *memtx,
 
 	struct tuple_format *format =
 		tuple_format_new(&memtx_tuple_format_vtab, keys, key_count, 0,
-				 def->fields, def->field_count, def->dict);
+				 def->fields, def->field_count, def->dict,
+				 epoch);
 	if (format == NULL) {
 		free(memtx_space);
 		return NULL;
diff --git a/src/box/memtx_space.h b/src/box/memtx_space.h
index 7dc3410..b5bec0c 100644
--- a/src/box/memtx_space.h
+++ b/src/box/memtx_space.h
@@ -79,7 +79,7 @@ memtx_space_replace_all_keys(struct space *, struct tuple *, struct tuple *,
 
 struct space *
 memtx_space_new(struct memtx_engine *memtx,
-		struct space_def *def, struct rlist *key_list);
+		struct space_def *def, struct rlist *key_list, uint64_t epoch);
 
 #if defined(__cplusplus)
 } /* extern "C" */
diff --git a/src/box/schema.cc b/src/box/schema.cc
index 67dbd8c..e5763fa 100644
--- a/src/box/schema.cc
+++ b/src/box/schema.cc
@@ -221,7 +221,7 @@ sc_space_new(uint32_t id, const char *name,
 	struct rlist key_list;
 	rlist_create(&key_list);
 	rlist_add_entry(&key_list, index_def, link);
-	struct space *space = space_new_xc(def, &key_list);
+	struct space *space = space_new_xc(def, &key_list, 0);
 	(void) space_cache_replace(space);
 	if (replace_trigger)
 		trigger_add(&space->on_replace, replace_trigger);
@@ -380,7 +380,7 @@ schema_init()
 			space_def_delete(def);
 		});
 		RLIST_HEAD(key_list);
-		struct space *space = space_new_xc(def, &key_list);
+		struct space *space = space_new_xc(def, &key_list, 0);
 		space_cache_replace(space);
 		init_system_space(space);
 		trigger_run_xc(&on_alter_space, space);
diff --git a/src/box/space.c b/src/box/space.c
index 871cc67..2e4df74 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -181,12 +181,12 @@ fail:
 }
 
 struct space *
-space_new(struct space_def *def, struct rlist *key_list)
+space_new(struct space_def *def, struct rlist *key_list, uint64_t epoch)
 {
 	struct engine *engine = engine_find(def->engine_name);
 	if (engine == NULL)
 		return NULL;
-	return engine_create_space(engine, def, key_list);
+	return engine_create_space(engine, def, key_list, epoch);
 }
 
 void
diff --git a/src/box/space.h b/src/box/space.h
index 8888ec8..068ea4b 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -378,10 +378,11 @@ struct field_def;
  * Allocate and initialize a space.
  * @param space_def Space definition.
  * @param key_list List of index_defs.
+ * @param epoch Last epoch to initialize format.
  * @retval Space object.
  */
 struct space *
-space_new(struct space_def *space_def, struct rlist *key_list);
+space_new(struct space_def *space_def, struct rlist *key_list, uint64_t epoch);
 
 /** Destroy and free a space. */
 void
@@ -416,9 +417,10 @@ int generic_space_prepare_alter(struct space *, struct space *);
 } /* extern "C" */
 
 static inline struct space *
-space_new_xc(struct space_def *space_def, struct rlist *key_list)
+space_new_xc(struct space_def *space_def, struct rlist *key_list,
+	     uint64_t epoch)
 {
-	struct space *space = space_new(space_def, key_list);
+	struct space *space = space_new(space_def, key_list, epoch);
 	if (space == NULL)
 		diag_raise();
 	return space;
diff --git a/src/box/sysview.c b/src/box/sysview.c
index a636c68..d35ff71 100644
--- a/src/box/sysview.c
+++ b/src/box/sysview.c
@@ -504,8 +504,9 @@ sysview_engine_shutdown(struct engine *engine)
 
 static struct space *
 sysview_engine_create_space(struct engine *engine, struct space_def *def,
-			    struct rlist *key_list)
+			    struct rlist *key_list, uint64_t epoch)
 {
+	(void)epoch;
 	struct space *space = (struct space *)calloc(1, sizeof(*space));
 	if (space == NULL) {
 		diag_set(OutOfMemory, sizeof(*space),
diff --git a/src/box/tuple.c b/src/box/tuple.c
index 2bc414f..6fbd9aa 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -234,7 +234,7 @@ tuple_init(field_name_hash_f hash)
 	 * Create a format for runtime tuples
 	 */
 	tuple_format_runtime = tuple_format_new(&tuple_format_runtime_vtab,
-						NULL, 0, 0, NULL, 0, NULL);
+						NULL, 0, 0, NULL, 0, NULL, 0);
 	if (tuple_format_runtime == NULL)
 		return -1;
 
@@ -406,7 +406,7 @@ box_tuple_format_new(struct key_def **keys, uint16_t key_count)
 {
 	box_tuple_format_t *format =
 		tuple_format_new(&tuple_format_runtime_vtab,
-				 keys, key_count, 0, NULL, 0, NULL);
+				 keys, key_count, 0, NULL, 0, NULL, 0);
 	if (format != NULL)
 		tuple_format_ref(format);
 	return format;
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 4738eb9..bdd802a 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -502,6 +502,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 	}
 	for (uint32_t i = 0; i < field_count; i++)
 		tuple_field_create(&format->fields[i], NULL, 0);
+	format->epoch = 0;
 	format->max_path_tree_depth = 1;
 	format->allocation_size = total;
 	format->refs = 0;
@@ -545,7 +546,8 @@ struct tuple_format *
 tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 		 uint16_t key_count, uint16_t extra_size,
 		 const struct field_def *space_fields,
-		 uint32_t space_field_count, struct tuple_dictionary *dict)
+		 uint32_t space_field_count, struct tuple_dictionary *dict,
+		 uint64_t epoch)
 {
 	struct tuple_format *format =
 		tuple_format_alloc(keys, key_count, space_field_count, dict);
@@ -555,6 +557,7 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 	format->engine = NULL;
 	format->extra_size = extra_size;
 	format->is_temporary = false;
+	format->epoch = epoch;
 	if (tuple_format_register(format) < 0) {
 		tuple_format_destroy(format);
 		free(format);
@@ -1050,27 +1053,42 @@ tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
 	if (likely(part->path == NULL))
 		return tuple_field_raw(format, data, field_map, part->fieldno);
 
-	struct tuple_field *root_field =
-		unlikely(part->fieldno < format->field_count) ?
-		(struct tuple_field *)&format->fields[part->fieldno] : NULL;
-	struct tuple_field *field =
-		unlikely(root_field == NULL) ? NULL:
-		tuple_field_tree_lookup(root_field, part->path, part->path_len);
-	if (unlikely(field == NULL)) {
-		/*
-		 * Legacy tuple having no field map for JSON
-		 * index require full path parse.
-		 */
-		const char *field_raw =
-			tuple_field_raw(format, data, field_map, part->fieldno);
-		if (unlikely(field_raw == NULL))
-			return NULL;
-		if (tuple_field_by_relative_path(&field_raw, part->path,
-						 part->path_len) != 0)
-			return NULL;
-		return field_raw;
+	int32_t offset_slot;
+	if (likely(part->offset_slot_epoch == format->epoch)) {
+		assert(format->epoch != 0);
+		offset_slot = part->offset_slot;
+	} else {
+		struct tuple_field *root_field =
+			unlikely(part->fieldno < format->field_count) ?
+			(struct tuple_field *)&format->fields[part->fieldno] :
+			NULL;
+		struct tuple_field *field =
+			unlikely(root_field == NULL) ? NULL:
+			tuple_field_tree_lookup(root_field, part->path,
+						part->path_len);
+		if (unlikely(field == NULL)) {
+			/*
+			* Legacy tuple having no field map for JSON
+			* index require full path parse.
+			*/
+			const char *field_raw =
+				tuple_field_raw(format, data, field_map,
+						part->fieldno);
+			if (unlikely(field_raw == NULL))
+				return NULL;
+			if (tuple_field_by_relative_path(&field_raw, part->path,
+							 part->path_len) != 0)
+				return NULL;
+			return field_raw;
+		}
+		offset_slot = field->offset_slot;
+		/* Cache offset_slot if required. */
+		if (part->offset_slot_epoch < format->epoch) {
+			assert(format->epoch != 0);
+			part->offset_slot = offset_slot;
+			part->offset_slot_epoch = format->epoch;
+		}
 	}
-	int32_t offset_slot = field->offset_slot;
 	assert(offset_slot < 0);
 	assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
 	if (unlikely(field_map[offset_slot] == 0))
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 599e768..005f70b 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -120,6 +120,8 @@ struct tuple_field {
  * Tuple format describes how tuple is stored and information about its fields
  */
 struct tuple_format {
+	/** Counter that grows incrementally on space rebuild. */
+	uint64_t epoch;
 	/** Virtual function table */
 	struct tuple_format_vtab vtab;
 	/** Pointer to engine-specific data. */
@@ -220,6 +222,7 @@ tuple_format_unref(struct tuple_format *format)
  * @param extra_size Extra bytes to reserve in tuples metadata.
  * @param space_fields Array of fields, defined in a space format.
  * @param space_field_count Length of @a space_fields.
+ * @param epoch Epoch of new format.
  *
  * @retval not NULL Tuple format.
  * @retval     NULL Memory error.
@@ -228,7 +231,8 @@ struct tuple_format *
 tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 		 uint16_t key_count, uint16_t extra_size,
 		 const struct field_def *space_fields,
-		 uint32_t space_field_count, struct tuple_dictionary *dict);
+		 uint32_t space_field_count, struct tuple_dictionary *dict,
+		 uint64_t epoch);
 
 /**
  * Check, if @a format1 can store any tuples of @a format2. For
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 0a6ead9..807cd9e 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -578,7 +578,7 @@ vinyl_engine_check_space_def(struct space_def *def)
 
 static struct space *
 vinyl_engine_create_space(struct engine *engine, struct space_def *def,
-			  struct rlist *key_list)
+			  struct rlist *key_list, uint64_t epoch)
 {
 	struct space *space = malloc(sizeof(*space));
 	if (space == NULL) {
@@ -604,7 +604,8 @@ vinyl_engine_create_space(struct engine *engine, struct space_def *def,
 
 	struct tuple_format *format =
 		tuple_format_new(&vy_tuple_format_vtab, keys, key_count, 0,
-				 def->fields, def->field_count, def->dict);
+				 def->fields, def->field_count, def->dict,
+				 epoch);
 	if (format == NULL) {
 		free(space);
 		return NULL;
@@ -3025,7 +3026,7 @@ vy_send_lsm(struct vy_join_ctx *ctx, struct vy_lsm_recovery_info *lsm_info)
 	if (ctx->key_def == NULL)
 		goto out;
 	ctx->format = tuple_format_new(&vy_tuple_format_vtab, &ctx->key_def,
-				       1, 0, NULL, 0, NULL);
+				       1, 0, NULL, 0, NULL, 0);
 	if (ctx->format == NULL)
 		goto out_free_key_def;
 	tuple_format_ref(ctx->format);
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index de5ad31..c549f4b 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -61,7 +61,7 @@ vy_lsm_env_create(struct vy_lsm_env *env, const char *path,
 		  void *upsert_thresh_arg)
 {
 	env->key_format = tuple_format_new(&vy_tuple_format_vtab,
-					   NULL, 0, 0, NULL, 0, NULL);
+					   NULL, 0, 0, NULL, 0, NULL, 0);
 	if (env->key_format == NULL)
 		return -1;
 	tuple_format_ref(env->key_format);
@@ -155,9 +155,45 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env,
 	} else {
 		lsm->disk_format = tuple_format_new(&vy_tuple_format_vtab,
 						    &cmp_def, 1, 0, NULL, 0,
-						    NULL);
+						    NULL, format->epoch);
 		if (lsm->disk_format == NULL)
 			goto fail_format;
+		/*
+		 * Tuple formats should be compatible to make
+		 * epoch-based caching work.
+		 */
+		int32_t min_offset_slot = 0;
+		struct tuple_field *dst_fields = lsm->disk_format->fields;
+		struct tuple_field *src_fields = format->fields;
+		struct key_part *part = cmp_def->parts;
+		struct key_part *part_end = part + cmp_def->part_count;
+		for (; part < part_end; part++) {
+			struct tuple_field *dst_field =
+				&dst_fields[part->fieldno];
+			struct tuple_field *src_field =
+				&src_fields[part->fieldno];
+			if (dst_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+				src_field = &format->fields[part->fieldno];
+			} else if (part->path != NULL) {
+				src_field =
+					tuple_field_tree_lookup(src_field,
+								part->path,
+								part->path_len);
+				assert(src_field != NULL);
+				dst_field =
+					tuple_field_tree_lookup(dst_field,
+								part->path,
+								part->path_len);
+				assert(dst_field != NULL);
+			} else {
+				continue;
+			}
+			if (src_field->offset_slot == TUPLE_OFFSET_SLOT_NIL)
+				continue;
+			dst_field->offset_slot = src_field->offset_slot;
+			min_offset_slot = MIN(src_field->offset_slot, min_offset_slot);
+		}
+		lsm->disk_format->field_map_size = -min_offset_slot * sizeof(uint32_t);
 	}
 	tuple_format_ref(lsm->disk_format);
 
diff --git a/test/unit/vy_iterators_helper.c b/test/unit/vy_iterators_helper.c
index 40d8d6a..bfef6a2 100644
--- a/test/unit/vy_iterators_helper.c
+++ b/test/unit/vy_iterators_helper.c
@@ -22,7 +22,7 @@ vy_iterator_C_test_init(size_t cache_size)
 	vy_cache_env_create(&cache_env, cord_slab_cache());
 	vy_cache_env_set_quota(&cache_env, cache_size);
 	vy_key_format = tuple_format_new(&vy_tuple_format_vtab, NULL, 0, 0,
-					 NULL, 0, NULL);
+					 NULL, 0, NULL, 0);
 	tuple_format_ref(vy_key_format);
 
 	size_t mem_size = 64 * 1024 * 1024;
@@ -210,7 +210,7 @@ create_test_mem(struct key_def *def)
 	struct key_def * const defs[] = { def };
 	struct tuple_format *format =
 		tuple_format_new(&vy_tuple_format_vtab, defs, def->part_count,
-				 0, NULL, 0, NULL);
+				 0, NULL, 0, NULL, 0);
 	fail_if(format == NULL);
 
 	/* Create format with column mask */
@@ -234,7 +234,7 @@ create_test_cache(uint32_t *fields, uint32_t *types,
 	assert(*def != NULL);
 	vy_cache_create(cache, &cache_env, *def, true);
 	*format = tuple_format_new(&vy_tuple_format_vtab, def, 1, 0, NULL, 0,
-				   NULL);
+				   NULL, 0);
 	tuple_format_ref(*format);
 }
 
diff --git a/test/unit/vy_mem.c b/test/unit/vy_mem.c
index 967aabe..d797b3d 100644
--- a/test/unit/vy_mem.c
+++ b/test/unit/vy_mem.c
@@ -78,7 +78,7 @@ test_iterator_restore_after_insertion()
 	/* Create format */
 	struct tuple_format *format = tuple_format_new(&vy_tuple_format_vtab,
 						       &key_def, 1, 0, NULL, 0,
-						       NULL);
+						       NULL, 0);
 	assert(format != NULL);
 	tuple_format_ref(format);
 
diff --git a/test/unit/vy_point_lookup.c b/test/unit/vy_point_lookup.c
index dd33bbe..73afbd5 100644
--- a/test/unit/vy_point_lookup.c
+++ b/test/unit/vy_point_lookup.c
@@ -85,7 +85,7 @@ test_basic()
 	vy_cache_create(&cache, &cache_env, key_def, true);
 	struct tuple_format *format = tuple_format_new(&vy_tuple_format_vtab,
 						       &key_def, 1, 0, NULL, 0,
-						       NULL);
+						       NULL, 0);
 	isnt(format, NULL, "tuple_format_new is not NULL");
 	tuple_format_ref(format);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 14/14] box: specify indexes in user-friendly form
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (4 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 13/14] box: introduce offset slot cache in key_part Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 02/14] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Since now it is possible to create indexes by JSON-path
using field names specified in format.

Closes #1012.

@TarantoolBot document
Title: Indexes by JSON path
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.

Example:
s:create_index('json_index', {parts = {{'FIO["fname"]', 'str'}}})
---
 src/box/lua/index.c        | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 src/box/lua/schema.lua     | 22 ++++++++---------
 test/engine/tuple.result   | 41 +++++++++++++++++++++++++++++++
 test/engine/tuple.test.lua | 12 ++++++++++
 4 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/src/box/lua/index.c b/src/box/lua/index.c
index ef89c39..21c299d 100644
--- a/src/box/lua/index.c
+++ b/src/box/lua/index.c
@@ -35,6 +35,9 @@
 #include "box/info.h"
 #include "box/lua/info.h"
 #include "box/lua/tuple.h"
+#include "box/schema.h"
+#include "box/tuple_format.h"
+#include "json/path.h"
 #include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */
 
 /** {{{ box.index Lua library: access to spaces and indexes
@@ -328,6 +331,62 @@ lbox_index_compact(lua_State *L)
 	return 0;
 }
 
+static int
+lbox_index_resolve_path(struct lua_State *L)
+{
+	if (lua_gettop(L) != 3 ||
+	    !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
+		return luaL_error(L, "Usage box.internal."
+				     "path_resolve(part_id, space_id, path)");
+	}
+	uint32_t part_id = lua_tonumber(L, 1);
+	uint32_t space_id = lua_tonumber(L, 2);
+	size_t path_len;
+	const char *path = lua_tolstring(L, 3, &path_len);
+	struct space *space = space_cache_find(space_id);
+	if (space == NULL)
+		return luaT_error(L);
+	struct json_path_parser parser;
+	struct json_path_node node;
+	json_path_parser_create(&parser, path, path_len);
+	int rc = json_path_next(&parser, &node);
+	if (rc != 0) {
+		const char *err_msg =
+			tt_sprintf("options.parts[%d]: error in path on "
+				   "position %d", part_id, rc);
+		diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+		return luaT_error(L);
+	}
+	assert(space->format != NULL && space->format->dict != NULL);
+	uint32_t fieldno;
+	if (node.type == JSON_PATH_NUM &&
+	    (fieldno = node.num - TUPLE_INDEX_BASE) >=
+	    space->format->field_count) {
+		const char *err_msg =
+			tt_sprintf("options.parts[%d]: field '%d' referenced "
+				   "in path is greater than format field "
+				   "count %d", part_id,
+				   fieldno + TUPLE_INDEX_BASE,
+				   space->format->field_count);
+		diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+		return luaT_error(L);
+	} else if (node.type == JSON_PATH_STR &&
+		   tuple_fieldno_by_name(space->format->dict, node.str, node.len,
+					 field_name_hash(node.str, node.len),
+					 &fieldno) != 0) {
+		const char *err_msg =
+			tt_sprintf("options.parts[%d]: field was not found by "
+				   "name '%.*s'", part_id, node.len, node.str);
+		diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+		return luaT_error(L);
+	}
+	fieldno += TUPLE_INDEX_BASE;
+	path += parser.offset;
+	lua_pushnumber(L, fieldno);
+	lua_pushstring(L, path);
+	return 2;
+}
+
 /* }}} */
 
 void
@@ -365,6 +424,7 @@ box_lua_index_init(struct lua_State *L)
 		{"truncate", lbox_truncate},
 		{"stat", lbox_index_stat},
 		{"compact", lbox_index_compact},
+		{"path_resolve", lbox_index_resolve_path},
 		{NULL, NULL}
 	};
 
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index 540a2a5..8fca3bd 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -556,7 +556,7 @@ local function update_index_parts_1_6_0(parts)
     return result
 end
 
-local function update_index_parts(format, parts)
+local function update_index_parts(format, parts, space_id)
     if type(parts) ~= "table" then
         box.error(box.error.ILLEGAL_PARAMS,
         "options.parts parameter should be a table")
@@ -607,16 +607,16 @@ local function update_index_parts(format, parts)
             box.error(box.error.ILLEGAL_PARAMS,
                       "options.parts[" .. i .. "]: field (name or number) is expected")
         elseif type(part.field) == 'string' then
-            for k,v in pairs(format) do
-                if v.name == part.field then
-                    part.field = k
-                    break
-                end
-            end
-            if type(part.field) == 'string' then
+            local idx, path = box.internal.path_resolve(i, space_id, part.field)
+            if part.path ~= nil and part.path ~= path then
                 box.error(box.error.ILLEGAL_PARAMS,
-                          "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
+                          "options.parts[" .. i .. "]: field path '"..
+                          part.path.." doesn't math path resolved by name '" ..
+                          part.field .. "'")
             end
+            parts_can_be_simplified = parts_can_be_simplified and path == nil
+            part.field = idx
+            part.path = path or part.path
         elseif part.field == 0 then
             box.error(box.error.ILLEGAL_PARAMS,
                       "options.parts[" .. i .. "]: field (number) must be one-based")
@@ -767,7 +767,7 @@ box.schema.index.create = function(space_id, name, options)
         end
     end
     local parts, parts_can_be_simplified =
-        update_index_parts(format, options.parts)
+        update_index_parts(format, options.parts, space_id)
     -- create_index() options contains type, parts, etc,
     -- stored separately. Remove these members from index_opts
     local index_opts = {
@@ -934,7 +934,7 @@ box.schema.index.alter = function(space_id, index_id, options)
     if options.parts then
         local parts_can_be_simplified
         parts, parts_can_be_simplified =
-            update_index_parts(format, options.parts)
+            update_index_parts(format, options.parts, space_id)
         -- save parts in old format if possible
         if parts_can_be_simplified then
             parts = simplify_index_parts(parts)
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 1842420..4dd34fd 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -1002,6 +1002,47 @@ assert(idx ~= nil)
 ---
 - true
 ...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+- error: Field 3 has type 'map' in one index, but type 'array' in another
+...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+...
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: error in path on position 1'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is
+    greater than format field count 5'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid'''
+...
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+---
+...
+assert(idx3 ~= nil)
+---
+- true
+...
+assert(idx3.parts[2].path == "[\"FIO\"][\"fname\"]")
+---
+- true
+...
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
+---
+...
 s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
 ---
 - error: 'Tuple field 3 type does not match one required by operation: expected map'
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index 8be6505..f8d99a0 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -326,6 +326,18 @@ s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'},
 s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
 idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
 assert(idx ~= nil)
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+assert(idx3 ~= nil)
+assert(idx3.parts[2].path == "[\"FIO\"][\"fname\"]")
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
 s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
 s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
 s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 02/14] box: introduce key_def_parts_are_sequential
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (5 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 14/14] box: specify indexes in user-friendly form Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-15 17:29   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 03/14] box: introduce tuple_field_by_relative_path Kirill Shcherbatov
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Introduced a new key_def_parts_are_sequential routine that test,
does specified key_def have sequential fields. This would be
useful with introducing JSON path as there would be another
complex criteria as fields with JSONs can't be 'sequential' in
this meaning.

Part of #1012.
---
 src/box/tuple_extract_key.cc | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index e493c3b..e9d7cac 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -4,12 +4,21 @@
 
 enum { MSGPACK_NULL = 0xc0 };
 
+/** True if key part i and i+1 are sequential. */
+static inline bool
+key_def_parts_are_sequential(const struct key_def *def, int i)
+{
+	uint32_t fieldno1 = def->parts[i].fieldno + 1;
+	uint32_t fieldno2 = def->parts[i + 1].fieldno;
+	return fieldno1 == fieldno2;
+}
+
 /** True, if a key con contain two or more parts in sequence. */
 static bool
 key_def_contains_sequential_parts(const struct key_def *def)
 {
 	for (uint32_t i = 0; i < def->part_count - 1; ++i) {
-		if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno)
+		if (key_def_parts_are_sequential(def, i))
 			return true;
 	}
 	return false;
@@ -123,8 +132,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (key_def->parts[i].fieldno + 1 !=
-				    key_def->parts[i + 1].fieldno) {
+				if (!key_def_parts_are_sequential(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -165,8 +173,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (key_def->parts[i].fieldno + 1 !=
-				    key_def->parts[i + 1].fieldno) {
+				if (!key_def_parts_are_sequential(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -231,8 +238,7 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 		uint32_t null_count = 0;
 		for (; i < key_def->part_count - 1; i++) {
-			if (key_def->parts[i].fieldno + 1 !=
-			    key_def->parts[i + 1].fieldno)
+			if (!key_def_parts_are_sequential(key_def, i))
 				break;
 		}
 		uint32_t end_fieldno = key_def->parts[i].fieldno;
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 03/14] box: introduce tuple_field_by_relative_path
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (6 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 02/14] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-15 17:46   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 04/14] box: introduce tuple_format_add_key_part Kirill Shcherbatov
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

The new tuple_field_by_relative_path routine is used in function
tuple_field_raw_by_path to retrieve data by JSON path from field.
We need this routine exported in future to access data by JSON
path specified in key_part.

Part of #1012.
---
 src/box/tuple_format.c | 60 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 17 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index b385c0d..5679cad 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -541,6 +541,42 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
 	return -1;
 }
 
+/**
+ * Retrieve field data by JSON path.
+ * @param field Pointer to msgpack with data.
+ * @param path The path to process.
+ * @param path_len The length of the @path.
+ * @retval 0 On success.
+ * @retval >0 On path parsing error, invalid character position.
+ */
+static inline int
+tuple_field_by_relative_path(const char **field, const char *path,
+			     uint32_t path_len)
+{
+	int rc;
+	struct json_path_parser parser;
+	struct json_path_node node;
+	json_path_parser_create(&parser, path, path_len);
+	while ((rc = json_path_next(&parser, &node)) == 0) {
+		switch (node.type) {
+		case JSON_PATH_NUM:
+			rc = tuple_field_go_to_index(field, node.num);
+			break;
+		case JSON_PATH_STR:
+			rc = tuple_field_go_to_key(field, node.str, node.len);
+			break;
+		default:
+			assert(node.type == JSON_PATH_END);
+			return 0;
+		}
+		if (rc != 0) {
+			*field = NULL;
+			return 0;
+		}
+	}
+	return rc;
+}
+
 int
 tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
                         const uint32_t *field_map, const char *path,
@@ -604,23 +640,13 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		*field = NULL;
 		return 0;
 	}
-	while ((rc = json_path_next(&parser, &node)) == 0) {
-		switch(node.type) {
-		case JSON_PATH_NUM:
-			rc = tuple_field_go_to_index(field, node.num);
-			break;
-		case JSON_PATH_STR:
-			rc = tuple_field_go_to_key(field, node.str, node.len);
-			break;
-		default:
-			assert(node.type == JSON_PATH_END);
-			return 0;
-		}
-		if (rc != 0) {
-			*field = NULL;
-			return 0;
-		}
-	}
+	rc = tuple_field_by_relative_path(field, path + parser.offset,
+					  path_len - parser.offset);
+	if (rc == 0)
+		return 0;
+	/* Setup absolute error position. */
+	rc += parser.offset;
+
 error:
 	assert(rc > 0);
 	diag_set(ClientError, ER_ILLEGAL_PARAMS,
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 04/14] box: introduce tuple_format_add_key_part
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (7 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 03/14] box: introduce tuple_field_by_relative_path Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-15 19:39   ` Vladimir Davydov
  2018-10-11  7:58 ` [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine Kirill Shcherbatov
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Introduced a new tuple_format_add_key_part that makes format
initialization for specified key_part and configuration.
This decrease tuple_format_create routine complexity and would
be used to initialize structures in format for JSON path.

Part of #1012.
---
 src/box/tuple_format.c | 125 +++++++++++++++++++++++++++----------------------
 1 file changed, 69 insertions(+), 56 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 5679cad..d8acaa5 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -42,6 +42,71 @@ static const struct tuple_field tuple_field_default = {
 };
 
 /**
+ * Add and initialize a new key_part to format.
+ * @param format Format to initialize.
+ * @param fields Fields definition if any.
+ * @param fields_count Count of @fields.
+ * @param part An index part to append.
+ * @param is_sequential Does this part sequential.
+ * @param current_slot Pointer to last offset slot.
+ * @retval -1 On error.
+ * @retval 0 On success.
+ */
+static int
+tuple_format_add_key_part(struct tuple_format *format,
+			  const struct field_def *fields, uint32_t field_count,
+			  const struct key_part *part, bool is_sequential,
+			  int *current_slot)
+{
+	assert(part->fieldno < format->field_count);
+	struct tuple_field *field = &format->fields[part->fieldno];
+	if (part->fieldno >= field_count) {
+		field->is_nullable = part->is_nullable;
+	} else if (field->is_nullable != part->is_nullable) {
+		/*
+		 * In case of mismatch set the most strict option
+		 * for is_nullable.
+		 */
+		field->is_nullable = false;
+	}
+	/*
+	 * Check that there are no conflicts between index part
+	 * types and space fields. If a part type is compatible
+	 * with field's one, then the part type is more strict
+	 * and the part type must be used in tuple_format.
+	 */
+	if (field_type1_contains_type2(field->type, part->type)) {
+		field->type = part->type;
+	} else if (!field_type1_contains_type2(part->type, field->type)) {
+		int fieldno = part->fieldno + TUPLE_INDEX_BASE;
+		const char *name = part->fieldno >= field_count ?
+				   tt_sprintf("%d", fieldno) :
+				   tt_sprintf("'%s'",
+					      fields[part->fieldno].name);
+		int errcode = !field->is_key_part ?
+			      ER_FORMAT_MISMATCH_INDEX_PART :
+			      ER_INDEX_PART_TYPE_MISMATCH;
+		diag_set(ClientError, errcode, name,
+			 field_type_strs[field->type],
+			 field_type_strs[part->type]);
+		return -1;
+	}
+	field->is_key_part = true;
+	/*
+	 * In the tuple, store only offsets necessary to access
+	 * fields of non-sequential keys.
+	 * First field is always simply accessible, so we don't
+	 * store an offset for it.
+	 */
+	if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && !is_sequential &&
+	    part->fieldno > 0) {
+		*current_slot = *current_slot - 1;
+		field->offset_slot = *current_slot;
+	}
+	return 0;
+}
+
+/**
  * Extract all available type info from keys and field
  * definitions.
  */
@@ -78,63 +143,11 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		const struct key_part *parts_end = part + key_def->part_count;
 
 		for (; part < parts_end; part++) {
-			assert(part->fieldno < format->field_count);
-			struct tuple_field *field =
-				&format->fields[part->fieldno];
-			if (part->fieldno >= field_count) {
-				field->is_nullable = part->is_nullable;
-			} else if (field->is_nullable != part->is_nullable) {
-				/*
-				 * In case of mismatch set the most
-				 * strict option for is_nullable.
-				 */
-				field->is_nullable = false;
-			}
-
-			/*
-			 * Check that there are no conflicts
-			 * between index part types and space
-			 * fields. If a part type is compatible
-			 * with field's one, then the part type is
-			 * more strict and the part type must be
-			 * used in tuple_format.
-			 */
-			if (field_type1_contains_type2(field->type,
-						       part->type)) {
-				field->type = part->type;
-			} else if (! field_type1_contains_type2(part->type,
-								field->type)) {
-				const char *name;
-				int fieldno = part->fieldno + TUPLE_INDEX_BASE;
-				if (part->fieldno >= field_count) {
-					name = tt_sprintf("%d", fieldno);
-				} else {
-					const struct field_def *def =
-						&fields[part->fieldno];
-					name = tt_sprintf("'%s'", def->name);
-				}
-				int errcode;
-				if (! field->is_key_part)
-					errcode = ER_FORMAT_MISMATCH_INDEX_PART;
-				else
-					errcode = ER_INDEX_PART_TYPE_MISMATCH;
-				diag_set(ClientError, errcode, name,
-					 field_type_strs[field->type],
-					 field_type_strs[part->type]);
+			if (tuple_format_add_key_part(format, fields,
+						      field_count, part,
+						      is_sequential,
+						      &current_slot) != 0)
 				return -1;
-			}
-			field->is_key_part = true;
-			/*
-			 * In the tuple, store only offsets necessary
-			 * to access fields of non-sequential keys.
-			 * First field is always simply accessible,
-			 * so we don't store an offset for it.
-			 */
-			if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
-			    is_sequential == false && part->fieldno > 0) {
-
-				field->offset_slot = --current_slot;
-			}
 		}
 	}
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (8 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 04/14] box: introduce tuple_format_add_key_part Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-15 17:52   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition Kirill Shcherbatov
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

New tuple_format_sizeof routine calculates the size of memory
for tuple_format allocation.

Part of #1012.
---
 src/box/tuple_format.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index d8acaa5..6a287f2 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -42,6 +42,18 @@ static const struct tuple_field tuple_field_default = {
 };
 
 /**
+ * Calculate the size of tuple format allocation.
+ * @param field_count Count of tuple fields.
+ * @retval Size of memory allocation.
+ */
+static inline uint32_t
+tuple_format_sizeof(uint32_t field_count)
+{
+	return sizeof(struct tuple_format) +
+	       field_count * sizeof(struct tuple_field);
+}
+
+/**
  * Add and initialize a new key_part to format.
  * @param format Format to initialize.
  * @param fields Fields definition if any.
@@ -225,8 +237,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
-	uint32_t total = sizeof(struct tuple_format) +
-			 field_count * sizeof(struct tuple_field);
+	uint32_t total = tuple_format_sizeof(field_count);
 
 	struct tuple_format *format = (struct tuple_format *) malloc(total);
 	if (format == NULL) {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (9 preceding siblings ...)
  2018-10-11  7:58 ` [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-16  8:15   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 07/14] box: drop format const qualifier in *init_field_map Kirill Shcherbatov
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Moved tuple_field_go_to_index and tuple_field_go_to_key functions
definitions in tuple_format.c source to make them visible in
tuple_init_field_map routine.

Part of #1012.
---
 src/box/tuple_format.c | 160 ++++++++++++++++++++++++-------------------------
 1 file changed, 80 insertions(+), 80 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 6a287f2..3e43cf7 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -54,6 +54,86 @@ tuple_format_sizeof(uint32_t field_count)
 }
 
 /**
+ * Propagate @a field to MessagePack(field)[index].
+ * @param[in][out] field Field to propagate.
+ * @param index 1-based index to propagate to.
+ *
+ * @retval  0 Success, the index was found.
+ * @retval -1 Not found.
+ */
+static inline int
+tuple_field_go_to_index(const char **field, uint64_t index)
+{
+	enum mp_type type = mp_typeof(**field);
+	if (type == MP_ARRAY) {
+		if (index == 0)
+			return -1;
+		/* Make index 0-based. */
+		index -= TUPLE_INDEX_BASE;
+		uint32_t count = mp_decode_array(field);
+		if (index >= count)
+			return -1;
+		for (; index > 0; --index)
+			mp_next(field);
+		return 0;
+	} else if (type == MP_MAP) {
+		uint64_t count = mp_decode_map(field);
+		for (; count > 0; --count) {
+			type = mp_typeof(**field);
+			if (type == MP_UINT) {
+				uint64_t value = mp_decode_uint(field);
+				if (value == index)
+					return 0;
+			} else if (type == MP_INT) {
+				int64_t value = mp_decode_int(field);
+				if (value >= 0 && (uint64_t)value == index)
+					return 0;
+			} else {
+				/* Skip key. */
+				mp_next(field);
+			}
+			/* Skip value. */
+			mp_next(field);
+		}
+	}
+	return -1;
+}
+
+/**
+ * Propagate @a field to MessagePack(field)[key].
+ * @param[in][out] field Field to propagate.
+ * @param key Key to propagate to.
+ * @param len Length of @a key.
+ *
+ * @retval  0 Success, the index was found.
+ * @retval -1 Not found.
+ */
+static inline int
+tuple_field_go_to_key(const char **field, const char *key, int len)
+{
+	enum mp_type type = mp_typeof(**field);
+	if (type != MP_MAP)
+		return -1;
+	uint64_t count = mp_decode_map(field);
+	for (; count > 0; --count) {
+		type = mp_typeof(**field);
+		if (type == MP_STR) {
+			uint32_t value_len;
+			const char *value = mp_decode_str(field, &value_len);
+			if (value_len == (uint)len &&
+			    memcmp(value, key, len) == 0)
+				return 0;
+		} else {
+			/* Skip key. */
+			mp_next(field);
+		}
+		/* Skip value. */
+		mp_next(field);
+	}
+	return -1;
+}
+
+/**
  * Add and initialize a new key_part to format.
  * @param format Format to initialize.
  * @param fields Fields definition if any.
@@ -486,86 +566,6 @@ box_tuple_format_unref(box_tuple_format_t *format)
 }
 
 /**
- * Propagate @a field to MessagePack(field)[index].
- * @param[in][out] field Field to propagate.
- * @param index 1-based index to propagate to.
- *
- * @retval  0 Success, the index was found.
- * @retval -1 Not found.
- */
-static inline int
-tuple_field_go_to_index(const char **field, uint64_t index)
-{
-	enum mp_type type = mp_typeof(**field);
-	if (type == MP_ARRAY) {
-		if (index == 0)
-			return -1;
-		/* Make index 0-based. */
-		index -= TUPLE_INDEX_BASE;
-		uint32_t count = mp_decode_array(field);
-		if (index >= count)
-			return -1;
-		for (; index > 0; --index)
-			mp_next(field);
-		return 0;
-	} else if (type == MP_MAP) {
-		uint64_t count = mp_decode_map(field);
-		for (; count > 0; --count) {
-			type = mp_typeof(**field);
-			if (type == MP_UINT) {
-				uint64_t value = mp_decode_uint(field);
-				if (value == index)
-					return 0;
-			} else if (type == MP_INT) {
-				int64_t value = mp_decode_int(field);
-				if (value >= 0 && (uint64_t)value == index)
-					return 0;
-			} else {
-				/* Skip key. */
-				mp_next(field);
-			}
-			/* Skip value. */
-			mp_next(field);
-		}
-	}
-	return -1;
-}
-
-/**
- * Propagate @a field to MessagePack(field)[key].
- * @param[in][out] field Field to propagate.
- * @param key Key to propagate to.
- * @param len Length of @a key.
- *
- * @retval  0 Success, the index was found.
- * @retval -1 Not found.
- */
-static inline int
-tuple_field_go_to_key(const char **field, const char *key, int len)
-{
-	enum mp_type type = mp_typeof(**field);
-	if (type != MP_MAP)
-		return -1;
-	uint64_t count = mp_decode_map(field);
-	for (; count > 0; --count) {
-		type = mp_typeof(**field);
-		if (type == MP_STR) {
-			uint32_t value_len;
-			const char *value = mp_decode_str(field, &value_len);
-			if (value_len == (uint)len &&
-			    memcmp(value, key, len) == 0)
-				return 0;
-		} else {
-			/* Skip key. */
-			mp_next(field);
-		}
-		/* Skip value. */
-		mp_next(field);
-	}
-	return -1;
-}
-
-/**
  * Retrieve field data by JSON path.
  * @param field Pointer to msgpack with data.
  * @param path The path to process.
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 07/14] box: drop format const qualifier in *init_field_map
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (10 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 08/14] lib: implement JSON tree class for json library Kirill Shcherbatov
  2018-10-11  7:58 ` [PATCH v4 09/14] lib: introduce json_path_normalize routine Kirill Shcherbatov
  13 siblings, 0 replies; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Dropped format const qualifier in tuple_init_field_map routine
as JSON tree iterators API require non-constant arguments.

Part of #1012.
---
 src/box/tuple_format.c | 2 +-
 src/box/tuple_format.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 3e43cf7..a8aceef 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -452,7 +452,7 @@ tuple_format_dup(struct tuple_format *src)
 
 /** @sa declaration for details. */
 int
-tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
+tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple)
 {
 	if (format->field_count == 0)
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index a9a9c30..894620f 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -320,7 +320,7 @@ box_tuple_format_unref(box_tuple_format_t *format);
  * tuple + off_i = indexed_field_i;
  */
 int
-tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
+tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple);
 
 /**
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 08/14] lib: implement JSON tree class for json library
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (11 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 07/14] box: drop format const qualifier in *init_field_map Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-16  8:26   ` Vladimir Davydov
  2018-10-11  7:58 ` [PATCH v4 09/14] lib: introduce json_path_normalize routine Kirill Shcherbatov
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

Need for #1012.
---
 src/lib/json/CMakeLists.txt |   2 +
 src/lib/json/tree.c         | 300 ++++++++++++++++++++++++++++++++++++++++++++
 src/lib/json/tree.h         | 260 ++++++++++++++++++++++++++++++++++++++
 test/unit/json_path.c       | 204 +++++++++++++++++++++++++++++-
 test/unit/json_path.result  |  48 ++++++-
 5 files changed, 812 insertions(+), 2 deletions(-)
 create mode 100644 src/lib/json/tree.c
 create mode 100644 src/lib/json/tree.h

diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 203fe6f..9eaba37 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -1,6 +1,8 @@
 set(lib_sources
     path.c
+    tree.c
 )
 
 set_source_files_compile_flags(${lib_sources})
 add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/tree.c b/src/lib/json/tree.c
new file mode 100644
index 0000000..2f82287
--- /dev/null
+++ b/src/lib/json/tree.c
@@ -0,0 +1,300 @@
+
+/*
+ * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <ctype.h>
+#include <string.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+
+#include "small/rlist.h"
+#include "trivia/util.h"
+#include "tree.h"
+#include "path.h"
+#include "third_party/PMurHash.h"
+
+
+/**
+ * Hash table: json_path_node => json_tree_node.
+ */
+struct mh_json_tree_node {
+	struct json_path_node key;
+	uint32_t key_hash;
+	struct json_tree_node *node;
+};
+
+/**
+ * Compare hash records of two json tree nodes. Return 0 if equal.
+ */
+static inline int
+mh_json_tree_node_cmp(const struct mh_json_tree_node *a,
+		      const struct mh_json_tree_node *b)
+{
+	if (a->key.type != b->key.type)
+		return a->key.type - b->key.type;
+	if (a->key.type == JSON_PATH_STR) {
+		if (a->key.len != b->key.len)
+			return a->key.len - b->key.len;
+		return memcmp(a->key.str, b->key.str, a->key.len);
+	} else if (a->key.type == JSON_PATH_NUM) {
+		return a->key_hash - b->key_hash;
+	}
+	unreachable();
+}
+
+#define MH_SOURCE 1
+#define mh_name _json_tree_node
+#define mh_key_t struct mh_json_tree_node *
+#define mh_node_t struct mh_json_tree_node
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((a)->key_hash)
+#define mh_hash_key(a, arg) ((a)->key_hash)
+#define mh_cmp(a, b, arg) (mh_json_tree_node_cmp((a), (b)))
+#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
+#include "salad/mhash.h"
+
+static const uint32_t json_path_node_hash_seed = 13U;
+
+uint32_t
+json_path_node_hash(struct json_path_node *key)
+{
+	uint32_t h = json_path_node_hash_seed;
+	uint32_t carry = 0;
+	const void *data;
+	uint32_t data_size;
+	if (key->type == JSON_PATH_STR) {
+		data = key->str;
+		data_size = key->len;
+	} else if (key->type == JSON_PATH_NUM) {
+		data = &key->num;
+		data_size = sizeof(key->num);
+	} else {
+		unreachable();
+	}
+	PMurHash32_Process(&h, &carry, data, data_size);
+	return PMurHash32_Result(h, carry, data_size);
+}
+
+void
+json_tree_node_create(struct json_tree_node *node,
+		      struct json_path_node *path_node, uint32_t path_node_hash)
+{
+	assert(path_node == NULL ||
+	       path_node_hash == json_path_node_hash(path_node));
+	memset(node, 0, sizeof(struct json_tree_node));
+	if (path_node != NULL) {
+		node->key = *path_node;
+		node->key_hash = path_node_hash;
+	} else {
+		node->key.type = JSON_PATH_END;
+	}
+	rlist_create(&node->sibling);
+	rlist_create(&node->children);
+}
+
+uint32_t
+json_tree_node_children_count(struct json_tree_node *node)
+{
+	return node->child_by_path_node != NULL ?
+	       mh_size(node->child_by_path_node) : 0;
+}
+
+void
+json_tree_node_destroy(struct json_tree_node *node)
+{
+	if (node->child_by_path_node != NULL)
+		mh_json_tree_node_delete(node->child_by_path_node);
+}
+
+struct json_tree_node *
+json_tree_lookup_by_path_node(struct json_tree_node *parent,
+			      struct json_path_node *path_node,
+			      uint32_t path_node_hash)
+{
+	if (unlikely(parent->child_by_path_node == NULL))
+		return NULL;
+	struct mh_json_tree_node info;
+	info.key = *path_node;
+	info.key_hash = path_node_hash;
+	mh_int_t id =
+		mh_json_tree_node_find(parent->child_by_path_node, &info, NULL);
+	if (id == mh_end(parent->child_by_path_node))
+		return NULL;
+	struct mh_json_tree_node *ht_node =
+		mh_json_tree_node_node(parent->child_by_path_node, id);
+	assert(ht_node == NULL || ht_node->node != NULL);
+	struct json_tree_node *ret = ht_node != NULL ? ht_node->node : NULL;
+	assert(ret == NULL || ret->parent == parent);
+	return ret;
+}
+
+int
+json_tree_add(struct json_tree_node *parent, struct json_tree_node *node)
+{
+	assert(node->parent == NULL);
+	if (unlikely(parent->child_by_path_node == NULL)) {
+		parent->child_by_path_node = mh_json_tree_node_new();
+		if (parent->child_by_path_node == NULL)
+			return -1;
+	}
+	assert(json_tree_lookup_by_path_node(parent, &node->key,
+					     node->key_hash) == NULL);
+	struct mh_json_tree_node ht_node;
+	ht_node.key = node->key;
+	ht_node.key_hash = node->key_hash;
+	ht_node.node = node;
+	mh_int_t rc = mh_json_tree_node_put(parent->child_by_path_node,
+					    &ht_node, NULL, NULL);
+	if (rc == mh_end(parent->child_by_path_node))
+		return -1;
+	node->parent = parent;
+	if (node->key.type == JSON_PATH_NUM) {
+		/*
+		 * Insert an entry into the list, keeping the
+		 * numerical values in ascending order.
+		 */
+		struct json_tree_node *curr_entry = NULL;
+		struct rlist *prev_rlist_node = &parent->children;
+		rlist_foreach_entry(curr_entry, &parent->children, sibling) {
+			assert(curr_entry->key.type == JSON_PATH_NUM);
+			if (curr_entry->key.num >= node->key.num)
+				break;
+			prev_rlist_node = &curr_entry->sibling;
+		}
+		rlist_add(prev_rlist_node, &node->sibling);
+	} else {
+		rlist_add(&parent->children, &node->sibling);
+	}
+	return 0;
+}
+
+void
+json_tree_remove(struct json_tree_node *parent, struct json_tree_node *node)
+{
+	assert(node->parent == parent);
+	assert(parent->child_by_path_node != NULL);
+	assert(json_tree_lookup_by_path_node(parent, &node->key,
+					     node->key_hash) == node);
+	rlist_del(&node->sibling);
+	struct mh_json_tree_node info;
+	info.key = node->key;
+	info.key_hash = node->key_hash;
+	mh_int_t id =
+		mh_json_tree_node_find(parent->child_by_path_node, &info, NULL);
+	assert(id != mh_end(parent->child_by_path_node));
+	mh_json_tree_node_del(parent->child_by_path_node, id, NULL);
+	assert(json_tree_lookup_by_path_node(parent, &node->key,
+					     node->key_hash) == NULL);
+}
+
+struct json_tree_node *
+json_tree_lookup_by_path(struct json_tree_node *parent, const char *path,
+			 uint32_t path_len)
+{
+	if (unlikely(parent->child_by_path_node == NULL))
+		return NULL;
+	int rc;
+	struct json_path_parser parser;
+	struct json_path_node path_node;
+	struct json_tree_node *ret = parent;
+	json_path_parser_create(&parser, path, path_len);
+	while ((rc = json_path_next(&parser, &path_node)) == 0 &&
+	       path_node.type != JSON_PATH_END && ret != NULL) {
+		uint32_t path_node_hash = json_path_node_hash(&path_node);
+		struct json_tree_node *node =
+			json_tree_lookup_by_path_node(ret, &path_node,
+						      path_node_hash);
+		assert(node == NULL || node->parent == ret);
+		ret = node;
+	}
+	if (rc != 0 || path_node.type != JSON_PATH_END)
+		return NULL;
+	return ret;
+}
+
+static struct json_tree_node *
+json_tree_next_child(struct json_tree_node *parent, struct json_tree_node *pos)
+{
+	struct json_tree_node *next;
+	if (pos == NULL) {
+		next = rlist_first_entry(&parent->children,
+					 struct json_tree_node, sibling);
+	} else {
+		next = rlist_next_entry(pos, sibling);
+	}
+	if (&next->sibling != &parent->children)
+		return next;
+	return NULL;
+}
+
+static struct json_tree_node *
+json_tree_leftmost(struct json_tree_node *pos)
+{
+	struct json_tree_node *last;
+	do {
+		last = pos;
+		pos = json_tree_next_child(pos, NULL);
+	} while (pos != NULL);
+	return last;
+}
+
+struct json_tree_node *
+json_tree_next_pre(struct json_tree_node *parent, struct json_tree_node *pos)
+{
+	if (pos == NULL)
+		return parent;
+	struct json_tree_node *next = json_tree_next_child(pos, NULL);
+	if (next != NULL)
+		return next;
+	while (pos != parent) {
+		next = json_tree_next_child(pos->parent, pos);
+		if (next != NULL)
+			return next;
+		pos = pos->parent;
+	}
+	return NULL;
+}
+
+struct json_tree_node *
+json_tree_next_post(struct json_tree_node *parent, struct json_tree_node *pos)
+{
+	struct json_tree_node *next;
+	if (pos == NULL)
+		return json_tree_leftmost(parent);
+	if (pos == parent)
+		return NULL;
+	next = json_tree_next_child(pos->parent, pos);
+	if (next != NULL)
+		return json_tree_leftmost(next);
+	return pos->parent;
+}
diff --git a/src/lib/json/tree.h b/src/lib/json/tree.h
new file mode 100644
index 0000000..45bf975
--- /dev/null
+++ b/src/lib/json/tree.h
@@ -0,0 +1,260 @@
+#ifndef TARANTOOL_JSON_TREE_H_INCLUDED
+#define TARANTOOL_JSON_TREE_H_INCLUDED
+/*
+ * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include <stdbool.h>
+#include <stdint.h>
+#include "small/rlist.h"
+#include "path.h"
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+struct mh_json_tree_node_t;
+
+/**
+ * JSON tree is a hierarchical data structure that organize JSON
+ * nodes produced by parser.
+ * 1. Numeric records are sorted in ascending order in sibling rlist.
+ * 2. Structure key records point to source strings memory.
+ * 3. Lookup by path complexity is O(path_len):
+ * 	path = path_token1 + .. + path_tokenN
+ * 	- Complexity of hash(path_tokenI) - O(strlen(path_tokenN))
+ * 	- Complexity of hash lookup - O(1) + strcmp on collision
+ */
+struct json_tree_node {
+	/** Path component represented by this node. */
+	struct json_path_node key;
+	/** Hash calculated for key field. */
+	uint32_t key_hash;
+	/**
+	 * Children hashtable:
+	 * json_path_node -> json_tree_node.
+	 */
+	struct mh_json_tree_node_t *child_by_path_node;
+	/**
+	 * Link in json_tree_node::children of the parent node.
+	 */
+	struct rlist sibling;
+	/**
+	 * List of child nodes, linked by
+	 * json_tree_node::sibling.
+	 */
+	struct rlist children;
+	/** Pointer to parent json_tree_node record. */
+	struct json_tree_node *parent;
+};
+
+/**
+ * Compute the hash value of a JSON path component.
+ * @param key JSON path component to compute the hash for.
+ * @retval Hash value.
+ */
+uint32_t
+json_path_node_hash(struct json_path_node *key);
+
+/**
+ * Init a JSON tree node.
+ * @param node JSON tree node to initialize.
+ * @param path_node JSON path component the new node will
+ *                  represent. May be NULL, in this case
+ *                  @node initialized as root record.
+ * @param path_node_hash Hash of @path_node, should be calculated
+ *                       with json_path_node_hash.
+ */
+void
+json_tree_node_create(struct json_tree_node *node,
+		      struct json_path_node *path_node,
+		      uint32_t path_node_hash);
+
+/**
+ * Destroy a JSON tree node.
+ * @param node JSON tree node to destruct.
+ */
+void
+json_tree_node_destroy(struct json_tree_node *node);
+
+/**
+ * Return count of JSON tree node direct descendant.
+ * Matches the size of the child_by_path_node hashtable.
+ * @param node JSON tree node to analyze.
+ * @retval Count of descendant.
+ */
+uint32_t
+json_tree_node_children_count(struct json_tree_node *node);
+
+/**
+ * Look up a child of a JSON tree node by a path component.
+ * @param parent JSON tree node to start lookup with.
+ * @param path_node JSON parser node with data.
+ * @param path_node_hash Hash of @path_node.
+ * @retval not NULL pointer to JSON tree node if any.
+ * @retval NULL on nothing found.
+ */
+struct json_tree_node *
+json_tree_lookup_by_path_node(struct json_tree_node *parent,
+			      struct json_path_node *path_node,
+			      uint32_t path_node_hash);
+
+/**
+ * Add a new JSON tree node at the given position in the tree.
+ * The parent node must not have a child representing the same
+ * path component.
+ * @param parent JSON tree parent node.
+ * @param node JSON tree node to append.
+ * @retval 0 On success, -1 on memory allocation error.
+ */
+int
+json_tree_add(struct json_tree_node *parent,
+	      struct json_tree_node *node);
+
+/**
+ * Delete a JSON tree node at the given position in the tree.
+ * The parent node must have such child.
+ * @param parent JSON tree parent node.
+ * @param node JSON tree node to add.
+ */
+void
+json_tree_remove(struct json_tree_node *parent,
+		 struct json_tree_node *node);
+
+/**
+ * Lookup a field by path in given position in the JSON tree.
+ * @param parent JSON tree parent node.
+ * @param path JSON path string.
+ * @param path_len Length of @path.
+ * @retval not NULL pointer to leaf record if any.
+ */
+struct json_tree_node *
+json_tree_lookup_by_path(struct json_tree_node *parent,
+			 const char *path, uint32_t path_len);
+
+/**
+ * Make pre order traversal in JSON tree.
+ * @param parent The JSON path tree to walk parent.
+ * @param pos Iterator pointer, NULL for initialization.
+ * @retval not NULL pointer to next node. NULL if nothing found.
+ */
+struct json_tree_node *
+json_tree_next_pre(struct json_tree_node *parent,
+		   struct json_tree_node *pos);
+
+/**
+ * Make post order traversal in JSON tree.
+ * @param parent The JSON path tree to walk parent.
+ * @param pos Iterator pointer, NULL for initialization.
+ * @retval not NULL pointer to next node. NULL if nothing found.
+ */
+struct json_tree_node *
+json_tree_next_post(struct json_tree_node *parent,
+		    struct json_tree_node *pos);
+
+/** Return entry by json_tree_node item. */
+#define json_tree_entry(item, type, member) ({				     \
+	const typeof( ((type *)0)->member ) *__mptr = (item);		     \
+	(type *)( (char *)__mptr - ((size_t) &((type *)0)->member) ); })
+
+/** Return entry by json_tree_node item or NULL if item is NULL.*/
+#define json_tree_entry_safe(item, type, member) ({			     \
+	(item) != NULL ? json_tree_entry((item), type, member) : NULL; })
+
+/** Make lookup in tree by path and return entry. */
+#define json_tree_lookup_entry_by_path(root, path, path_len, type, member)   \
+({struct json_tree_node *__tree_node =					     \
+	json_tree_lookup_by_path(&(root)->member, path, path_len);	     \
+ __tree_node != NULL ? json_tree_entry(__tree_node, type, member) :	     \
+		       NULL; })
+
+/** Make lookup in tree by path_node and return entry. */
+#define json_tree_lookup_entry_by_path_node(root, path_node, path_node_hash, \
+					    type, member)		     \
+({struct json_tree_node *__tree_node =					     \
+	json_tree_lookup_by_path_node(&(root)->member, path_node,	     \
+				      path_node_hash);			     \
+ __tree_node != NULL ? json_tree_entry(__tree_node, type, member) :	     \
+		       NULL; })
+
+/** Make pre-order traversal in JSON tree. */
+#define json_tree_foreach_pre(item, root)				     \
+for ((item) = json_tree_next_pre((root), NULL); (item) != NULL;		     \
+     (item) = json_tree_next_pre((root), (item)))
+
+/** Make post-order traversal in JSON tree. */
+#define json_tree_foreach_post(item, root)				     \
+for ((item) = json_tree_next_post((root), NULL); (item) != NULL;	     \
+     (item) = json_tree_next_post((root), (item)))
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * Safe for building destructors.
+ */
+#define json_tree_foreach_safe(item, root)				     \
+for (struct json_tree_node *__iter = json_tree_next_post((root), NULL);      \
+     (((item) = __iter) && (__iter = json_tree_next_post((root), (item))),   \
+     (item) != NULL);)
+
+/** Make post-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_pre(item, root, member)			     \
+for (struct json_tree_node *__iter =					     \
+     json_tree_next_pre(&(root)->member, NULL);				     \
+     __iter != NULL && ((item) = json_tree_entry(__iter, typeof(*(root)),    \
+					       member));		     \
+     __iter = json_tree_next_pre(&(root)->member, __iter))
+
+/** Make pre-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_post(item, root, member)		     \
+for (struct json_tree_node *__iter =					     \
+     json_tree_next_post(&(root)->member, NULL);			     \
+     __iter != NULL && ((item) = json_tree_entry(__iter, typeof(*(root)),    \
+					       member));		     \
+     __iter = json_tree_next_post(&(root)->member, __iter))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ */
+#define json_tree_foreach_entry_safe(item, root, member)		     \
+for (struct json_tree_node *__tmp_iter, *__iter =			     \
+     json_tree_next_post(NULL, &(root)->member);			     \
+     __iter != NULL && ((item) = json_tree_entry(__iter, typeof(*(root)),    \
+						      member)) &&	     \
+     (__iter != NULL && (__tmp_iter =					     \
+			 json_tree_next_post(&(root)->member, __iter))),     \
+     (__iter != NULL && ((item) = json_tree_entry(__iter, typeof(*(root)),   \
+						  member)));		     \
+     __iter = __tmp_iter)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* TARANTOOL_JSON_TREE_H_INCLUDED */
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 1d7e7d3..9a1de06 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -1,7 +1,9 @@
 #include "json/path.h"
+#include "json/tree.h"
 #include "unit.h"
 #include "trivia/util.h"
 #include <string.h>
+#include <stdbool.h>
 
 #define reset_to_new_path(value) \
 	path = value; \
@@ -159,14 +161,214 @@ test_errors()
 	footer();
 }
 
+struct test_struct {
+	int value;
+	struct json_tree_node tree_node;
+};
+
+struct test_struct *
+test_add_path(struct test_struct *root, const char *path, uint32_t path_len,
+	      struct test_struct *records_pool, int *pool_idx)
+{
+	int rc;
+	struct json_path_parser parser;
+	struct json_path_node path_node;
+	struct test_struct *parent = root;
+	json_path_parser_create(&parser, path, path_len);
+	while ((rc = json_path_next(&parser, &path_node)) == 0 &&
+		path_node.type != JSON_PATH_END) {
+		uint32_t path_node_hash = json_path_node_hash(&path_node);
+		struct test_struct *new_node =
+			json_tree_lookup_entry_by_path_node(parent, &path_node,
+							    path_node_hash,
+							    struct test_struct,
+							    tree_node);
+		if (new_node == NULL) {
+			new_node = &records_pool[*pool_idx];
+			*pool_idx = *pool_idx + 1;
+			json_tree_node_create(&new_node->tree_node, &path_node,
+					      path_node_hash);
+			rc = json_tree_add(&parent->tree_node,
+					   &new_node->tree_node);
+			fail_if(rc != 0);
+		}
+		parent = new_node;
+	}
+	fail_if(rc != 0 || path_node.type != JSON_PATH_END);
+	return parent;
+}
+
+void
+test_tree()
+{
+	header();
+	plan(42);
+
+	struct test_struct records[6];
+	for (int i = 0; i < 6; i++)
+		records[i].value = i;
+	json_tree_node_create(&records[0].tree_node, NULL, 0);
+	fail_if(&records[0] != json_tree_entry(&records[0].tree_node,
+					       struct test_struct, tree_node));
+
+	const char *path1 = "[1][10]";
+	const char *path2 = "[1][20].file";
+	const char *path3_to_remove = "[2]";
+	const char *path_unregistered = "[1][3]";
+
+	int records_idx = 1;
+	struct test_struct *node;
+	node = test_add_path(&records[0], path1, strlen(path1), records,
+			     &records_idx);
+	is(node, &records[2], "add path '%s'", path1);
+
+	node = test_add_path(&records[0], path2, strlen(path2), records,
+			     &records_idx);
+	is(node, &records[4], "add path '%s'", path2);
+
+	node = json_tree_lookup_entry_by_path(&records[0], path1, strlen(path1),
+					      struct test_struct, tree_node);
+	is(node, &records[2], "lookup path '%s'", path1);
+
+	node = json_tree_lookup_entry_by_path(&records[0], path2, strlen(path2),
+					      struct test_struct, tree_node);
+	is(node, &records[4], "lookup path '%s'", path2);
+
+	node = json_tree_lookup_entry_by_path(&records[0], path_unregistered,
+					      strlen(path_unregistered),
+					      struct test_struct, tree_node);
+	is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+	node = test_add_path(&records[0], path3_to_remove,
+			     strlen(path3_to_remove), records, &records_idx);
+	is(node, &records[5], "add path to remove '%s'", path3_to_remove);
+	if (node != NULL) {
+		json_tree_remove(&records[0].tree_node, &node->tree_node);
+		json_tree_node_destroy(&node->tree_node);
+	} else {
+		isnt(node, NULL, "can't remove empty node!");
+	}
+
+	/* Test iterators. */
+	struct json_tree_node *tree_record = NULL;
+	const struct json_tree_node *tree_nodes_preorder[] =
+		{&records[0].tree_node, &records[1].tree_node,
+		 &records[2].tree_node, &records[3].tree_node,
+		 &records[4].tree_node};
+	int cnt = sizeof(tree_nodes_preorder)/sizeof(tree_nodes_preorder[0]);
+	int idx = 0;
+	json_tree_foreach_pre(tree_record, &records[0].tree_node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(tree_record, struct test_struct,
+					tree_node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_preorder[idx],
+					struct test_struct, tree_node);
+		is(tree_record, tree_nodes_preorder[idx],
+		   "test foreach pre order %d: have %d expected %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, 5, "records iterated count %d of %d", idx, cnt);
+
+
+	const struct json_tree_node *tree_nodes_postorder[] =
+		{&records[2].tree_node, &records[4].tree_node, &records[3].tree_node,
+		 &records[1].tree_node,
+		 &records[0].tree_node};
+	cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+	idx = 0;
+	json_tree_foreach_post(tree_record, &records[0].tree_node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(tree_record, struct test_struct,
+					tree_node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, tree_node);
+		is(tree_record, tree_nodes_postorder[idx],
+		   "test foreach post order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, 5, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_safe(tree_record, &records[0].tree_node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(tree_record, struct test_struct,
+					tree_node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, tree_node);
+		is(tree_record, tree_nodes_postorder[idx],
+		   "test foreach safe order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, 5, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_pre(node, &records[0], tree_node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_preorder[idx],
+					struct test_struct, tree_node);
+		is(&node->tree_node, tree_nodes_preorder[idx],
+		   "test foreach entry pre order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, 5, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_post(node, &records[0], tree_node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, tree_node);
+		is(&node->tree_node, tree_nodes_postorder[idx],
+		   "test foreach entry post order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, 5, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_safe(node, &records[0], tree_node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, tree_node);
+		is(&node->tree_node, tree_nodes_postorder[idx],
+		   "test foreach entry safe order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		json_tree_node_destroy(&node->tree_node);
+		idx++;
+	}
+	is(idx, 5, "records iterated count %d of %d", idx, cnt);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(2);
+	plan(3);
 
 	test_basic();
 	test_errors();
+	test_tree();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f82..7bc9d37 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..2
+1..3
 	*** test_basic ***
     1..71
     ok 1 - parse <[0]>
@@ -99,4 +99,50 @@ ok 1 - subtests
     ok 20 - tab inside identifier
 ok 2 - subtests
 	*** test_errors: done ***
+	*** test_tree ***
+    1..42
+    ok 1 - add path '[1][10]'
+    ok 2 - add path '[1][20].file'
+    ok 3 - lookup path '[1][10]'
+    ok 4 - lookup path '[1][20].file'
+    ok 5 - lookup unregistered path '[1][3]'
+    ok 6 - add path to remove '[2]'
+    ok 7 - test foreach pre order 0: have 0 expected 0
+    ok 8 - test foreach pre order 1: have 1 expected 1
+    ok 9 - test foreach pre order 2: have 2 expected 2
+    ok 10 - test foreach pre order 3: have 3 expected 3
+    ok 11 - test foreach pre order 4: have 4 expected 4
+    ok 12 - records iterated count 5 of 5
+    ok 13 - test foreach post order 0: have 2 expected of 2
+    ok 14 - test foreach post order 1: have 4 expected of 4
+    ok 15 - test foreach post order 2: have 3 expected of 3
+    ok 16 - test foreach post order 3: have 1 expected of 1
+    ok 17 - test foreach post order 4: have 0 expected of 0
+    ok 18 - records iterated count 5 of 5
+    ok 19 - test foreach safe order 0: have 2 expected of 2
+    ok 20 - test foreach safe order 1: have 4 expected of 4
+    ok 21 - test foreach safe order 2: have 3 expected of 3
+    ok 22 - test foreach safe order 3: have 1 expected of 1
+    ok 23 - test foreach safe order 4: have 0 expected of 0
+    ok 24 - records iterated count 5 of 5
+    ok 25 - test foreach entry pre order 0: have 0 expected of 0
+    ok 26 - test foreach entry pre order 1: have 1 expected of 1
+    ok 27 - test foreach entry pre order 2: have 2 expected of 2
+    ok 28 - test foreach entry pre order 3: have 3 expected of 3
+    ok 29 - test foreach entry pre order 4: have 4 expected of 4
+    ok 30 - records iterated count 5 of 5
+    ok 31 - test foreach entry post order 0: have 2 expected of 2
+    ok 32 - test foreach entry post order 1: have 4 expected of 4
+    ok 33 - test foreach entry post order 2: have 3 expected of 3
+    ok 34 - test foreach entry post order 3: have 1 expected of 1
+    ok 35 - test foreach entry post order 4: have 0 expected of 0
+    ok 36 - records iterated count 5 of 5
+    ok 37 - test foreach entry safe order 0: have 2 expected of 2
+    ok 38 - test foreach entry safe order 1: have 4 expected of 4
+    ok 39 - test foreach entry safe order 2: have 3 expected of 3
+    ok 40 - test foreach entry safe order 3: have 1 expected of 1
+    ok 41 - test foreach entry safe order 4: have 0 expected of 0
+    ok 42 - records iterated count 5 of 5
+ok 3 - subtests
+	*** test_tree: done ***
 	*** main: done ***
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 09/14] lib: introduce json_path_normalize routine
  2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
                   ` (12 preceding siblings ...)
  2018-10-11  7:58 ` [PATCH v4 08/14] lib: implement JSON tree class for json library Kirill Shcherbatov
@ 2018-10-11  7:58 ` Kirill Shcherbatov
  2018-10-16  8:39   ` Vladimir Davydov
  13 siblings, 1 reply; 24+ messages in thread
From: Kirill Shcherbatov @ 2018-10-11  7:58 UTC (permalink / raw)
  To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov

Introduced a new routine json_path_normalize that makes a
conversion of JSON path to the 'canonical' form:
  - all maps keys are specified with operator ["key"] form
  - all array indexes are specified with operator [i] form.
This notation is preferable because in the general case it can
be uniquely parsed.
We need such API in JSON indexes patch to store all paths in
'canonical' form to commit the path uniqueness checks and
to tune access with JSON path hashtable.

Need for #1012.
---
 src/lib/json/path.c        | 25 +++++++++++++++++++++++++
 src/lib/json/path.h        | 18 ++++++++++++++++++
 test/unit/json_path.c      | 41 ++++++++++++++++++++++++++++++++++++++++-
 test/unit/json_path.result | 14 +++++++++++++-
 4 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/src/lib/json/path.c b/src/lib/json/path.c
index 2e72930..0eb5d49 100644
--- a/src/lib/json/path.c
+++ b/src/lib/json/path.c
@@ -242,3 +242,28 @@ json_path_next(struct json_path_parser *parser, struct json_path_node *node)
 		return json_parse_identifier(parser, node);
 	}
 }
+
+int
+json_path_normalize(const char *path, uint32_t path_len, char *out)
+{
+	struct json_path_parser parser;
+	struct json_path_node node;
+	json_path_parser_create(&parser, path, path_len);
+	int rc;
+	while ((rc = json_path_next(&parser, &node)) == 0 &&
+		node.type != JSON_PATH_END) {
+		if (node.type == JSON_PATH_NUM) {
+			out += sprintf(out, "[%llu]",
+				      (unsigned long long)node.num);
+		} else if (node.type == JSON_PATH_STR) {
+			out += sprintf(out, "[\"%.*s\"]", node.len, node.str);
+		} else {
+			unreachable();
+		}
+	};
+	if (rc != 0)
+		return rc;
+	*out = '\0';
+	assert(node.type == JSON_PATH_END);
+	return 0;
+}
diff --git a/src/lib/json/path.h b/src/lib/json/path.h
index c3c381a..f6b2ee2 100644
--- a/src/lib/json/path.h
+++ b/src/lib/json/path.h
@@ -105,6 +105,24 @@ json_path_parser_create(struct json_path_parser *parser, const char *src,
 int
 json_path_next(struct json_path_parser *parser, struct json_path_node *node);
 
+/**
+ * Convert path to the 'canonical' form:
+ *  - all maps keys are specified with operator ["key"] form
+ *  - all array indexes are specified with operator [i] form.
+ * This notation is preferable because in the general case it can
+ * be uniquely parsed.
+ * @param path Source path string to be converted.
+ * @param path_len The length of the @path.
+ * @param[out] out Memory to store normalized string.
+ *                 The worst-case scenario require
+ *                 2.5 * path_len + 1 buffer.
+ * @retval 0 On success.
+ * @retval > 0 Position of a syntax error. A position is 1-based
+ *             and starts from a beginning of a source string.
+ */
+int
+json_path_normalize(const char *path, uint32_t path_len, char *out);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 9a1de06..775b0b1 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -360,15 +360,54 @@ test_tree()
 	footer();
 }
 
+void
+test_normalize_path()
+{
+	header();
+	plan(8);
+
+	const char *path_normalized = "[\"FIO\"][3][\"fname\"]";
+	const char *path1 = "FIO[3].fname";
+	const char *path2 = "[\"FIO\"][3].fname";
+	const char *path3 = "FIO[3][\"fname\"]";
+	char buff[strlen(path_normalized) + 1];
+	int rc;
+
+	rc = json_path_normalize(path_normalized, strlen(path_normalized),
+				 buff);
+	is(rc, 0, "normalize '%s' path status", path_normalized);
+	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+		  path_normalized);
+
+	rc = json_path_normalize(path1, strlen(path1), buff);
+	is(rc, 0, "normalize '%s' path status", path1);
+	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+		  path1);
+
+	rc = json_path_normalize(path2, strlen(path2), buff);
+	is(rc, 0, "normalize '%s' path status", path2);
+	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+		  path2);
+
+	rc = json_path_normalize(path3, strlen(path3), buff);
+	is(rc, 0, "normalize '%s' path status", path3);
+	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
+		  path3);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(3);
+	plan(4);
 
 	test_basic();
 	test_errors();
 	test_tree();
+	test_normalize_path();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index 7bc9d37..383c393 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..3
+1..4
 	*** test_basic ***
     1..71
     ok 1 - parse <[0]>
@@ -145,4 +145,16 @@ ok 2 - subtests
     ok 42 - records iterated count 5 of 5
 ok 3 - subtests
 	*** test_tree: done ***
+	*** test_normalize_path ***
+    1..8
+    ok 1 - normalize '["FIO"][3]["fname"]' path status
+    ok 2 - normalize '["FIO"][3]["fname"]' path compare
+    ok 3 - normalize 'FIO[3].fname' path status
+    ok 4 - normalize 'FIO[3].fname' path compare
+    ok 5 - normalize '["FIO"][3].fname' path status
+    ok 6 - normalize '["FIO"][3].fname' path compare
+    ok 7 - normalize 'FIO[3]["fname"]' path status
+    ok 8 - normalize 'FIO[3]["fname"]' path compare
+ok 4 - subtests
+	*** test_normalize_path: done ***
 	*** main: done ***
-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 01/14] box: refactor key_def_find routine
  2018-10-11  7:58 ` [PATCH v4 01/14] box: refactor key_def_find routine Kirill Shcherbatov
@ 2018-10-15 17:27   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-15 17:27 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:41AM +0300, Kirill Shcherbatov wrote:
> Replaced key_def_find with a new key_def_contains_part routine
> that checks if key_def contains specified part. In all cases
> legacy key_def_find has been used for such purpose. New API
> is more convenient for complex key_part that will appear with
> JSON paths introduction.
> 
> Part of #1012.

Nitpicking:

This isn't a part of the json index feature. It's merely a prerequisite,
a tiny step toward. We use

Needed for #1012

in this case (note, no dot after the issue number).

Please fix here and everywhere else.

> ---
>  src/box/key_def.c | 17 +++++++++--------
>  src/box/key_def.h | 10 ++++------
>  2 files changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index 545b5da..67b517f 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -483,16 +483,17 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count,
>  	return 0;
>  }
>  
> -const struct key_part *
> -key_def_find(const struct key_def *key_def, uint32_t fieldno)
> +bool
> +key_def_contains_part(const struct key_def *key_def,
> +		      const struct key_part *to_find)

Actually key_def_find is used in SQL code (see 2.0) and since we're
moving to trunk-based development, you'll have to leave it in the next
version. You should probably introduce key_def_contains_part as a static
helper in addition to key_def_find.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/14] box: introduce key_def_parts_are_sequential
  2018-10-11  7:58 ` [PATCH v4 02/14] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
@ 2018-10-15 17:29   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-15 17:29 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:47AM +0300, Kirill Shcherbatov wrote:
> Introduced a new key_def_parts_are_sequential routine that test,
> does specified key_def have sequential fields. This would be
> useful with introducing JSON path as there would be another
> complex criteria as fields with JSONs can't be 'sequential' in
> this meaning.
> 
> Part of #1012.

Again, should be

Needed for #1012

Other than that, the patch looks OK to me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 03/14] box: introduce tuple_field_by_relative_path
  2018-10-11  7:58 ` [PATCH v4 03/14] box: introduce tuple_field_by_relative_path Kirill Shcherbatov
@ 2018-10-15 17:46   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-15 17:46 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:48AM +0300, Kirill Shcherbatov wrote:
> The new tuple_field_by_relative_path routine is used in function
> tuple_field_raw_by_path to retrieve data by JSON path from field.
> We need this routine exported in future to access data by JSON
> path specified in key_part.
> 
> Part of #1012.
> ---
>  src/box/tuple_format.c | 60 ++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 43 insertions(+), 17 deletions(-)
> 
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index b385c0d..5679cad 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -541,6 +541,42 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
>  	return -1;
>  }
>  
> +/**
> + * Retrieve field data by JSON path.
> + * @param field Pointer to msgpack with data.
> + * @param path The path to process.
> + * @param path_len The length of the @path.
> + * @retval 0 On success.
> + * @retval >0 On path parsing error, invalid character position.
> + */
> +static inline int
> +tuple_field_by_relative_path(const char **field, const char *path,
> +			     uint32_t path_len)

I kinda dislike the function name. See, we have tuple_field_by_path,
which takes a tuple struct, and tuple_field_by_relative_path, which
takes raw msgpack. Please think of a better name so that everything
looks consistent.

Also, this function doesn't (and won't looking at the further patches)
look up tuple_field/offset_slot, which is another unsettiling difference
from tuple_field_by_path, despite similar names. May be, it should
descend the JSON tree as well?

> +{
> +	int rc;
> +	struct json_path_parser parser;
> +	struct json_path_node node;
> +	json_path_parser_create(&parser, path, path_len);

I don't like that you create a parser even if this function is called
from tuple_field_raw_by_path: you could pass the parser created by the
latter. I mean, you'd have two functions, one takes a relative path,
it's going to be public. Another takes a parser. The latter would be
called by the former and by tuple_field_raw_by_path.

> +	while ((rc = json_path_next(&parser, &node)) == 0) {
> +		switch (node.type) {
> +		case JSON_PATH_NUM:
> +			rc = tuple_field_go_to_index(field, node.num);
> +			break;
> +		case JSON_PATH_STR:
> +			rc = tuple_field_go_to_key(field, node.str, node.len);
> +			break;
> +		default:
> +			assert(node.type == JSON_PATH_END);
> +			return 0;
> +		}
> +		if (rc != 0) {
> +			*field = NULL;
> +			return 0;
> +		}
> +	}
> +	return rc;
> +}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine
  2018-10-11  7:58 ` [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine Kirill Shcherbatov
@ 2018-10-15 17:52   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-15 17:52 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:50AM +0300, Kirill Shcherbatov wrote:
> New tuple_format_sizeof routine calculates the size of memory
> for tuple_format allocation.
> 
> Part of #1012.
> ---
>  src/box/tuple_format.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index d8acaa5..6a287f2 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -42,6 +42,18 @@ static const struct tuple_field tuple_field_default = {
>  };
>  
>  /**
> + * Calculate the size of tuple format allocation.
> + * @param field_count Count of tuple fields.
> + * @retval Size of memory allocation.
> + */
> +static inline uint32_t
> +tuple_format_sizeof(uint32_t field_count)
> +{
> +	return sizeof(struct tuple_format) +
> +	       field_count * sizeof(struct tuple_field);
> +}

I looked at patch 10. Without tuple_format_dup, which I'm planning to
drop, this function wouldn't make much sense IMO.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 04/14] box: introduce tuple_format_add_key_part
  2018-10-11  7:58 ` [PATCH v4 04/14] box: introduce tuple_format_add_key_part Kirill Shcherbatov
@ 2018-10-15 19:39   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-15 19:39 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:49AM +0300, Kirill Shcherbatov wrote:
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index 5679cad..d8acaa5 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -42,6 +42,71 @@ static const struct tuple_field tuple_field_default = {
>  };
>  
>  /**
> + * Add and initialize a new key_part to format.
> + * @param format Format to initialize.
> + * @param fields Fields definition if any.
> + * @param fields_count Count of @fields.
> + * @param part An index part to append.
> + * @param is_sequential Does this part sequential.
> + * @param current_slot Pointer to last offset slot.
> + * @retval -1 On error.
> + * @retval 0 On success.
> + */

A doxygen comment is not needed here, because this is a static function
needed solely to reduce the indentation level of tuple_format_create.
Moreover, all those parameter descirptions simply repeat argument names
and hence useless. At the same time the comment itself is misleading and
incomplete: what is "initialize a new key_part to format" supposed to
mean? Does this function initialize the given key part? No, it doesn't
touch it. Please elaborate.

> +static int
> +tuple_format_add_key_part(struct tuple_format *format,
> +			  const struct field_def *fields, uint32_t field_count,
> +			  const struct key_part *part, bool is_sequential,
> +			  int *current_slot)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition
  2018-10-11  7:58 ` [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition Kirill Shcherbatov
@ 2018-10-16  8:15   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-16  8:15 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:51AM +0300, Kirill Shcherbatov wrote:
> Moved tuple_field_go_to_index and tuple_field_go_to_key functions
> definitions in tuple_format.c source to make them visible in
> tuple_init_field_map routine.

I'm not sure you'll need this since we agreed to build a field map by
parsing msgpack and looking up tree nodes instead of walking over the
json tree and looking up corresponding path components.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 08/14] lib: implement JSON tree class for json library
  2018-10-11  7:58 ` [PATCH v4 08/14] lib: implement JSON tree class for json library Kirill Shcherbatov
@ 2018-10-16  8:26   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-16  8:26 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:53AM +0300, Kirill Shcherbatov wrote:
> +/**
> + * JSON tree is a hierarchical data structure that organize JSON
> + * nodes produced by parser.
> + * 1. Numeric records are sorted in ascending order in sibling rlist.
> + * 2. Structure key records point to source strings memory.
> + * 3. Lookup by path complexity is O(path_len):
> + * 	path = path_token1 + .. + path_tokenN
> + * 	- Complexity of hash(path_tokenI) - O(strlen(path_tokenN))
> + * 	- Complexity of hash lookup - O(1) + strcmp on collision

I doubt that an unaware reader would figure out how the tree organizes
json path tokens after reading this comment...

> + */
> +struct json_tree_node {
> +	/** Path component represented by this node. */
> +	struct json_path_node key;
> +	/** Hash calculated for key field. */
> +	uint32_t key_hash;
> +	/**
> +	 * Children hashtable:
> +	 * json_path_node -> json_tree_node.
> +	 */
> +	struct mh_json_tree_node_t *child_by_path_node;
> +	/**
> +	 * Link in json_tree_node::children of the parent node.
> +	 */
> +	struct rlist sibling;
> +	/**
> +	 * List of child nodes, linked by
> +	 * json_tree_node::sibling.
> +	 */
> +	struct rlist children;
> +	/** Pointer to parent json_tree_node record. */
> +	struct json_tree_node *parent;
> +};

For the record. After discussion with Kostja we agreed that the json
tree class needs to be reworked so that there's the only hash map per a
tree that contains all descendants. To make it possible to look up a
descendant of a non-root node, rolling hashes should be used:

  hash(child node) = hash(parent node) xor hash(path token)

Also, child nodes should be stored in an array rather than in a list,
because that would allow fast access of json array elements.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/14] lib: introduce json_path_normalize routine
  2018-10-11  7:58 ` [PATCH v4 09/14] lib: introduce json_path_normalize routine Kirill Shcherbatov
@ 2018-10-16  8:39   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-16  8:39 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:54AM +0300, Kirill Shcherbatov wrote:
> Introduced a new routine json_path_normalize that makes a
> conversion of JSON path to the 'canonical' form:
>   - all maps keys are specified with operator ["key"] form
>   - all array indexes are specified with operator [i] form.
> This notation is preferable because in the general case it can
> be uniquely parsed.
> We need such API in JSON indexes patch to store all paths in
> 'canonical' form to commit the path uniqueness checks and
> to tune access with JSON path hashtable.
> 
> Need for #1012.

Nit:

Needed for #1012

> ---
>  src/lib/json/path.c        | 25 +++++++++++++++++++++++++
>  src/lib/json/path.h        | 18 ++++++++++++++++++
>  test/unit/json_path.c      | 41 ++++++++++++++++++++++++++++++++++++++++-
>  test/unit/json_path.result | 14 +++++++++++++-
>  4 files changed, 96 insertions(+), 2 deletions(-)
> 
> diff --git a/src/lib/json/path.c b/src/lib/json/path.c
> index 2e72930..0eb5d49 100644
> --- a/src/lib/json/path.c
> +++ b/src/lib/json/path.c
> @@ -242,3 +242,28 @@ json_path_next(struct json_path_parser *parser, struct json_path_node *node)
>  		return json_parse_identifier(parser, node);
>  	}
>  }
> +
> +int
> +json_path_normalize(const char *path, uint32_t path_len, char *out)
> +{
> +	struct json_path_parser parser;
> +	struct json_path_node node;
> +	json_path_parser_create(&parser, path, path_len);
> +	int rc;
> +	while ((rc = json_path_next(&parser, &node)) == 0 &&
> +		node.type != JSON_PATH_END) {
> +		if (node.type == JSON_PATH_NUM) {
> +			out += sprintf(out, "[%llu]",
> +				      (unsigned long long)node.num);
> +		} else if (node.type == JSON_PATH_STR) {
> +			out += sprintf(out, "[\"%.*s\"]", node.len, node.str);

Using sprintf here is unsafe, because it can write beyond the buffer
boundaries. The API should be reworked so that the output buffer length
is passed explicitly and the function returns the number of printed
characters, in an snprintf-like manner. You might want to use SNPRINT
helper macro.

BTW, it may be worth renaming json_path_node to json_path_token and
json_path_parser to json_path_tokenizer so as not to confuse path tokens
with tree nodes.

> +		} else {
> +			unreachable();
> +		}
> +	};
> +	if (rc != 0)
> +		return rc;
> +	*out = '\0';
> +	assert(node.type == JSON_PATH_END);
> +	return 0;
> +}
> diff --git a/src/lib/json/path.h b/src/lib/json/path.h
> index c3c381a..f6b2ee2 100644
> --- a/src/lib/json/path.h
> +++ b/src/lib/json/path.h
> @@ -105,6 +105,24 @@ json_path_parser_create(struct json_path_parser *parser, const char *src,
>  int
>  json_path_next(struct json_path_parser *parser, struct json_path_node *node);
>  
> +/**
> + * Convert path to the 'canonical' form:
> + *  - all maps keys are specified with operator ["key"] form
> + *  - all array indexes are specified with operator [i] form.
> + * This notation is preferable because in the general case it can
> + * be uniquely parsed.
> + * @param path Source path string to be converted.
> + * @param path_len The length of the @path.
> + * @param[out] out Memory to store normalized string.
> + *                 The worst-case scenario require
> + *                 2.5 * path_len + 1 buffer.
> + * @retval 0 On success.
> + * @retval > 0 Position of a syntax error. A position is 1-based
> + *             and starts from a beginning of a source string.
> + */
> +int
> +json_path_normalize(const char *path, uint32_t path_len, char *out);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/test/unit/json_path.c b/test/unit/json_path.c
> index 9a1de06..775b0b1 100644
> --- a/test/unit/json_path.c
> +++ b/test/unit/json_path.c
> @@ -360,15 +360,54 @@ test_tree()
>  	footer();
>  }
>  
> +void
> +test_normalize_path()
> +{
> +	header();
> +	plan(8);
> +
> +	const char *path_normalized = "[\"FIO\"][3][\"fname\"]";
> +	const char *path1 = "FIO[3].fname";
> +	const char *path2 = "[\"FIO\"][3].fname";
> +	const char *path3 = "FIO[3][\"fname\"]";
> +	char buff[strlen(path_normalized) + 1];
> +	int rc;
> +
> +	rc = json_path_normalize(path_normalized, strlen(path_normalized),
> +				 buff);
> +	is(rc, 0, "normalize '%s' path status", path_normalized);
> +	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
> +		  path_normalized);
> +
> +	rc = json_path_normalize(path1, strlen(path1), buff);
> +	is(rc, 0, "normalize '%s' path status", path1);
> +	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
> +		  path1);
> +
> +	rc = json_path_normalize(path2, strlen(path2), buff);
> +	is(rc, 0, "normalize '%s' path status", path2);
> +	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
> +		  path2);
> +
> +	rc = json_path_normalize(path3, strlen(path3), buff);
> +	is(rc, 0, "normalize '%s' path status", path3);
> +	is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare",
> +		  path3);

I guess, you should test invalid paths as well so as to cover error
paths.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/14] box: introduce JSON indexes
  2018-10-11  7:58 ` [PATCH v4 10/14] box: introduce JSON indexes Kirill Shcherbatov
@ 2018-10-16  9:33   ` Vladimir Davydov
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Davydov @ 2018-10-16  9:33 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches

On Thu, Oct 11, 2018 at 10:58:42AM +0300, Kirill Shcherbatov wrote:
> @@ -477,34 +833,39 @@ tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
>  	}
>  
>  	/* first field is simply accessible, so we do not store offset to it */
> -	enum mp_type mp_type = mp_typeof(*pos);
> -	const struct tuple_field *field = &format->fields[0];
> -	if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> -				 TUPLE_INDEX_BASE, field->is_nullable))
> -		return -1;
> -	mp_next(&pos);
> -	/* other fields...*/
> -	++field;
> -	uint32_t i = 1;
> -	uint32_t defined_field_count = MIN(field_count, format->field_count);
> -	if (field_count < format->index_field_count) {
> +	struct tuple_field *field = &format->fields[0];
> +	uint32_t i = 0;
> +	if (field_count < format->index_field_count ||
> +	    json_tree_node_children_count(&field->path_tree_node) == 0) {
>  		/*
> -		 * Nullify field map to be able to detect by 0,
> -		 * which key fields are absent in tuple_field().
> -		 */
> +		* Nullify field map to be able to detect by 0,
> +		* which key fields are absent in tuple_field().
> +		*/
>  		memset((char *)field_map - format->field_map_size, 0,
> -		       format->field_map_size);
> +		format->field_map_size);
>  	}
> -	for (; i < defined_field_count; ++i, ++field) {
> -		mp_type = mp_typeof(*pos);
> +	if (json_tree_node_children_count(&field->path_tree_node) == 0) {
> +		enum mp_type mp_type = mp_typeof(*pos);
>  		if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> -					 i + TUPLE_INDEX_BASE,
> -					 field->is_nullable))
> +					TUPLE_INDEX_BASE, field->is_nullable))
> +			return -1;
> +		mp_next(&pos);
> +		++field;
> +		++i;
> +	}
> +	size_t off_stack_size =
> +		format->max_path_tree_depth * sizeof(const char *);
> +	const char **off_stack = region_alloc(&fiber()->gc, off_stack_size);
> +	if (off_stack == NULL) {
> +		diag_set(OutOfMemory, off_stack_size, "region_alloc",
> +			"off_stack");
> +		return -1;
> +	}
> +	uint32_t defined_field_count = MIN(field_count, format->field_count);
> +	for (; i < defined_field_count; ++i, ++field) {
> +		if (tuple_field_tree_parse_raw(field, pos, tuple, i, field_map,
> +					       off_stack) != 0)
>  			return -1;
> -		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> -			field_map[field->offset_slot] =
> -				(uint32_t) (pos - tuple);
> -		}
>  		mp_next(&pos);
>  	}
>  	return 0;

> @@ -108,6 +109,10 @@ struct tuple_field {
>  	bool is_key_part;
>  	/** True, if a field can store NULL. */
>  	bool is_nullable;
> +	/** JSON path tree max depth. */
> +	uint32_t path_tree_depth;
> +	/** JSON root path tree node for registered indexes. */
> +	struct json_tree_node path_tree_node;
>  };

For the record. After discussion with Kostja, we agreed that a json tree
should be rooted at struct tuple_format rather than tuple_field so that
top level fields are accessed in the same fashion as nested fields.

Also, a tuple field map should be initialized as msgpack gets parsed
rather than walking over a json tree and looking up fields in msgpack,
because that would be more efficient.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-10-16  9:33 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-11  7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov
2018-10-11  7:58 ` [PATCH v4 01/14] box: refactor key_def_find routine Kirill Shcherbatov
2018-10-15 17:27   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 10/14] box: introduce JSON indexes Kirill Shcherbatov
2018-10-16  9:33   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 11/14] box: introduce has_json_paths flag in templates Kirill Shcherbatov
2018-10-11  7:58 ` [PATCH v4 12/14] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2018-10-11  7:58 ` [PATCH v4 13/14] box: introduce offset slot cache in key_part Kirill Shcherbatov
2018-10-11  7:58 ` [PATCH v4 14/14] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-10-11  7:58 ` [PATCH v4 02/14] box: introduce key_def_parts_are_sequential Kirill Shcherbatov
2018-10-15 17:29   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 03/14] box: introduce tuple_field_by_relative_path Kirill Shcherbatov
2018-10-15 17:46   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 04/14] box: introduce tuple_format_add_key_part Kirill Shcherbatov
2018-10-15 19:39   ` Vladimir Davydov
2018-10-11  7:58 ` [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine Kirill Shcherbatov
2018-10-15 17:52   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition Kirill Shcherbatov
2018-10-16  8:15   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 07/14] box: drop format const qualifier in *init_field_map Kirill Shcherbatov
2018-10-11  7:58 ` [PATCH v4 08/14] lib: implement JSON tree class for json library Kirill Shcherbatov
2018-10-16  8:26   ` Vladimir Davydov
2018-10-11  7:58 ` [PATCH v4 09/14] lib: introduce json_path_normalize routine Kirill Shcherbatov
2018-10-16  8:39   ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox