Tarantool development patches archive
 help / color / mirror / Atom feed
* [PATCH v5 0/9] box: indexes by JSON path
@ 2018-11-26 10:49 Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
                   ` (8 more replies)
  0 siblings, 9 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes
https://github.com/tarantool/tarantool/issues/1012

Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
New JSON tree class is used to manage tuple_field(s) defined for
format. This allows to work with fields in unified way.
To speed-up data access by JSON index key_part structure extended
with offset_slot cache that points to field_map item containing
data offset for current tuple. Initialization of the field map is
done by traversing the tree to detect vertices that are missing
in the msgpack.
Introduced offset_slot_cache in key_part to tune data access
for typical scenario of using tuples that have same format.

Kirill Shcherbatov (9):
  box: refactor json_path_parser class
  lib: implement JSON tree class for json library
  box: manage format fields with JSON tree class
  lib: introduce json_path_cmp routine
  box: introduce JSON indexes
  box: introduce has_json_paths flag in templates
  box: tune tuple_field_raw_by_path for indexed data
  box: introduce offset slot cache in key_part
  box: specify indexes in user-friendly form

 src/box/alter.cc                |   7 +-
 src/box/blackhole.c             |   5 +-
 src/box/engine.h                |  11 +-
 src/box/errcode.h               |   2 +-
 src/box/index_def.c             |   8 +-
 src/box/key_def.c               | 171 +++++++++--
 src/box/key_def.h               |  31 +-
 src/box/lua/index.c             |  66 +++++
 src/box/lua/schema.lua          |  22 +-
 src/box/lua/space.cc            |   5 +
 src/box/lua/tuple.c             |   2 +-
 src/box/memtx_engine.c          |   7 +-
 src/box/memtx_space.c           |   5 +-
 src/box/memtx_space.h           |   2 +-
 src/box/schema.cc               |   4 +-
 src/box/space.c                 |   6 +-
 src/box/space.h                 |   8 +-
 src/box/sql.c                   |  17 +-
 src/box/sql/build.c             |   6 +-
 src/box/sql/select.c            |   6 +-
 src/box/sql/where.c             |   1 +
 src/box/sysview.c               |   3 +-
 src/box/tuple.c                 |  40 +--
 src/box/tuple_compare.cc        | 125 +++++---
 src/box/tuple_extract_key.cc    | 121 +++++---
 src/box/tuple_format.c          | 633 ++++++++++++++++++++++++++++++++--------
 src/box/tuple_format.h          |  80 ++++-
 src/box/tuple_hash.cc           |  47 ++-
 src/box/vinyl.c                 |  10 +-
 src/box/vy_log.c                |   3 +-
 src/box/vy_lsm.c                |   5 +-
 src/box/vy_point_lookup.c       |   2 -
 src/box/vy_stmt.c               | 168 +++++++++--
 src/lib/json/CMakeLists.txt     |   3 +-
 src/lib/json/json.c             | 537 ++++++++++++++++++++++++++++++++++
 src/lib/json/json.h             | 296 +++++++++++++++++++
 src/lib/json/path.c             | 244 ----------------
 src/lib/json/path.h             | 112 -------
 test/box/misc.result            |   1 +
 test/engine/tuple.result        | 462 +++++++++++++++++++++++++++++
 test/engine/tuple.test.lua      | 135 +++++++++
 test/unit/json_path.c           | 281 ++++++++++++++++--
 test/unit/json_path.result      |  51 +++-
 test/unit/vy_iterators_helper.c |   6 +-
 test/unit/vy_mem.c              |   2 +-
 test/unit/vy_point_lookup.c     |   2 +-
 46 files changed, 3006 insertions(+), 755 deletions(-)
 create mode 100644 src/lib/json/json.c
 create mode 100644 src/lib/json/json.h
 delete mode 100644 src/lib/json/path.c
 delete mode 100644 src/lib/json/path.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 1/9] box: refactor json_path_parser class
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 2/9] lib: implement JSON tree class for json library Kirill Shcherbatov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

Renamed object json_path_node to json_token and
json_path_parser class to json_lexer.

Need for #1012
---
 src/box/tuple_format.c |  51 +++++++++--------
 src/lib/json/path.c    | 151 ++++++++++++++++++++++++-------------------------
 src/lib/json/path.h    |  49 ++++++++--------
 test/unit/json_path.c  |  54 +++++++++---------
 4 files changed, 152 insertions(+), 153 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 5a2481f..cf05cc8 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -580,19 +580,19 @@ static int
 tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
 {
 	int rc;
-	struct json_path_parser parser;
-	struct json_path_node node;
-	json_path_parser_create(&parser, path, path_len);
-	while ((rc = json_path_next(&parser, &node)) == 0) {
-		switch (node.type) {
-		case JSON_PATH_NUM:
-			rc = tuple_field_go_to_index(data, node.num);
+	struct json_lexer lexer;
+	struct json_token token;
+	json_lexer_create(&lexer, path, path_len);
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0) {
+		switch (token.type) {
+		case JSON_TOKEN_NUM:
+			rc = tuple_field_go_to_index(data, token.num);
 			break;
-		case JSON_PATH_STR:
-			rc = tuple_field_go_to_key(data, node.str, node.len);
+		case JSON_TOKEN_STR:
+			rc = tuple_field_go_to_key(data, token.str, token.len);
 			break;
 		default:
-			assert(node.type == JSON_PATH_END);
+			assert(token.type == JSON_TOKEN_END);
 			return 0;
 		}
 		if (rc != 0) {
@@ -622,15 +622,15 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		*field = tuple_field_raw(format, tuple, field_map, fieldno);
 		return 0;
 	}
-	struct json_path_parser parser;
-	struct json_path_node node;
-	json_path_parser_create(&parser, path, path_len);
-	int rc = json_path_next(&parser, &node);
+	struct json_lexer lexer;
+	struct json_token token;
+	json_lexer_create(&lexer, path, path_len);
+	int rc = json_lexer_next_token(&lexer, &token);
 	if (rc != 0)
 		goto error;
-	switch(node.type) {
-	case JSON_PATH_NUM: {
-		int index = node.num;
+	switch(token.type) {
+	case JSON_TOKEN_NUM: {
+		int index = token.num;
 		if (index == 0) {
 			*field = NULL;
 			return 0;
@@ -641,10 +641,10 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			return 0;
 		break;
 	}
-	case JSON_PATH_STR: {
+	case JSON_TOKEN_STR: {
 		/* First part of a path is a field name. */
 		uint32_t name_hash;
-		if (path_len == (uint32_t) node.len) {
+		if (path_len == (uint32_t) token.len) {
 			name_hash = path_hash;
 		} else {
 			/*
@@ -653,25 +653,26 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			 * used. A tuple dictionary hashes only
 			 * name, not path.
 			 */
-			name_hash = field_name_hash(node.str, node.len);
+			name_hash = field_name_hash(token.str, token.len);
 		}
 		*field = tuple_field_raw_by_name(format, tuple, field_map,
-						 node.str, node.len, name_hash);
+						 token.str, token.len,
+						 name_hash);
 		if (*field == NULL)
 			return 0;
 		break;
 	}
 	default:
-		assert(node.type == JSON_PATH_END);
+		assert(token.type == JSON_TOKEN_END);
 		*field = NULL;
 		return 0;
 	}
-	rc = tuple_field_go_to_path(field, path + parser.offset,
-				    path_len - parser.offset);
+	rc = tuple_field_go_to_path(field, path + lexer.offset,
+				    path_len - lexer.offset);
 	if (rc == 0)
 		return 0;
 	/* Setup absolute error position. */
-	rc += parser.offset;
+	rc += lexer.offset;
 
 error:
 	assert(rc > 0);
diff --git a/src/lib/json/path.c b/src/lib/json/path.c
index 2e72930..dfd7d5c 100644
--- a/src/lib/json/path.c
+++ b/src/lib/json/path.c
@@ -38,82 +38,82 @@
 
 /**
  * Read a single symbol from a string starting from an offset.
- * @param parser JSON path parser.
+ * @param lexer JSON path lexer.
  * @param[out] UChar32 Read symbol.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_read_symbol(struct json_path_parser *parser, UChar32 *out)
+json_read_symbol(struct json_lexer *lexer, UChar32 *out)
 {
-	if (parser->offset == parser->src_len) {
+	if (lexer->offset == lexer->src_len) {
 		*out = U_SENTINEL;
-		return parser->symbol_count + 1;
+		return lexer->symbol_count + 1;
 	}
-	U8_NEXT(parser->src, parser->offset, parser->src_len, *out);
+	U8_NEXT(lexer->src, lexer->offset, lexer->src_len, *out);
 	if (*out == U_SENTINEL)
-		return parser->symbol_count + 1;
-	++parser->symbol_count;
+		return lexer->symbol_count + 1;
+	++lexer->symbol_count;
 	return 0;
 }
 
 /**
  * Rollback one symbol offset.
- * @param parser JSON path parser.
+ * @param lexer JSON path lexer.
  * @param offset Offset to the previous symbol.
  */
 static inline void
-json_revert_symbol(struct json_path_parser *parser, int offset)
+json_revert_symbol(struct json_lexer *lexer, int offset)
 {
-	parser->offset = offset;
-	--parser->symbol_count;
+	lexer->offset = offset;
+	--lexer->symbol_count;
 }
 
 /** Fast forward when it is known that a symbol is 1-byte char. */
 static inline void
-json_skip_char(struct json_path_parser *parser)
+json_skip_char(struct json_lexer *lexer)
 {
-	++parser->offset;
-	++parser->symbol_count;
+	++lexer->offset;
+	++lexer->symbol_count;
 }
 
 /** Get a current symbol as a 1-byte char. */
 static inline char
-json_current_char(const struct json_path_parser *parser)
+json_current_char(const struct json_lexer *lexer)
 {
-	return *(parser->src + parser->offset);
+	return *(lexer->src + lexer->offset);
 }
 
 /**
- * Parse string identifier in quotes. Parser either stops right
+ * Parse string identifier in quotes. Lexer either stops right
  * after the closing quote, or returns an error position.
- * @param parser JSON path parser.
- * @param[out] node JSON node to store result.
+ * @param lexer JSON path lexer.
+ * @param[out] token JSON token to store result.
  * @param quote_type Quote by that a string must be terminated.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
+json_parse_string(struct json_lexer *lexer, struct json_token *token,
 		  UChar32 quote_type)
 {
-	assert(parser->offset < parser->src_len);
-	assert(quote_type == json_current_char(parser));
+	assert(lexer->offset < lexer->src_len);
+	assert(quote_type == json_current_char(lexer));
 	/* The first symbol is always char  - ' or ". */
-	json_skip_char(parser);
-	int str_offset = parser->offset;
+	json_skip_char(lexer);
+	int str_offset = lexer->offset;
 	UChar32 c;
 	int rc;
-	while ((rc = json_read_symbol(parser, &c)) == 0) {
+	while ((rc = json_read_symbol(lexer, &c)) == 0) {
 		if (c == quote_type) {
-			int len = parser->offset - str_offset - 1;
+			int len = lexer->offset - str_offset - 1;
 			if (len == 0)
-				return parser->symbol_count;
-			node->type = JSON_PATH_STR;
-			node->str = parser->src + str_offset;
-			node->len = len;
+				return lexer->symbol_count;
+			token->type = JSON_TOKEN_STR;
+			token->str = lexer->src + str_offset;
+			token->len = len;
 			return 0;
 		}
 	}
@@ -122,32 +122,32 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
 
 /**
  * Parse digit sequence into integer until non-digit is met.
- * Parser stops right after the last digit.
- * @param parser JSON parser.
- * @param[out] node JSON node to store result.
+ * Lexer stops right after the last digit.
+ * @param lexer JSON lexer.
+ * @param[out] token JSON token to store result.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_parse_integer(struct json_path_parser *parser, struct json_path_node *node)
+json_parse_integer(struct json_lexer *lexer, struct json_token *token)
 {
-	const char *end = parser->src + parser->src_len;
-	const char *pos = parser->src + parser->offset;
+	const char *end = lexer->src + lexer->src_len;
+	const char *pos = lexer->src + lexer->offset;
 	assert(pos < end);
 	int len = 0;
 	uint64_t value = 0;
 	char c = *pos;
 	if (! isdigit(c))
-		return parser->symbol_count + 1;
+		return lexer->symbol_count + 1;
 	do {
 		value = value * 10 + c - (int)'0';
 		++len;
 	} while (++pos < end && isdigit((c = *pos)));
-	parser->offset += len;
-	parser->symbol_count += len;
-	node->type = JSON_PATH_NUM;
-	node->num = value;
+	lexer->offset += len;
+	lexer->symbol_count += len;
+	token->type = JSON_TOKEN_NUM;
+	token->num = value;
 	return 0;
 }
 
@@ -164,81 +164,80 @@ json_is_valid_identifier_symbol(UChar32 c)
 /**
  * Parse identifier out of quotes. It can contain only alphas,
  * digits and underscores. And can not contain digit at the first
- * position. Parser is stoped right after the last non-digit,
+ * position. Lexer is stoped right after the last non-digit,
  * non-alpha and non-underscore symbol.
- * @param parser JSON parser.
- * @param[out] node JSON node to store result.
+ * @param lexer JSON lexer.
+ * @param[out] token JSON token to store result.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_parse_identifier(struct json_path_parser *parser,
-		      struct json_path_node *node)
+json_parse_identifier(struct json_lexer *lexer, struct json_token *token)
 {
-	assert(parser->offset < parser->src_len);
-	int str_offset = parser->offset;
+	assert(lexer->offset < lexer->src_len);
+	int str_offset = lexer->offset;
 	UChar32 c;
-	int rc = json_read_symbol(parser, &c);
+	int rc = json_read_symbol(lexer, &c);
 	if (rc != 0)
 		return rc;
 	/* First symbol can not be digit. */
 	if (!u_isalpha(c) && c != (UChar32)'_')
-		return parser->symbol_count;
-	int last_offset = parser->offset;
-	while ((rc = json_read_symbol(parser, &c)) == 0) {
+		return lexer->symbol_count;
+	int last_offset = lexer->offset;
+	while ((rc = json_read_symbol(lexer, &c)) == 0) {
 		if (! json_is_valid_identifier_symbol(c)) {
-			json_revert_symbol(parser, last_offset);
+			json_revert_symbol(lexer, last_offset);
 			break;
 		}
-		last_offset = parser->offset;
+		last_offset = lexer->offset;
 	}
-	node->type = JSON_PATH_STR;
-	node->str = parser->src + str_offset;
-	node->len = parser->offset - str_offset;
+	token->type = JSON_TOKEN_STR;
+	token->str = lexer->src + str_offset;
+	token->len = lexer->offset - str_offset;
 	return 0;
 }
 
 int
-json_path_next(struct json_path_parser *parser, struct json_path_node *node)
+json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
 {
-	if (parser->offset == parser->src_len) {
-		node->type = JSON_PATH_END;
+	if (lexer->offset == lexer->src_len) {
+		token->type = JSON_TOKEN_END;
 		return 0;
 	}
 	UChar32 c;
-	int last_offset = parser->offset;
-	int rc = json_read_symbol(parser, &c);
+	int last_offset = lexer->offset;
+	int rc = json_read_symbol(lexer, &c);
 	if (rc != 0)
 		return rc;
 	switch(c) {
 	case (UChar32)'[':
 		/* Error for '[\0'. */
-		if (parser->offset == parser->src_len)
-			return parser->symbol_count;
-		c = json_current_char(parser);
+		if (lexer->offset == lexer->src_len)
+			return lexer->symbol_count;
+		c = json_current_char(lexer);
 		if (c == '"' || c == '\'')
-			rc = json_parse_string(parser, node, c);
+			rc = json_parse_string(lexer, token, c);
 		else
-			rc = json_parse_integer(parser, node);
+			rc = json_parse_integer(lexer, token);
 		if (rc != 0)
 			return rc;
 		/*
 		 * Expression, started from [ must be finished
 		 * with ] regardless of its type.
 		 */
-		if (parser->offset == parser->src_len ||
-		    json_current_char(parser) != ']')
-			return parser->symbol_count + 1;
+		if (lexer->offset == lexer->src_len ||
+		    json_current_char(lexer) != ']')
+			return lexer->symbol_count + 1;
 		/* Skip ] - one byte char. */
-		json_skip_char(parser);
+		json_skip_char(lexer);
 		return 0;
 	case (UChar32)'.':
-		if (parser->offset == parser->src_len)
-			return parser->symbol_count + 1;
-		return json_parse_identifier(parser, node);
+		if (lexer->offset == lexer->src_len)
+			return lexer->symbol_count + 1;
+		return json_parse_identifier(lexer, token);
 	default:
-		json_revert_symbol(parser, last_offset);
-		return json_parse_identifier(parser, node);
+		json_revert_symbol(lexer, last_offset);
+		return json_parse_identifier(lexer, token);
 	}
 }
diff --git a/src/lib/json/path.h b/src/lib/json/path.h
index c3c381a..7f41fb4 100644
--- a/src/lib/json/path.h
+++ b/src/lib/json/path.h
@@ -37,25 +37,25 @@ extern "C" {
 #endif
 
 /**
- * Parser for JSON paths:
+ * Lexer for JSON paths:
  * <field>, <.field>, <[123]>, <['field']> and their combinations.
  */
-struct json_path_parser {
+struct json_lexer {
 	/** Source string. */
 	const char *src;
 	/** Length of string. */
 	int src_len;
-	/** Current parser's offset in bytes. */
+	/** Current lexer's offset in bytes. */
 	int offset;
-	/** Current parser's offset in symbols. */
+	/** Current lexer's offset in symbols. */
 	int symbol_count;
 };
 
-enum json_path_type {
-	JSON_PATH_NUM,
-	JSON_PATH_STR,
-	/** Parser reached end of path. */
-	JSON_PATH_END,
+enum json_token_type {
+	JSON_TOKEN_NUM,
+	JSON_TOKEN_STR,
+	/** Lexer reached end of path. */
+	JSON_TOKEN_END,
 };
 
 /**
@@ -63,8 +63,8 @@ enum json_path_type {
  * String idenfiers are in ["..."] and between dots. Numbers are
  * indexes in [...].
  */
-struct json_path_node {
-	enum json_path_type type;
+struct json_token {
+	enum json_token_type type;
 	union {
 		struct {
 			/** String identifier. */
@@ -78,32 +78,31 @@ struct json_path_node {
 };
 
 /**
- * Create @a parser.
- * @param[out] parser Parser to create.
+ * Create @a lexer.
+ * @param[out] lexer Lexer to create.
  * @param src Source string.
  * @param src_len Length of @a src.
  */
 static inline void
-json_path_parser_create(struct json_path_parser *parser, const char *src,
-                        int src_len)
+json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
 {
-	parser->src = src;
-	parser->src_len = src_len;
-	parser->offset = 0;
-	parser->symbol_count = 0;
+	lexer->src = src;
+	lexer->src_len = src_len;
+	lexer->offset = 0;
+	lexer->symbol_count = 0;
 }
 
 /**
- * Get a next path node.
- * @param parser Parser.
- * @param[out] node Node to store parsed result.
- * @retval   0 Success. For result see @a node.str, node.len,
- *             node.num.
+ * Get a next path token.
+ * @param lexer Lexer.
+ * @param[out] token Token to store parsed result.
+ * @retval   0 Success. For result see @a token.str, token.len,
+ *             token.num.
  * @retval > 0 Position of a syntax error. A position is 1-based
  *             and starts from a beginning of a source string.
  */
 int
-json_path_next(struct json_path_parser *parser, struct json_path_node *node);
+json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
 #ifdef __cplusplus
 }
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 1d7e7d3..bb6e5ca 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -6,21 +6,21 @@
 #define reset_to_new_path(value) \
 	path = value; \
 	len = strlen(value); \
-	json_path_parser_create(&parser, path, len);
+	json_lexer_create(&lexer, path, len);
 
 #define is_next_index(value_len, value) \
-	path = parser.src + parser.offset; \
-	is(json_path_next(&parser, &node), 0, "parse <%." #value_len "s>", \
+	path = lexer.src + lexer.offset; \
+	is(json_lexer_next_token(&lexer, &token), 0, "parse <%." #value_len "s>", \
 	   path); \
-	is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", path); \
-	is(node.num, value, "<%." #value_len "s> is " #value, path);
+	is(token.type, JSON_TOKEN_NUM, "<%." #value_len "s> is num", path); \
+	is(token.num, value, "<%." #value_len "s> is " #value, path);
 
 #define is_next_key(value) \
 	len = strlen(value); \
-	is(json_path_next(&parser, &node), 0, "parse <" value ">"); \
-	is(node.type, JSON_PATH_STR, "<" value "> is str"); \
-	is(node.len, len, "len is %d", len); \
-	is(strncmp(node.str, value, len), 0, "str is " value);
+	is(json_lexer_next_token(&lexer, &token), 0, "parse <" value ">"); \
+	is(token.type, JSON_TOKEN_STR, "<" value "> is str"); \
+	is(token.len, len, "len is %d", len); \
+	is(strncmp(token.str, value, len), 0, "str is " value);
 
 void
 test_basic()
@@ -29,8 +29,8 @@ test_basic()
 	plan(71);
 	const char *path;
 	int len;
-	struct json_path_parser parser;
-	struct json_path_node node;
+	struct json_lexer lexer;
+	struct json_token token;
 
 	reset_to_new_path("[0].field1.field2['field3'][5]");
 	is_next_index(3, 0);
@@ -61,8 +61,8 @@ test_basic()
 
 	/* Empty path. */
 	reset_to_new_path("");
-	is(json_path_next(&parser, &node), 0, "parse empty path");
-	is(node.type, JSON_PATH_END, "is str");
+	is(json_lexer_next_token(&lexer, &token), 0, "parse empty path");
+	is(token.type, JSON_TOKEN_END, "is str");
 
 	/* Path with no '.' at the beginning. */
 	reset_to_new_path("field1.field2");
@@ -81,8 +81,8 @@ test_basic()
 
 #define check_new_path_on_error(value, errpos) \
 	reset_to_new_path(value); \
-	struct json_path_node node; \
-	is(json_path_next(&parser, &node), errpos, "error on position %d" \
+	struct json_token token; \
+	is(json_lexer_next_token(&lexer, &token), errpos, "error on position %d" \
 	   " for <%s>", errpos, path);
 
 struct path_and_errpos {
@@ -97,7 +97,7 @@ test_errors()
 	plan(20);
 	const char *path;
 	int len;
-	struct json_path_parser parser;
+	struct json_lexer lexer;
 	const struct path_and_errpos errors[] = {
 		/* Double [[. */
 		{"[[", 2},
@@ -133,27 +133,27 @@ test_errors()
 	for (size_t i = 0; i < lengthof(errors); ++i) {
 		reset_to_new_path(errors[i].path);
 		int errpos = errors[i].errpos;
-		struct json_path_node node;
-		is(json_path_next(&parser, &node), errpos,
+		struct json_token token;
+		is(json_lexer_next_token(&lexer, &token), errpos,
 		   "error on position %d for <%s>", errpos, path);
 	}
 
 	reset_to_new_path("f.[2]")
-	struct json_path_node node;
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 3, "can not write <field.[index]>")
+	struct json_token token;
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 3, "can not write <field.[index]>")
 
 	reset_to_new_path("f.")
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 3, "error in leading <.>");
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 3, "error in leading <.>");
 
 	reset_to_new_path("fiel d1")
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 5, "space inside identifier");
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 5, "space inside identifier");
 
 	reset_to_new_path("field\t1")
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 6, "tab inside identifier");
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 6, "tab inside identifier");
 
 	check_plan();
 	footer();
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 3/9] box: manage format fields with JSON tree class Kirill Shcherbatov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

Need for #1012
---
 src/box/lua/tuple.c         |   2 +-
 src/box/tuple_format.c      |  24 ++-
 src/lib/json/CMakeLists.txt |   3 +-
 src/lib/json/json.c         | 509 ++++++++++++++++++++++++++++++++++++++++++++
 src/lib/json/json.h         | 285 +++++++++++++++++++++++++
 src/lib/json/path.c         | 243 ---------------------
 src/lib/json/path.h         | 111 ----------
 test/unit/json_path.c       | 210 +++++++++++++++++-
 test/unit/json_path.result  |  41 +++-
 9 files changed, 1052 insertions(+), 376 deletions(-)
 create mode 100644 src/lib/json/json.c
 create mode 100644 src/lib/json/json.h
 delete mode 100644 src/lib/json/path.c
 delete mode 100644 src/lib/json/path.h

diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
index 65660ce..cbe71da 100644
--- a/src/box/lua/tuple.c
+++ b/src/box/lua/tuple.c
@@ -41,7 +41,7 @@
 #include "box/tuple.h"
 #include "box/tuple_convert.h"
 #include "box/errcode.h"
-#include "json/path.h"
+#include "json/json.h"
 #include "mpstream.h"
 
 /** {{{ box.tuple Lua library
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index cf05cc8..d184dba 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -28,7 +28,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
-#include "json/path.h"
+#include "json/json.h"
 #include "tuple_format.h"
 #include "coll_id_cache.h"
 
@@ -584,15 +584,16 @@ tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
 	struct json_token token;
 	json_lexer_create(&lexer, path, path_len);
 	while ((rc = json_lexer_next_token(&lexer, &token)) == 0) {
-		switch (token.type) {
+		switch (token.key.type) {
 		case JSON_TOKEN_NUM:
-			rc = tuple_field_go_to_index(data, token.num);
+			rc = tuple_field_go_to_index(data, token.key.num);
 			break;
 		case JSON_TOKEN_STR:
-			rc = tuple_field_go_to_key(data, token.str, token.len);
+			rc = tuple_field_go_to_key(data, token.key.str,
+						   token.key.len);
 			break;
 		default:
-			assert(token.type == JSON_TOKEN_END);
+			assert(token.key.type == JSON_TOKEN_END);
 			return 0;
 		}
 		if (rc != 0) {
@@ -628,9 +629,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 	int rc = json_lexer_next_token(&lexer, &token);
 	if (rc != 0)
 		goto error;
-	switch(token.type) {
+	switch(token.key.type) {
 	case JSON_TOKEN_NUM: {
-		int index = token.num;
+		int index = token.key.num;
 		if (index == 0) {
 			*field = NULL;
 			return 0;
@@ -644,7 +645,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 	case JSON_TOKEN_STR: {
 		/* First part of a path is a field name. */
 		uint32_t name_hash;
-		if (path_len == (uint32_t) token.len) {
+		if (path_len == (uint32_t) token.key.len) {
 			name_hash = path_hash;
 		} else {
 			/*
@@ -653,17 +654,18 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			 * used. A tuple dictionary hashes only
 			 * name, not path.
 			 */
-			name_hash = field_name_hash(token.str, token.len);
+			name_hash = field_name_hash(token.key.str,
+						    token.key.len);
 		}
 		*field = tuple_field_raw_by_name(format, tuple, field_map,
-						 token.str, token.len,
+						 token.key.str, token.key.len,
 						 name_hash);
 		if (*field == NULL)
 			return 0;
 		break;
 	}
 	default:
-		assert(token.type == JSON_TOKEN_END);
+		assert(token.key.type == JSON_TOKEN_END);
 		*field = NULL;
 		return 0;
 	}
diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 203fe6f..51a1f02 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -1,6 +1,7 @@
 set(lib_sources
-    path.c
+    json.c
 )
 
 set_source_files_compile_flags(${lib_sources})
 add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
new file mode 100644
index 0000000..9198dca
--- /dev/null
+++ b/src/lib/json/json.c
@@ -0,0 +1,509 @@
+/*
+ * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <ctype.h>
+#include <stdbool.h>
+#include <unicode/uchar.h>
+#include <unicode/utf8.h>
+#include "json.h"
+#include "trivia/util.h"
+#include "third_party/PMurHash.h"
+
+/**
+ * Read a single symbol from a string starting from an offset.
+ * @param lexer JSON path lexer.
+ * @param[out] UChar32 Read symbol.
+ *
+ * @retval   0 Success.
+ * @retval > 0 1-based position of a syntax error.
+ */
+static inline int
+json_read_symbol(struct json_lexer *lexer, UChar32 *out)
+{
+	if (lexer->offset == lexer->src_len) {
+		*out = U_SENTINEL;
+		return lexer->symbol_count + 1;
+	}
+	U8_NEXT(lexer->src, lexer->offset, lexer->src_len, *out);
+	if (*out == U_SENTINEL)
+		return lexer->symbol_count + 1;
+	++lexer->symbol_count;
+	return 0;
+}
+
+/**
+ * Rollback one symbol offset.
+ * @param lexer JSON path lexer.
+ * @param offset Offset to the previous symbol.
+ */
+static inline void
+json_revert_symbol(struct json_lexer *lexer, int offset)
+{
+	lexer->offset = offset;
+	--lexer->symbol_count;
+}
+
+/** Fast forward when it is known that a symbol is 1-byte char. */
+static inline void
+json_skip_char(struct json_lexer *lexer)
+{
+	++lexer->offset;
+	++lexer->symbol_count;
+}
+
+/** Get a current symbol as a 1-byte char. */
+static inline char
+json_current_char(const struct json_lexer *lexer)
+{
+	return *(lexer->src + lexer->offset);
+}
+
+/**
+ * Parse string identifier in quotes. Lexer either stops right
+ * after the closing quote, or returns an error position.
+ * @param lexer JSON path lexer.
+ * @param[out] token JSON token to store result.
+ * @param quote_type Quote by that a string must be terminated.
+ *
+ * @retval   0 Success.
+ * @retval > 0 1-based position of a syntax error.
+ */
+static inline int
+json_parse_string(struct json_lexer *lexer, struct json_token *token,
+		  UChar32 quote_type)
+{
+	assert(lexer->offset < lexer->src_len);
+	assert(quote_type == json_current_char(lexer));
+	/* The first symbol is always char  - ' or ". */
+	json_skip_char(lexer);
+	int str_offset = lexer->offset;
+	UChar32 c;
+	int rc;
+	while ((rc = json_read_symbol(lexer, &c)) == 0) {
+		if (c == quote_type) {
+			int len = lexer->offset - str_offset - 1;
+			if (len == 0)
+				return lexer->symbol_count;
+			token->key.type = JSON_TOKEN_STR;
+			token->key.str = lexer->src + str_offset;
+			token->key.len = len;
+			return 0;
+		}
+	}
+	return rc;
+}
+
+/**
+ * Parse digit sequence into integer until non-digit is met.
+ * Lexer stops right after the last digit.
+ * @param lexer JSON lexer.
+ * @param[out] token JSON token to store result.
+ *
+ * @retval   0 Success.
+ * @retval > 0 1-based position of a syntax error.
+ */
+static inline int
+json_parse_integer(struct json_lexer *lexer, struct json_token *token)
+{
+	const char *end = lexer->src + lexer->src_len;
+	const char *pos = lexer->src + lexer->offset;
+	assert(pos < end);
+	int len = 0;
+	uint64_t value = 0;
+	char c = *pos;
+	if (! isdigit(c))
+		return lexer->symbol_count + 1;
+	do {
+		value = value * 10 + c - (int)'0';
+		++len;
+	} while (++pos < end && isdigit((c = *pos)));
+	lexer->offset += len;
+	lexer->symbol_count += len;
+	token->key.type = JSON_TOKEN_NUM;
+	token->key.num = value;
+	return 0;
+}
+
+/**
+ * Check that a symbol can be part of a JSON path not inside
+ * ["..."].
+ */
+static inline bool
+json_is_valid_identifier_symbol(UChar32 c)
+{
+	return u_isUAlphabetic(c) || c == (UChar32)'_' || u_isdigit(c);
+}
+
+/**
+ * Parse identifier out of quotes. It can contain only alphas,
+ * digits and underscores. And can not contain digit at the first
+ * position. Lexer is stoped right after the last non-digit,
+ * non-alpha and non-underscore symbol.
+ * @param lexer JSON lexer.
+ * @param[out] token JSON token to store result.
+ *
+ * @retval   0 Success.
+ * @retval > 0 1-based position of a syntax error.
+ */
+static inline int
+json_parse_identifier(struct json_lexer *lexer, struct json_token *token)
+{
+	assert(lexer->offset < lexer->src_len);
+	int str_offset = lexer->offset;
+	UChar32 c;
+	int rc = json_read_symbol(lexer, &c);
+	if (rc != 0)
+		return rc;
+	/* First symbol can not be digit. */
+	if (!u_isalpha(c) && c != (UChar32)'_')
+		return lexer->symbol_count;
+	int last_offset = lexer->offset;
+	while ((rc = json_read_symbol(lexer, &c)) == 0) {
+		if (! json_is_valid_identifier_symbol(c)) {
+			json_revert_symbol(lexer, last_offset);
+			break;
+		}
+		last_offset = lexer->offset;
+	}
+	token->key.type = JSON_TOKEN_STR;
+	token->key.str = lexer->src + str_offset;
+	token->key.len = lexer->offset - str_offset;
+	return 0;
+}
+
+int
+json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
+{
+	if (lexer->offset == lexer->src_len) {
+		token->key.type = JSON_TOKEN_END;
+		return 0;
+	}
+	UChar32 c;
+	int last_offset = lexer->offset;
+	int rc = json_read_symbol(lexer, &c);
+	if (rc != 0)
+		return rc;
+	switch(c) {
+	case (UChar32)'[':
+		/* Error for '[\0'. */
+		if (lexer->offset == lexer->src_len)
+			return lexer->symbol_count;
+		c = json_current_char(lexer);
+		if (c == '"' || c == '\'')
+			rc = json_parse_string(lexer, token, c);
+		else
+			rc = json_parse_integer(lexer, token);
+		if (rc != 0)
+			return rc;
+		/*
+		 * Expression, started from [ must be finished
+		 * with ] regardless of its type.
+		 */
+		if (lexer->offset == lexer->src_len ||
+		    json_current_char(lexer) != ']')
+			return lexer->symbol_count + 1;
+		/* Skip ] - one byte char. */
+		json_skip_char(lexer);
+		return 0;
+	case (UChar32)'.':
+		if (lexer->offset == lexer->src_len)
+			return lexer->symbol_count + 1;
+		return json_parse_identifier(lexer, token);
+	default:
+		json_revert_symbol(lexer, last_offset);
+		return json_parse_identifier(lexer, token);
+	}
+}
+
+/** Compare JSON tokens keys. */
+static int
+json_token_key_cmp(const struct json_token *a, const struct json_token *b)
+{
+	if (a->key.type != b->key.type)
+		return a->key.type - b->key.type;
+	int ret = 0;
+	if (a->key.type == JSON_TOKEN_STR) {
+		if (a->key.len != b->key.len)
+			return a->key.len - b->key.len;
+		ret = memcmp(a->key.str, b->key.str, a->key.len);
+	} else if (a->key.type == JSON_TOKEN_NUM) {
+		ret = a->key.num - b->key.num;
+	} else {
+		unreachable();
+	}
+	return ret;
+}
+
+/**
+ * Compare hash records of two json tree nodes. Return 0 if equal.
+ */
+static inline int
+mh_json_cmp(const struct json_token *a, const struct json_token *b)
+{
+	if (a->parent != b->parent)
+		return a->parent - b->parent;
+	return json_token_key_cmp(a, b);
+}
+
+#define MH_SOURCE 1
+#define mh_name _json
+#define mh_key_t struct json_token **
+#define mh_node_t struct json_token *
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((*a)->rolling_hash)
+#define mh_hash_key(a, arg) ((*a)->rolling_hash)
+#define mh_cmp(a, b, arg) (mh_json_cmp((*a), (*b)))
+#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
+#include "salad/mhash.h"
+
+static const uint32_t hash_seed = 13U;
+
+/** Compute the hash value of a JSON token. */
+static uint32_t
+json_token_hash(struct json_token *token, uint32_t seed)
+{
+	uint32_t h = seed;
+	uint32_t carry = 0;
+	const void *data;
+	uint32_t data_size;
+	if (token->key.type == JSON_TOKEN_STR) {
+		data = token->key.str;
+		data_size = token->key.len;
+	} else if (token->key.type == JSON_TOKEN_NUM) {
+		data = &token->key.num;
+		data_size = sizeof(token->key.num);
+	} else {
+		unreachable();
+	}
+	PMurHash32_Process(&h, &carry, data, data_size);
+	return PMurHash32_Result(h, carry, data_size);
+}
+
+int
+json_tree_create(struct json_tree *tree)
+{
+	memset(tree, 0, sizeof(struct json_tree));
+	tree->root.rolling_hash = hash_seed;
+	tree->root.key.type = JSON_TOKEN_END;
+	tree->hash = mh_json_new();
+	return tree->hash == NULL ? -1 : 0;
+}
+
+static void
+json_token_destroy(struct json_token *token)
+{
+	free(token->children);
+}
+
+void
+json_tree_destroy(struct json_tree *tree)
+{
+	assert(tree->hash != NULL);
+	json_token_destroy(&tree->root);
+	mh_json_delete(tree->hash);
+}
+
+struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+		 struct json_token *token)
+{
+	if (parent == NULL)
+		parent = &tree->root;
+	struct json_token *ret = NULL;
+	if (token->key.type == JSON_TOKEN_STR) {
+		struct json_token key = *token;
+		key.rolling_hash = json_token_hash(token, parent->rolling_hash);
+		key.parent = parent;
+		token = &key;
+		mh_int_t id = mh_json_find(tree->hash, &token, NULL);
+		if (unlikely(id == mh_end(tree->hash)))
+			return NULL;
+		struct json_token **entry = mh_json_node(tree->hash, id);
+		assert(entry == NULL || (*entry)->parent == parent);
+		return entry != NULL ? *entry : NULL;
+	} else if (token->key.type == JSON_TOKEN_NUM) {
+		uint32_t idx =  token->key.num - 1;
+		ret = idx < parent->child_size ? parent->children[idx] : NULL;
+	} else {
+		unreachable();
+	}
+	return ret;
+}
+
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token)
+{
+	if (parent == NULL)
+		parent = &tree->root;
+	uint32_t rolling_hash =
+	       json_token_hash(token, parent->rolling_hash);
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+	uint32_t insert_idx = (token->key.type == JSON_TOKEN_NUM) ?
+			      (uint32_t)token->key.num - 1 :
+			      parent->child_size;
+	if (insert_idx >= parent->child_size) {
+		uint32_t new_size =
+			parent->child_size == 0 ? 1 : 2 * parent->child_size;
+		while (insert_idx >= new_size)
+			new_size *= 2;
+		struct json_token **children =
+			realloc(parent->children, new_size*sizeof(void *));
+		if (unlikely(children == NULL))
+			return -1;
+		memset(children + parent->child_size, 0,
+		       (new_size - parent->child_size)*sizeof(void *));
+		parent->children = children;
+		parent->child_size = new_size;
+	}
+	assert(parent->children[insert_idx] == NULL);
+	parent->children[insert_idx] = token;
+	parent->child_count = MAX(parent->child_count, insert_idx + 1);
+	token->sibling_idx = insert_idx;
+	token->rolling_hash = rolling_hash;
+	token->parent = parent;
+
+	const struct json_token **key =
+		(const struct json_token **)&token;
+	mh_int_t rc = mh_json_put(tree->hash, key, NULL, NULL);
+	if (unlikely(rc == mh_end(tree->hash))) {
+		parent->children[insert_idx] = NULL;
+		return -1;
+	}
+	assert(json_tree_lookup(tree, parent, token) == token);
+	return 0;
+}
+
+void
+json_tree_del(struct json_tree *tree, struct json_token *token)
+{
+	struct json_token *parent = token->parent;
+	assert(json_tree_lookup(tree, parent, token) == token);
+	struct json_token **child_slot = NULL;
+	if (token->key.type == JSON_TOKEN_NUM) {
+		child_slot = &parent->children[token->key.num - 1];
+	} else {
+		uint32_t idx = 0;
+		while (idx < parent->child_size &&
+		       parent->children[idx] != token) { idx++; }
+		if (idx < parent->child_size &&
+		       parent->children[idx] == token)
+			child_slot = &parent->children[idx];
+	}
+	assert(child_slot != NULL && *child_slot == token);
+	*child_slot = NULL;
+
+	mh_int_t id = mh_json_find(tree->hash, &token, NULL);
+	assert(id != mh_end(tree->hash));
+	mh_json_del(tree->hash, id, NULL);
+	json_token_destroy(token);
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+}
+
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token token;
+	struct json_token *ret = parent != NULL ? parent : &tree->root;
+	json_lexer_create(&lexer, path, path_len);
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
+	       token.key.type != JSON_TOKEN_END && ret != NULL) {
+		ret = json_tree_lookup(tree, ret, &token);
+	}
+	if (rc != 0 || token.key.type != JSON_TOKEN_END)
+		return NULL;
+	return ret;
+}
+
+static struct json_token *
+json_tree_child_next(struct json_token *parent, struct json_token *pos)
+{
+	assert(pos == NULL || pos->parent == parent);
+	struct json_token **arr = parent->children;
+	uint32_t arr_size = parent->child_size;
+	if (arr == NULL)
+		return NULL;
+	uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
+	while (idx < arr_size && arr[idx] == NULL)
+		idx++;
+	if (idx >= arr_size)
+		return NULL;
+	return arr[idx];
+}
+
+static struct json_token *
+json_tree_leftmost(struct json_token *pos)
+{
+	struct json_token *last;
+	do {
+		last = pos;
+		pos = json_tree_child_next(pos, NULL);
+	} while (pos != NULL);
+	return last;
+}
+
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos)
+{
+	if (pos == NULL)
+		pos = root;
+	struct json_token *next = json_tree_child_next(pos, NULL);
+	if (next != NULL)
+		return next;
+	while (pos != root) {
+		next = json_tree_child_next(pos->parent, pos);
+		if (next != NULL)
+			return next;
+		pos = pos->parent;
+	}
+	return NULL;
+}
+
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos)
+{
+	struct json_token *next;
+	if (pos == NULL) {
+		next = json_tree_leftmost(root);
+		return next != root ? next : NULL;
+	}
+	if (pos == root)
+		return NULL;
+	next = json_tree_child_next(pos->parent, pos);
+	if (next != NULL) {
+		next = json_tree_leftmost(next);
+		return next != root ? next : NULL;
+	}
+	return pos->parent != root ? pos->parent : NULL;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
new file mode 100644
index 0000000..dd09f5a
--- /dev/null
+++ b/src/lib/json/json.h
@@ -0,0 +1,285 @@
+#ifndef TARANTOOL_JSON_JSON_H_INCLUDED
+#define TARANTOOL_JSON_JSON_H_INCLUDED
+/*
+ * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above
+ *    copyright notice, this list of conditions and the
+ *    following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above
+ *    copyright notice, this list of conditions and the following
+ *    disclaimer in the documentation and/or other materials
+ *    provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
+ * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
+ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Lexer for JSON paths:
+ * <field>, <.field>, <[123]>, <['field']> and their combinations.
+ */
+struct json_lexer {
+	/** Source string. */
+	const char *src;
+	/** Length of string. */
+	int src_len;
+	/** Current lexer's offset in bytes. */
+	int offset;
+	/** Current lexer's offset in symbols. */
+	int symbol_count;
+};
+
+enum json_token_type {
+	JSON_TOKEN_NUM,
+	JSON_TOKEN_STR,
+	/** Lexer reached end of path. */
+	JSON_TOKEN_END,
+};
+
+/**
+ * Element of a JSON path. It can be either string or number.
+ * String idenfiers are in ["..."] and between dots. Numbers are
+ * indexes in [...]. May be used to organize a JSON tree.
+ */
+struct json_token {
+	struct {
+		enum json_token_type type;
+		union {
+			struct {
+				/** String identifier. */
+				const char *str;
+				/** Length of @a str. */
+				int len;
+			};
+			/** Index value. */
+			uint64_t num;
+		};
+	} key;
+	/** Rolling hash for node used to lookup in json_tree. */
+	uint32_t rolling_hash;
+	/**
+	 * Array of child records. Indexes in this array
+	 * follows array indexe-1 for JSON_TOKEN_NUM token type
+	 * and are allocated sequently for JSON_TOKEN_NUM child
+	 * tokens. NULLs initializations are performed with new
+	 * entries allocation.
+	 */
+	struct json_token **children;
+	/** Allocation size of children array. */
+	uint32_t child_size;
+	/**
+	 * Count of defined children array items. Equal the
+	 * maximum index of inserted item.
+	 */
+	uint32_t child_count;
+	/** Index of node in parent children array. */
+	uint32_t sibling_idx;
+	/** Pointer to parent node. */
+	struct json_token *parent;
+};
+
+struct mh_json_t;
+
+/** JSON tree object to manage tokens relations. */
+struct json_tree {
+	/** JSON tree root node. */
+	struct json_token root;
+	/** Hashtable af all tree nodes. */
+	struct mh_json_t *hash;
+};
+
+/**
+ * Create @a lexer.
+ * @param[out] lexer Lexer to create.
+ * @param src Source string.
+ * @param src_len Length of @a src.
+ */
+static inline void
+json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
+{
+	lexer->src = src;
+	lexer->src_len = src_len;
+	lexer->offset = 0;
+	lexer->symbol_count = 0;
+}
+
+/**
+ * Get a next path token.
+ * @param lexer Lexer.
+ * @param[out] token Token to store parsed result.
+ * @retval   0 Success. For result see @a token.str, token.len,
+ *             token.num.
+ * @retval > 0 Position of a syntax error. A position is 1-based
+ *             and starts from a beginning of a source string.
+ */
+int
+json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
+
+/** Create a JSON tree object to manage data relations. */
+int
+json_tree_create(struct json_tree *tree);
+
+/**
+ * Destroy JSON tree object. This routine doesn't destroy attached
+ * subtree so it should be called at the end of manual destroy.
+ */
+void
+json_tree_destroy(struct json_tree *tree);
+
+/**
+ * Make child lookup in JSON tree by token at position specified
+ * with parent. The parent may be set NULL to use tree root
+ * record.
+ */
+struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+		 struct json_token *token);
+
+/**
+ * Append token to the given parent position in a JSON tree. The
+ * parent mustn't have a child with such content. The parent may
+ * be set NULL to use tree root record.
+ */
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token);
+
+/**
+ * Delete a JSON tree token at the given parent position in JSON
+ * tree. Token entry shouldn't have subtree.
+ */
+void
+json_tree_del(struct json_tree *tree, struct json_token *token);
+
+/** Make child lookup by path in JSON tree. */
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len);
+
+/** Make pre order traversal in JSON tree. */
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos);
+
+/** Make post order traversal in JSON tree. */
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos);
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * May be used for destructors.
+ */
+#define json_tree_foreach_safe(node, root)				     \
+for (struct json_token *__next = json_tree_postorder_next((root), NULL);     \
+     (((node) = __next) &&						     \
+     (__next = json_tree_postorder_next((root), (node))), (node) != NULL);)
+
+#ifndef typeof
+/* TODO: 'typeof' is a GNU extension */
+#define typeof __typeof__
+#endif
+
+/** Return container entry by json_tree_node node. */
+#define json_tree_entry(node, type, member) ({ 				     \
+	const typeof( ((type *)0)->member ) *__mptr = (node);		     \
+	(type *)( (char *)__mptr - ((size_t) &((type *)0)->member) );	     \
+})
+
+/**
+ * Return container entry by json_tree_node or NULL if
+ * node is NULL.
+ */
+#define json_tree_entry_safe(node, type, member) ({			     \
+	(node) != NULL ? json_tree_entry((node), type, member) : NULL;	     \
+})
+
+/** Make entry pre order traversal in JSON tree  */
+#define json_tree_preorder_next_entry(node, root, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_preorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/** Make entry post order traversal in JSON tree  */
+#define json_tree_postorder_next_entry(node, root, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_postorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/** Make lookup in tree by path and return entry. */
+#define json_tree_lookup_path_entry(tree, parent, path, path_len, type,	     \
+				    member)				     \
+({struct json_token *__node =						     \
+	json_tree_lookup_path((tree), (parent), path, path_len);	     \
+	json_tree_entry_safe(__node, type, member); })
+
+/** Make lookup in tree by token and return entry. */
+#define json_tree_lookup_entry(tree, parent, token, type, member)	     \
+({struct json_token *__node =						     \
+	json_tree_lookup((tree), (parent), token);			     \
+	json_tree_entry_safe(__node, type, member);			     \
+})
+
+/** Make pre-order traversal in JSON tree. */
+#define json_tree_foreach_preorder(node, root)				     \
+for ((node) = json_tree_preorder_next((root), NULL); (node) != NULL;	     \
+     (node) = json_tree_preorder_next((root), (node)))
+
+/** Make post-order traversal in JSON tree. */
+#define json_tree_foreach_postorder(node, root)				     \
+for ((node) = json_tree_postorder_next((root), NULL); (node) != NULL;	     \
+     (node) = json_tree_postorder_next((root), (node)))
+
+/** Make post-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_preorder(node, root, type, member)	     \
+for ((node) = json_tree_preorder_next_entry(NULL, (root), type, member);     \
+     (node) != NULL;							     \
+     (node) = json_tree_preorder_next_entry(&(node)->member, (root), type,   \
+					    member))
+
+/** Make pre-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_postorder(node, root, type, member)	     \
+for ((node) = json_tree_postorder_next_entry(NULL, (root), type, member);    \
+     (node) != NULL;							     \
+     (node) = json_tree_postorder_next_entry(&(node)->member, (root), type,  \
+     					     member))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ */
+#define json_tree_foreach_entry_safe(node, root, type, member)		     \
+for (type *__next = json_tree_postorder_next_entry(NULL, (root), type,	     \
+						   member);		     \
+     (((node) = __next) &&						     \
+     (__next = json_tree_postorder_next_entry(&(node)->member, (root), type, \
+					      member)),			     \
+     (node) != NULL);)
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* TARANTOOL_JSON_JSON_H_INCLUDED */
diff --git a/src/lib/json/path.c b/src/lib/json/path.c
deleted file mode 100644
index dfd7d5c..0000000
--- a/src/lib/json/path.c
+++ /dev/null
@@ -1,243 +0,0 @@
-/*
- * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * 1. Redistributions of source code must retain the above
- *    copyright notice, this list of conditions and the
- *    following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above
- *    copyright notice, this list of conditions and the following
- *    disclaimer in the documentation and/or other materials
- *    provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
- * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
- * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
- * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
- * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
- * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
- * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-
-#include "path.h"
-#include <ctype.h>
-#include <stdbool.h>
-#include <unicode/uchar.h>
-#include <unicode/utf8.h>
-#include "trivia/util.h"
-
-/**
- * Read a single symbol from a string starting from an offset.
- * @param lexer JSON path lexer.
- * @param[out] UChar32 Read symbol.
- *
- * @retval   0 Success.
- * @retval > 0 1-based position of a syntax error.
- */
-static inline int
-json_read_symbol(struct json_lexer *lexer, UChar32 *out)
-{
-	if (lexer->offset == lexer->src_len) {
-		*out = U_SENTINEL;
-		return lexer->symbol_count + 1;
-	}
-	U8_NEXT(lexer->src, lexer->offset, lexer->src_len, *out);
-	if (*out == U_SENTINEL)
-		return lexer->symbol_count + 1;
-	++lexer->symbol_count;
-	return 0;
-}
-
-/**
- * Rollback one symbol offset.
- * @param lexer JSON path lexer.
- * @param offset Offset to the previous symbol.
- */
-static inline void
-json_revert_symbol(struct json_lexer *lexer, int offset)
-{
-	lexer->offset = offset;
-	--lexer->symbol_count;
-}
-
-/** Fast forward when it is known that a symbol is 1-byte char. */
-static inline void
-json_skip_char(struct json_lexer *lexer)
-{
-	++lexer->offset;
-	++lexer->symbol_count;
-}
-
-/** Get a current symbol as a 1-byte char. */
-static inline char
-json_current_char(const struct json_lexer *lexer)
-{
-	return *(lexer->src + lexer->offset);
-}
-
-/**
- * Parse string identifier in quotes. Lexer either stops right
- * after the closing quote, or returns an error position.
- * @param lexer JSON path lexer.
- * @param[out] token JSON token to store result.
- * @param quote_type Quote by that a string must be terminated.
- *
- * @retval   0 Success.
- * @retval > 0 1-based position of a syntax error.
- */
-static inline int
-json_parse_string(struct json_lexer *lexer, struct json_token *token,
-		  UChar32 quote_type)
-{
-	assert(lexer->offset < lexer->src_len);
-	assert(quote_type == json_current_char(lexer));
-	/* The first symbol is always char  - ' or ". */
-	json_skip_char(lexer);
-	int str_offset = lexer->offset;
-	UChar32 c;
-	int rc;
-	while ((rc = json_read_symbol(lexer, &c)) == 0) {
-		if (c == quote_type) {
-			int len = lexer->offset - str_offset - 1;
-			if (len == 0)
-				return lexer->symbol_count;
-			token->type = JSON_TOKEN_STR;
-			token->str = lexer->src + str_offset;
-			token->len = len;
-			return 0;
-		}
-	}
-	return rc;
-}
-
-/**
- * Parse digit sequence into integer until non-digit is met.
- * Lexer stops right after the last digit.
- * @param lexer JSON lexer.
- * @param[out] token JSON token to store result.
- *
- * @retval   0 Success.
- * @retval > 0 1-based position of a syntax error.
- */
-static inline int
-json_parse_integer(struct json_lexer *lexer, struct json_token *token)
-{
-	const char *end = lexer->src + lexer->src_len;
-	const char *pos = lexer->src + lexer->offset;
-	assert(pos < end);
-	int len = 0;
-	uint64_t value = 0;
-	char c = *pos;
-	if (! isdigit(c))
-		return lexer->symbol_count + 1;
-	do {
-		value = value * 10 + c - (int)'0';
-		++len;
-	} while (++pos < end && isdigit((c = *pos)));
-	lexer->offset += len;
-	lexer->symbol_count += len;
-	token->type = JSON_TOKEN_NUM;
-	token->num = value;
-	return 0;
-}
-
-/**
- * Check that a symbol can be part of a JSON path not inside
- * ["..."].
- */
-static inline bool
-json_is_valid_identifier_symbol(UChar32 c)
-{
-	return u_isUAlphabetic(c) || c == (UChar32)'_' || u_isdigit(c);
-}
-
-/**
- * Parse identifier out of quotes. It can contain only alphas,
- * digits and underscores. And can not contain digit at the first
- * position. Lexer is stoped right after the last non-digit,
- * non-alpha and non-underscore symbol.
- * @param lexer JSON lexer.
- * @param[out] token JSON token to store result.
- *
- * @retval   0 Success.
- * @retval > 0 1-based position of a syntax error.
- */
-static inline int
-json_parse_identifier(struct json_lexer *lexer, struct json_token *token)
-{
-	assert(lexer->offset < lexer->src_len);
-	int str_offset = lexer->offset;
-	UChar32 c;
-	int rc = json_read_symbol(lexer, &c);
-	if (rc != 0)
-		return rc;
-	/* First symbol can not be digit. */
-	if (!u_isalpha(c) && c != (UChar32)'_')
-		return lexer->symbol_count;
-	int last_offset = lexer->offset;
-	while ((rc = json_read_symbol(lexer, &c)) == 0) {
-		if (! json_is_valid_identifier_symbol(c)) {
-			json_revert_symbol(lexer, last_offset);
-			break;
-		}
-		last_offset = lexer->offset;
-	}
-	token->type = JSON_TOKEN_STR;
-	token->str = lexer->src + str_offset;
-	token->len = lexer->offset - str_offset;
-	return 0;
-}
-
-int
-json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
-{
-	if (lexer->offset == lexer->src_len) {
-		token->type = JSON_TOKEN_END;
-		return 0;
-	}
-	UChar32 c;
-	int last_offset = lexer->offset;
-	int rc = json_read_symbol(lexer, &c);
-	if (rc != 0)
-		return rc;
-	switch(c) {
-	case (UChar32)'[':
-		/* Error for '[\0'. */
-		if (lexer->offset == lexer->src_len)
-			return lexer->symbol_count;
-		c = json_current_char(lexer);
-		if (c == '"' || c == '\'')
-			rc = json_parse_string(lexer, token, c);
-		else
-			rc = json_parse_integer(lexer, token);
-		if (rc != 0)
-			return rc;
-		/*
-		 * Expression, started from [ must be finished
-		 * with ] regardless of its type.
-		 */
-		if (lexer->offset == lexer->src_len ||
-		    json_current_char(lexer) != ']')
-			return lexer->symbol_count + 1;
-		/* Skip ] - one byte char. */
-		json_skip_char(lexer);
-		return 0;
-	case (UChar32)'.':
-		if (lexer->offset == lexer->src_len)
-			return lexer->symbol_count + 1;
-		return json_parse_identifier(lexer, token);
-	default:
-		json_revert_symbol(lexer, last_offset);
-		return json_parse_identifier(lexer, token);
-	}
-}
diff --git a/src/lib/json/path.h b/src/lib/json/path.h
deleted file mode 100644
index 7f41fb4..0000000
--- a/src/lib/json/path.h
+++ /dev/null
@@ -1,111 +0,0 @@
-#ifndef TARANTOOL_JSON_PATH_H_INCLUDED
-#define TARANTOOL_JSON_PATH_H_INCLUDED
-/*
- * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
- *
- * Redistribution and use in source and binary forms, with or
- * without modification, are permitted provided that the following
- * conditions are met:
- *
- * 1. Redistributions of source code must retain the above
- *    copyright notice, this list of conditions and the
- *    following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above
- *    copyright notice, this list of conditions and the following
- *    disclaimer in the documentation and/or other materials
- *    provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
- * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
- * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
- * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
- * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
- * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
- * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- * SUCH DAMAGE.
- */
-#include <stdint.h>
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/**
- * Lexer for JSON paths:
- * <field>, <.field>, <[123]>, <['field']> and their combinations.
- */
-struct json_lexer {
-	/** Source string. */
-	const char *src;
-	/** Length of string. */
-	int src_len;
-	/** Current lexer's offset in bytes. */
-	int offset;
-	/** Current lexer's offset in symbols. */
-	int symbol_count;
-};
-
-enum json_token_type {
-	JSON_TOKEN_NUM,
-	JSON_TOKEN_STR,
-	/** Lexer reached end of path. */
-	JSON_TOKEN_END,
-};
-
-/**
- * Element of a JSON path. It can be either string or number.
- * String idenfiers are in ["..."] and between dots. Numbers are
- * indexes in [...].
- */
-struct json_token {
-	enum json_token_type type;
-	union {
-		struct {
-			/** String identifier. */
-			const char *str;
-			/** Length of @a str. */
-			int len;
-		};
-		/** Index value. */
-		uint64_t num;
-	};
-};
-
-/**
- * Create @a lexer.
- * @param[out] lexer Lexer to create.
- * @param src Source string.
- * @param src_len Length of @a src.
- */
-static inline void
-json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
-{
-	lexer->src = src;
-	lexer->src_len = src_len;
-	lexer->offset = 0;
-	lexer->symbol_count = 0;
-}
-
-/**
- * Get a next path token.
- * @param lexer Lexer.
- * @param[out] token Token to store parsed result.
- * @retval   0 Success. For result see @a token.str, token.len,
- *             token.num.
- * @retval > 0 Position of a syntax error. A position is 1-based
- *             and starts from a beginning of a source string.
- */
-int
-json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* TARANTOOL_JSON_PATH_H_INCLUDED */
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index bb6e5ca..f6b0472 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -1,7 +1,8 @@
-#include "json/path.h"
+#include "json/json.h"
 #include "unit.h"
 #include "trivia/util.h"
 #include <string.h>
+#include <stdbool.h>
 
 #define reset_to_new_path(value) \
 	path = value; \
@@ -12,15 +13,15 @@
 	path = lexer.src + lexer.offset; \
 	is(json_lexer_next_token(&lexer, &token), 0, "parse <%." #value_len "s>", \
 	   path); \
-	is(token.type, JSON_TOKEN_NUM, "<%." #value_len "s> is num", path); \
-	is(token.num, value, "<%." #value_len "s> is " #value, path);
+	is(token.key.type, JSON_TOKEN_NUM, "<%." #value_len "s> is num", path); \
+	is(token.key.num, value, "<%." #value_len "s> is " #value, path);
 
 #define is_next_key(value) \
 	len = strlen(value); \
 	is(json_lexer_next_token(&lexer, &token), 0, "parse <" value ">"); \
-	is(token.type, JSON_TOKEN_STR, "<" value "> is str"); \
-	is(token.len, len, "len is %d", len); \
-	is(strncmp(token.str, value, len), 0, "str is " value);
+	is(token.key.type, JSON_TOKEN_STR, "<" value "> is str"); \
+	is(token.key.len, len, "len is %d", len); \
+	is(strncmp(token.key.str, value, len), 0, "str is " value);
 
 void
 test_basic()
@@ -62,7 +63,7 @@ test_basic()
 	/* Empty path. */
 	reset_to_new_path("");
 	is(json_lexer_next_token(&lexer, &token), 0, "parse empty path");
-	is(token.type, JSON_TOKEN_END, "is str");
+	is(token.key.type, JSON_TOKEN_END, "is str");
 
 	/* Path with no '.' at the beginning. */
 	reset_to_new_path("field1.field2");
@@ -159,14 +160,207 @@ test_errors()
 	footer();
 }
 
+struct test_struct {
+	int value;
+	struct json_token node;
+};
+
+struct test_struct *
+test_struct_alloc(struct test_struct *records_pool, int *pool_idx)
+{
+	struct test_struct *ret = &records_pool[*pool_idx];
+	*pool_idx = *pool_idx + 1;
+	memset(&ret->node, 0, sizeof(ret->node));
+	return ret;
+}
+
+struct test_struct *
+test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
+	      struct test_struct *records_pool, int *pool_idx)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token *parent = NULL;
+	json_lexer_create(&lexer, path, path_len);
+	struct test_struct *field = test_struct_alloc(records_pool, pool_idx);
+	while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 &&
+		field->node.key.type != JSON_TOKEN_END) {
+		struct json_token *next =
+			json_tree_lookup(tree, parent, &field->node);
+		if (next == NULL) {
+			rc = json_tree_add(tree, parent, &field->node);
+			fail_if(rc != 0);
+			next = &field->node;
+			field = test_struct_alloc(records_pool, pool_idx);
+		}
+		parent = next;
+	}
+	fail_if(rc != 0 || field->node.key.type != JSON_TOKEN_END);
+	*pool_idx = *pool_idx - 1;
+	/* release field */
+	return json_tree_entry(parent, struct test_struct, node);
+}
+
+void
+test_tree()
+{
+	header();
+	plan(35);
+
+	struct json_tree tree;
+	int rc = json_tree_create(&tree);
+	fail_if(rc != 0);
+
+	struct test_struct records[6];
+	for (int i = 0; i < 6; i++)
+		records[i].value = i;
+
+	const char *path1 = "[1][10]";
+	const char *path2 = "[1][20].file";
+	const char *path_unregistered = "[1][3]";
+
+	int records_idx = 1;
+	struct test_struct *node;
+	node = test_add_path(&tree, path1, strlen(path1), records,
+			     &records_idx);
+	is(node, &records[2], "add path '%s'", path1);
+
+	node = test_add_path(&tree, path2, strlen(path2), records,
+			     &records_idx);
+	is(node, &records[4], "add path '%s'", path2);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1),
+					   struct test_struct, node);
+	is(node, &records[2], "lookup path '%s'", path1);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+					   struct test_struct, node);
+	is(node, &records[4], "lookup path '%s'", path2);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered,
+					   strlen(path_unregistered),
+					   struct test_struct, node);
+	is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+	/* Test iterators. */
+	struct json_token *token = NULL;
+	const struct json_token *tokens_preorder[] =
+		{&records[1].node, &records[2].node,
+		 &records[3].node, &records[4].node};
+	int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]);
+	int idx = 0;
+
+	json_tree_foreach_preorder(token, &tree.root) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(token, tokens_preorder[idx],
+		   "test foreach pre order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	const struct json_token *tree_nodes_postorder[] =
+		{&records[2].node, &records[4].node,
+		 &records[3].node, &records[1].node};
+	cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+	idx = 0;
+	json_tree_foreach_postorder(token, &tree.root) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach post order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_safe(token, &tree.root) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach safe order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_preorder(node, &tree.root, struct test_struct,
+					 node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(&node->node, tokens_preorder[idx],
+		   "test foreach entry pre order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_postorder(node, &tree.root, struct test_struct,
+					  node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder[idx],
+		   "test foreach entry post order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_safe(node, &tree.root, struct test_struct,
+				     node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder[idx],
+		   "test foreach entry safe order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		json_tree_del(&tree, &node->node);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+	json_tree_destroy(&tree);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(2);
+	plan(3);
 
 	test_basic();
 	test_errors();
+	test_tree();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f82..df68210 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..2
+1..3
 	*** test_basic ***
     1..71
     ok 1 - parse <[0]>
@@ -99,4 +99,43 @@ ok 1 - subtests
     ok 20 - tab inside identifier
 ok 2 - subtests
 	*** test_errors: done ***
+	*** test_tree ***
+    1..35
+    ok 1 - add path '[1][10]'
+    ok 2 - add path '[1][20].file'
+    ok 3 - lookup path '[1][10]'
+    ok 4 - lookup path '[1][20].file'
+    ok 5 - lookup unregistered path '[1][3]'
+    ok 6 - test foreach pre order 0: have 1 expected of 1
+    ok 7 - test foreach pre order 1: have 2 expected of 2
+    ok 8 - test foreach pre order 2: have 3 expected of 3
+    ok 9 - test foreach pre order 3: have 4 expected of 4
+    ok 10 - records iterated count 4 of 4
+    ok 11 - test foreach post order 0: have 2 expected of 2
+    ok 12 - test foreach post order 1: have 4 expected of 4
+    ok 13 - test foreach post order 2: have 3 expected of 3
+    ok 14 - test foreach post order 3: have 1 expected of 1
+    ok 15 - records iterated count 4 of 4
+    ok 16 - test foreach safe order 0: have 2 expected of 2
+    ok 17 - test foreach safe order 1: have 4 expected of 4
+    ok 18 - test foreach safe order 2: have 3 expected of 3
+    ok 19 - test foreach safe order 3: have 1 expected of 1
+    ok 20 - records iterated count 4 of 4
+    ok 21 - test foreach entry pre order 0: have 1 expected of 1
+    ok 22 - test foreach entry pre order 1: have 2 expected of 2
+    ok 23 - test foreach entry pre order 2: have 3 expected of 3
+    ok 24 - test foreach entry pre order 3: have 4 expected of 4
+    ok 25 - records iterated count 4 of 4
+    ok 26 - test foreach entry post order 0: have 2 expected of 2
+    ok 27 - test foreach entry post order 1: have 4 expected of 4
+    ok 28 - test foreach entry post order 2: have 3 expected of 3
+    ok 29 - test foreach entry post order 3: have 1 expected of 1
+    ok 30 - records iterated count 4 of 4
+    ok 31 - test foreach entry safe order 0: have 2 expected of 2
+    ok 32 - test foreach entry safe order 1: have 4 expected of 4
+    ok 33 - test foreach entry safe order 2: have 3 expected of 3
+    ok 34 - test foreach entry safe order 3: have 1 expected of 1
+    ok 35 - records iterated count 4 of 4
+ok 3 - subtests
+	*** test_tree: done ***
 	*** main: done ***
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 2/9] lib: implement JSON tree class for json library Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-11-29 19:07   ` Vladimir Davydov
  2018-11-26 10:49 ` [PATCH v5 4/9] lib: introduce json_path_cmp routine Kirill Shcherbatov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

As we going to work with format fields in a unified way, we
started to use the tree JSON class for working with first-level
format fields.

Need for #1012
---
 src/box/sql.c          |  16 +++----
 src/box/sql/build.c    |   5 +-
 src/box/tuple.c        |  10 ++--
 src/box/tuple_format.c | 121 ++++++++++++++++++++++++++++++++++---------------
 src/box/tuple_format.h |  36 ++++++++++++---
 src/box/vy_stmt.c      |   4 +-
 6 files changed, 133 insertions(+), 59 deletions(-)

diff --git a/src/box/sql.c b/src/box/sql.c
index c341800..0e4e0f4 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -198,7 +198,8 @@ tarantoolSqlite3TupleColumnFast(BtCursor *pCur, u32 fieldno, u32 *field_size)
 	struct tuple_format *format = tuple_format(pCur->last_tuple);
 	assert(format->exact_field_count == 0
 	       || fieldno < format->exact_field_count);
-	if (format->fields[fieldno].offset_slot == TUPLE_OFFSET_SLOT_NIL)
+	if (tuple_format_field(format, fieldno)->offset_slot ==
+	    TUPLE_OFFSET_SLOT_NIL)
 		return NULL;
 	const char *field = tuple_field(pCur->last_tuple, fieldno);
 	const char *end = field;
@@ -893,7 +894,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	struct key_def *key_def;
 	const struct tuple *tuple;
 	const char *base;
-	const struct tuple_format *format;
+	struct tuple_format *format;
 	const uint32_t *field_map;
 	uint32_t field_count, next_fieldno = 0;
 	const char *p, *field0;
@@ -911,7 +912,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	base = tuple_data(tuple);
 	format = tuple_format(tuple);
 	field_map = tuple_field_map(tuple);
-	field_count = format->field_count;
+	field_count = tuple_format_field_count(format);
 	field0 = base; mp_decode_array(&field0); p = field0;
 	for (i = 0; i < n; i++) {
 		/*
@@ -929,9 +930,10 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 
 		if (fieldno != next_fieldno) {
+			struct tuple_field *field =
+				tuple_format_field(format, fieldno);
 			if (fieldno >= field_count ||
-			    format->fields[fieldno].offset_slot ==
-			    TUPLE_OFFSET_SLOT_NIL) {
+			    field->offset_slot == TUPLE_OFFSET_SLOT_NIL) {
 				/* Outdated field_map. */
 				uint32_t j = 0;
 
@@ -939,9 +941,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 				while (j++ != fieldno)
 					mp_next(&p);
 			} else {
-				p = base + field_map[
-					format->fields[fieldno].offset_slot
-					];
+				p = base + field_map[field->offset_slot];
 			}
 		}
 		next_fieldno = fieldno + 1;
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 52f0bde..b5abaee 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -936,8 +936,9 @@ sql_column_collation(struct space_def *def, uint32_t column, uint32_t *coll_id)
 		struct coll_id *collation = coll_by_id(*coll_id);
 		return collation != NULL ? collation->coll : NULL;
 	}
-	*coll_id = space->format->fields[column].coll_id;
-	return space->format->fields[column].coll;
+	struct tuple_field *field = tuple_format_field(space->format, column);
+	*coll_id = field->coll_id;
+	return field->coll;
 }
 
 struct ExprList *
diff --git a/src/box/tuple.c b/src/box/tuple.c
index ef4d16f..aae1c3c 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,7 +138,7 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
 int
 tuple_validate_raw(struct tuple_format *format, const char *tuple)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to check */
 
 	/* Check to see if the tuple has a sufficient number of fields. */
@@ -158,10 +158,12 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
 	}
 
 	/* Check field types */
-	struct tuple_field *field = &format->fields[0];
+	struct tuple_field *field = tuple_format_field(format, 0);
 	uint32_t i = 0;
-	uint32_t defined_field_count = MIN(field_count, format->field_count);
-	for (; i < defined_field_count; ++i, ++field) {
+	uint32_t defined_field_count =
+		MIN(field_count, tuple_format_field_count(format));
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field(format, i);
 		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
 					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
 					 tuple_field_is_nullable(field)))
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index d184dba..92028c5 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -38,10 +38,28 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
 
 static uint32_t formats_size = 0, formats_capacity = 0;
 
-static const struct tuple_field tuple_field_default = {
-	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
-};
+static struct tuple_field *
+tuple_field_create(struct json_token *token)
+{
+	struct tuple_field *ret = calloc(1, sizeof(struct tuple_field));
+	if (ret == NULL) {
+		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
+			 "ret");
+		return NULL;
+	}
+	ret->type = FIELD_TYPE_ANY;
+	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
+	ret->coll_id = COLL_NONE;
+	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
+	ret->token = *token;
+	return ret;
+}
+
+static void
+tuple_field_destroy(struct tuple_field *field)
+{
+	free(field);
+}
 
 static int
 tuple_format_use_key_part(struct tuple_format *format,
@@ -49,8 +67,8 @@ tuple_format_use_key_part(struct tuple_format *format,
 			  const struct key_part *part, bool is_sequential,
 			  int *current_slot)
 {
-	assert(part->fieldno < format->field_count);
-	struct tuple_field *field = &format->fields[part->fieldno];
+	assert(part->fieldno < tuple_format_field_count(format));
+	struct tuple_field *field = tuple_format_field(format, part->fieldno);
 	/*
 		* If a field is not present in the space format,
 		* inherit nullable action of the first key part
@@ -138,16 +156,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 	format->min_field_count =
 		tuple_format_min_field_count(keys, key_count, fields,
 					     field_count);
-	if (format->field_count == 0) {
+	if (tuple_format_field_count(format) == 0) {
 		format->field_map_size = 0;
 		return 0;
 	}
 	/* Initialize defined fields */
 	for (uint32_t i = 0; i < field_count; ++i) {
-		format->fields[i].is_key_part = false;
-		format->fields[i].type = fields[i].type;
-		format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
-		format->fields[i].nullable_action = fields[i].nullable_action;
+		struct tuple_field *field = tuple_format_field(format, i);
+		field->type = fields[i].type;
+		field->nullable_action = fields[i].nullable_action;
 		struct coll *coll = NULL;
 		uint32_t cid = fields[i].coll_id;
 		if (cid != COLL_NONE) {
@@ -159,12 +176,9 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			}
 			coll = coll_id->coll;
 		}
-		format->fields[i].coll = coll;
-		format->fields[i].coll_id = cid;
+		field->coll = coll;
+		field->coll_id = cid;
 	}
-	/* Initialize remaining fields */
-	for (uint32_t i = field_count; i < format->field_count; i++)
-		format->fields[i] = tuple_field_default;
 
 	int current_slot = 0;
 
@@ -184,7 +198,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		}
 	}
 
-	assert(format->fields[0].offset_slot == TUPLE_OFFSET_SLOT_NIL);
+	assert(tuple_format_field(format, 0)->offset_slot ==
+		TUPLE_OFFSET_SLOT_NIL);
 	size_t field_map_size = -current_slot * sizeof(uint32_t);
 	if (field_map_size > UINT16_MAX) {
 		/** tuple->data_offset is 16 bits */
@@ -258,39 +273,68 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
-	uint32_t total = sizeof(struct tuple_format) +
-			 field_count * sizeof(struct tuple_field);
 
-	struct tuple_format *format = (struct tuple_format *) malloc(total);
+	struct tuple_format *format = malloc(sizeof(struct tuple_format));
 	if (format == NULL) {
 		diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
 			 "tuple format");
 		return NULL;
 	}
+	if (json_tree_create(&format->tree) != 0) {
+		free(format);
+		return NULL;
+	}
+	struct json_token token;
+	memset(&token, 0, sizeof(token));
+	token.key.type = JSON_TOKEN_NUM;
+	for (token.key.num = TUPLE_INDEX_BASE;
+	     token.key.num < field_count + TUPLE_INDEX_BASE; token.key.num++) {
+		struct tuple_field *field = tuple_field_create(&token);
+		if (field == NULL)
+			goto error;
+		if (json_tree_add(&format->tree, NULL, &field->token) != 0) {
+			tuple_field_destroy(field);
+			goto error;
+		}
+	}
 	if (dict == NULL) {
 		assert(space_field_count == 0);
 		format->dict = tuple_dictionary_new(NULL, 0);
-		if (format->dict == NULL) {
-			free(format);
-			return NULL;
-		}
+		if (format->dict == NULL)
+			goto error;
 	} else {
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
-	format->field_count = field_count;
 	format->index_field_count = index_field_count;
 	format->exact_field_count = 0;
 	format->min_field_count = 0;
 	return format;
+error:;
+	struct tuple_field *field;
+	json_tree_foreach_entry_safe(field, &format->tree.root,
+				     struct tuple_field, token) {
+		json_tree_del(&format->tree, &field->token);
+		tuple_field_destroy(field);
+	}
+	json_tree_destroy(&format->tree);
+	free(format);
+	return NULL;
 }
 
 /** Free tuple format resources, doesn't unregister. */
 static inline void
 tuple_format_destroy(struct tuple_format *format)
 {
+	struct tuple_field *field;
+	json_tree_foreach_entry_safe(field, &format->tree.root,
+				     struct tuple_field, token) {
+		json_tree_del(&format->tree, &field->token);
+		tuple_field_destroy(field);
+	}
+	json_tree_destroy(&format->tree);
 	tuple_dictionary_unref(format->dict);
 }
 
@@ -328,18 +372,21 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 }
 
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2)
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2)
 {
 	if (format1->exact_field_count != format2->exact_field_count)
 		return false;
-	for (uint32_t i = 0; i < format1->field_count; ++i) {
-		const struct tuple_field *field1 = &format1->fields[i];
+	uint32_t format1_field_count = tuple_format_field_count(format1);
+	uint32_t format2_field_count = tuple_format_field_count(format2);
+	for (uint32_t i = 0; i < format1_field_count; ++i) {
+		const struct tuple_field *field1 =
+			tuple_format_field(format1, i);
 		/*
 		 * The field has a data type in format1, but has
 		 * no data type in format2.
 		 */
-		if (i >= format2->field_count) {
+		if (i >= format2_field_count) {
 			/*
 			 * The field can get a name added
 			 * for it, and this doesn't require a data
@@ -355,7 +402,8 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
 			else
 				return false;
 		}
-		const struct tuple_field *field2 = &format2->fields[i];
+		const struct tuple_field *field2 =
+			tuple_format_field(format2, i);
 		if (! field_type1_contains_type2(field1->type, field2->type))
 			return false;
 		/*
@@ -374,7 +422,7 @@ int
 tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple, bool validate)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to initialize */
 
 	const char *pos = tuple;
@@ -397,17 +445,17 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 
 	/* first field is simply accessible, so we do not store offset to it */
 	enum mp_type mp_type = mp_typeof(*pos);
-	const struct tuple_field *field = &format->fields[0];
+	const struct tuple_field *field =
+		tuple_format_field((struct tuple_format *)format, 0);
 	if (validate &&
 	    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
 				 TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
 		return -1;
 	mp_next(&pos);
 	/* other fields...*/
-	++field;
 	uint32_t i = 1;
 	uint32_t defined_field_count = MIN(field_count, validate ?
-					   format->field_count :
+					   tuple_format_field_count(format) :
 					   format->index_field_count);
 	if (field_count < format->index_field_count) {
 		/*
@@ -417,7 +465,8 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		memset((char *)field_map - format->field_map_size, 0,
 		       format->field_map_size);
 	}
-	for (; i < defined_field_count; ++i, ++field) {
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field((struct tuple_format *)format, i);
 		mp_type = mp_typeof(*pos);
 		if (validate &&
 		    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 232df22..2da773b 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -34,6 +34,7 @@
 #include "key_def.h"
 #include "field_def.h"
 #include "errinj.h"
+#include "json/json.h"
 #include "tuple_dictionary.h"
 
 #if defined(__cplusplus)
@@ -113,6 +114,8 @@ struct tuple_field {
 	struct coll *coll;
 	/** Collation identifier. */
 	uint32_t coll_id;
+	/** An JSON entry to organize tree. */
+	struct json_token token;
 };
 
 /**
@@ -166,16 +169,33 @@ struct tuple_format {
 	 * index_field_count <= min_field_count <= field_count.
 	 */
 	uint32_t min_field_count;
-	/* Length of 'fields' array. */
-	uint32_t field_count;
 	/**
 	 * Shared names storage used by all formats of a space.
 	 */
 	struct tuple_dictionary *dict;
-	/* Formats of the fields */
-	struct tuple_field fields[0];
+	/** JSON tree of fields. */
+	struct json_tree tree;
 };
 
+
+static inline uint32_t
+tuple_format_field_count(const struct tuple_format *format)
+{
+	return format->tree.root.child_count;
+}
+
+static inline struct tuple_field *
+tuple_format_field(struct tuple_format *format, uint32_t fieldno)
+{
+	assert(fieldno < tuple_format_field_count(format));
+	struct json_token token = {
+		.key.type = JSON_TOKEN_NUM,
+		.key.num = fieldno + TUPLE_INDEX_BASE
+	};
+	return json_tree_lookup_entry(&format->tree, NULL, &token,
+				      struct tuple_field, token);
+}
+
 extern struct tuple_format **tuple_formats;
 
 static inline uint32_t
@@ -238,8 +258,8 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
  * @retval True, if @a format1 can store any tuples of @a format2.
  */
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2);
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2);
 
 /**
  * Calculate minimal field count of tuples with specified keys and
@@ -333,7 +353,9 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple,
 			return tuple;
 		}
 
-		int32_t offset_slot = format->fields[field_no].offset_slot;
+		int32_t offset_slot =
+			tuple_format_field((struct tuple_format *)format,
+					   field_no)->offset_slot;
 		if (offset_slot != TUPLE_OFFSET_SLOT_NIL) {
 			if (field_map[offset_slot] != 0)
 				return tuple + field_map[offset_slot];
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index d838404..3e60fec 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -411,7 +411,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	uint32_t *field_map = (uint32_t *) raw;
 	char *wpos = mp_encode_array(raw, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
 			field_map[field->offset_slot] = wpos - raw;
 		if (iov[i].iov_base == NULL) {
@@ -465,7 +465,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format,
 	}
 	char *pos = mp_encode_array(data, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (! field->is_key_part) {
 			/* Unindexed field - write NIL. */
 			assert(i < src_count);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 4/9] lib: introduce json_path_cmp routine
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
                   ` (2 preceding siblings ...)
  2018-11-26 10:49 ` [PATCH v5 3/9] box: manage format fields with JSON tree class Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-11-30 10:46   ` Vladimir Davydov
  2018-11-26 10:49 ` [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes Kirill Shcherbatov
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

Introduced a new json_path_cmp routine as a part of JSON library
to compare JSON paths that may have different representation.

Need for #1012
---
 src/lib/json/json.c        | 28 ++++++++++++++++++++++++++++
 src/lib/json/json.h        | 11 +++++++++++
 test/unit/json_path.c      | 31 ++++++++++++++++++++++++++++++-
 test/unit/json_path.result | 12 +++++++++++-
 4 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index 9198dca..20fbbba 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -507,3 +507,31 @@ json_tree_postorder_next(struct json_token *root, struct json_token *pos)
 	}
 	return pos->parent != root ? pos->parent : NULL;
 }
+
+int
+json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len)
+{
+	struct json_lexer lexer_a, lexer_b;
+	json_lexer_create(&lexer_a, a, a_len);
+	json_lexer_create(&lexer_b, b, b_len);
+	struct json_token token_a, token_b;
+	int rc_a, rc_b;
+	while ((rc_a = json_lexer_next_token(&lexer_a, &token_a)) == 0 &&
+		(rc_b = json_lexer_next_token(&lexer_b, &token_b)) == 0 &&
+		token_a.key.type != JSON_TOKEN_END &&
+		token_b.key.type != JSON_TOKEN_END) {
+		int rc = json_token_key_cmp(&token_a, &token_b);
+		if (rc != 0)
+			return rc;
+	}
+	/* Path "a" should be valid. */
+	assert(rc_a == 0);
+	if (rc_b != 0)
+		return rc_b;
+	/*
+	 * The parser stopped because the end of one of the paths
+	 * was reached. As JSON_TOKEN_END > JSON_TOKEN_{NUM, STR},
+	 * the path having more tokens has lower key.type value.
+	 */
+	return token_b.key.type - token_a.key.type;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index dd09f5a..7d46601 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -137,6 +137,17 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
 int
 json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
+/**
+ * Compare two JSON paths using Lexer class.
+ * - @a path must be valid
+ * - at the case of paths that have same token-sequence prefix,
+ *   the path having more tokens is assumed to be greater
+ * - when @b path contains an error, the path "a" is assumed to
+ *   be greater
+ */
+int
+json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len);
+
 /** Create a JSON tree object to manage data relations. */
 int
 json_tree_create(struct json_tree *tree);
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index f6b0472..35c2164 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -352,15 +352,44 @@ test_tree()
 	footer();
 }
 
+void
+test_path_cmp()
+{
+	const char *a = "Data[1][\"FIO\"].fname";
+	uint32_t a_len = strlen(a);
+	const struct path_and_errpos rc[] = {
+		{a, 0},
+		{"[\"Data\"][1].FIO[\"fname\"]", 0},
+		{"Data[[1][\"FIO\"].fname", 6},
+		{"Data[1]", 1},
+		{"Data[1][\"FIO\"].fname[1]", -2},
+		{"Data[1][\"Info\"].fname[1]", -1},
+	};
+	header();
+	plan(lengthof(rc));
+
+	for (size_t i = 0; i < lengthof(rc); ++i) {
+		const char *path = rc[i].path;
+		int errpos = rc[i].errpos;
+		int rc = json_path_cmp(a, a_len, path, strlen(path));
+		is(rc, errpos, "path cmp result \"%s\" with \"%s\": "
+		   "have %d, expected %d", a, path, rc, errpos);
+	}
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(3);
+	plan(4);
 
 	test_basic();
 	test_errors();
 	test_tree();
+	test_path_cmp();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index df68210..5c1de38 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..3
+1..4
 	*** test_basic ***
     1..71
     ok 1 - parse <[0]>
@@ -138,4 +138,14 @@ ok 2 - subtests
     ok 35 - records iterated count 4 of 4
 ok 3 - subtests
 	*** test_tree: done ***
+	*** test_path_cmp ***
+    1..6
+    ok 1 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname": have 0, expected 0
+    ok 2 - path cmp result "Data[1]["FIO"].fname" with "["Data"][1].FIO["fname"]": have 0, expected 0
+    ok 3 - path cmp result "Data[1]["FIO"].fname" with "Data[[1]["FIO"].fname": have 6, expected 6
+    ok 4 - path cmp result "Data[1]["FIO"].fname" with "Data[1]": have 1, expected 1
+    ok 5 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname[1]": have -2, expected -2
+    ok 6 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["Info"].fname[1]": have -1, expected -1
+ok 4 - subtests
+	*** test_path_cmp: done ***
 	*** main: done ***
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
                   ` (3 preceding siblings ...)
  2018-11-26 10:49 ` [PATCH v5 4/9] lib: introduce json_path_cmp routine Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-11-30 21:28   ` Vladimir Davydov
  2018-11-26 10:49 ` [PATCH v5 6/9] box: introduce has_json_paths flag in templates Kirill Shcherbatov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

New JSON-path-based indexes allows to index documents content.
As we need to store user-defined JSON path in key_part
and key_part_def, we have introduced path and path_len
fields. JSON path is verified and transformed to canonical
form on index msgpack unpack.
Path string stored as a part of the key_def allocation:

+-------+---------+-------+---------+-------+-------+-------+
|key_def|key_part1|  ...  |key_partN| path1 | pathK | pathN |
+-------+---------+-------+---------+-------+-------+-------+
          |                         ^
          |-> path _________________|

With format creation JSON paths are stored at the end of format
allocation:
+------------+------------+-------+------------+-------+
|tuple_format|tuple_field1|  ...  |tuple_fieldN| pathK |
+------------+------------+-------+------------+-------+

Part of #1012
---
 src/box/errcode.h            |   2 +-
 src/box/index_def.c          |   8 +-
 src/box/key_def.c            | 164 +++++++++++++---
 src/box/key_def.h            |  23 ++-
 src/box/lua/space.cc         |   5 +
 src/box/memtx_engine.c       |   3 +
 src/box/sql.c                |   1 +
 src/box/sql/build.c          |   1 +
 src/box/sql/select.c         |   6 +-
 src/box/sql/where.c          |   1 +
 src/box/tuple.c              |  38 +---
 src/box/tuple_compare.cc     |  13 +-
 src/box/tuple_extract_key.cc |  21 ++-
 src/box/tuple_format.c       | 439 ++++++++++++++++++++++++++++++++++++-------
 src/box/tuple_format.h       |  38 +++-
 src/box/tuple_hash.cc        |   2 +-
 src/box/vinyl.c              |   3 +
 src/box/vy_log.c             |   3 +-
 src/box/vy_point_lookup.c    |   2 -
 src/box/vy_stmt.c            | 166 +++++++++++++---
 test/box/misc.result         |   1 +
 test/engine/tuple.result     | 416 ++++++++++++++++++++++++++++++++++++++++
 test/engine/tuple.test.lua   | 121 ++++++++++++
 23 files changed, 1306 insertions(+), 171 deletions(-)

diff --git a/src/box/errcode.h b/src/box/errcode.h
index 73359eb..2f979ab 100644
--- a/src/box/errcode.h
+++ b/src/box/errcode.h
@@ -138,7 +138,7 @@ struct errcode_record {
 	/* 83 */_(ER_ROLE_EXISTS,		"Role '%s' already exists") \
 	/* 84 */_(ER_CREATE_ROLE,		"Failed to create role '%s': %s") \
 	/* 85 */_(ER_INDEX_EXISTS,		"Index '%s' already exists") \
-	/* 86 */_(ER_UNUSED6,			"") \
+	/* 86 */_(ER_DATA_STRUCTURE_MISMATCH,	"Tuple doesn't math document structure: %s") \
 	/* 87 */_(ER_ROLE_LOOP,			"Granting role '%s' to role '%s' would create a loop") \
 	/* 88 */_(ER_GRANT,			"Incorrect grant arguments: %s") \
 	/* 89 */_(ER_PRIV_GRANTED,		"User '%s' already has %s access on %s '%s'") \
diff --git a/src/box/index_def.c b/src/box/index_def.c
index 45c74d9..de4ea85 100644
--- a/src/box/index_def.c
+++ b/src/box/index_def.c
@@ -31,6 +31,7 @@
 #include "index_def.h"
 #include "schema_def.h"
 #include "identifier.h"
+#include "json/json.h"
 
 const char *index_type_strs[] = { "HASH", "TREE", "BITSET", "RTREE" };
 
@@ -298,8 +299,11 @@ index_def_is_valid(struct index_def *index_def, const char *space_name)
 			 * Courtesy to a user who could have made
 			 * a typo.
 			 */
-			if (index_def->key_def->parts[i].fieldno ==
-			    index_def->key_def->parts[j].fieldno) {
+			struct key_part *part_a = &index_def->key_def->parts[i];
+			struct key_part *part_b = &index_def->key_def->parts[j];
+			if (part_a->fieldno == part_b->fieldno &&
+			    json_path_cmp(part_a->path, part_a->path_len,
+					  part_b->path, part_b->path_len) == 0){
 				diag_set(ClientError, ER_MODIFY_INDEX,
 					 index_def->name, space_name,
 					 "same key part is indexed twice");
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 2119ca3..bc6cecd 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -28,6 +28,8 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
+#include "fiber.h"
+#include "json/json.h"
 #include "key_def.h"
 #include "tuple_compare.h"
 #include "tuple_extract_key.h"
@@ -44,7 +46,8 @@ const struct key_part_def key_part_def_default = {
 	COLL_NONE,
 	false,
 	ON_CONFLICT_ACTION_DEFAULT,
-	SORT_ORDER_ASC
+	SORT_ORDER_ASC,
+	NULL
 };
 
 static int64_t
@@ -59,6 +62,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len)
 #define PART_OPT_NULLABILITY	 "is_nullable"
 #define PART_OPT_NULLABLE_ACTION "nullable_action"
 #define PART_OPT_SORT_ORDER	 "sort_order"
+#define PART_OPT_PATH		 "path"
 
 const struct opt_def part_def_reg[] = {
 	OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type,
@@ -71,6 +75,7 @@ const struct opt_def part_def_reg[] = {
 		     struct key_part_def, nullable_action, NULL),
 	OPT_DEF_ENUM(PART_OPT_SORT_ORDER, sort_order, struct key_part_def,
 		     sort_order, NULL),
+	OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path),
 	OPT_END,
 };
 
@@ -106,13 +111,25 @@ const uint32_t key_mp_type[] = {
 struct key_def *
 key_def_dup(const struct key_def *src)
 {
-	size_t sz = key_def_sizeof(src->part_count);
-	struct key_def *res = (struct key_def *)malloc(sz);
+	const struct key_part *parts = src->parts;
+	const struct key_part *parts_end = parts + src->part_count;
+	size_t sz = 0;
+	for (; parts < parts_end; parts++)
+		sz += parts->path != NULL ? parts->path_len + 1 : 0;
+	sz = key_def_sizeof(src->part_count, sz);
+	struct key_def *res = (struct key_def *)calloc(1, sz);
 	if (res == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "res");
 		return NULL;
 	}
 	memcpy(res, src, sz);
+	/* Update paths to point to the new memory chunk.*/
+	for (uint32_t i = 0; i < src->part_count; i++) {
+		if (src->parts[i].path == NULL)
+			continue;
+		size_t path_offset = src->parts[i].path - (char *)src;
+		res->parts[i].path = (char *)res + path_offset;
+	}
 	return res;
 }
 
@@ -120,8 +137,23 @@ void
 key_def_swap(struct key_def *old_def, struct key_def *new_def)
 {
 	assert(old_def->part_count == new_def->part_count);
-	for (uint32_t i = 0; i < new_def->part_count; i++)
-		SWAP(old_def->parts[i], new_def->parts[i]);
+	for (uint32_t i = 0; i < new_def->part_count; i++) {
+		if (old_def->parts[i].path == NULL) {
+			SWAP(old_def->parts[i], new_def->parts[i]);
+		} else {
+			/*
+			 * Since the data is located in  memory
+			 * in the same order (otherwise rebuild
+			 * would be called), just update the
+			 * pointers.
+			 */
+			size_t path_offset =
+				old_def->parts[i].path - (char *)old_def;
+			SWAP(old_def->parts[i], new_def->parts[i]);
+			old_def->parts[i].path = (char *)old_def + path_offset;
+			new_def->parts[i].path = (char *)new_def + path_offset;
+		}
+	}
 	SWAP(*old_def, *new_def);
 }
 
@@ -144,24 +176,38 @@ static void
 key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 		 enum field_type type, enum on_conflict_action nullable_action,
 		 struct coll *coll, uint32_t coll_id,
-		 enum sort_order sort_order)
+		 enum sort_order sort_order, const char *path,
+		 uint32_t path_len)
 {
 	assert(part_no < def->part_count);
 	assert(type < field_type_MAX);
 	def->is_nullable |= (nullable_action == ON_CONFLICT_ACTION_NONE);
+	def->has_json_paths |= path != NULL;
 	def->parts[part_no].nullable_action = nullable_action;
 	def->parts[part_no].fieldno = fieldno;
 	def->parts[part_no].type = type;
 	def->parts[part_no].coll = coll;
 	def->parts[part_no].coll_id = coll_id;
 	def->parts[part_no].sort_order = sort_order;
+	if (path != NULL) {
+		def->parts[part_no].path_len = path_len;
+		assert(def->parts[part_no].path != NULL);
+		memcpy(def->parts[part_no].path, path, path_len);
+		def->parts[part_no].path[path_len] = '\0';
+	} else {
+		def->parts[part_no].path_len = 0;
+		def->parts[part_no].path = NULL;
+	}
 	column_mask_set_fieldno(&def->column_mask, fieldno);
 }
 
 struct key_def *
 key_def_new(const struct key_part_def *parts, uint32_t part_count)
 {
-	size_t sz = key_def_sizeof(part_count);
+	ssize_t sz = 0;
+	for (uint32_t i = 0; i < part_count; i++)
+		sz += parts[i].path != NULL ? strlen(parts[i].path) + 1 : 0;
+	sz = key_def_sizeof(part_count, sz);
 	struct key_def *def = calloc(1, sz);
 	if (def == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -171,6 +217,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
 	def->part_count = part_count;
 	def->unique_part_count = part_count;
 
+	char *data = (char *)def + key_def_sizeof(part_count, 0);
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		struct coll *coll = NULL;
@@ -184,16 +231,23 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
 			}
 			coll = coll_id->coll;
 		}
+		uint32_t path_len = 0;
+		if (part->path != NULL) {
+			path_len = strlen(part->path);
+			def->parts[i].path = data;
+			data += path_len + 1;
+		}
 		key_def_set_part(def, i, part->fieldno, part->type,
 				 part->nullable_action, coll, part->coll_id,
-				 part->sort_order);
+				 part->sort_order, part->path, path_len);
 	}
 	key_def_set_cmp(def);
 	return def;
 }
 
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+		   struct key_part_def *parts)
 {
 	for (uint32_t i = 0; i < def->part_count; i++) {
 		const struct key_part *part = &def->parts[i];
@@ -203,13 +257,27 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
 		part_def->is_nullable = key_part_is_nullable(part);
 		part_def->nullable_action = part->nullable_action;
 		part_def->coll_id = part->coll_id;
+		if (part->path != NULL) {
+			char *path  = region_alloc(pool, part->path_len + 1);
+			if (path == NULL) {
+				diag_set(OutOfMemory, part->path_len + 1,
+					 "region_alloc", "part_def->path");
+				return -1;
+			}
+			memcpy(path, part->path, part->path_len);
+			path[part->path_len] = '\0';
+			part_def->path = path;
+		} else {
+			part_def->path = NULL;
+}
 	}
+	return 0;
 }
 
 box_key_def_t *
 box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 {
-	size_t sz = key_def_sizeof(part_count);
+	size_t sz = key_def_sizeof(part_count, 0);
 	struct key_def *key_def = calloc(1, sz);
 	if (key_def == NULL) {
 		diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -223,7 +291,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 		key_def_set_part(key_def, item, fields[item],
 				 (enum field_type)types[item],
 				 ON_CONFLICT_ACTION_DEFAULT,
-				 NULL, COLL_NONE, SORT_ORDER_ASC);
+				 NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0);
 	}
 	key_def_set_cmp(key_def);
 	return key_def;
@@ -272,6 +340,10 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
 		if (key_part_is_nullable(part1) != key_part_is_nullable(part2))
 			return key_part_is_nullable(part1) <
 			       key_part_is_nullable(part2) ? -1 : 1;
+		int rc;
+		if ((rc = json_path_cmp(part1->path, part1->path_len,
+					part2->path, part2->path_len)) != 0)
+			return rc;
 	}
 	return part_count1 < part_count2 ? -1 : part_count1 > part_count2;
 }
@@ -303,8 +375,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts,
 	for (uint32_t i = 0; i < part_count; i++) {
 		const struct key_part_def *part = &parts[i];
 		assert(part->type < field_type_MAX);
-		SNPRINT(total, snprintf, buf, size, "%d, '%s'",
-			(int)part->fieldno, field_type_strs[part->type]);
+		if (part->path != NULL) {
+			SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'",
+				(int)part->fieldno, part->path,
+				field_type_strs[part->type]);
+		} else {
+			SNPRINT(total, snprintf, buf, size, "%d, '%s'",
+				(int)part->fieldno,
+				field_type_strs[part->type]);
+		}
 		if (i < part_count - 1)
 			SNPRINT(total, snprintf, buf, size, ", ");
 	}
@@ -323,6 +402,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 			count++;
 		if (part->is_nullable)
 			count++;
+		if (part->path != NULL)
+			count++;
 		size += mp_sizeof_map(count);
 		size += mp_sizeof_str(strlen(PART_OPT_FIELD));
 		size += mp_sizeof_uint(part->fieldno);
@@ -337,6 +418,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
 			size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY));
 			size += mp_sizeof_bool(part->is_nullable);
 		}
+		if (part->path != NULL) {
+			size += mp_sizeof_str(strlen(PART_OPT_PATH));
+			size += mp_sizeof_str(strlen(part->path));
+		}
 	}
 	return size;
 }
@@ -352,6 +437,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 			count++;
 		if (part->is_nullable)
 			count++;
+		if (part->path != NULL)
+			count++;
 		data = mp_encode_map(data, count);
 		data = mp_encode_str(data, PART_OPT_FIELD,
 				     strlen(PART_OPT_FIELD));
@@ -371,6 +458,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
 					     strlen(PART_OPT_NULLABILITY));
 			data = mp_encode_bool(data, part->is_nullable);
 		}
+		if (part->path != NULL) {
+			data = mp_encode_str(data, PART_OPT_PATH,
+					     strlen(PART_OPT_PATH));
+			data = mp_encode_str(data, part->path,
+					     strlen(part->path));
+		}
 	}
 	return data;
 }
@@ -432,6 +525,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
 				     fields[part->fieldno].is_nullable :
 				     key_part_def_default.is_nullable);
 		part->coll_id = COLL_NONE;
+		part->path = NULL;
 	}
 	return 0;
 }
@@ -445,6 +539,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 		return key_def_decode_parts_166(parts, part_count, data,
 						fields, field_count);
 	}
+	struct region *region = &fiber()->gc;
 	for (uint32_t i = 0; i < part_count; i++) {
 		struct key_part_def *part = &parts[i];
 		if (mp_typeof(**data) != MP_MAP) {
@@ -468,7 +563,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
 			const char *key = mp_decode_str(data, &key_len);
 			if (opts_parse_key(part, part_def_reg, key, key_len, data,
 					   ER_WRONG_INDEX_OPTIONS,
-					   i + TUPLE_INDEX_BASE, NULL,
+					   i + TUPLE_INDEX_BASE, region,
 					   false) != 0)
 				return -1;
 			if (is_action_missing &&
@@ -533,7 +628,9 @@ key_def_find(const struct key_def *key_def, const struct key_part *to_find)
 	const struct key_part *part = key_def->parts;
 	const struct key_part *end = part + key_def->part_count;
 	for (; part != end; part++) {
-		if (part->fieldno == to_find->fieldno)
+		if (part->fieldno == to_find->fieldno &&
+		    json_path_cmp(part->path, part->path_len,
+				  to_find->path, to_find->path_len) == 0)
 			return part;
 	}
 	return NULL;
@@ -559,18 +656,27 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	 * Find and remove part duplicates, i.e. parts counted
 	 * twice since they are present in both key defs.
 	 */
-	const struct key_part *part = second->parts;
-	const struct key_part *end = part + second->part_count;
+	size_t sz = 0;
+	const struct key_part *part = first->parts;
+	const struct key_part *end = part + first->part_count;
+	for (; part != end; part++) {
+		if (part->path != NULL)
+			sz += part->path_len + 1;
+	}
+	part = second->parts;
+	end = part + second->part_count;
 	for (; part != end; part++) {
 		if (key_def_find(first, part) != NULL)
 			--new_part_count;
+		else if (part->path != NULL)
+			sz += part->path_len + 1;
 	}
 
+	sz = key_def_sizeof(new_part_count, sz);
 	struct key_def *new_def;
-	new_def =  (struct key_def *)calloc(1, key_def_sizeof(new_part_count));
+	new_def =  (struct key_def *)calloc(1, sz);
 	if (new_def == NULL) {
-		diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc",
-			 "new_def");
+		diag_set(OutOfMemory, sz, "malloc", "new_def");
 		return NULL;
 	}
 	new_def->part_count = new_part_count;
@@ -578,15 +684,22 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	new_def->is_nullable = first->is_nullable || second->is_nullable;
 	new_def->has_optional_parts = first->has_optional_parts ||
 				      second->has_optional_parts;
+	/* Path data write position in the new key_def. */
+	char *data = (char *)new_def + key_def_sizeof(new_part_count, 0);
 	/* Write position in the new key def. */
 	uint32_t pos = 0;
 	/* Append first key def's parts to the new index_def. */
 	part = first->parts;
 	end = part + first->part_count;
 	for (; part != end; part++) {
+		if (part->path != NULL) {
+			new_def->parts[pos].path = data;
+			data += part->path_len + 1;
+		}
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
 				 part->nullable_action, part->coll,
-				 part->coll_id, part->sort_order);
+				 part->coll_id, part->sort_order, part->path,
+				 part->path_len);
 	}
 
 	/* Set-append second key def's part to the new key def. */
@@ -595,9 +708,14 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 	for (; part != end; part++) {
 		if (key_def_find(first, part) != NULL)
 			continue;
+		if (part->path != NULL) {
+			new_def->parts[pos].path = data;
+			data += part->path_len + 1;
+		}
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
 				 part->nullable_action, part->coll,
-				 part->coll_id, part->sort_order);
+				 part->coll_id, part->sort_order, part->path,
+				 part->path_len);
 	}
 	key_def_set_cmp(new_def);
 	return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index d4da6c5..7731e48 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -68,6 +68,8 @@ struct key_part_def {
 	enum on_conflict_action nullable_action;
 	/** Part sort order. */
 	enum sort_order sort_order;
+	/** JSON path to data. */
+	const char *path;
 };
 
 extern const struct key_part_def key_part_def_default;
@@ -86,6 +88,13 @@ struct key_part {
 	enum on_conflict_action nullable_action;
 	/** Part sort order. */
 	enum sort_order sort_order;
+	/**
+	 * JSON path to data in 'canonical' form.
+	 * Read json_path_normalize to get more details.
+	 */
+	char *path;
+	/** The length of JSON path. */
+	uint32_t path_len;
 };
 
 struct key_def;
@@ -152,6 +161,8 @@ struct key_def {
 	uint32_t unique_part_count;
 	/** True, if at least one part can store NULL. */
 	bool is_nullable;
+	/** True, if some key part has JSON path. */
+	bool has_json_paths;
 	/**
 	 * True, if some key parts can be absent in a tuple. These
 	 * fields assumed to be MP_NIL.
@@ -245,9 +256,10 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b,
 /** \endcond public */
 
 static inline size_t
-key_def_sizeof(uint32_t part_count)
+key_def_sizeof(uint32_t part_count, uint32_t paths_size)
 {
-	return sizeof(struct key_def) + sizeof(struct key_part) * part_count;
+	return sizeof(struct key_def) + sizeof(struct key_part) * part_count +
+	       paths_size;
 }
 
 /**
@@ -260,8 +272,9 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count);
 /**
  * Dump part definitions of the given key def.
  */
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
+int
+key_def_dump_parts(struct region *pool, const struct key_def *def,
+		   struct key_part_def *parts);
 
 /**
  * Update 'has_optional_parts' of @a key_def with correspondence
@@ -368,6 +381,8 @@ key_validate_parts(const struct key_def *key_def, const char *key,
 static inline bool
 key_def_is_sequential(const struct key_def *key_def)
 {
+	if (key_def->has_json_paths)
+		return false;
 	for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) {
 		if (key_def->parts[part_id].fieldno != part_id)
 			return false;
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 7cae436..a882a9d 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -296,6 +296,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
 			lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE);
 			lua_setfield(L, -2, "fieldno");
 
+			if (part->path != NULL) {
+				lua_pushstring(L, part->path);
+				lua_setfield(L, -2, "path");
+			}
+
 			lua_pushboolean(L, key_part_is_nullable(part));
 			lua_setfield(L, -2, "is_nullable");
 
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 28afb32..1bc46c6 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1316,6 +1316,9 @@ memtx_index_def_change_requires_rebuild(struct index *index,
 			return true;
 		if (old_part->coll != new_part->coll)
 			return true;
+		if (json_path_cmp(old_part->path, old_part->path_len,
+				  new_part->path, new_part->path_len) != 0)
+			return true;
 	}
 	return false;
 }
diff --git a/src/box/sql.c b/src/box/sql.c
index 0e4e0f4..d199171 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -378,6 +378,7 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info)
 		part->nullable_action = ON_CONFLICT_ACTION_NONE;
 		part->is_nullable = true;
 		part->sort_order = SORT_ORDER_ASC;
+		part->path = NULL;
 		if (def != NULL && i < def->part_count)
 			part->coll_id = def->parts[i].coll_id;
 		else
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index b5abaee..9f5d5aa 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -2423,6 +2423,7 @@ index_fill_def(struct Parse *parse, struct index *index,
 		part->is_nullable = part->nullable_action == ON_CONFLICT_ACTION_NONE;
 		part->sort_order = SORT_ORDER_ASC;
 		part->coll_id = coll_id;
+		part->path = NULL;
 	}
 	key_def = key_def_new(key_parts, expr_list->nExpr);
 	if (key_def == NULL)
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index ca709b4..0734712 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1349,6 +1349,7 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
 		part->is_nullable = false;
 		part->nullable_action = ON_CONFLICT_ACTION_ABORT;
 		part->sort_order = SORT_ORDER_ASC;
+		part->path = NULL;
 	}
 	return key_info;
 }
@@ -1356,6 +1357,9 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
 struct sql_key_info *
 sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
 {
+	/** SQL key_parts could not have JSON paths. */
+	for (uint32_t i = 0; i < key_def->part_count; i++)
+		assert(key_def->parts[i].path == NULL);
 	struct sql_key_info *key_info = sqlite3DbMallocRawNN(db,
 				sql_key_info_sizeof(key_def->part_count));
 	if (key_info == NULL) {
@@ -1366,7 +1370,7 @@ sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
 	key_info->key_def = NULL;
 	key_info->refs = 1;
 	key_info->part_count = key_def->part_count;
-	key_def_dump_parts(key_def, key_info->parts);
+	key_def_dump_parts(&fiber()->gc, key_def, key_info->parts);
 	return key_info;
 }
 
diff --git a/src/box/sql/where.c b/src/box/sql/where.c
index 9c3462b..78f70f4 100644
--- a/src/box/sql/where.c
+++ b/src/box/sql/where.c
@@ -2807,6 +2807,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder,	/* WHERE clause information */
 		part.is_nullable = false;
 		part.sort_order = SORT_ORDER_ASC;
 		part.coll_id = COLL_NONE;
+		part.path = NULL;
 
 		struct key_def *key_def = key_def_new(&part, 1);
 		if (key_def == NULL) {
diff --git a/src/box/tuple.c b/src/box/tuple.c
index aae1c3c..62e06e7 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,38 +138,18 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
 int
 tuple_validate_raw(struct tuple_format *format, const char *tuple)
 {
-	if (tuple_format_field_count(format) == 0)
-		return 0; /* Nothing to check */
-
-	/* Check to see if the tuple has a sufficient number of fields. */
-	uint32_t field_count = mp_decode_array(&tuple);
-	if (format->exact_field_count > 0 &&
-	    format->exact_field_count != field_count) {
-		diag_set(ClientError, ER_EXACT_FIELD_COUNT,
-			 (unsigned) field_count,
-			 (unsigned) format->exact_field_count);
+	struct region *region = &fiber()->gc;
+	uint32_t used = region_used(region);
+	uint32_t *field_map = region_alloc(region, format->field_map_size);
+	if (field_map == NULL) {
+		diag_set(OutOfMemory, format->field_map_size, "region_alloc",
+			 "field_map");
 		return -1;
 	}
-	if (unlikely(field_count < format->min_field_count)) {
-		diag_set(ClientError, ER_MIN_FIELD_COUNT,
-			 (unsigned) field_count,
-			 (unsigned) format->min_field_count);
+	field_map = (uint32_t *)((char *)field_map + format->field_map_size);
+	if (tuple_init_field_map(format, field_map, tuple, true) != 0)
 		return -1;
-	}
-
-	/* Check field types */
-	struct tuple_field *field = tuple_format_field(format, 0);
-	uint32_t i = 0;
-	uint32_t defined_field_count =
-		MIN(field_count, tuple_format_field_count(format));
-	for (; i < defined_field_count; ++i) {
-		field = tuple_format_field(format, i);
-		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
-					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
-					 tuple_field_is_nullable(field)))
-			return -1;
-		mp_next(&tuple);
-	}
+	region_truncate(region, used);
 	return 0;
 }
 
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index e21b009..554c29f 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	struct key_part *part = key_def->parts;
 	const char *tuple_a_raw = tuple_data(tuple_a);
 	const char *tuple_b_raw = tuple_data(tuple_b);
-	if (key_def->part_count == 1 && part->fieldno == 0) {
+	if (key_def->part_count == 1 && part->fieldno == 0 &&
+	    part->path == NULL) {
 		/*
 		 * First field can not be optional - empty tuples
 		 * can not exist.
@@ -493,8 +494,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	}
 
 	bool was_null_met = false;
-	const struct tuple_format *format_a = tuple_format(tuple_a);
-	const struct tuple_format *format_b = tuple_format(tuple_b);
+	struct tuple_format *format_a = tuple_format(tuple_a);
+	struct tuple_format *format_b = tuple_format(tuple_b);
 	const uint32_t *field_map_a = tuple_field_map(tuple_a);
 	const uint32_t *field_map_b = tuple_field_map(tuple_b);
 	struct key_part *end;
@@ -585,7 +586,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	assert(key != NULL || part_count == 0);
 	assert(part_count <= key_def->part_count);
 	struct key_part *part = key_def->parts;
-	const struct tuple_format *format = tuple_format(tuple);
+	struct tuple_format *format = tuple_format(tuple);
 	const char *tuple_raw = tuple_data(tuple);
 	const uint32_t *field_map = tuple_field_map(tuple);
 	enum mp_type a_type, b_type;
@@ -1027,7 +1028,7 @@ tuple_compare_create(const struct key_def *def)
 		}
 	}
 	assert(! def->has_optional_parts);
-	if (!key_def_has_collation(def)) {
+	if (!key_def_has_collation(def) && !def->has_json_paths) {
 		/* Precalculated comparators don't use collation */
 		for (uint32_t k = 0;
 		     k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) {
@@ -1247,7 +1248,7 @@ tuple_compare_with_key_create(const struct key_def *def)
 		}
 	}
 	assert(! def->has_optional_parts);
-	if (!key_def_has_collation(def)) {
+	if (!key_def_has_collation(def) && !def->has_json_paths) {
 		/* Precalculated comparators don't use collation */
 		for (uint32_t k = 0;
 		     k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]);
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index e9d7cac..04c5463 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -10,7 +10,8 @@ key_def_parts_are_sequential(const struct key_def *def, int i)
 {
 	uint32_t fieldno1 = def->parts[i].fieldno + 1;
 	uint32_t fieldno2 = def->parts[i + 1].fieldno;
-	return fieldno1 == fieldno2;
+	return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+	       def->parts[i + 1].path == NULL;
 }
 
 /** True, if a key con contain two or more parts in sequence. */
@@ -111,7 +112,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 	const char *data = tuple_data(tuple);
 	uint32_t part_count = key_def->part_count;
 	uint32_t bsize = mp_sizeof_array(part_count);
-	const struct tuple_format *format = tuple_format(tuple);
+	struct tuple_format *format = tuple_format(tuple);
 	const uint32_t *field_map = tuple_field_map(tuple);
 	const char *tuple_end = data + tuple->bsize;
 
@@ -241,7 +242,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 			if (!key_def_parts_are_sequential(key_def, i))
 				break;
 		}
-		uint32_t end_fieldno = key_def->parts[i].fieldno;
+		const struct key_part *part = &key_def->parts[i];
+		uint32_t end_fieldno = part->fieldno;
 
 		if (fieldno < current_fieldno) {
 			/* Rewind. */
@@ -283,6 +285,15 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 				current_fieldno++;
 			}
 		}
+		const char *field_last, *field_end_last;
+		if (part->path != NULL) {
+			field_last = field;
+			field_end_last = field_end;
+			(void)tuple_field_go_to_path(&field, part->path,
+						     part->path_len);
+			field_end = field;
+			mp_next(&field_end);
+		}
 		memcpy(key_buf, field, field_end - field);
 		key_buf += field_end - field;
 		if (has_optional_parts && null_count != 0) {
@@ -291,6 +302,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		} else {
 			assert(key_buf - key <= data_end - data);
 		}
+		if (part->path != NULL) {
+			field = field_last;
+			field_end = field_end_last;
+		}
 	}
 	if (key_size != NULL)
 		*key_size = (uint32_t)(key_buf - key);
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 92028c5..193d0d8 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -28,6 +28,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
+#include "fiber.h"
 #include "json/json.h"
 #include "tuple_format.h"
 #include "coll_id_cache.h"
@@ -51,7 +52,8 @@ tuple_field_create(struct json_token *token)
 	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
 	ret->coll_id = COLL_NONE;
 	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
-	ret->token = *token;
+	if (token != NULL)
+		ret->token = *token;
 	return ret;
 }
 
@@ -61,14 +63,114 @@ tuple_field_destroy(struct tuple_field *field)
 	free(field);
 }
 
+/** Build a JSON tree path for specified path. */
+static struct tuple_field *
+tuple_field_tree_add_path(struct tuple_format *format, const char *path,
+			  uint32_t path_len, uint32_t fieldno)
+{
+	int rc = 0;
+	struct json_tree *tree = &format->tree;
+	struct tuple_field *parent = tuple_format_field(format, fieldno);
+	struct tuple_field *field = tuple_field_create(NULL);
+	if (unlikely(field == NULL))
+		goto end;
+
+	struct json_lexer lexer;
+	bool is_last_new = false;
+	json_lexer_create(&lexer, path, path_len);
+	while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 &&
+	       field->token.key.type != JSON_TOKEN_END) {
+		enum field_type iterm_node_type =
+			field->token.key.type == JSON_TOKEN_STR ?
+			FIELD_TYPE_MAP : FIELD_TYPE_ARRAY;
+		if (parent->type != FIELD_TYPE_ANY &&
+		    parent->type != iterm_node_type) {
+			const char *name =
+				tt_sprintf("[%d]%.*s", fieldno, path_len, path);
+			diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, name,
+				 field_type_strs[parent->type],
+				 field_type_strs[iterm_node_type]);
+			parent = NULL;
+			goto end;
+		}
+		struct tuple_field *next =
+			json_tree_lookup_entry(tree, &parent->token,
+					       &field->token,
+					       struct tuple_field, token);
+		if (next == NULL) {
+			rc = json_tree_add(tree, &parent->token, &field->token);
+			if (unlikely(rc != 0)) {
+				diag_set(OutOfMemory, sizeof(struct json_token),
+					 "json_tree_add", "tree");
+				parent = NULL;
+				goto end;
+			}
+			next = field;
+			is_last_new = true;
+			field = tuple_field_create(NULL);
+			if (unlikely(next == NULL))
+				goto end;
+		} else {
+			is_last_new = false;
+		}
+		parent->type = iterm_node_type;
+		parent = next;
+	}
+	if (rc != 0 || field->token.key.type != JSON_TOKEN_END) {
+		const char *err_msg =
+			tt_sprintf("invalid JSON path '%s': path has invalid "
+				   "structure (error at position %d)", path,
+				   rc);
+		diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+			 fieldno + TUPLE_INDEX_BASE, err_msg);
+		parent = NULL;
+		goto end;
+	}
+	assert(parent != NULL);
+	/* Update tree depth information. */
+	if (is_last_new) {
+		uint32_t depth = 1;
+		for (struct json_token *iter = parent->token.parent;
+		     iter != &format->tree.root; iter = iter->parent, ++depth) {
+			struct tuple_field *record =
+				json_tree_entry(iter, struct tuple_field,
+						token);
+			record->subtree_depth =
+				MAX(record->subtree_depth, depth);
+		}
+	}
+end:
+	tuple_field_destroy(field);
+	return parent;
+}
+
 static int
 tuple_format_use_key_part(struct tuple_format *format,
 			  const struct field_def *fields, uint32_t field_count,
 			  const struct key_part *part, bool is_sequential,
-			  int *current_slot)
+			  int *current_slot, char **path_data)
 {
 	assert(part->fieldno < tuple_format_field_count(format));
 	struct tuple_field *field = tuple_format_field(format, part->fieldno);
+	if (unlikely(part->path != NULL)) {
+		assert(!is_sequential);
+		/**
+		 * Copy JSON path data to reserved area at the
+		 * end of format allocation.
+		 */
+		memcpy(*path_data, part->path, part->path_len);
+		(*path_data)[part->path_len] = '\0';
+		struct tuple_field *root = field;
+		field = tuple_field_tree_add_path(format, *path_data,
+						  part->path_len,
+						  part->fieldno);
+		if (field == NULL)
+			return -1;
+		format->subtree_depth =
+			MAX(format->subtree_depth, root->subtree_depth + 1);
+		field->is_key_part = true;
+		*path_data += part->path_len + 1;
+	}
 	/*
 		* If a field is not present in the space format,
 		* inherit nullable action of the first key part
@@ -113,7 +215,10 @@ tuple_format_use_key_part(struct tuple_format *format,
 					       field->type)) {
 		const char *name;
 		int fieldno = part->fieldno + TUPLE_INDEX_BASE;
-		if (part->fieldno >= field_count) {
+		if (unlikely(part->path != NULL)) {
+			name = tt_sprintf("[%d]%.*s", fieldno, part->path_len,
+					  part->path);
+		} else if (part->fieldno >= field_count) {
 			name = tt_sprintf("%d", fieldno);
 		} else {
 			const struct field_def *def =
@@ -137,10 +242,9 @@ tuple_format_use_key_part(struct tuple_format *format,
 	 * simply accessible, so we don't store an offset for it.
 	 */
 	if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
-	    is_sequential == false && part->fieldno > 0) {
-		*current_slot = *current_slot - 1;
-		field->offset_slot = *current_slot;
-	}
+	    is_sequential == false &&
+	    (part->fieldno > 0 || part->path != NULL))
+		field->offset_slot = (*current_slot = *current_slot - 1);
 	return 0;
 }
 
@@ -181,7 +285,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 	}
 
 	int current_slot = 0;
-
+	char *paths_data = (char *)format + sizeof(struct tuple_format);
 	/* extract field type info */
 	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
 		const struct key_def *key_def = keys[key_no];
@@ -193,7 +297,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			if (tuple_format_use_key_part(format, fields,
 						      field_count, part,
 						      is_sequential,
-						      &current_slot) != 0)
+						      &current_slot,
+						      &paths_data) != 0)
 				return -1;
 		}
 	}
@@ -261,6 +366,8 @@ static struct tuple_format *
 tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		   uint32_t space_field_count, struct tuple_dictionary *dict)
 {
+	/* Size of area to store paths. */
+	uint32_t paths_size = 0;
 	uint32_t index_field_count = 0;
 	/* find max max field no */
 	for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
@@ -270,13 +377,16 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		for (; part < pend; part++) {
 			index_field_count = MAX(index_field_count,
 						part->fieldno + 1);
+			if (part->path != NULL)
+				paths_size += part->path_len + 1;
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
 
-	struct tuple_format *format = malloc(sizeof(struct tuple_format));
+	uint32_t allocation_size = sizeof(struct tuple_format) + paths_size;
+	struct tuple_format *format = malloc(allocation_size);
 	if (format == NULL) {
-		diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
+		diag_set(OutOfMemory, allocation_size, "malloc",
 			 "tuple format");
 		return NULL;
 	}
@@ -284,6 +394,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		free(format);
 		return NULL;
 	}
+	format->subtree_depth = 1;
 	struct json_token token;
 	memset(&token, 0, sizeof(token));
 	token.key.type = JSON_TOKEN_NUM;
@@ -306,6 +417,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
+	format->allocation_size = allocation_size;
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
 	format->index_field_count = index_field_count;
@@ -377,16 +489,37 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
 {
 	if (format1->exact_field_count != format2->exact_field_count)
 		return false;
-	uint32_t format1_field_count = tuple_format_field_count(format1);
-	uint32_t format2_field_count = tuple_format_field_count(format2);
-	for (uint32_t i = 0; i < format1_field_count; ++i) {
-		const struct tuple_field *field1 =
-			tuple_format_field(format1, i);
+	struct tuple_field *field1;
+	struct json_token *field2_prev_token = NULL;
+	struct json_token *skip_root_token = NULL;
+	struct json_token *field1_prev_token = &format1->tree.root;
+	json_tree_foreach_entry_preorder(field1, &format1->tree.root,
+					 struct tuple_field, token) {
+		/* Test if subtree skip is required. */
+		if (skip_root_token != NULL) {
+			struct json_token *tmp = &field1->token;
+			while (tmp->parent != NULL &&
+			       tmp->parent != skip_root_token)
+				tmp = tmp->parent;
+			if (tmp->parent == skip_root_token)
+				continue;
+		}
+		skip_root_token = NULL;
+		/* Lookup for a valid parent node in new tree. */
+		while (field1_prev_token != field1->token.parent) {
+			field1_prev_token = field1_prev_token->parent;
+			field2_prev_token = field2_prev_token->parent;
+			assert(field1_prev_token != NULL);
+		}
+		struct tuple_field *field2 =
+			json_tree_lookup_entry(&format2->tree, field2_prev_token,
+						&field1->token,
+						struct tuple_field, token);
 		/*
 		 * The field has a data type in format1, but has
 		 * no data type in format2.
 		 */
-		if (i >= format2_field_count) {
+		if (field2 == NULL) {
 			/*
 			 * The field can get a name added
 			 * for it, and this doesn't require a data
@@ -397,13 +530,13 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
 			 * NULLs or miss the subject field.
 			 */
 			if (field1->type == FIELD_TYPE_ANY &&
-			    tuple_field_is_nullable(field1))
+			    tuple_field_is_nullable(field1)) {
+				skip_root_token = &field1->token;
 				continue;
-			else
+			} else {
 				return false;
+			}
 		}
-		const struct tuple_field *field2 =
-			tuple_format_field(format2, i);
 		if (! field_type1_contains_type2(field1->type, field2->type))
 			return false;
 		/*
@@ -413,10 +546,82 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
 		if (tuple_field_is_nullable(field2) &&
 		    !tuple_field_is_nullable(field1))
 			return false;
+
+		field2_prev_token = &field2->token;
+		field1_prev_token = &field1->token;
 	}
 	return true;
 }
 
+/** Find a field in format by offset slot. */
+static struct tuple_field *
+tuple_field_by_offset_slot(const struct tuple_format *format,
+			   int32_t offset_slot)
+{
+	struct tuple_field *field;
+	struct json_token *root = (struct json_token *)&format->tree.root;
+	json_tree_foreach_entry_preorder(field, root, struct tuple_field,
+					 token) {
+		if (field->offset_slot == offset_slot)
+			return field;
+	}
+	return NULL;
+}
+
+/**
+ * Verify field_map and raise error on some indexed field has
+ * not been initialized. Routine rely on field_map has been
+ * initialized with UINT32_MAX marker before field_map
+ * initialization.
+ */
+static int
+tuple_field_map_validate(const struct tuple_format *format, uint32_t *field_map)
+{
+	struct json_token *tree_node = (struct json_token *)&format->tree.root;
+	/* Lookup for absent not-nullable fields. */
+	int32_t field_map_items =
+		(int32_t)(format->field_map_size/sizeof(field_map[0]));
+	for (int32_t i = -1; i >= -field_map_items; i--) {
+		if (field_map[i] != UINT32_MAX)
+			continue;
+
+		struct tuple_field *field =
+			tuple_field_by_offset_slot(format, i);
+		assert(field != NULL);
+		/* Lookup for field number in tree. */
+		struct json_token *parent = &field->token;
+		while (parent->parent != &format->tree.root)
+			parent = parent->parent;
+		assert(parent->key.type == JSON_TOKEN_NUM);
+		uint32_t fieldno = parent->key.num;
+
+		tree_node = &field->token;
+		const char *err_msg;
+		if (field->token.key.type == JSON_TOKEN_STR) {
+			err_msg = tt_sprintf("invalid field %d document "
+					     "content: map doesn't contain a "
+					     "key '%.*s' defined in index",
+					     fieldno, tree_node->key.len,
+					     tree_node->key.str);
+		} else if (field->token.key.type == JSON_TOKEN_NUM) {
+			err_msg = tt_sprintf("invalid field %d document "
+					     "content: array size %d is less "
+					     "than size %d defined in index",
+					     fieldno, tree_node->key.num,
+					     tree_node->parent->child_count);
+		}
+		diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg);
+		return -1;
+	}
+	return 0;
+}
+
+struct parse_ctx {
+	enum json_token_type child_type;
+	uint32_t items;
+	uint32_t curr;
+};
+
 /** @sa declaration for details. */
 int
 tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
@@ -442,44 +647,123 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 			 (unsigned) format->min_field_count);
 		return -1;
 	}
-
-	/* first field is simply accessible, so we do not store offset to it */
-	enum mp_type mp_type = mp_typeof(*pos);
-	const struct tuple_field *field =
-		tuple_format_field((struct tuple_format *)format, 0);
-	if (validate &&
-	    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
-				 TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
-		return -1;
-	mp_next(&pos);
-	/* other fields...*/
-	uint32_t i = 1;
 	uint32_t defined_field_count = MIN(field_count, validate ?
 					   tuple_format_field_count(format) :
 					   format->index_field_count);
-	if (field_count < format->index_field_count) {
-		/*
-		 * Nullify field map to be able to detect by 0,
-		 * which key fields are absent in tuple_field().
-		 */
-		memset((char *)field_map - format->field_map_size, 0,
-		       format->field_map_size);
-	}
-	for (; i < defined_field_count; ++i) {
-		field = tuple_format_field((struct tuple_format *)format, i);
-		mp_type = mp_typeof(*pos);
-		if (validate &&
-		    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
-					 i + TUPLE_INDEX_BASE,
-					 tuple_field_is_nullable(field)))
-			return -1;
-		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
-			field_map[field->offset_slot] =
-				(uint32_t) (pos - tuple);
+	/*
+	 * Fill field_map with marker for toutine
+	 * tuple_field_map_validate to detect absent fields.
+	 */
+	memset((char *)field_map - format->field_map_size,
+		validate ? UINT32_MAX : 0, format->field_map_size);
+
+	struct region *region = &fiber()->gc;
+	uint32_t mp_stack_items = format->subtree_depth + 1;
+	uint32_t mp_stack_size = mp_stack_items * sizeof(struct parse_ctx);
+	struct parse_ctx *mp_stack = region_alloc(region, mp_stack_size);
+	if (unlikely(mp_stack == NULL)) {
+		diag_set(OutOfMemory, mp_stack_size, "region_alloc",
+			 "mp_stack");
+		return -1;
+	}
+	mp_stack[0] = (struct parse_ctx){
+		.child_type = JSON_TOKEN_NUM,
+		.items = defined_field_count,
+		.curr = 0,
+	};
+	uint32_t mp_stack_idx = 0;
+	struct json_tree *tree = (struct json_tree *)&format->tree;
+	struct json_token *parent = &tree->root;
+	while (mp_stack[0].curr <= mp_stack[0].items) {
+		/* Prepare key for tree lookup. */
+		struct json_token token;
+		token.key.type = mp_stack[mp_stack_idx].child_type;
+		++mp_stack[mp_stack_idx].curr;
+		if (token.key.type == JSON_TOKEN_NUM) {
+			token.key.num = mp_stack[mp_stack_idx].curr;
+		} else if (token.key.type == JSON_TOKEN_STR) {
+			if (mp_typeof(*pos) != MP_STR) {
+				/*
+				 * We do not support non-string
+				 * keys in maps.
+				 */
+				mp_next(&pos);
+				mp_next(&pos);
+				continue;
+			}
+			token.key.str =
+				mp_decode_str(&pos, (uint32_t *)&token.key.len);
+		} else {
+			unreachable();
+		}
+		struct tuple_field *field =
+			json_tree_lookup_entry(tree, parent, &token,
+					       struct tuple_field, token);
+		enum mp_type type = mp_typeof(*pos);
+		if (field != NULL) {
+			bool is_nullable = tuple_field_is_nullable(field);
+			if (validate &&
+			    key_mp_type_validate(field->type, type,
+						 ER_FIELD_TYPE,
+						 mp_stack[0].curr,
+						 is_nullable) != 0)
+				return -1;
+			if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+				field_map[field->offset_slot] =
+					(uint32_t)(pos - tuple);
+			}
+		}
+		/* Prepare stack info for next iteration. */
+		if (field != NULL && type == MP_ARRAY &&
+		    mp_stack_idx + 1 < format->subtree_depth) {
+			uint32_t size = mp_decode_array(&pos);
+			if (unlikely(size == 0))
+				continue;
+			parent = &field->token;
+			mp_stack[++mp_stack_idx] = (struct parse_ctx){
+				.child_type = JSON_TOKEN_NUM,
+				.items = size,
+				.curr = 0,
+			};
+		} else if (field != NULL && type == MP_MAP &&
+			   mp_stack_idx + 1 < format->subtree_depth) {
+			uint32_t size = mp_decode_map(&pos);
+			if (unlikely(size == 0))
+				continue;
+			parent = &field->token;
+			mp_stack[++mp_stack_idx] = (struct parse_ctx){
+				.child_type = JSON_TOKEN_STR,
+				.items = size,
+				.curr = 0,
+			};
+		} else {
+			mp_next(&pos);
+			while (mp_stack[mp_stack_idx].curr >=
+			       mp_stack[mp_stack_idx].items) {
+				assert(parent != NULL);
+				parent = parent->parent;
+				if (mp_stack_idx-- == 0)
+					goto end;
+			}
 		}
-		mp_next(&pos);
+	};
+end:;
+	/*
+	 * Field map has already been initialized with zeros when
+	 * no validation is required.
+	 */
+	if (!validate)
+		return 0;
+	struct tuple_field *field;
+	struct json_token *root = (struct json_token *)&format->tree.root;
+	json_tree_foreach_entry_preorder(field, root, struct tuple_field,
+					 token) {
+		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL &&
+			tuple_field_is_nullable(field) &&
+			field_map[field->offset_slot] == UINT32_MAX)
+			field_map[field->offset_slot] = 0;
 	}
-	return 0;
+	return tuple_field_map_validate(format, field_map);
 }
 
 uint32_t
@@ -617,15 +901,7 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
 	return -1;
 }
 
-/**
- * Retrieve msgpack data by JSON path.
- * @param data Pointer to msgpack with data.
- * @param path The path to process.
- * @param path_len The length of the @path.
- * @retval 0 On success.
- * @retval >0 On path parsing error, invalid character position.
- */
-static int
+int
 tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
 {
 	int rc;
@@ -731,3 +1007,40 @@ error:
 		 tt_sprintf("error in path on position %d", rc));
 	return -1;
 }
+
+const char *
+tuple_field_by_part_raw(struct tuple_format *format, const char *data,
+			const uint32_t *field_map, struct key_part *part)
+{
+	if (likely(part->path == NULL))
+		return tuple_field_raw(format, data, field_map, part->fieldno);
+
+	uint32_t field_count = tuple_format_field_count(format);
+	struct tuple_field *root_field =
+		likely(part->fieldno < field_count) ?
+		tuple_format_field(format, part->fieldno) : NULL;
+	struct tuple_field *field =
+		unlikely(root_field == NULL) ? NULL:
+		tuple_format_field_by_path(format, root_field, part->path,
+					   part->path_len);
+	if (unlikely(field == NULL)) {
+		/*
+		 * Legacy tuple having no field map for JSON
+		 * index require full path parse.
+		 */
+		const char *field_raw =
+			tuple_field_raw(format, data, field_map, part->fieldno);
+		if (unlikely(field_raw == NULL))
+			return NULL;
+		if (tuple_field_go_to_path(&field_raw, part->path,
+					   part->path_len) != 0)
+			return NULL;
+		return field_raw;
+	}
+	int32_t offset_slot = field->offset_slot;
+	assert(offset_slot < 0);
+	assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
+	if (unlikely(field_map[offset_slot] == 0))
+		return NULL;
+	return data + field_map[offset_slot];
+}
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 2da773b..860f052 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -116,6 +116,8 @@ struct tuple_field {
 	uint32_t coll_id;
 	/** An JSON entry to organize tree. */
 	struct json_token token;
+	/** A maximum depth of field subtree. */
+	uint32_t subtree_depth;
 };
 
 /**
@@ -169,12 +171,16 @@ struct tuple_format {
 	 * index_field_count <= min_field_count <= field_count.
 	 */
 	uint32_t min_field_count;
+	/** Size of format allocation. */
+	uint32_t allocation_size;
 	/**
 	 * Shared names storage used by all formats of a space.
 	 */
 	struct tuple_dictionary *dict;
 	/** JSON tree of fields. */
 	struct json_tree tree;
+	/** A maximum depth of fields subtree. */
+	uint32_t subtree_depth;
 };
 
 
@@ -196,6 +202,17 @@ tuple_format_field(struct tuple_format *format, uint32_t fieldno)
 				      struct tuple_field, token);
 }
 
+static inline struct tuple_field *
+tuple_format_field_by_path(struct tuple_format *format,
+			   struct tuple_field *root, const char *path,
+			   uint32_t path_len)
+{
+	return json_tree_lookup_path_entry(&format->tree, &root->token,
+					   path, path_len, struct tuple_field,
+					   token);
+}
+
+
 extern struct tuple_format **tuple_formats;
 
 static inline uint32_t
@@ -397,6 +414,18 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
 }
 
 /**
+ * Retrieve msgpack data by JSON path.
+ * @param data Pointer to msgpack with data.
+ * @param path The path to process.
+ * @param path_len The length of the @path.
+ * @retval 0 On success.
+ * @retval >0 On path parsing error, invalid character position.
+ */
+int
+tuple_field_go_to_path(const char **data, const char *path,
+		       uint32_t path_len);
+
+/**
  * Get tuple field by its path.
  * @param format Tuple format.
  * @param tuple MessagePack tuple's body.
@@ -423,12 +452,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
  * @param part Index part to use.
  * @retval Field data if the field exists or NULL.
  */
-static inline const char *
-tuple_field_by_part_raw(const struct tuple_format *format, const char *data,
-			const uint32_t *field_map, struct key_part *part)
-{
-	return tuple_field_raw(format, data, field_map, part->fieldno);
-}
+const char *
+tuple_field_by_part_raw(struct tuple_format *format, const char *data,
+			const uint32_t *field_map, struct key_part *part);
 
 #if defined(__cplusplus)
 } /* extern "C" */
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index b394804..3486ce1 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -222,7 +222,7 @@ key_hash_slowpath(const char *key, struct key_def *key_def);
 
 void
 tuple_hash_func_set(struct key_def *key_def) {
-	if (key_def->is_nullable)
+	if (key_def->is_nullable || key_def->has_json_paths)
 		goto slowpath;
 	/*
 	 * Check that key_def defines sequential a key without holes
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index ce81c6a..3c9fbf8 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -982,6 +982,9 @@ vinyl_index_def_change_requires_rebuild(struct index *index,
 			return true;
 		if (!field_type1_contains_type2(new_part->type, old_part->type))
 			return true;
+		if (json_path_cmp(old_part->path, old_part->path_len,
+				  new_part->path, new_part->path_len) != 0)
+			return true;
 	}
 	return false;
 }
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index 8a8f9d7..0550144 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -711,7 +711,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src)
 				 "struct key_part_def");
 			goto err;
 		}
-		key_def_dump_parts(src->key_def, dst->key_parts);
+		if (key_def_dump_parts(pool, src->key_def, dst->key_parts) != 0)
+			goto err;
 		dst->key_part_count = src->key_def->part_count;
 		dst->key_def = NULL;
 	}
diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
index 7b704b8..9d5e220 100644
--- a/src/box/vy_point_lookup.c
+++ b/src/box/vy_point_lookup.c
@@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
 		const struct vy_read_view **rv,
 		struct tuple *key, struct tuple **ret)
 {
-	assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
-
 	*ret = NULL;
 	double start_time = ev_monotonic_now(loop());
 	int rc = 0;
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index 3e60fec..2f35284 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -29,6 +29,7 @@
  * SUCH DAMAGE.
  */
 
+#include "assoc.h"
 #include "vy_stmt.h"
 
 #include <stdlib.h>
@@ -370,6 +371,85 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
 	return replace;
 }
 
+/**
+ * Construct tuple or calculate it's size. The fields_iov_ht
+ * is a hashtable that links leaf field records of field path
+ * tree and iovs that contain raw data. Function also fills the
+ * tuple field_map when write_data flag is set true.
+ */
+static void
+vy_stmt_tuple_restore_raw(struct tuple_format *format, char *tuple_raw,
+			  uint32_t *field_map, char **offset,
+			  struct mh_i64ptr_t *fields_iov_ht, bool write_data)
+{
+	struct tuple_field *prev = NULL;
+	struct tuple_field *curr;
+	json_tree_foreach_entry_preorder(curr, &format->tree.root,
+					 struct tuple_field, token) {
+		struct json_token *curr_node = &curr->token;
+		struct tuple_field *parent =
+			curr_node->parent == NULL ? NULL :
+			json_tree_entry(curr_node->parent, struct tuple_field,
+					token);
+		if (parent != NULL && parent->type == FIELD_TYPE_ARRAY &&
+		    curr_node->sibling_idx > 0) {
+			/*
+			 * Fill unindexed array items with nulls.
+			 * Gaps size calculated as a difference
+			 * between sibling nodes.
+			 */
+			for (uint32_t i = curr_node->sibling_idx - 1;
+			     curr_node->parent->children[i] == NULL &&
+			     i > 0; i--) {
+				*offset = !write_data ?
+					  (*offset += mp_sizeof_nil()) :
+					  mp_encode_nil(*offset);
+			}
+		} else if (parent != NULL && parent->type == FIELD_TYPE_MAP) {
+			/* Set map key. */
+			const char *str = curr_node->key.str;
+			uint32_t len = curr_node->key.len;
+			*offset = !write_data ?
+				  (*offset += mp_sizeof_str(len)) :
+				  mp_encode_str(*offset, str, len);
+		}
+		/* Fill data. */
+		uint32_t children_count = curr_node->child_count;
+		if (curr->type == FIELD_TYPE_ARRAY) {
+			*offset = !write_data ?
+				  (*offset += mp_sizeof_array(children_count)) :
+				  mp_encode_array(*offset, children_count);
+		} else if (curr->type == FIELD_TYPE_MAP) {
+			*offset = !write_data ?
+				  (*offset += mp_sizeof_map(children_count)) :
+				  mp_encode_map(*offset, children_count);
+		} else {
+			/* Leaf record. */
+			mh_int_t k = mh_i64ptr_find(fields_iov_ht,
+						    (uint64_t)curr, NULL);
+			struct iovec *iov =
+				k != mh_end(fields_iov_ht) ?
+				mh_i64ptr_node(fields_iov_ht, k)->val : NULL;
+			if (iov == NULL) {
+				*offset = !write_data ?
+					  (*offset += mp_sizeof_nil()) :
+					  mp_encode_nil(*offset);
+			} else {
+				uint32_t data_offset = *offset - tuple_raw;
+				int32_t slot = curr->offset_slot;
+				if (write_data) {
+					memcpy(*offset, iov->iov_base,
+					       iov->iov_len);
+					if (slot != TUPLE_OFFSET_SLOT_NIL)
+						field_map[slot] = data_offset;
+				}
+				*offset += iov->iov_len;
+			}
+		}
+		prev = curr;
+	}
+}
+
 static struct tuple *
 vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 			       const struct key_def *cmp_def,
@@ -378,51 +458,79 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	/* UPSERT can't be surrogate. */
 	assert(type != IPROTO_UPSERT);
 	struct region *region = &fiber()->gc;
+	struct tuple *stmt = NULL;
 
 	uint32_t field_count = format->index_field_count;
-	struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
+	uint32_t part_count = mp_decode_array(&key);
+	assert(part_count == cmp_def->part_count);
+	struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count);
 	if (iov == NULL) {
-		diag_set(OutOfMemory, sizeof(*iov) * field_count,
-			 "region", "iov for surrogate key");
+		diag_set(OutOfMemory, sizeof(*iov) * part_count, "region",
+			"iov for surrogate key");
 		return NULL;
 	}
-	memset(iov, 0, sizeof(*iov) * field_count);
-	uint32_t part_count = mp_decode_array(&key);
-	assert(part_count == cmp_def->part_count);
-	assert(part_count <= field_count);
-	uint32_t nulls_count = field_count - cmp_def->part_count;
-	uint32_t bsize = mp_sizeof_array(field_count) +
-			 mp_sizeof_nil() * nulls_count;
-	for (uint32_t i = 0; i < part_count; ++i) {
-		const struct key_part *part = &cmp_def->parts[i];
+	/* Hastable linking leaf field and corresponding iov. */
+	struct mh_i64ptr_t *fields_iov_ht = mh_i64ptr_new();
+	if (fields_iov_ht == NULL) {
+		diag_set(OutOfMemory, sizeof(struct mh_i64ptr_t),
+			 "mh_i64ptr_new", "fields_iov_ht");
+		return NULL;
+	}
+	if (mh_i64ptr_reserve(fields_iov_ht, part_count, NULL) != 0) {
+		diag_set(OutOfMemory, part_count, "mh_i64ptr_reserve",
+			 "fields_iov_ht");
+		goto end;
+	}
+	memset(iov, 0, sizeof(*iov) * part_count);
+	const struct key_part *part = cmp_def->parts;
+	for (uint32_t i = 0; i < part_count; ++i, ++part) {
 		assert(part->fieldno < field_count);
 		const char *svp = key;
-		iov[part->fieldno].iov_base = (char *) key;
+		iov[i].iov_base = (char *) key;
 		mp_next(&key);
-		iov[part->fieldno].iov_len = key - svp;
-		bsize += key - svp;
+		iov[i].iov_len = key - svp;
+		struct tuple_field *field;
+		field = tuple_format_field(format, part->fieldno);
+		assert(field != NULL);
+		if (unlikely(part->path != NULL)) {
+			field = tuple_format_field_by_path(format, field,
+							   part->path,
+							   part->path_len);
+		}
+		assert(field != NULL);
+		struct mh_i64ptr_node_t node = {(uint64_t)field, &iov[i]};
+		mh_int_t k = mh_i64ptr_put(fields_iov_ht, &node, NULL, NULL);
+		if (unlikely(k == mh_end(fields_iov_ht))) {
+			diag_set(OutOfMemory, part_count, "mh_i64ptr_put",
+				 "fields_iov_ht");
+			goto end;
+		}
+		k = mh_i64ptr_find(fields_iov_ht, (uint64_t)field, NULL);
+		assert(k != mh_end(fields_iov_ht));
 	}
+	/* Calculate tuple size to make allocation. */
+	char *data = NULL;
+	vy_stmt_tuple_restore_raw(format, NULL, NULL, &data, fields_iov_ht,
+				  false);
+	uint32_t bsize = mp_sizeof_array(field_count) + data - (char *)NULL;
 
-	struct tuple *stmt = vy_stmt_alloc(format, bsize);
+	stmt = vy_stmt_alloc(format, bsize);
 	if (stmt == NULL)
-		return NULL;
+		goto end;
 
+	/* Construct tuple. */
 	char *raw = (char *) tuple_data(stmt);
 	uint32_t *field_map = (uint32_t *) raw;
+	memset((char *)field_map - format->field_map_size, 0,
+	       format->field_map_size);
 	char *wpos = mp_encode_array(raw, field_count);
-	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = tuple_format_field(format, i);
-		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
-			field_map[field->offset_slot] = wpos - raw;
-		if (iov[i].iov_base == NULL) {
-			wpos = mp_encode_nil(wpos);
-		} else {
-			memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
-			wpos += iov[i].iov_len;
-		}
-	}
-	assert(wpos == raw + bsize);
+	vy_stmt_tuple_restore_raw(format, raw, field_map, &wpos, fields_iov_ht,
+				  true);
+
+	assert(wpos <= raw + bsize);
 	vy_stmt_set_type(stmt, type);
+end:
+	mh_i64ptr_delete(fields_iov_ht);
 	return stmt;
 }
 
diff --git a/test/box/misc.result b/test/box/misc.result
index 9f863d9..97070f3 100644
--- a/test/box/misc.result
+++ b/test/box/misc.result
@@ -415,6 +415,7 @@ t;
   83: box.error.ROLE_EXISTS
   84: box.error.CREATE_ROLE
   85: box.error.INDEX_EXISTS
+  86: box.error.DATA_STRUCTURE_MISMATCH
   87: box.error.ROLE_LOOP
   88: box.error.GRANT
   89: box.error.PRIV_GRANTED
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 35c700e..322821e 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -954,6 +954,422 @@ type(tuple:tomap().fourth)
 s:drop()
 ---
 ...
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+box.cfg()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
+    part is indexed twice'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+- error: 'Wrong index options (field 2): ''path'' must be string'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+    ''map'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+    ''array'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field [2]["FIO"].fname has type 'string' in one index, but type 'map' in
+    another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field [2]["FIO"].fname has type 'array' in one index, but type 'map' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+---
+- error: 'Wrong index options (field 3): invalid JSON path ''FIO....fname'': path
+    has invalid structure (error at position 5)'
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+assert(idx.parts[2].path == "FIO.fname")
+---
+- true
+...
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected string'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+---
+- error: 'Tuple doesn''t math document structure: invalid field 3 document content:
+    map doesn''t contain a key ''sname'' defined in index'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+  4, 5]
+...
+idx:select()
+---
+- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+  - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+    4, 5]
+...
+idx:min()
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:max()
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+  4, 5]
+...
+s:drop()
+---
+...
+s = box.schema.create_space('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[2]'}
+---
+...
+pk = s:create_index('pk', {parts = parts})
+---
+...
+s:insert{{1, 2}, 3}
+---
+- [[1, 2], 3]
+...
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+---
+...
+s:get(2)
+---
+- [[1, 2], 8]
+...
+s:drop()
+---
+...
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk = s:create_index('primary', { type = 'tree' })
+---
+...
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+---
+- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5]
+...
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+---
+- [4, 7, {'town': 'London', 'FIO': [1, 2, 3]}, 4, 5]
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+_ = s:delete(1)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+_ = s:delete(2)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field 3 type does not match one required by operation: expected map'
+...
+_ = s:delete(4)
+---
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+---
+- error: Field [3]["FIO"]["fname"] has type 'string' in one index, but type 'number'
+    in another
+...
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+...
+assert(idx2 ~= nil)
+---
+- true
+...
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+---
+...
+idx:select()
+---
+- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+  - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:min()
+---
+- [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+...
+idx:max()
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:drop()
+---
+...
+s:drop()
+---
+...
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'str', path='[3][2].a'}
+---
+...
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+---
+...
+parts[3] = {2, 'str', path = '[2].d[1]'}
+---
+...
+pk = s:create_index('primary', { type = 'tree', parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+---
+- [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6,
+  [1, 2, 3]]
+...
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- error: Duplicate key exists in unique index 'primary' in space 'withdata'
+...
+parts = {}
+---
+...
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+---
+...
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+---
+...
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+---
+...
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+---
+- error: 'Tuple doesn''t math document structure: invalid field 4 document content:
+    array size 1 is less than size 4 defined in index'
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+---
+...
+parts[2] = {3, 'unsigned'}
+---
+...
+crosspart_idx = s:create_index('crosspart', { parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+---
+...
+num_idx = s:create_index('numeric', {parts =  parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:get(2)
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+num_idx:select()
+---
+- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+      9, 2, 3]]
+  - [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}],
+    6, [1, 2, 3]]
+  - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+      0]]
+...
+num_idx:max()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:min()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+assert(crosspart_idx:max() == num_idx:max())
+---
+- true
+...
+assert(crosspart_idx:min() == num_idx:min())
+---
+- true
+...
+trap_idx:max()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+    2, 3]]
+...
+trap_idx:min()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+s:drop()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk_simplified = s:create_index('primary', { type = 'tree',  parts = {{1, 'unsigned'}}})
+---
+...
+assert(pk_simplified.path == box.NULL)
+---
+- true
+...
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+---
+...
+s:insert{31, {a = 1, aa = -1}}
+---
+- [31, {'a': 1, 'aa': -1}]
+...
+s:insert{22, {a = 2, aa = -2}}
+---
+- [22, {'a': 2, 'aa': -2}]
+...
+s:insert{13, {a = 3, aa = -3}}
+---
+- [13, {'a': 3, 'aa': -3}]
+...
+idx:select()
+---
+- - [31, {'a': 1, 'aa': -1}]
+  - [22, {'a': 2, 'aa': -2}]
+  - [13, {'a': 3, 'aa': -3}]
+...
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+---
+...
+idx:select()
+---
+- - [13, {'a': 3, 'aa': -3}]
+  - [22, {'a': 2, 'aa': -2}]
+  - [31, {'a': 1, 'aa': -1}]
+...
+s:drop()
+---
+...
+-- incompatible format change
+s = box.schema.space.create('test')
+---
+...
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
+---
+...
+s:insert{{-1}}
+---
+- [[-1]]
+...
+i:alter{parts = {{1, 'string', path = '[1]'}}}
+---
+- error: 'Tuple field 1 type does not match one required by operation: expected string'
+...
+s:insert{{'a'}}
+---
+- error: 'Tuple field 1 type does not match one required by operation: expected integer'
+...
+i:drop()
+---
+...
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}})
+---
+...
+s:insert{{{FIO=-1}}}
+---
+- [[{'FIO': -1}]]
+...
+i:alter{parts = {{1, 'integer', path = '[1][1]'}}}
+---
+- error: 'Tuple field 1 type does not match one required by operation: expected array'
+...
+i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}}
+---
+- error: 'Tuple field 1 type does not match one required by operation: expected array'
+...
+s:drop()
+---
+...
 engine = nil
 ---
 ...
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index edc3dab..d53ab42 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -312,5 +312,126 @@ tuple:tomap().fourth
 type(tuple:tomap().fourth)
 s:drop()
 
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+box.cfg()
+s = box.schema.space.create('withdata', {engine = engine})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+assert(idx ~= nil)
+assert(idx.parts[2].path == "FIO.fname")
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+s:drop()
+
+s = box.schema.create_space('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'unsigned', path='[2]'}
+pk = s:create_index('pk', {parts = parts})
+s:insert{{1, 2}, 3}
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+s:get(2)
+s:drop()
+
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+pk = s:create_index('primary', { type = 'tree' })
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(1)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(2)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(4)
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+assert(idx ~= nil)
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+assert(idx2 ~= nil)
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+idx:drop()
+s:drop()
+
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'str', path='[3][2].a'}
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+parts[3] = {2, 'str', path = '[2].d[1]'}
+pk = s:create_index('primary', { type = 'tree', parts =  parts})
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+parts = {}
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+parts[2] = {3, 'unsigned'}
+crosspart_idx = s:create_index('crosspart', { parts =  parts})
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+num_idx = s:create_index('numeric', {parts =  parts})
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+num_idx:get(2)
+num_idx:select()
+num_idx:max()
+num_idx:min()
+assert(crosspart_idx:max() == num_idx:max())
+assert(crosspart_idx:min() == num_idx:min())
+trap_idx:max()
+trap_idx:min()
+s:drop()
+
+s = box.schema.space.create('withdata', {engine = engine})
+pk_simplified = s:create_index('primary', { type = 'tree',  parts = {{1, 'unsigned'}}})
+assert(pk_simplified.path == box.NULL)
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+s:insert{31, {a = 1, aa = -1}}
+s:insert{22, {a = 2, aa = -2}}
+s:insert{13, {a = 3, aa = -3}}
+idx:select()
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+idx:select()
+s:drop()
+
+-- incompatible format change
+s = box.schema.space.create('test')
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
+s:insert{{-1}}
+i:alter{parts = {{1, 'string', path = '[1]'}}}
+s:insert{{'a'}}
+i:drop()
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}})
+s:insert{{{FIO=-1}}}
+i:alter{parts = {{1, 'integer', path = '[1][1]'}}}
+i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}}
+s:drop()
+
+
 engine = nil
 test_run = nil
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 6/9] box: introduce has_json_paths flag in templates
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
                   ` (4 preceding siblings ...)
  2018-11-26 10:49 ` [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

Introduced has_json_path flag for compare, hash and extract
function(that are really hot) to make possible do not look to
path field for flat indexes without any JSON paths.

Part of #1012
---
 src/box/tuple_compare.cc     | 112 +++++++++++++++++++++++++++++++------------
 src/box/tuple_extract_key.cc | 104 ++++++++++++++++++++++++++--------------
 src/box/tuple_hash.cc        |  45 ++++++++++++-----
 3 files changed, 182 insertions(+), 79 deletions(-)

diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 554c29f..97963d0 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -458,11 +458,12 @@ tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	return i;
 }
 
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_path>
 static inline int
 tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		       struct key_def *key_def)
 {
+	assert(has_json_path == key_def->has_json_paths);
 	assert(!has_optional_parts || is_nullable);
 	assert(is_nullable == key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
@@ -508,10 +509,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 		end = part + key_def->part_count;
 
 	for (; part < end; part++) {
-		field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
-						  field_map_a, part);
-		field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
-						  field_map_b, part);
+		if (!has_json_path) {
+			field_a = tuple_field_raw(format_a, tuple_a_raw,
+						  field_map_a,
+						  part->fieldno);
+			field_b = tuple_field_raw(format_b, tuple_b_raw,
+						  field_map_b,
+						  part->fieldno);
+		} else {
+			field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+							  field_map_a, part);
+			field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+							  field_map_b, part);
+		}
 		assert(has_optional_parts ||
 		       (field_a != NULL && field_b != NULL));
 		if (! is_nullable) {
@@ -558,10 +568,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	 */
 	end = key_def->parts + key_def->part_count;
 	for (; part < end; ++part) {
-		field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
-						  field_map_a, part);
-		field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
-						  field_map_b, part);
+		if (!has_json_path) {
+			field_a = tuple_field_raw(format_a, tuple_a_raw,
+						  field_map_a,
+						  part->fieldno);
+			field_b = tuple_field_raw(format_b, tuple_b_raw,
+						  field_map_b,
+						  part->fieldno);
+		} else {
+			field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+							  field_map_a, part);
+			field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+							  field_map_b, part);
+		}
 		/*
 		 * Extended parts are primary, and they can not
 		 * be absent or be NULLs.
@@ -575,11 +594,12 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
 	return 0;
 }
 
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_paths>
 static inline int
 tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 				uint32_t part_count, struct key_def *key_def)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(!has_optional_parts || is_nullable);
 	assert(is_nullable == key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
@@ -591,9 +611,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	const uint32_t *field_map = tuple_field_map(tuple);
 	enum mp_type a_type, b_type;
 	if (likely(part_count == 1)) {
-		const char *field =
-			tuple_field_by_part_raw(format, tuple_raw, field_map,
-						part);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, tuple_raw, field_map,
+						part->fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, tuple_raw,
+							field_map, part);
+		}
 		if (! is_nullable) {
 			return tuple_compare_field(field, key, part->type,
 						   part->coll);
@@ -617,9 +642,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
 	struct key_part *end = part + part_count;
 	int rc;
 	for (; part < end; ++part, mp_next(&key)) {
-		const char *field =
-			tuple_field_by_part_raw(format, tuple_raw,
-						field_map, part);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, tuple_raw, field_map,
+						part->fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, tuple_raw,
+							field_map, part);
+		}
 		if (! is_nullable) {
 			rc = tuple_compare_field(field, key, part->type,
 						 part->coll);
@@ -1012,19 +1042,31 @@ static const comparator_signature cmp_arr[] = {
 
 #undef COMPARATOR
 
+static const tuple_compare_t compare_slowpath_funcs[] = {
+	tuple_compare_slowpath<false, false, false>,
+	tuple_compare_slowpath<true, false, false>,
+	tuple_compare_slowpath<false, true, false>,
+	tuple_compare_slowpath<true, true, false>,
+	tuple_compare_slowpath<false, false, true>,
+	tuple_compare_slowpath<true, false, true>,
+	tuple_compare_slowpath<false, true, true>,
+	tuple_compare_slowpath<true, true, true>
+};
+
 tuple_compare_t
 tuple_compare_create(const struct key_def *def)
 {
+	int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+			   2 * (def->has_optional_parts ? 1 : 0) +
+			   4 * (def->has_json_paths ? 1 : 0);
 	if (def->is_nullable) {
 		if (key_def_is_sequential(def)) {
 			if (def->has_optional_parts)
 				return tuple_compare_sequential<true, true>;
 			else
 				return tuple_compare_sequential<true, false>;
-		} else if (def->has_optional_parts) {
-			return tuple_compare_slowpath<true, true>;
 		} else {
-			return tuple_compare_slowpath<true, false>;
+			return compare_slowpath_funcs[cmp_func_idx];
 		}
 	}
 	assert(! def->has_optional_parts);
@@ -1044,10 +1086,9 @@ tuple_compare_create(const struct key_def *def)
 				return cmp_arr[k].f;
 		}
 	}
-	if (key_def_is_sequential(def))
-		return tuple_compare_sequential<false, false>;
-	else
-		return tuple_compare_slowpath<false, false>;
+	return key_def_is_sequential(def) ?
+	       tuple_compare_sequential<false, false> :
+	       compare_slowpath_funcs[cmp_func_idx];
 }
 
 /* }}} tuple_compare */
@@ -1229,9 +1270,23 @@ static const comparator_with_key_signature cmp_wk_arr[] = {
 
 #undef KEY_COMPARATOR
 
+static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = {
+	tuple_compare_with_key_slowpath<false, false, false>,
+	tuple_compare_with_key_slowpath<true, false, false>,
+	tuple_compare_with_key_slowpath<false, true, false>,
+	tuple_compare_with_key_slowpath<true, true, false>,
+	tuple_compare_with_key_slowpath<false, false, true>,
+	tuple_compare_with_key_slowpath<true, false, true>,
+	tuple_compare_with_key_slowpath<false, true, true>,
+	tuple_compare_with_key_slowpath<true, true, true>
+};
+
 tuple_compare_with_key_t
 tuple_compare_with_key_create(const struct key_def *def)
 {
+	int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+			   2 * (def->has_optional_parts ? 1 : 0) +
+			   4 * (def->has_json_paths ? 1 : 0);
 	if (def->is_nullable) {
 		if (key_def_is_sequential(def)) {
 			if (def->has_optional_parts) {
@@ -1241,10 +1296,8 @@ tuple_compare_with_key_create(const struct key_def *def)
 				return tuple_compare_with_key_sequential<true,
 									 false>;
 			}
-		} else if (def->has_optional_parts) {
-			return tuple_compare_with_key_slowpath<true, true>;
 		} else {
-			return tuple_compare_with_key_slowpath<true, false>;
+			return compare_with_key_slowpath_funcs[cmp_func_idx];
 		}
 	}
 	assert(! def->has_optional_parts);
@@ -1267,10 +1320,9 @@ tuple_compare_with_key_create(const struct key_def *def)
 				return cmp_wk_arr[k].f;
 		}
 	}
-	if (key_def_is_sequential(def))
-		return tuple_compare_with_key_sequential<false, false>;
-	else
-		return tuple_compare_with_key_slowpath<false, false>;
+	return key_def_is_sequential(def) ?
+	       tuple_compare_with_key_sequential<false, false> :
+	       compare_with_key_slowpath_funcs[cmp_func_idx];
 }
 
 /* }}} tuple_compare_with_key */
diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index 04c5463..63ad970 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -5,13 +5,18 @@
 enum { MSGPACK_NULL = 0xc0 };
 
 /** True if key part i and i+1 are sequential. */
+template <bool has_json_paths>
 static inline bool
 key_def_parts_are_sequential(const struct key_def *def, int i)
 {
 	uint32_t fieldno1 = def->parts[i].fieldno + 1;
 	uint32_t fieldno2 = def->parts[i + 1].fieldno;
-	return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
-	       def->parts[i + 1].path == NULL;
+	if (!has_json_paths) {
+		return fieldno1 == fieldno2;
+	} else {
+		return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+		       def->parts[i + 1].path == NULL;
+	}
 }
 
 /** True, if a key con contain two or more parts in sequence. */
@@ -19,7 +24,7 @@ static bool
 key_def_contains_sequential_parts(const struct key_def *def)
 {
 	for (uint32_t i = 0; i < def->part_count - 1; ++i) {
-		if (key_def_parts_are_sequential(def, i))
+		if (key_def_parts_are_sequential<true>(def, i))
 			return true;
 	}
 	return false;
@@ -99,11 +104,13 @@ tuple_extract_key_sequential(const struct tuple *tuple, struct key_def *key_def,
  * General-purpose implementation of tuple_extract_key()
  * @copydoc tuple_extract_key()
  */
-template <bool contains_sequential_parts, bool has_optional_parts>
+template <bool contains_sequential_parts, bool has_optional_parts,
+	  bool has_json_paths>
 static char *
 tuple_extract_key_slowpath(const struct tuple *tuple,
 			   struct key_def *key_def, uint32_t *key_size)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(!has_optional_parts || key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	assert(contains_sequential_parts ==
@@ -118,9 +125,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 
 	/* Calculate the key size. */
 	for (uint32_t i = 0; i < part_count; ++i) {
-		const char *field =
-			tuple_field_by_part_raw(format, data, field_map,
-						&key_def->parts[i]);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, data, field_map,
+						key_def->parts[i].fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, data, field_map,
+							&key_def->parts[i]);
+		}
 		if (has_optional_parts && field == NULL) {
 			bsize += mp_sizeof_nil();
 			continue;
@@ -133,7 +145,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (!key_def_parts_are_sequential(key_def, i)) {
+				if (!key_def_parts_are_sequential
+				        <has_json_paths>(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -159,9 +172,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 	}
 	char *key_buf = mp_encode_array(key, part_count);
 	for (uint32_t i = 0; i < part_count; ++i) {
-		const char *field =
-			tuple_field_by_part_raw(format, data, field_map,
-						&key_def->parts[i]);
+		const char *field;
+		if (!has_json_paths) {
+			field = tuple_field_raw(format, data, field_map,
+						key_def->parts[i].fieldno);
+		} else {
+			field = tuple_field_by_part_raw(format, data, field_map,
+							&key_def->parts[i]);
+		}
 		if (has_optional_parts && field == NULL) {
 			key_buf = mp_encode_nil(key_buf);
 			continue;
@@ -174,7 +192,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
 			 * minimize tuple_field_raw() calls.
 			 */
 			for (; i < part_count - 1; i++) {
-				if (!key_def_parts_are_sequential(key_def, i)) {
+				if (!key_def_parts_are_sequential
+				        <has_json_paths>(key_def, i)) {
 					/*
 					 * End of sequential part.
 					 */
@@ -207,11 +226,12 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
  * General-purpose version of tuple_extract_key_raw()
  * @copydoc tuple_extract_key_raw()
  */
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
 static char *
 tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 			       struct key_def *key_def, uint32_t *key_size)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(!has_optional_parts || key_def->is_nullable);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	assert(mp_sizeof_nil() == 1);
@@ -239,7 +259,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 		uint32_t null_count = 0;
 		for (; i < key_def->part_count - 1; i++) {
-			if (!key_def_parts_are_sequential(key_def, i))
+			if (!key_def_parts_are_sequential
+			        <has_json_paths>(key_def, i))
 				break;
 		}
 		const struct key_part *part = &key_def->parts[i];
@@ -312,6 +333,17 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
 	return key;
 }
 
+static const tuple_extract_key_t extract_key_slowpath_funcs[] = {
+	tuple_extract_key_slowpath<false, false, false>,
+	tuple_extract_key_slowpath<true, false, false>,
+	tuple_extract_key_slowpath<false, true, false>,
+	tuple_extract_key_slowpath<true, true, false>,
+	tuple_extract_key_slowpath<false, false, true>,
+	tuple_extract_key_slowpath<true, false, true>,
+	tuple_extract_key_slowpath<false, true, true>,
+	tuple_extract_key_slowpath<true, true, true>
+};
+
 /**
  * Initialize tuple_extract_key() and tuple_extract_key_raw()
  */
@@ -332,32 +364,30 @@ tuple_extract_key_set(struct key_def *key_def)
 				tuple_extract_key_sequential_raw<false>;
 		}
 	} else {
-		if (key_def->has_optional_parts) {
-			assert(key_def->is_nullable);
-			if (key_def_contains_sequential_parts(key_def)) {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, true>;
-			} else {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false, true>;
-			}
-		} else {
-			if (key_def_contains_sequential_parts(key_def)) {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<true, false>;
-			} else {
-				key_def->tuple_extract_key =
-					tuple_extract_key_slowpath<false,
-								   false>;
-			}
-		}
+		int func_idx =
+			(key_def_contains_sequential_parts(key_def) ? 1 : 0) +
+			2 * (key_def->has_optional_parts ? 1 : 0) +
+			4 * (key_def->has_json_paths ? 1 : 0);
+		key_def->tuple_extract_key =
+			extract_key_slowpath_funcs[func_idx];
+		assert(!key_def->has_optional_parts || key_def->is_nullable);
 	}
 	if (key_def->has_optional_parts) {
 		assert(key_def->is_nullable);
-		key_def->tuple_extract_key_raw =
-			tuple_extract_key_slowpath_raw<true>;
+		if (key_def->has_json_paths) {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<true, true>;
+		} else {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<true, false>;
+		}
 	} else {
-		key_def->tuple_extract_key_raw =
-			tuple_extract_key_slowpath_raw<false>;
+		if (key_def->has_json_paths) {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<false, true>;
+		} else {
+			key_def->tuple_extract_key_raw =
+				tuple_extract_key_slowpath_raw<false, false>;
+		}
 	}
 }
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index 3486ce1..8ede290 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -213,7 +213,7 @@ static const hasher_signature hash_arr[] = {
 
 #undef HASHER
 
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def);
 
@@ -256,10 +256,17 @@ tuple_hash_func_set(struct key_def *key_def) {
 	}
 
 slowpath:
-	if (key_def->has_optional_parts)
-		key_def->tuple_hash = tuple_hash_slowpath<true>;
-	else
-		key_def->tuple_hash = tuple_hash_slowpath<false>;
+	if (key_def->has_optional_parts) {
+		if (key_def->has_json_paths)
+			key_def->tuple_hash = tuple_hash_slowpath<true, true>;
+		else
+			key_def->tuple_hash = tuple_hash_slowpath<true, false>;
+	} else {
+		if (key_def->has_json_paths)
+			key_def->tuple_hash = tuple_hash_slowpath<false, true>;
+		else
+			key_def->tuple_hash = tuple_hash_slowpath<false, false>;
+	}
 	key_def->key_hash = key_hash_slowpath;
 }
 
@@ -319,10 +326,11 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple,
 	return tuple_hash_field(ph1, pcarry, &field, part->coll);
 }
 
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
 uint32_t
 tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
 {
+	assert(has_json_paths == key_def->has_json_paths);
 	assert(has_optional_parts == key_def->has_optional_parts);
 	uint32_t h = HASH_SEED;
 	uint32_t carry = 0;
@@ -331,9 +339,13 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
 	struct tuple_format *format = tuple_format(tuple);
 	const char *tuple_raw = tuple_data(tuple);
 	const uint32_t *field_map = tuple_field_map(tuple);
-	const char *field =
-		tuple_field_by_part_raw(format, tuple_raw, field_map,
-					key_def->parts);
+	const char *field;
+	if (!has_json_paths) {
+		field = tuple_field(tuple, prev_fieldno);
+	} else {
+		field = tuple_field_by_part_raw(format, tuple_raw, field_map,
+						key_def->parts);
+	}
 	const char *end = (char *)tuple + tuple_size(tuple);
 	if (has_optional_parts && field == NULL) {
 		total_size += tuple_hash_null(&h, &carry);
@@ -347,9 +359,18 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
 		 * need of tuple_field
 		 */
 		if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) {
-			struct key_part *part = &key_def->parts[part_id];
-			field = tuple_field_by_part_raw(format, tuple_raw,
-							field_map, part);
+			if (!has_json_paths) {
+				field = tuple_field(tuple,
+						    key_def->parts[part_id].
+						    fieldno);
+			} else {
+				struct key_part *part =
+					&key_def->parts[part_id];
+				field = tuple_field_by_part_raw(format,
+								tuple_raw,
+								field_map,
+								part);
+			}
 		}
 		if (has_optional_parts && (field == NULL || field >= end)) {
 			total_size += tuple_hash_null(&h, &carry);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
                   ` (5 preceding siblings ...)
  2018-11-26 10:49 ` [PATCH v5 6/9] box: introduce has_json_paths flag in templates Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-12-01 17:20   ` Vladimir Davydov
  2018-11-26 10:49 ` [PATCH v5 8/9] box: introduce offset slot cache in key_part Kirill Shcherbatov
  2018-11-26 10:49 ` [PATCH v5 9/9] box: specify indexes in user-friendly form Kirill Shcherbatov
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

We don't need to parse tuple in tuple_field_raw_by_path if
required field has been indexed. We do path lookup in field
tree of JSON paths and return data by it's offset from field_map
instead of whole tuple parsing.

Part of #1012
---
 src/box/tuple_format.c     | 34 ++++++++++++++++++++++++----------
 test/engine/tuple.result   |  5 +++++
 test/engine/tuple.test.lua |  2 ++
 3 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 193d0d8..be89764 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -956,15 +956,12 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		goto error;
 	switch(token.key.type) {
 	case JSON_TOKEN_NUM: {
-		int index = token.key.num;
-		if (index == 0) {
+		fieldno = token.key.num;
+		if (fieldno == 0) {
 			*field = NULL;
 			return 0;
 		}
-		index -= TUPLE_INDEX_BASE;
-		*field = tuple_field_raw(format, tuple, field_map, index);
-		if (*field == NULL)
-			return 0;
+		fieldno -= TUPLE_INDEX_BASE;
 		break;
 	}
 	case JSON_TOKEN_STR: {
@@ -982,10 +979,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			name_hash = field_name_hash(token.key.str,
 						    token.key.len);
 		}
-		*field = tuple_field_raw_by_name(format, tuple, field_map,
-						 token.key.str, token.key.len,
-						 name_hash);
-		if (*field == NULL)
+		if (tuple_fieldno_by_name(format->dict, token.key.str,
+					  token.key.len, name_hash,
+					  &fieldno) != 0)
 			return 0;
 		break;
 	}
@@ -994,6 +990,24 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		*field = NULL;
 		return 0;
 	}
+	/* Optimize indexed JSON field data access. */
+	assert(field != NULL);
+	struct tuple_field *indexed_field =
+		unlikely(fieldno >= tuple_format_field_count(format)) ? NULL :
+		tuple_format_field_by_path(format,
+					   tuple_format_field(format, fieldno),
+					   path + lexer.offset,
+					   path_len - lexer.offset);
+	if (indexed_field != NULL &&
+	    indexed_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+		*field = tuple + field_map[indexed_field->offset_slot];
+		return 0;
+	}
+
+	/* No such field in index. Continue parsing JSON path. */
+	*field = tuple_field_raw(format, tuple, field_map, fieldno);
+	if (*field == NULL)
+		return 0;
 	rc = tuple_field_go_to_path(field, path + lexer.offset,
 				    path_len - lexer.offset);
 	if (rc == 0)
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 322821e..a07e23c 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -1147,6 +1147,11 @@ assert(idx2 ~= nil)
 t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
 ---
 ...
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
+---
+- Agent
+...
 idx:select()
 ---
 - - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index d53ab42..8630850 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -367,6 +367,8 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["
 idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
 assert(idx2 ~= nil)
 t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
 idx:select()
 idx:min()
 idx:max()
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 8/9] box: introduce offset slot cache in key_part
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
                   ` (6 preceding siblings ...)
  2018-11-26 10:49 ` [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-12-03 21:04   ` Vladimir Davydov
  2018-11-26 10:49 ` [PATCH v5 9/9] box: specify indexes in user-friendly form Kirill Shcherbatov
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

Tuned tuple_field_by_part_raw routine with key_part's
offset_slot_cache. Introduced tuple_format epoch to test validity
of this cache. The key_part caches last offset_slot source
format to make epoch comparison because same space may have
multiple format of same epoch that have different key_parts and
related offset_slots distribution.

Part of #1012
---
 src/box/alter.cc                |  7 +++--
 src/box/blackhole.c             |  5 ++--
 src/box/engine.h                | 11 ++++----
 src/box/key_def.c               | 17 +++++++----
 src/box/key_def.h               |  8 ++++++
 src/box/memtx_engine.c          |  4 +--
 src/box/memtx_space.c           |  5 ++--
 src/box/memtx_space.h           |  2 +-
 src/box/schema.cc               |  4 +--
 src/box/space.c                 |  6 ++--
 src/box/space.h                 |  8 ++++--
 src/box/sysview.c               |  3 +-
 src/box/tuple.c                 |  4 +--
 src/box/tuple_format.c          | 62 ++++++++++++++++++++++++++---------------
 src/box/tuple_format.h          |  6 +++-
 src/box/vinyl.c                 |  7 +++--
 src/box/vy_lsm.c                |  5 ++--
 test/unit/vy_iterators_helper.c |  6 ++--
 test/unit/vy_mem.c              |  2 +-
 test/unit/vy_point_lookup.c     |  2 +-
 20 files changed, 110 insertions(+), 64 deletions(-)

diff --git a/src/box/alter.cc b/src/box/alter.cc
index 029da02..6291159 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -856,7 +856,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
 	 * Create a new (empty) space for the new definition.
 	 * Sic: the triggers are not moved over yet.
 	 */
-	alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
+	alter->new_space =
+		space_new_xc(alter->space_def, &alter->key_list,
+			     alter->old_space->format != NULL ?
+			     alter->old_space->format->epoch + 1 : 1);
 	/*
 	 * Copy the replace function, the new space is at the same recovery
 	 * phase as the old one. This hack is especially necessary for
@@ -1667,7 +1670,7 @@ on_replace_dd_space(struct trigger * /* trigger */, void *event)
 		access_check_ddl(def->name, def->id, def->uid, SC_SPACE,
 				 PRIV_C);
 		RLIST_HEAD(empty_list);
-		struct space *space = space_new_xc(def, &empty_list);
+		struct space *space = space_new_xc(def, &empty_list, 0);
 		/**
 		 * The new space must be inserted in the space
 		 * cache right away to achieve linearisable
diff --git a/src/box/blackhole.c b/src/box/blackhole.c
index 0412ffe..2727d12 100644
--- a/src/box/blackhole.c
+++ b/src/box/blackhole.c
@@ -139,7 +139,7 @@ blackhole_engine_shutdown(struct engine *engine)
 
 static struct space *
 blackhole_engine_create_space(struct engine *engine, struct space_def *def,
-			      struct rlist *key_list)
+			      struct rlist *key_list, uint64_t epoch)
 {
 	if (!rlist_empty(key_list)) {
 		diag_set(ClientError, ER_UNSUPPORTED, "Blackhole", "indexes");
@@ -156,7 +156,8 @@ blackhole_engine_create_space(struct engine *engine, struct space_def *def,
 	/* Allocate tuples on runtime arena, but check space format. */
 	struct tuple_format *format;
 	format = tuple_format_new(&tuple_format_runtime->vtab, NULL, 0,
-				  def->fields, def->field_count, def->dict);
+				  def->fields, def->field_count, def->dict,
+				  epoch);
 	if (format == NULL) {
 		free(space);
 		return NULL;
diff --git a/src/box/engine.h b/src/box/engine.h
index 5b96c74..0e8c76c 100644
--- a/src/box/engine.h
+++ b/src/box/engine.h
@@ -72,7 +72,8 @@ struct engine_vtab {
 	void (*shutdown)(struct engine *);
 	/** Allocate a new space instance. */
 	struct space *(*create_space)(struct engine *engine,
-			struct space_def *def, struct rlist *key_list);
+			struct space_def *def, struct rlist *key_list,
+			uint64_t epoch);
 	/**
 	 * Write statements stored in checkpoint @vclock to @stream.
 	 */
@@ -237,9 +238,9 @@ engine_find(const char *name)
 
 static inline struct space *
 engine_create_space(struct engine *engine, struct space_def *def,
-		    struct rlist *key_list)
+		    struct rlist *key_list, uint64_t epoch)
 {
-	return engine->vtab->create_space(engine, def, key_list);
+	return engine->vtab->create_space(engine, def, key_list, epoch);
 }
 
 static inline int
@@ -390,9 +391,9 @@ engine_find_xc(const char *name)
 
 static inline struct space *
 engine_create_space_xc(struct engine *engine, struct space_def *def,
-		    struct rlist *key_list)
+		    struct rlist *key_list, uint64_t epoch)
 {
-	struct space *space = engine_create_space(engine, def, key_list);
+	struct space *space = engine_create_space(engine, def, key_list, epoch);
 	if (space == NULL)
 		diag_raise();
 	return space;
diff --git a/src/box/key_def.c b/src/box/key_def.c
index bc6cecd..2953bfa 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -177,7 +177,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 		 enum field_type type, enum on_conflict_action nullable_action,
 		 struct coll *coll, uint32_t coll_id,
 		 enum sort_order sort_order, const char *path,
-		 uint32_t path_len)
+		 uint32_t path_len, int32_t offset_slot_cache,
+		 struct tuple_format *format_cache)
 {
 	assert(part_no < def->part_count);
 	assert(type < field_type_MAX);
@@ -189,6 +190,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
 	def->parts[part_no].coll = coll;
 	def->parts[part_no].coll_id = coll_id;
 	def->parts[part_no].sort_order = sort_order;
+	def->parts[part_no].offset_slot_cache = offset_slot_cache;
+	def->parts[part_no].format_cache = format_cache;
 	if (path != NULL) {
 		def->parts[part_no].path_len = path_len;
 		assert(def->parts[part_no].path != NULL);
@@ -239,7 +242,8 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
 		}
 		key_def_set_part(def, i, part->fieldno, part->type,
 				 part->nullable_action, coll, part->coll_id,
-				 part->sort_order, part->path, path_len);
+				 part->sort_order, part->path, path_len,
+				 TUPLE_OFFSET_SLOT_NIL, NULL);
 	}
 	key_def_set_cmp(def);
 	return def;
@@ -291,7 +295,8 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
 		key_def_set_part(key_def, item, fields[item],
 				 (enum field_type)types[item],
 				 ON_CONFLICT_ACTION_DEFAULT,
-				 NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0);
+				 NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0,
+				 TUPLE_OFFSET_SLOT_NIL, NULL);
 	}
 	key_def_set_cmp(key_def);
 	return key_def;
@@ -699,7 +704,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
 				 part->nullable_action, part->coll,
 				 part->coll_id, part->sort_order, part->path,
-				 part->path_len);
+				 part->path_len, part->offset_slot_cache,
+				 part->format_cache);
 	}
 
 	/* Set-append second key def's part to the new key def. */
@@ -715,7 +721,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
 		key_def_set_part(new_def, pos++, part->fieldno, part->type,
 				 part->nullable_action, part->coll,
 				 part->coll_id, part->sort_order, part->path,
-				 part->path_len);
+				 part->path_len, part->offset_slot_cache,
+				 part->format_cache);
 	}
 	key_def_set_cmp(new_def);
 	return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index 7731e48..3e08eb4 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -95,6 +95,14 @@ struct key_part {
 	char *path;
 	/** The length of JSON path. */
 	uint32_t path_len;
+	/**
+	 * Source format for offset_slot_cache hit validations.
+	 * Cache is expected to use "the format with the newest
+	 * epoch is most relevant" strategy.
+	 */
+	struct tuple_format *format_cache;
+	/** Cache with format's field offset slot. */
+	int32_t offset_slot_cache;
 };
 
 struct key_def;
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 1bc46c6..caafef0 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -358,10 +358,10 @@ memtx_engine_end_recovery(struct engine *engine)
 
 static struct space *
 memtx_engine_create_space(struct engine *engine, struct space_def *def,
-			  struct rlist *key_list)
+			  struct rlist *key_list, uint64_t epoch)
 {
 	struct memtx_engine *memtx = (struct memtx_engine *)engine;
-	return memtx_space_new(memtx, def, key_list);
+	return memtx_space_new(memtx, def, key_list, epoch);
 }
 
 static int
diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c
index eb790a6..38c9de0 100644
--- a/src/box/memtx_space.c
+++ b/src/box/memtx_space.c
@@ -965,7 +965,7 @@ static const struct space_vtab memtx_space_vtab = {
 
 struct space *
 memtx_space_new(struct memtx_engine *memtx,
-		struct space_def *def, struct rlist *key_list)
+		struct space_def *def, struct rlist *key_list, uint64_t epoch)
 {
 	struct memtx_space *memtx_space = malloc(sizeof(*memtx_space));
 	if (memtx_space == NULL) {
@@ -991,7 +991,8 @@ memtx_space_new(struct memtx_engine *memtx,
 
 	struct tuple_format *format =
 		tuple_format_new(&memtx_tuple_format_vtab, keys, key_count,
-				 def->fields, def->field_count, def->dict);
+				 def->fields, def->field_count, def->dict,
+				 epoch);
 	if (format == NULL) {
 		free(memtx_space);
 		return NULL;
diff --git a/src/box/memtx_space.h b/src/box/memtx_space.h
index 5325383..bad6917 100644
--- a/src/box/memtx_space.h
+++ b/src/box/memtx_space.h
@@ -86,7 +86,7 @@ memtx_space_replace_all_keys(struct space *, struct tuple *, struct tuple *,
 
 struct space *
 memtx_space_new(struct memtx_engine *memtx,
-		struct space_def *def, struct rlist *key_list);
+		struct space_def *def, struct rlist *key_list, uint64_t epoch);
 
 #if defined(__cplusplus)
 } /* extern "C" */
diff --git a/src/box/schema.cc b/src/box/schema.cc
index 8625d92..865e07e 100644
--- a/src/box/schema.cc
+++ b/src/box/schema.cc
@@ -283,7 +283,7 @@ sc_space_new(uint32_t id, const char *name,
 	struct rlist key_list;
 	rlist_create(&key_list);
 	rlist_add_entry(&key_list, index_def, link);
-	struct space *space = space_new_xc(def, &key_list);
+	struct space *space = space_new_xc(def, &key_list, 0);
 	space_cache_replace(NULL, space);
 	if (replace_trigger)
 		trigger_add(&space->on_replace, replace_trigger);
@@ -495,7 +495,7 @@ schema_init()
 			space_def_delete(def);
 		});
 		RLIST_HEAD(key_list);
-		struct space *space = space_new_xc(def, &key_list);
+		struct space *space = space_new_xc(def, &key_list, 0);
 		space_cache_replace(NULL, space);
 		init_system_space(space);
 		trigger_run_xc(&on_alter_space, space);
diff --git a/src/box/space.c b/src/box/space.c
index 4d174f7..0a23cf8 100644
--- a/src/box/space.c
+++ b/src/box/space.c
@@ -183,18 +183,18 @@ fail:
 }
 
 struct space *
-space_new(struct space_def *def, struct rlist *key_list)
+space_new(struct space_def *def, struct rlist *key_list, uint64_t epoch)
 {
 	struct engine *engine = engine_find(def->engine_name);
 	if (engine == NULL)
 		return NULL;
-	return engine_create_space(engine, def, key_list);
+	return engine_create_space(engine, def, key_list, epoch);
 }
 
 struct space *
 space_new_ephemeral(struct space_def *def, struct rlist *key_list)
 {
-	struct space *space = space_new(def, key_list);
+	struct space *space = space_new(def, key_list, 0);
 	if (space == NULL)
 		return NULL;
 	space->def->opts.is_temporary = true;
diff --git a/src/box/space.h b/src/box/space.h
index 7eb7ae2..99eff48 100644
--- a/src/box/space.h
+++ b/src/box/space.h
@@ -419,10 +419,11 @@ struct field_def;
  * Allocate and initialize a space.
  * @param space_def Space definition.
  * @param key_list List of index_defs.
+ * @param epoch Last epoch to initialize format.
  * @retval Space object.
  */
 struct space *
-space_new(struct space_def *space_def, struct rlist *key_list);
+space_new(struct space_def *space_def, struct rlist *key_list, uint64_t epoch);
 
 /**
  * Create an ephemeral space.
@@ -474,9 +475,10 @@ int generic_space_prepare_alter(struct space *, struct space *);
 } /* extern "C" */
 
 static inline struct space *
-space_new_xc(struct space_def *space_def, struct rlist *key_list)
+space_new_xc(struct space_def *space_def, struct rlist *key_list,
+	     uint64_t epoch)
 {
-	struct space *space = space_new(space_def, key_list);
+	struct space *space = space_new(space_def, key_list, epoch);
 	if (space == NULL)
 		diag_raise();
 	return space;
diff --git a/src/box/sysview.c b/src/box/sysview.c
index 29de430..64106c0 100644
--- a/src/box/sysview.c
+++ b/src/box/sysview.c
@@ -508,8 +508,9 @@ sysview_engine_shutdown(struct engine *engine)
 
 static struct space *
 sysview_engine_create_space(struct engine *engine, struct space_def *def,
-			    struct rlist *key_list)
+			    struct rlist *key_list, uint64_t epoch)
 {
+	(void)epoch;
 	struct space *space = (struct space *)calloc(1, sizeof(*space));
 	if (space == NULL) {
 		diag_set(OutOfMemory, sizeof(*space),
diff --git a/src/box/tuple.c b/src/box/tuple.c
index 62e06e7..d8cf517 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -205,7 +205,7 @@ tuple_init(field_name_hash_f hash)
 	 * Create a format for runtime tuples
 	 */
 	tuple_format_runtime = tuple_format_new(&tuple_format_runtime_vtab,
-						NULL, 0, NULL, 0, NULL);
+						NULL, 0, NULL, 0, NULL, 0);
 	if (tuple_format_runtime == NULL)
 		return -1;
 
@@ -377,7 +377,7 @@ box_tuple_format_new(struct key_def **keys, uint16_t key_count)
 {
 	box_tuple_format_t *format =
 		tuple_format_new(&tuple_format_runtime_vtab,
-				 keys, key_count, NULL, 0, NULL);
+				 keys, key_count, NULL, 0, NULL, 0);
 	if (format != NULL)
 		tuple_format_ref(format);
 	return format;
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index be89764..5d90632 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -461,7 +461,8 @@ tuple_format_delete(struct tuple_format *format)
 struct tuple_format *
 tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 		 uint16_t key_count, const struct field_def *space_fields,
-		 uint32_t space_field_count, struct tuple_dictionary *dict)
+		 uint32_t space_field_count, struct tuple_dictionary *dict,
+		 uint64_t epoch)
 {
 	struct tuple_format *format =
 		tuple_format_alloc(keys, key_count, space_field_count, dict);
@@ -470,6 +471,7 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 	format->vtab = *vtab;
 	format->engine = NULL;
 	format->is_temporary = false;
+	format->epoch = epoch;
 	if (tuple_format_register(format) < 0) {
 		tuple_format_destroy(format);
 		free(format);
@@ -1029,29 +1031,43 @@ tuple_field_by_part_raw(struct tuple_format *format, const char *data,
 	if (likely(part->path == NULL))
 		return tuple_field_raw(format, data, field_map, part->fieldno);
 
-	uint32_t field_count = tuple_format_field_count(format);
-	struct tuple_field *root_field =
-		likely(part->fieldno < field_count) ?
-		tuple_format_field(format, part->fieldno) : NULL;
-	struct tuple_field *field =
-		unlikely(root_field == NULL) ? NULL:
-		tuple_format_field_by_path(format, root_field, part->path,
-					   part->path_len);
-	if (unlikely(field == NULL)) {
-		/*
-		 * Legacy tuple having no field map for JSON
-		 * index require full path parse.
-		 */
-		const char *field_raw =
-			tuple_field_raw(format, data, field_map, part->fieldno);
-		if (unlikely(field_raw == NULL))
-			return NULL;
-		if (tuple_field_go_to_path(&field_raw, part->path,
-					   part->path_len) != 0)
-			return NULL;
-		return field_raw;
+	int32_t offset_slot;
+	if (likely(part->format_cache == format)) {
+		assert(format->epoch != 0);
+		offset_slot = part->offset_slot_cache;
+	} else {
+		uint32_t field_count = tuple_format_field_count(format);
+		struct tuple_field *root_field =
+			likely(part->fieldno < field_count) ?
+			tuple_format_field(format, part->fieldno) : NULL;
+		struct tuple_field *field =
+			unlikely(root_field == NULL) ? NULL:
+			tuple_format_field_by_path(format, root_field, part->path,
+						part->path_len);
+		if (unlikely(field == NULL)) {
+			/*
+			* Legacy tuple having no field map for JSON
+			* index require full path parse.
+			*/
+			const char *field_raw =
+				tuple_field_raw(format, data, field_map, part->fieldno);
+			if (unlikely(field_raw == NULL))
+				return NULL;
+			if (tuple_field_go_to_path(&field_raw, part->path,
+						part->path_len) != 0)
+				return NULL;
+			return field_raw;
+		}
+		offset_slot = field->offset_slot;
+		/* Cache offset_slot if required. */
+		if (part->format_cache != format &&
+		    (part->format_cache == NULL ||
+		     part->format_cache->epoch < format->epoch)) {
+			assert(format->epoch != 0);
+			part->offset_slot_cache = offset_slot;
+			part->format_cache = format;
+		}
 	}
-	int32_t offset_slot = field->offset_slot;
 	assert(offset_slot < 0);
 	assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
 	if (unlikely(field_map[offset_slot] == 0))
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 860f052..8a7ebfa 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -137,6 +137,8 @@ tuple_field_is_nullable(const struct tuple_field *tuple_field)
  * Tuple format describes how tuple is stored and information about its fields
  */
 struct tuple_format {
+	/** Counter that grows incrementally on space rebuild. */
+	uint64_t epoch;
 	/** Virtual function table */
 	struct tuple_format_vtab vtab;
 	/** Pointer to engine-specific data. */
@@ -254,6 +256,7 @@ tuple_format_unref(struct tuple_format *format)
  * @param key_count The number of keys in @a keys array.
  * @param space_fields Array of fields, defined in a space format.
  * @param space_field_count Length of @a space_fields.
+ * @param epoch Epoch of new format.
  *
  * @retval not NULL Tuple format.
  * @retval     NULL Memory error.
@@ -261,7 +264,8 @@ tuple_format_unref(struct tuple_format *format)
 struct tuple_format *
 tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 		 uint16_t key_count, const struct field_def *space_fields,
-		 uint32_t space_field_count, struct tuple_dictionary *dict);
+		 uint32_t space_field_count, struct tuple_dictionary *dict,
+		 uint64_t epoch);
 
 /**
  * Check, if @a format1 can store any tuples of @a format2. For
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index 3c9fbf8..ebb1301 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -584,7 +584,7 @@ vinyl_engine_check_space_def(struct space_def *def)
 
 static struct space *
 vinyl_engine_create_space(struct engine *engine, struct space_def *def,
-			  struct rlist *key_list)
+			  struct rlist *key_list, uint64_t epoch)
 {
 	struct space *space = malloc(sizeof(*space));
 	if (space == NULL) {
@@ -610,7 +610,8 @@ vinyl_engine_create_space(struct engine *engine, struct space_def *def,
 
 	struct tuple_format *format =
 		tuple_format_new(&vy_tuple_format_vtab, keys, key_count,
-				 def->fields, def->field_count, def->dict);
+				 def->fields, def->field_count, def->dict,
+				 epoch);
 	if (format == NULL) {
 		free(space);
 		return NULL;
@@ -3017,7 +3018,7 @@ vy_send_lsm(struct vy_join_ctx *ctx, struct vy_lsm_recovery_info *lsm_info)
 	if (ctx->key_def == NULL)
 		goto out;
 	ctx->format = tuple_format_new(&vy_tuple_format_vtab, &ctx->key_def,
-				       1, NULL, 0, NULL);
+				       1, NULL, 0, NULL, 0);
 	if (ctx->format == NULL)
 		goto out_free_key_def;
 	tuple_format_ref(ctx->format);
diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c
index 681b165..e57f864 100644
--- a/src/box/vy_lsm.c
+++ b/src/box/vy_lsm.c
@@ -61,7 +61,7 @@ vy_lsm_env_create(struct vy_lsm_env *env, const char *path,
 		  void *upsert_thresh_arg)
 {
 	env->key_format = tuple_format_new(&vy_tuple_format_vtab,
-					   NULL, 0, NULL, 0, NULL);
+					   NULL, 0, NULL, 0, NULL, 0);
 	if (env->key_format == NULL)
 		return -1;
 	tuple_format_ref(env->key_format);
@@ -154,7 +154,8 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env,
 		lsm->disk_format = format;
 	} else {
 		lsm->disk_format = tuple_format_new(&vy_tuple_format_vtab,
-						    &cmp_def, 1, NULL, 0, NULL);
+						    &cmp_def, 1, NULL, 0, NULL,
+						    format->epoch);
 		if (lsm->disk_format == NULL)
 			goto fail_format;
 	}
diff --git a/test/unit/vy_iterators_helper.c b/test/unit/vy_iterators_helper.c
index 7fad560..bbb3149 100644
--- a/test/unit/vy_iterators_helper.c
+++ b/test/unit/vy_iterators_helper.c
@@ -22,7 +22,7 @@ vy_iterator_C_test_init(size_t cache_size)
 	vy_cache_env_create(&cache_env, cord_slab_cache());
 	vy_cache_env_set_quota(&cache_env, cache_size);
 	vy_key_format = tuple_format_new(&vy_tuple_format_vtab, NULL, 0,
-					 NULL, 0, NULL);
+					 NULL, 0, NULL, 0);
 	tuple_format_ref(vy_key_format);
 
 	size_t mem_size = 64 * 1024 * 1024;
@@ -202,7 +202,7 @@ create_test_mem(struct key_def *def)
 	struct key_def * const defs[] = { def };
 	struct tuple_format *format =
 		tuple_format_new(&vy_tuple_format_vtab, defs, def->part_count,
-				 NULL, 0, NULL);
+				 NULL, 0, NULL, 0);
 	fail_if(format == NULL);
 
 	/* Create mem */
@@ -220,7 +220,7 @@ create_test_cache(uint32_t *fields, uint32_t *types,
 	assert(*def != NULL);
 	vy_cache_create(cache, &cache_env, *def, true);
 	*format = tuple_format_new(&vy_tuple_format_vtab, def, 1, NULL, 0,
-				   NULL);
+				   NULL, 0);
 	tuple_format_ref(*format);
 }
 
diff --git a/test/unit/vy_mem.c b/test/unit/vy_mem.c
index ebf3fbc..325c7cc 100644
--- a/test/unit/vy_mem.c
+++ b/test/unit/vy_mem.c
@@ -78,7 +78,7 @@ test_iterator_restore_after_insertion()
 	/* Create format */
 	struct tuple_format *format = tuple_format_new(&vy_tuple_format_vtab,
 						       &key_def, 1, NULL, 0,
-						       NULL);
+						       NULL, 0);
 	assert(format != NULL);
 	tuple_format_ref(format);
 
diff --git a/test/unit/vy_point_lookup.c b/test/unit/vy_point_lookup.c
index 65dafcb..d8e3a9e 100644
--- a/test/unit/vy_point_lookup.c
+++ b/test/unit/vy_point_lookup.c
@@ -85,7 +85,7 @@ test_basic()
 	vy_cache_create(&cache, &cache_env, key_def, true);
 	struct tuple_format *format = tuple_format_new(&vy_tuple_format_vtab,
 						       &key_def, 1, NULL, 0,
-						       NULL);
+						       NULL, 0);
 	isnt(format, NULL, "tuple_format_new is not NULL");
 	tuple_format_ref(format);
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v5 9/9] box: specify indexes in user-friendly form
  2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
                   ` (7 preceding siblings ...)
  2018-11-26 10:49 ` [PATCH v5 8/9] box: introduce offset slot cache in key_part Kirill Shcherbatov
@ 2018-11-26 10:49 ` Kirill Shcherbatov
  2018-12-04 12:22   ` Vladimir Davydov
  8 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 10:49 UTC (permalink / raw)
  To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov

It is now possible to create indexes by JSON path and using
field names specified in the space format.

Closes #1012

@TarantoolBot document
Title: Indexes by JSON path
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.

Example:
s:create_index('json_index', {parts = {{'FIO["fname"]', 'str'}}})
---
 src/box/lua/index.c        | 66 ++++++++++++++++++++++++++++++++++++++++++++++
 src/box/lua/schema.lua     | 22 ++++++++--------
 test/engine/tuple.result   | 41 ++++++++++++++++++++++++++++
 test/engine/tuple.test.lua | 12 +++++++++
 4 files changed, 130 insertions(+), 11 deletions(-)

diff --git a/src/box/lua/index.c b/src/box/lua/index.c
index ef89c39..ef81ab2 100644
--- a/src/box/lua/index.c
+++ b/src/box/lua/index.c
@@ -35,6 +35,9 @@
 #include "box/info.h"
 #include "box/lua/info.h"
 #include "box/lua/tuple.h"
+#include "box/schema.h"
+#include "box/tuple_format.h"
+#include "json/json.h"
 #include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */
 
 /** {{{ box.index Lua library: access to spaces and indexes
@@ -328,6 +331,68 @@ lbox_index_compact(lua_State *L)
 	return 0;
 }
 
+/**
+ * Resolve field index by absolute JSON path first component and
+ * return relative JSON path.
+ */
+static int
+lbox_index_path_resolve(struct lua_State *L)
+{
+	if (lua_gettop(L) != 3 ||
+	    !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
+		return luaL_error(L, "Usage box.internal."
+				     "path_resolve(part_id, space_id, path)");
+	}
+	uint32_t part_id = lua_tonumber(L, 1);
+	uint32_t space_id = lua_tonumber(L, 2);
+	size_t path_len;
+	const char *path = lua_tolstring(L, 3, &path_len);
+	struct space *space = space_cache_find(space_id);
+	if (space == NULL)
+		return luaT_error(L);
+	struct json_lexer lexer;
+	struct json_token token;
+	json_lexer_create(&lexer, path, path_len);
+	int rc = json_lexer_next_token(&lexer, &token);
+	if (rc != 0) {
+		const char *err_msg =
+			tt_sprintf("options.parts[%d]: error in path on "
+				   "position %d", part_id, rc);
+		diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+		return luaT_error(L);
+	}
+	assert(space->format != NULL && space->format->dict != NULL);
+	uint32_t fieldno;
+	uint32_t field_count = tuple_format_field_count(space->format);
+	if (token.key.type == JSON_TOKEN_NUM &&
+	    (fieldno = token.key.num - TUPLE_INDEX_BASE) >= field_count) {
+		const char *err_msg =
+			tt_sprintf("options.parts[%d]: field '%d' referenced "
+				   "in path is greater than format field "
+				   "count %d", part_id,
+				   fieldno + TUPLE_INDEX_BASE, field_count);
+		diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+		return luaT_error(L);
+	} else if (token.key.type == JSON_TOKEN_STR &&
+		   tuple_fieldno_by_name(space->format->dict, token.key.str,
+					 token.key.len,
+					 field_name_hash(token.key.str,
+					 		 token.key.len),
+					 &fieldno) != 0) {
+		const char *err_msg =
+			tt_sprintf("options.parts[%d]: field was not found by "
+				   "name '%.*s'", part_id, token.key.len,
+				   token.key.str);
+		diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+		return luaT_error(L);
+	}
+	fieldno += TUPLE_INDEX_BASE;
+	path += lexer.offset;
+	lua_pushnumber(L, fieldno);
+	lua_pushstring(L, path);
+	return 2;
+}
+
 /* }}} */
 
 void
@@ -365,6 +430,7 @@ box_lua_index_init(struct lua_State *L)
 		{"truncate", lbox_truncate},
 		{"stat", lbox_index_stat},
 		{"compact", lbox_index_compact},
+		{"path_resolve", lbox_index_path_resolve},
 		{NULL, NULL}
 	};
 
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index 8a804f0..497cf19 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -575,7 +575,7 @@ local function update_index_parts_1_6_0(parts)
     return result
 end
 
-local function update_index_parts(format, parts)
+local function update_index_parts(format, parts, space_id)
     if type(parts) ~= "table" then
         box.error(box.error.ILLEGAL_PARAMS,
         "options.parts parameter should be a table")
@@ -626,16 +626,16 @@ local function update_index_parts(format, parts)
             box.error(box.error.ILLEGAL_PARAMS,
                       "options.parts[" .. i .. "]: field (name or number) is expected")
         elseif type(part.field) == 'string' then
-            for k,v in pairs(format) do
-                if v.name == part.field then
-                    part.field = k
-                    break
-                end
-            end
-            if type(part.field) == 'string' then
+            local idx, path = box.internal.path_resolve(i, space_id, part.field)
+            if part.path ~= nil and part.path ~= path then
                 box.error(box.error.ILLEGAL_PARAMS,
-                          "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
+                          "options.parts[" .. i .. "]: field path '"..
+                          part.path.." doesn't math the path '" ..
+                          part.field .. "'")
             end
+            parts_can_be_simplified = parts_can_be_simplified and path == nil
+            part.field = idx
+            part.path = path or part.path
         elseif part.field == 0 then
             box.error(box.error.ILLEGAL_PARAMS,
                       "options.parts[" .. i .. "]: field (number) must be one-based")
@@ -792,7 +792,7 @@ box.schema.index.create = function(space_id, name, options)
         end
     end
     local parts, parts_can_be_simplified =
-        update_index_parts(format, options.parts)
+        update_index_parts(format, options.parts, space_id)
     -- create_index() options contains type, parts, etc,
     -- stored separately. Remove these members from index_opts
     local index_opts = {
@@ -959,7 +959,7 @@ box.schema.index.alter = function(space_id, index_id, options)
     if options.parts then
         local parts_can_be_simplified
         parts, parts_can_be_simplified =
-            update_index_parts(format, options.parts)
+            update_index_parts(format, options.parts, space_id)
         -- save parts in old format if possible
         if parts_can_be_simplified then
             parts = simplify_index_parts(parts)
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index a07e23c..44455bf 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -1007,6 +1007,47 @@ assert(idx.parts[2].path == "FIO.fname")
 ---
 - true
 ...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+- error: Field [2]FIO.fname has type 'array' in one index, but type 'map' in another
+...
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+---
+...
+s:format(format)
+---
+...
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: error in path on position 1'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is
+    greater than format field count 5'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid'''
+...
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+---
+...
+assert(idx3 ~= nil)
+---
+- true
+...
+assert(idx3.parts[2].path == ".FIO[\"fname\"]")
+---
+- true
+...
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
+---
+...
 s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
 ---
 - error: 'Tuple field 3 type does not match one required by operation: expected map'
diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua
index 8630850..fb366c8 100644
--- a/test/engine/tuple.test.lua
+++ b/test/engine/tuple.test.lua
@@ -327,6 +327,18 @@ s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname
 idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
 assert(idx ~= nil)
 assert(idx.parts[2].path == "FIO.fname")
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}}
+s:format(format)
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+assert(idx3 ~= nil)
+assert(idx3.parts[2].path == ".FIO[\"fname\"]")
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
 s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
 s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
 s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
-- 
2.7.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-11-26 10:49 ` [PATCH v5 2/9] lib: implement JSON tree class for json library Kirill Shcherbatov
@ 2018-11-26 12:53   ` Kirill Shcherbatov
  2018-11-29 17:38     ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 12:53 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov

New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

Need for #1012
---
 src/box/tuple_format.c      |  22 +--
 src/lib/json/CMakeLists.txt |   1 +
 src/lib/json/json.c         | 284 ++++++++++++++++++++++++++++++++++--
 src/lib/json/json.h         | 196 +++++++++++++++++++++++--
 test/unit/json_path.c       | 208 +++++++++++++++++++++++++-
 test/unit/json_path.result  |  41 +++++-
 6 files changed, 714 insertions(+), 38 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 661cfdc94..d184dbae1 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -584,15 +584,16 @@ tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
 	struct json_token token;
 	json_lexer_create(&lexer, path, path_len);
 	while ((rc = json_lexer_next_token(&lexer, &token)) == 0) {
-		switch (token.type) {
+		switch (token.key.type) {
 		case JSON_TOKEN_NUM:
-			rc = tuple_field_go_to_index(data, token.num);
+			rc = tuple_field_go_to_index(data, token.key.num);
 			break;
 		case JSON_TOKEN_STR:
-			rc = tuple_field_go_to_key(data, token.str, token.len);
+			rc = tuple_field_go_to_key(data, token.key.str,
+						   token.key.len);
 			break;
 		default:
-			assert(token.type == JSON_TOKEN_END);
+			assert(token.key.type == JSON_TOKEN_END);
 			return 0;
 		}
 		if (rc != 0) {
@@ -628,9 +629,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 	int rc = json_lexer_next_token(&lexer, &token);
 	if (rc != 0)
 		goto error;
-	switch(token.type) {
+	switch(token.key.type) {
 	case JSON_TOKEN_NUM: {
-		int index = token.num;
+		int index = token.key.num;
 		if (index == 0) {
 			*field = NULL;
 			return 0;
@@ -644,7 +645,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 	case JSON_TOKEN_STR: {
 		/* First part of a path is a field name. */
 		uint32_t name_hash;
-		if (path_len == (uint32_t) token.len) {
+		if (path_len == (uint32_t) token.key.len) {
 			name_hash = path_hash;
 		} else {
 			/*
@@ -653,17 +654,18 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			 * used. A tuple dictionary hashes only
 			 * name, not path.
 			 */
-			name_hash = field_name_hash(token.str, token.len);
+			name_hash = field_name_hash(token.key.str,
+						    token.key.len);
 		}
 		*field = tuple_field_raw_by_name(format, tuple, field_map,
-						 token.str, token.len,
+						 token.key.str, token.key.len,
 						 name_hash);
 		if (*field == NULL)
 			return 0;
 		break;
 	}
 	default:
-		assert(token.type == JSON_TOKEN_END);
+		assert(token.key.type == JSON_TOKEN_END);
 		*field = NULL;
 		return 0;
 	}
diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 0f0739620..51a1f027a 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -4,3 +4,4 @@ set(lib_sources
 
 set_source_files_compile_flags(${lib_sources})
 add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index eb80e4bbc..6bc74507f 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -30,6 +30,7 @@
  */
 
 #include "json.h"
+#include "third_party/PMurHash.h"
 #include <ctype.h>
 #include <stdbool.h>
 #include <unicode/uchar.h>
@@ -111,9 +112,9 @@ json_parse_string(struct json_lexer *lexer, struct json_token *token,
 			int len = lexer->offset - str_offset - 1;
 			if (len == 0)
 				return lexer->symbol_count;
-			token->type = JSON_TOKEN_STR;
-			token->str = lexer->src + str_offset;
-			token->len = len;
+			token->key.type = JSON_TOKEN_STR;
+			token->key.str = lexer->src + str_offset;
+			token->key.len = len;
 			return 0;
 		}
 	}
@@ -146,8 +147,8 @@ json_parse_integer(struct json_lexer *lexer, struct json_token *token)
 	} while (++pos < end && isdigit((c = *pos)));
 	lexer->offset += len;
 	lexer->symbol_count += len;
-	token->type = JSON_TOKEN_NUM;
-	token->num = value;
+	token->key.type = JSON_TOKEN_NUM;
+	token->key.num = value;
 	return 0;
 }
 
@@ -192,9 +193,9 @@ json_parse_identifier(struct json_lexer *lexer, struct json_token *token)
 		}
 		last_offset = lexer->offset;
 	}
-	token->type = JSON_TOKEN_STR;
-	token->str = lexer->src + str_offset;
-	token->len = lexer->offset - str_offset;
+	token->key.type = JSON_TOKEN_STR;
+	token->key.str = lexer->src + str_offset;
+	token->key.len = lexer->offset - str_offset;
 	return 0;
 }
 
@@ -202,7 +203,7 @@ int
 json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
 {
 	if (lexer->offset == lexer->src_len) {
-		token->type = JSON_TOKEN_END;
+		token->key.type = JSON_TOKEN_END;
 		return 0;
 	}
 	UChar32 c;
@@ -241,3 +242,268 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
 		return json_parse_identifier(lexer, token);
 	}
 }
+
+/** Compare JSON tokens keys. */
+static int
+json_token_key_cmp(const struct json_token *a, const struct json_token *b)
+{
+	if (a->key.type != b->key.type)
+		return a->key.type - b->key.type;
+	int ret = 0;
+	if (a->key.type == JSON_TOKEN_STR) {
+		if (a->key.len != b->key.len)
+			return a->key.len - b->key.len;
+		ret = memcmp(a->key.str, b->key.str, a->key.len);
+	} else if (a->key.type == JSON_TOKEN_NUM) {
+		ret = a->key.num - b->key.num;
+	} else {
+		unreachable();
+	}
+	return ret;
+}
+
+/**
+ * Compare hash records of two json tree nodes. Return 0 if equal.
+ */
+static inline int
+mh_json_cmp(const struct json_token *a, const struct json_token *b)
+{
+	if (a->parent != b->parent)
+		return a->parent - b->parent;
+	return json_token_key_cmp(a, b);
+}
+
+#define MH_SOURCE 1
+#define mh_name _json
+#define mh_key_t struct json_token **
+#define mh_node_t struct json_token *
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((*a)->rolling_hash)
+#define mh_hash_key(a, arg) ((*a)->rolling_hash)
+#define mh_cmp(a, b, arg) (mh_json_cmp((*a), (*b)))
+#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
+#include "salad/mhash.h"
+
+static const uint32_t hash_seed = 13U;
+
+/** Compute the hash value of a JSON token. */
+static uint32_t
+json_token_hash(struct json_token *token, uint32_t seed)
+{
+	uint32_t h = seed;
+	uint32_t carry = 0;
+	const void *data;
+	uint32_t data_size;
+	if (token->key.type == JSON_TOKEN_STR) {
+		data = token->key.str;
+		data_size = token->key.len;
+	} else if (token->key.type == JSON_TOKEN_NUM) {
+		data = &token->key.num;
+		data_size = sizeof(token->key.num);
+	} else {
+		unreachable();
+	}
+	PMurHash32_Process(&h, &carry, data, data_size);
+	return PMurHash32_Result(h, carry, data_size);
+}
+
+int
+json_tree_create(struct json_tree *tree)
+{
+	memset(tree, 0, sizeof(struct json_tree));
+	tree->root.rolling_hash = hash_seed;
+	tree->root.key.type = JSON_TOKEN_END;
+	tree->hash = mh_json_new();
+	return tree->hash == NULL ? -1 : 0;
+}
+
+static void
+json_token_destroy(struct json_token *token)
+{
+	free(token->children);
+}
+
+void
+json_tree_destroy(struct json_tree *tree)
+{
+	assert(tree->hash != NULL);
+	json_token_destroy(&tree->root);
+	mh_json_delete(tree->hash);
+}
+
+struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+		 struct json_token *token)
+{
+	if (parent == NULL)
+		parent = &tree->root;
+	struct json_token *ret = NULL;
+	if (token->key.type == JSON_TOKEN_STR) {
+		struct json_token key = *token;
+		key.rolling_hash = json_token_hash(token, parent->rolling_hash);
+		key.parent = parent;
+		token = &key;
+		mh_int_t id = mh_json_find(tree->hash, &token, NULL);
+		if (unlikely(id == mh_end(tree->hash)))
+			return NULL;
+		struct json_token **entry = mh_json_node(tree->hash, id);
+		assert(entry == NULL || (*entry)->parent == parent);
+		return entry != NULL ? *entry : NULL;
+	} else if (token->key.type == JSON_TOKEN_NUM) {
+		uint32_t idx =  token->key.num - 1;
+		ret = idx < parent->child_size ? parent->children[idx] : NULL;
+	} else {
+		unreachable();
+	}
+	return ret;
+}
+
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token)
+{
+	if (parent == NULL)
+		parent = &tree->root;
+	uint32_t rolling_hash =
+	       json_token_hash(token, parent->rolling_hash);
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+	uint32_t insert_idx = (token->key.type == JSON_TOKEN_NUM) ?
+			      (uint32_t)token->key.num - 1 :
+			      parent->child_size;
+	if (insert_idx >= parent->child_size) {
+		uint32_t new_size =
+			parent->child_size == 0 ? 1 : 2 * parent->child_size;
+		while (insert_idx >= new_size)
+			new_size *= 2;
+		struct json_token **children =
+			realloc(parent->children, new_size*sizeof(void *));
+		if (unlikely(children == NULL))
+			return -1;
+		memset(children + parent->child_size, 0,
+		       (new_size - parent->child_size)*sizeof(void *));
+		parent->children = children;
+		parent->child_size = new_size;
+	}
+	assert(parent->children[insert_idx] == NULL);
+	parent->children[insert_idx] = token;
+	parent->child_count = MAX(parent->child_count, insert_idx + 1);
+	token->sibling_idx = insert_idx;
+	token->rolling_hash = rolling_hash;
+	token->parent = parent;
+
+	const struct json_token **key =
+		(const struct json_token **)&token;
+	mh_int_t rc = mh_json_put(tree->hash, key, NULL, NULL);
+	if (unlikely(rc == mh_end(tree->hash))) {
+		parent->children[insert_idx] = NULL;
+		return -1;
+	}
+	assert(json_tree_lookup(tree, parent, token) == token);
+	return 0;
+}
+
+void
+json_tree_del(struct json_tree *tree, struct json_token *token)
+{
+	struct json_token *parent = token->parent;
+	assert(json_tree_lookup(tree, parent, token) == token);
+	struct json_token **child_slot = NULL;
+	if (token->key.type == JSON_TOKEN_NUM) {
+		child_slot = &parent->children[token->key.num - 1];
+	} else {
+		uint32_t idx = 0;
+		while (idx < parent->child_size &&
+		       parent->children[idx] != token) { idx++; }
+		if (idx < parent->child_size &&
+		       parent->children[idx] == token)
+			child_slot = &parent->children[idx];
+	}
+	assert(child_slot != NULL && *child_slot == token);
+	*child_slot = NULL;
+
+	mh_int_t id = mh_json_find(tree->hash, &token, NULL);
+	assert(id != mh_end(tree->hash));
+	mh_json_del(tree->hash, id, NULL);
+	json_token_destroy(token);
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+}
+
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token token;
+	struct json_token *ret = parent != NULL ? parent : &tree->root;
+	json_lexer_create(&lexer, path, path_len);
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
+	       token.key.type != JSON_TOKEN_END && ret != NULL) {
+		ret = json_tree_lookup(tree, ret, &token);
+	}
+	if (rc != 0 || token.key.type != JSON_TOKEN_END)
+		return NULL;
+	return ret;
+}
+
+static struct json_token *
+json_tree_child_next(struct json_token *parent, struct json_token *pos)
+{
+	assert(pos == NULL || pos->parent == parent);
+	struct json_token **arr = parent->children;
+	uint32_t arr_size = parent->child_size;
+	if (arr == NULL)
+		return NULL;
+	uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
+	while (idx < arr_size && arr[idx] == NULL)
+		idx++;
+	if (idx >= arr_size)
+		return NULL;
+	return arr[idx];
+}
+
+static struct json_token *
+json_tree_leftmost(struct json_token *pos)
+{
+	struct json_token *last;
+	do {
+		last = pos;
+		pos = json_tree_child_next(pos, NULL);
+	} while (pos != NULL);
+	return last;
+}
+
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos)
+{
+	if (pos == NULL)
+		pos = root;
+	struct json_token *next = json_tree_child_next(pos, NULL);
+	if (next != NULL)
+		return next;
+	while (pos != root) {
+		next = json_tree_child_next(pos->parent, pos);
+		if (next != NULL)
+			return next;
+		pos = pos->parent;
+	}
+	return NULL;
+}
+
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos)
+{
+	struct json_token *next;
+	if (pos == NULL) {
+		next = json_tree_leftmost(root);
+		return next != root ? next : NULL;
+	}
+	if (pos == root)
+		return NULL;
+	next = json_tree_child_next(pos->parent, pos);
+	if (next != NULL) {
+		next = json_tree_leftmost(next);
+		return next != root ? next : NULL;
+	}
+	return pos->parent != root ? pos->parent : NULL;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index ead446878..2bc159ff8 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -61,20 +61,53 @@ enum json_token_type {
 /**
  * Element of a JSON path. It can be either string or number.
  * String idenfiers are in ["..."] and between dots. Numbers are
- * indexes in [...].
+ * indexes in [...]. May be organized in a JSON tree.
  */
 struct json_token {
-	enum json_token_type type;
-	union {
-		struct {
-			/** String identifier. */
-			const char *str;
-			/** Length of @a str. */
-			int len;
+	struct {
+		enum json_token_type type;
+		union {
+			struct {
+				/** String identifier. */
+				const char *str;
+				/** Length of @a str. */
+				int len;
+			};
+			/** Index value. */
+			uint64_t num;
 		};
-		/** Index value. */
-		uint64_t num;
-	};
+	} key;
+	/** Rolling hash for node used to lookup in json_tree. */
+	uint32_t rolling_hash;
+	/**
+	 * Array of child records. Indexes in this array
+	 * follows array indexe-1 for JSON_TOKEN_NUM token type
+	 * and are allocated sequently for JSON_TOKEN_NUM child
+	 * tokens. NULLs initializations are performed with new
+	 * entries allocation.
+	 */
+	struct json_token **children;
+	/** Allocation size of children array. */
+	uint32_t child_size;
+	/**
+	 * Count of defined children array items. Equal the
+	 * maximum index of inserted item.
+	 */
+	uint32_t child_count;
+	/** Index of node in parent children array. */
+	uint32_t sibling_idx;
+	/** Pointer to parent node. */
+	struct json_token *parent;
+};
+
+struct mh_json_t;
+
+/** JSON tree object to manage tokens relations. */
+struct json_tree {
+	/** JSON tree root node. */
+	struct json_token root;
+	/** Hashtable af all tree nodes. */
+	struct mh_json_t *hash;
 };
 
 /**
@@ -104,6 +137,147 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
 int
 json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
+/** Create a JSON tree object to manage data relations. */
+int
+json_tree_create(struct json_tree *tree);
+
+/**
+ * Destroy JSON tree object. This routine doesn't destroy attached
+ * subtree so it should be called at the end of manual destroy.
+ */
+void
+json_tree_destroy(struct json_tree *tree);
+
+/**
+ * Make child lookup in JSON tree by token at position specified
+ * with parent. The parent may be set NULL to use tree root
+ * record.
+ */
+struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+		 struct json_token *token);
+
+/**
+ * Append token to the given parent position in a JSON tree. The
+ * parent mustn't have a child with such content. The parent may
+ * be set NULL to use tree root record.
+ */
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token);
+
+/**
+ * Delete a JSON tree token at the given parent position in JSON
+ * tree. Token entry shouldn't have subtree.
+ */
+void
+json_tree_del(struct json_tree *tree, struct json_token *token);
+
+/** Make child lookup by path in JSON tree. */
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len);
+
+/** Make pre order traversal in JSON tree. */
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos);
+
+/** Make post order traversal in JSON tree. */
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos);
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * May be used for destructors.
+ */
+#define json_tree_foreach_safe(node, root)				     \
+for (struct json_token *__next = json_tree_postorder_next((root), NULL);     \
+     (((node) = __next) &&						     \
+     (__next = json_tree_postorder_next((root), (node))), (node) != NULL);)
+
+#ifndef typeof
+/* TODO: 'typeof' is a GNU extension */
+#define typeof __typeof__
+#endif
+
+/** Return container entry by json_tree_node node. */
+#define json_tree_entry(node, type, member) ({ 				     \
+	const typeof( ((type *)0)->member ) *__mptr = (node);		     \
+	(type *)( (char *)__mptr - ((size_t) &((type *)0)->member) );	     \
+})
+
+/**
+ * Return container entry by json_tree_node or NULL if
+ * node is NULL.
+ */
+#define json_tree_entry_safe(node, type, member) ({			     \
+	(node) != NULL ? json_tree_entry((node), type, member) : NULL;	     \
+})
+
+/** Make entry pre order traversal in JSON tree  */
+#define json_tree_preorder_next_entry(node, root, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_preorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/** Make entry post order traversal in JSON tree  */
+#define json_tree_postorder_next_entry(node, root, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_postorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/** Make lookup in tree by path and return entry. */
+#define json_tree_lookup_path_entry(tree, parent, path, path_len, type,	     \
+				    member)				     \
+({struct json_token *__node =						     \
+	json_tree_lookup_path((tree), (parent), path, path_len);	     \
+	json_tree_entry_safe(__node, type, member); })
+
+/** Make lookup in tree by token and return entry. */
+#define json_tree_lookup_entry(tree, parent, token, type, member)	     \
+({struct json_token *__node =						     \
+	json_tree_lookup((tree), (parent), token);			     \
+	json_tree_entry_safe(__node, type, member);			     \
+})
+
+/** Make pre-order traversal in JSON tree. */
+#define json_tree_foreach_preorder(node, root)				     \
+for ((node) = json_tree_preorder_next((root), NULL); (node) != NULL;	     \
+     (node) = json_tree_preorder_next((root), (node)))
+
+/** Make post-order traversal in JSON tree. */
+#define json_tree_foreach_postorder(node, root)				     \
+for ((node) = json_tree_postorder_next((root), NULL); (node) != NULL;	     \
+     (node) = json_tree_postorder_next((root), (node)))
+
+/** Make post-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_preorder(node, root, type, member)	     \
+for ((node) = json_tree_preorder_next_entry(NULL, (root), type, member);     \
+     (node) != NULL;							     \
+     (node) = json_tree_preorder_next_entry(&(node)->member, (root), type,   \
+					    member))
+
+/** Make pre-order traversal in JSON tree and return entry. */
+#define json_tree_foreach_entry_postorder(node, root, type, member)	     \
+for ((node) = json_tree_postorder_next_entry(NULL, (root), type, member);    \
+     (node) != NULL;							     \
+     (node) = json_tree_postorder_next_entry(&(node)->member, (root), type,  \
+					     member))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ */
+#define json_tree_foreach_entry_safe(node, root, type, member)		     \
+for (type *__next = json_tree_postorder_next_entry(NULL, (root), type,	     \
+						   member);		     \
+     (((node) = __next) &&						     \
+     (__next = json_tree_postorder_next_entry(&(node)->member, (root), type, \
+					      member)),			     \
+     (node) != NULL);)
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index a5f90ad98..f6b0472f4 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -2,6 +2,7 @@
 #include "unit.h"
 #include "trivia/util.h"
 #include <string.h>
+#include <stdbool.h>
 
 #define reset_to_new_path(value) \
 	path = value; \
@@ -12,15 +13,15 @@
 	path = lexer.src + lexer.offset; \
 	is(json_lexer_next_token(&lexer, &token), 0, "parse <%." #value_len "s>", \
 	   path); \
-	is(token.type, JSON_TOKEN_NUM, "<%." #value_len "s> is num", path); \
-	is(token.num, value, "<%." #value_len "s> is " #value, path);
+	is(token.key.type, JSON_TOKEN_NUM, "<%." #value_len "s> is num", path); \
+	is(token.key.num, value, "<%." #value_len "s> is " #value, path);
 
 #define is_next_key(value) \
 	len = strlen(value); \
 	is(json_lexer_next_token(&lexer, &token), 0, "parse <" value ">"); \
-	is(token.type, JSON_TOKEN_STR, "<" value "> is str"); \
-	is(token.len, len, "len is %d", len); \
-	is(strncmp(token.str, value, len), 0, "str is " value);
+	is(token.key.type, JSON_TOKEN_STR, "<" value "> is str"); \
+	is(token.key.len, len, "len is %d", len); \
+	is(strncmp(token.key.str, value, len), 0, "str is " value);
 
 void
 test_basic()
@@ -62,7 +63,7 @@ test_basic()
 	/* Empty path. */
 	reset_to_new_path("");
 	is(json_lexer_next_token(&lexer, &token), 0, "parse empty path");
-	is(token.type, JSON_TOKEN_END, "is str");
+	is(token.key.type, JSON_TOKEN_END, "is str");
 
 	/* Path with no '.' at the beginning. */
 	reset_to_new_path("field1.field2");
@@ -159,14 +160,207 @@ test_errors()
 	footer();
 }
 
+struct test_struct {
+	int value;
+	struct json_token node;
+};
+
+struct test_struct *
+test_struct_alloc(struct test_struct *records_pool, int *pool_idx)
+{
+	struct test_struct *ret = &records_pool[*pool_idx];
+	*pool_idx = *pool_idx + 1;
+	memset(&ret->node, 0, sizeof(ret->node));
+	return ret;
+}
+
+struct test_struct *
+test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
+	      struct test_struct *records_pool, int *pool_idx)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token *parent = NULL;
+	json_lexer_create(&lexer, path, path_len);
+	struct test_struct *field = test_struct_alloc(records_pool, pool_idx);
+	while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 &&
+		field->node.key.type != JSON_TOKEN_END) {
+		struct json_token *next =
+			json_tree_lookup(tree, parent, &field->node);
+		if (next == NULL) {
+			rc = json_tree_add(tree, parent, &field->node);
+			fail_if(rc != 0);
+			next = &field->node;
+			field = test_struct_alloc(records_pool, pool_idx);
+		}
+		parent = next;
+	}
+	fail_if(rc != 0 || field->node.key.type != JSON_TOKEN_END);
+	*pool_idx = *pool_idx - 1;
+	/* release field */
+	return json_tree_entry(parent, struct test_struct, node);
+}
+
+void
+test_tree()
+{
+	header();
+	plan(35);
+
+	struct json_tree tree;
+	int rc = json_tree_create(&tree);
+	fail_if(rc != 0);
+
+	struct test_struct records[6];
+	for (int i = 0; i < 6; i++)
+		records[i].value = i;
+
+	const char *path1 = "[1][10]";
+	const char *path2 = "[1][20].file";
+	const char *path_unregistered = "[1][3]";
+
+	int records_idx = 1;
+	struct test_struct *node;
+	node = test_add_path(&tree, path1, strlen(path1), records,
+			     &records_idx);
+	is(node, &records[2], "add path '%s'", path1);
+
+	node = test_add_path(&tree, path2, strlen(path2), records,
+			     &records_idx);
+	is(node, &records[4], "add path '%s'", path2);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1),
+					   struct test_struct, node);
+	is(node, &records[2], "lookup path '%s'", path1);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+					   struct test_struct, node);
+	is(node, &records[4], "lookup path '%s'", path2);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered,
+					   strlen(path_unregistered),
+					   struct test_struct, node);
+	is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+	/* Test iterators. */
+	struct json_token *token = NULL;
+	const struct json_token *tokens_preorder[] =
+		{&records[1].node, &records[2].node,
+		 &records[3].node, &records[4].node};
+	int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]);
+	int idx = 0;
+
+	json_tree_foreach_preorder(token, &tree.root) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(token, tokens_preorder[idx],
+		   "test foreach pre order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	const struct json_token *tree_nodes_postorder[] =
+		{&records[2].node, &records[4].node,
+		 &records[3].node, &records[1].node};
+	cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+	idx = 0;
+	json_tree_foreach_postorder(token, &tree.root) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach post order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_safe(token, &tree.root) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach safe order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_preorder(node, &tree.root, struct test_struct,
+					 node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(&node->node, tokens_preorder[idx],
+		   "test foreach entry pre order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_postorder(node, &tree.root, struct test_struct,
+					  node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder[idx],
+		   "test foreach entry post order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_safe(node, &tree.root, struct test_struct,
+				     node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder[idx],
+		   "test foreach entry safe order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		json_tree_del(&tree, &node->node);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+	json_tree_destroy(&tree);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(2);
+	plan(3);
 
 	test_basic();
 	test_errors();
+	test_tree();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f829f..df682109c 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..2
+1..3
 	*** test_basic ***
     1..71
     ok 1 - parse <[0]>
@@ -99,4 +99,43 @@ ok 1 - subtests
     ok 20 - tab inside identifier
 ok 2 - subtests
 	*** test_errors: done ***
+	*** test_tree ***
+    1..35
+    ok 1 - add path '[1][10]'
+    ok 2 - add path '[1][20].file'
+    ok 3 - lookup path '[1][10]'
+    ok 4 - lookup path '[1][20].file'
+    ok 5 - lookup unregistered path '[1][3]'
+    ok 6 - test foreach pre order 0: have 1 expected of 1
+    ok 7 - test foreach pre order 1: have 2 expected of 2
+    ok 8 - test foreach pre order 2: have 3 expected of 3
+    ok 9 - test foreach pre order 3: have 4 expected of 4
+    ok 10 - records iterated count 4 of 4
+    ok 11 - test foreach post order 0: have 2 expected of 2
+    ok 12 - test foreach post order 1: have 4 expected of 4
+    ok 13 - test foreach post order 2: have 3 expected of 3
+    ok 14 - test foreach post order 3: have 1 expected of 1
+    ok 15 - records iterated count 4 of 4
+    ok 16 - test foreach safe order 0: have 2 expected of 2
+    ok 17 - test foreach safe order 1: have 4 expected of 4
+    ok 18 - test foreach safe order 2: have 3 expected of 3
+    ok 19 - test foreach safe order 3: have 1 expected of 1
+    ok 20 - records iterated count 4 of 4
+    ok 21 - test foreach entry pre order 0: have 1 expected of 1
+    ok 22 - test foreach entry pre order 1: have 2 expected of 2
+    ok 23 - test foreach entry pre order 2: have 3 expected of 3
+    ok 24 - test foreach entry pre order 3: have 4 expected of 4
+    ok 25 - records iterated count 4 of 4
+    ok 26 - test foreach entry post order 0: have 2 expected of 2
+    ok 27 - test foreach entry post order 1: have 4 expected of 4
+    ok 28 - test foreach entry post order 2: have 3 expected of 3
+    ok 29 - test foreach entry post order 3: have 1 expected of 1
+    ok 30 - records iterated count 4 of 4
+    ok 31 - test foreach entry safe order 0: have 2 expected of 2
+    ok 32 - test foreach entry safe order 1: have 4 expected of 4
+    ok 33 - test foreach entry safe order 2: have 3 expected of 3
+    ok 34 - test foreach entry safe order 3: have 1 expected of 1
+    ok 35 - records iterated count 4 of 4
+ok 3 - subtests
+	*** test_tree: done ***
 	*** main: done ***
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tarantool-patches] [PATCH v5 1/9] box: refactor json_path_parser class
  2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
@ 2018-11-26 12:53   ` Kirill Shcherbatov
  2018-11-29 15:39     ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-11-26 12:53 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov

Renamed object json_path_node to json_token and
json_path_parser class to json_lexer.

Need for #1012
---
 src/box/lua/tuple.c             |   2 +-
 src/box/tuple_format.c          |  53 +++++------
 src/lib/json/CMakeLists.txt     |   2 +-
 src/lib/json/{path.c => json.c} | 153 ++++++++++++++++----------------
 src/lib/json/{path.h => json.h} |  55 ++++++------
 test/unit/json_path.c           |  56 ++++++------
 6 files changed, 160 insertions(+), 161 deletions(-)
 rename src/lib/json/{path.c => json.c} (55%)
 rename src/lib/json/{path.h => json.h} (70%)

diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c
index 65660ce7a..cbe71da18 100644
--- a/src/box/lua/tuple.c
+++ b/src/box/lua/tuple.c
@@ -41,7 +41,7 @@
 #include "box/tuple.h"
 #include "box/tuple_convert.h"
 #include "box/errcode.h"
-#include "json/path.h"
+#include "json/json.h"
 #include "mpstream.h"
 
 /** {{{ box.tuple Lua library
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 5a2481fd6..661cfdc94 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -28,7 +28,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
-#include "json/path.h"
+#include "json/json.h"
 #include "tuple_format.h"
 #include "coll_id_cache.h"
 
@@ -580,19 +580,19 @@ static int
 tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
 {
 	int rc;
-	struct json_path_parser parser;
-	struct json_path_node node;
-	json_path_parser_create(&parser, path, path_len);
-	while ((rc = json_path_next(&parser, &node)) == 0) {
-		switch (node.type) {
-		case JSON_PATH_NUM:
-			rc = tuple_field_go_to_index(data, node.num);
+	struct json_lexer lexer;
+	struct json_token token;
+	json_lexer_create(&lexer, path, path_len);
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0) {
+		switch (token.type) {
+		case JSON_TOKEN_NUM:
+			rc = tuple_field_go_to_index(data, token.num);
 			break;
-		case JSON_PATH_STR:
-			rc = tuple_field_go_to_key(data, node.str, node.len);
+		case JSON_TOKEN_STR:
+			rc = tuple_field_go_to_key(data, token.str, token.len);
 			break;
 		default:
-			assert(node.type == JSON_PATH_END);
+			assert(token.type == JSON_TOKEN_END);
 			return 0;
 		}
 		if (rc != 0) {
@@ -622,15 +622,15 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 		*field = tuple_field_raw(format, tuple, field_map, fieldno);
 		return 0;
 	}
-	struct json_path_parser parser;
-	struct json_path_node node;
-	json_path_parser_create(&parser, path, path_len);
-	int rc = json_path_next(&parser, &node);
+	struct json_lexer lexer;
+	struct json_token token;
+	json_lexer_create(&lexer, path, path_len);
+	int rc = json_lexer_next_token(&lexer, &token);
 	if (rc != 0)
 		goto error;
-	switch(node.type) {
-	case JSON_PATH_NUM: {
-		int index = node.num;
+	switch(token.type) {
+	case JSON_TOKEN_NUM: {
+		int index = token.num;
 		if (index == 0) {
 			*field = NULL;
 			return 0;
@@ -641,10 +641,10 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			return 0;
 		break;
 	}
-	case JSON_PATH_STR: {
+	case JSON_TOKEN_STR: {
 		/* First part of a path is a field name. */
 		uint32_t name_hash;
-		if (path_len == (uint32_t) node.len) {
+		if (path_len == (uint32_t) token.len) {
 			name_hash = path_hash;
 		} else {
 			/*
@@ -653,25 +653,26 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 			 * used. A tuple dictionary hashes only
 			 * name, not path.
 			 */
-			name_hash = field_name_hash(node.str, node.len);
+			name_hash = field_name_hash(token.str, token.len);
 		}
 		*field = tuple_field_raw_by_name(format, tuple, field_map,
-						 node.str, node.len, name_hash);
+						 token.str, token.len,
+						 name_hash);
 		if (*field == NULL)
 			return 0;
 		break;
 	}
 	default:
-		assert(node.type == JSON_PATH_END);
+		assert(token.type == JSON_TOKEN_END);
 		*field = NULL;
 		return 0;
 	}
-	rc = tuple_field_go_to_path(field, path + parser.offset,
-				    path_len - parser.offset);
+	rc = tuple_field_go_to_path(field, path + lexer.offset,
+				    path_len - lexer.offset);
 	if (rc == 0)
 		return 0;
 	/* Setup absolute error position. */
-	rc += parser.offset;
+	rc += lexer.offset;
 
 error:
 	assert(rc > 0);
diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 203fe6f42..0f0739620 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -1,5 +1,5 @@
 set(lib_sources
-    path.c
+    json.c
 )
 
 set_source_files_compile_flags(${lib_sources})
diff --git a/src/lib/json/path.c b/src/lib/json/json.c
similarity index 55%
rename from src/lib/json/path.c
rename to src/lib/json/json.c
index 2e72930a6..eb80e4bbc 100644
--- a/src/lib/json/path.c
+++ b/src/lib/json/json.c
@@ -29,7 +29,7 @@
  * SUCH DAMAGE.
  */
 
-#include "path.h"
+#include "json.h"
 #include <ctype.h>
 #include <stdbool.h>
 #include <unicode/uchar.h>
@@ -38,82 +38,82 @@
 
 /**
  * Read a single symbol from a string starting from an offset.
- * @param parser JSON path parser.
+ * @param lexer JSON path lexer.
  * @param[out] UChar32 Read symbol.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_read_symbol(struct json_path_parser *parser, UChar32 *out)
+json_read_symbol(struct json_lexer *lexer, UChar32 *out)
 {
-	if (parser->offset == parser->src_len) {
+	if (lexer->offset == lexer->src_len) {
 		*out = U_SENTINEL;
-		return parser->symbol_count + 1;
+		return lexer->symbol_count + 1;
 	}
-	U8_NEXT(parser->src, parser->offset, parser->src_len, *out);
+	U8_NEXT(lexer->src, lexer->offset, lexer->src_len, *out);
 	if (*out == U_SENTINEL)
-		return parser->symbol_count + 1;
-	++parser->symbol_count;
+		return lexer->symbol_count + 1;
+	++lexer->symbol_count;
 	return 0;
 }
 
 /**
  * Rollback one symbol offset.
- * @param parser JSON path parser.
+ * @param lexer JSON path lexer.
  * @param offset Offset to the previous symbol.
  */
 static inline void
-json_revert_symbol(struct json_path_parser *parser, int offset)
+json_revert_symbol(struct json_lexer *lexer, int offset)
 {
-	parser->offset = offset;
-	--parser->symbol_count;
+	lexer->offset = offset;
+	--lexer->symbol_count;
 }
 
 /** Fast forward when it is known that a symbol is 1-byte char. */
 static inline void
-json_skip_char(struct json_path_parser *parser)
+json_skip_char(struct json_lexer *lexer)
 {
-	++parser->offset;
-	++parser->symbol_count;
+	++lexer->offset;
+	++lexer->symbol_count;
 }
 
 /** Get a current symbol as a 1-byte char. */
 static inline char
-json_current_char(const struct json_path_parser *parser)
+json_current_char(const struct json_lexer *lexer)
 {
-	return *(parser->src + parser->offset);
+	return *(lexer->src + lexer->offset);
 }
 
 /**
- * Parse string identifier in quotes. Parser either stops right
+ * Parse string identifier in quotes. Lexer either stops right
  * after the closing quote, or returns an error position.
- * @param parser JSON path parser.
- * @param[out] node JSON node to store result.
+ * @param lexer JSON path lexer.
+ * @param[out] token JSON token to store result.
  * @param quote_type Quote by that a string must be terminated.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
+json_parse_string(struct json_lexer *lexer, struct json_token *token,
 		  UChar32 quote_type)
 {
-	assert(parser->offset < parser->src_len);
-	assert(quote_type == json_current_char(parser));
+	assert(lexer->offset < lexer->src_len);
+	assert(quote_type == json_current_char(lexer));
 	/* The first symbol is always char  - ' or ". */
-	json_skip_char(parser);
-	int str_offset = parser->offset;
+	json_skip_char(lexer);
+	int str_offset = lexer->offset;
 	UChar32 c;
 	int rc;
-	while ((rc = json_read_symbol(parser, &c)) == 0) {
+	while ((rc = json_read_symbol(lexer, &c)) == 0) {
 		if (c == quote_type) {
-			int len = parser->offset - str_offset - 1;
+			int len = lexer->offset - str_offset - 1;
 			if (len == 0)
-				return parser->symbol_count;
-			node->type = JSON_PATH_STR;
-			node->str = parser->src + str_offset;
-			node->len = len;
+				return lexer->symbol_count;
+			token->type = JSON_TOKEN_STR;
+			token->str = lexer->src + str_offset;
+			token->len = len;
 			return 0;
 		}
 	}
@@ -122,32 +122,32 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node,
 
 /**
  * Parse digit sequence into integer until non-digit is met.
- * Parser stops right after the last digit.
- * @param parser JSON parser.
- * @param[out] node JSON node to store result.
+ * Lexer stops right after the last digit.
+ * @param lexer JSON lexer.
+ * @param[out] token JSON token to store result.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_parse_integer(struct json_path_parser *parser, struct json_path_node *node)
+json_parse_integer(struct json_lexer *lexer, struct json_token *token)
 {
-	const char *end = parser->src + parser->src_len;
-	const char *pos = parser->src + parser->offset;
+	const char *end = lexer->src + lexer->src_len;
+	const char *pos = lexer->src + lexer->offset;
 	assert(pos < end);
 	int len = 0;
 	uint64_t value = 0;
 	char c = *pos;
 	if (! isdigit(c))
-		return parser->symbol_count + 1;
+		return lexer->symbol_count + 1;
 	do {
 		value = value * 10 + c - (int)'0';
 		++len;
 	} while (++pos < end && isdigit((c = *pos)));
-	parser->offset += len;
-	parser->symbol_count += len;
-	node->type = JSON_PATH_NUM;
-	node->num = value;
+	lexer->offset += len;
+	lexer->symbol_count += len;
+	token->type = JSON_TOKEN_NUM;
+	token->num = value;
 	return 0;
 }
 
@@ -164,81 +164,80 @@ json_is_valid_identifier_symbol(UChar32 c)
 /**
  * Parse identifier out of quotes. It can contain only alphas,
  * digits and underscores. And can not contain digit at the first
- * position. Parser is stoped right after the last non-digit,
+ * position. Lexer is stoped right after the last non-digit,
  * non-alpha and non-underscore symbol.
- * @param parser JSON parser.
- * @param[out] node JSON node to store result.
+ * @param lexer JSON lexer.
+ * @param[out] token JSON token to store result.
  *
  * @retval   0 Success.
  * @retval > 0 1-based position of a syntax error.
  */
 static inline int
-json_parse_identifier(struct json_path_parser *parser,
-		      struct json_path_node *node)
+json_parse_identifier(struct json_lexer *lexer, struct json_token *token)
 {
-	assert(parser->offset < parser->src_len);
-	int str_offset = parser->offset;
+	assert(lexer->offset < lexer->src_len);
+	int str_offset = lexer->offset;
 	UChar32 c;
-	int rc = json_read_symbol(parser, &c);
+	int rc = json_read_symbol(lexer, &c);
 	if (rc != 0)
 		return rc;
 	/* First symbol can not be digit. */
 	if (!u_isalpha(c) && c != (UChar32)'_')
-		return parser->symbol_count;
-	int last_offset = parser->offset;
-	while ((rc = json_read_symbol(parser, &c)) == 0) {
+		return lexer->symbol_count;
+	int last_offset = lexer->offset;
+	while ((rc = json_read_symbol(lexer, &c)) == 0) {
 		if (! json_is_valid_identifier_symbol(c)) {
-			json_revert_symbol(parser, last_offset);
+			json_revert_symbol(lexer, last_offset);
 			break;
 		}
-		last_offset = parser->offset;
+		last_offset = lexer->offset;
 	}
-	node->type = JSON_PATH_STR;
-	node->str = parser->src + str_offset;
-	node->len = parser->offset - str_offset;
+	token->type = JSON_TOKEN_STR;
+	token->str = lexer->src + str_offset;
+	token->len = lexer->offset - str_offset;
 	return 0;
 }
 
 int
-json_path_next(struct json_path_parser *parser, struct json_path_node *node)
+json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
 {
-	if (parser->offset == parser->src_len) {
-		node->type = JSON_PATH_END;
+	if (lexer->offset == lexer->src_len) {
+		token->type = JSON_TOKEN_END;
 		return 0;
 	}
 	UChar32 c;
-	int last_offset = parser->offset;
-	int rc = json_read_symbol(parser, &c);
+	int last_offset = lexer->offset;
+	int rc = json_read_symbol(lexer, &c);
 	if (rc != 0)
 		return rc;
 	switch(c) {
 	case (UChar32)'[':
 		/* Error for '[\0'. */
-		if (parser->offset == parser->src_len)
-			return parser->symbol_count;
-		c = json_current_char(parser);
+		if (lexer->offset == lexer->src_len)
+			return lexer->symbol_count;
+		c = json_current_char(lexer);
 		if (c == '"' || c == '\'')
-			rc = json_parse_string(parser, node, c);
+			rc = json_parse_string(lexer, token, c);
 		else
-			rc = json_parse_integer(parser, node);
+			rc = json_parse_integer(lexer, token);
 		if (rc != 0)
 			return rc;
 		/*
 		 * Expression, started from [ must be finished
 		 * with ] regardless of its type.
 		 */
-		if (parser->offset == parser->src_len ||
-		    json_current_char(parser) != ']')
-			return parser->symbol_count + 1;
+		if (lexer->offset == lexer->src_len ||
+		    json_current_char(lexer) != ']')
+			return lexer->symbol_count + 1;
 		/* Skip ] - one byte char. */
-		json_skip_char(parser);
+		json_skip_char(lexer);
 		return 0;
 	case (UChar32)'.':
-		if (parser->offset == parser->src_len)
-			return parser->symbol_count + 1;
-		return json_parse_identifier(parser, node);
+		if (lexer->offset == lexer->src_len)
+			return lexer->symbol_count + 1;
+		return json_parse_identifier(lexer, token);
 	default:
-		json_revert_symbol(parser, last_offset);
-		return json_parse_identifier(parser, node);
+		json_revert_symbol(lexer, last_offset);
+		return json_parse_identifier(lexer, token);
 	}
 }
diff --git a/src/lib/json/path.h b/src/lib/json/json.h
similarity index 70%
rename from src/lib/json/path.h
rename to src/lib/json/json.h
index c3c381a14..ead446878 100644
--- a/src/lib/json/path.h
+++ b/src/lib/json/json.h
@@ -1,5 +1,5 @@
-#ifndef TARANTOOL_JSON_PATH_H_INCLUDED
-#define TARANTOOL_JSON_PATH_H_INCLUDED
+#ifndef TARANTOOL_JSON_JSON_H_INCLUDED
+#define TARANTOOL_JSON_JSON_H_INCLUDED
 /*
  * Copyright 2010-2018 Tarantool AUTHORS: please see AUTHORS file.
  *
@@ -37,25 +37,25 @@ extern "C" {
 #endif
 
 /**
- * Parser for JSON paths:
+ * Lexer for JSON paths:
  * <field>, <.field>, <[123]>, <['field']> and their combinations.
  */
-struct json_path_parser {
+struct json_lexer {
 	/** Source string. */
 	const char *src;
 	/** Length of string. */
 	int src_len;
-	/** Current parser's offset in bytes. */
+	/** Current lexer's offset in bytes. */
 	int offset;
-	/** Current parser's offset in symbols. */
+	/** Current lexer's offset in symbols. */
 	int symbol_count;
 };
 
-enum json_path_type {
-	JSON_PATH_NUM,
-	JSON_PATH_STR,
-	/** Parser reached end of path. */
-	JSON_PATH_END,
+enum json_token_type {
+	JSON_TOKEN_NUM,
+	JSON_TOKEN_STR,
+	/** Lexer reached end of path. */
+	JSON_TOKEN_END,
 };
 
 /**
@@ -63,8 +63,8 @@ enum json_path_type {
  * String idenfiers are in ["..."] and between dots. Numbers are
  * indexes in [...].
  */
-struct json_path_node {
-	enum json_path_type type;
+struct json_token {
+	enum json_token_type type;
 	union {
 		struct {
 			/** String identifier. */
@@ -78,35 +78,34 @@ struct json_path_node {
 };
 
 /**
- * Create @a parser.
- * @param[out] parser Parser to create.
+ * Create @a lexer.
+ * @param[out] lexer Lexer to create.
  * @param src Source string.
  * @param src_len Length of @a src.
  */
 static inline void
-json_path_parser_create(struct json_path_parser *parser, const char *src,
-                        int src_len)
+json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
 {
-	parser->src = src;
-	parser->src_len = src_len;
-	parser->offset = 0;
-	parser->symbol_count = 0;
+	lexer->src = src;
+	lexer->src_len = src_len;
+	lexer->offset = 0;
+	lexer->symbol_count = 0;
 }
 
 /**
- * Get a next path node.
- * @param parser Parser.
- * @param[out] node Node to store parsed result.
- * @retval   0 Success. For result see @a node.str, node.len,
- *             node.num.
+ * Get a next path token.
+ * @param lexer Lexer.
+ * @param[out] token Token to store parsed result.
+ * @retval   0 Success. For result see @a token.str, token.len,
+ *             token.num.
  * @retval > 0 Position of a syntax error. A position is 1-based
  *             and starts from a beginning of a source string.
  */
 int
-json_path_next(struct json_path_parser *parser, struct json_path_node *node);
+json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
 #ifdef __cplusplus
 }
 #endif
 
-#endif /* TARANTOOL_JSON_PATH_H_INCLUDED */
+#endif /* TARANTOOL_JSON_JSON_H_INCLUDED */
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 1d7e7d372..a5f90ad98 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -1,4 +1,4 @@
-#include "json/path.h"
+#include "json/json.h"
 #include "unit.h"
 #include "trivia/util.h"
 #include <string.h>
@@ -6,21 +6,21 @@
 #define reset_to_new_path(value) \
 	path = value; \
 	len = strlen(value); \
-	json_path_parser_create(&parser, path, len);
+	json_lexer_create(&lexer, path, len);
 
 #define is_next_index(value_len, value) \
-	path = parser.src + parser.offset; \
-	is(json_path_next(&parser, &node), 0, "parse <%." #value_len "s>", \
+	path = lexer.src + lexer.offset; \
+	is(json_lexer_next_token(&lexer, &token), 0, "parse <%." #value_len "s>", \
 	   path); \
-	is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", path); \
-	is(node.num, value, "<%." #value_len "s> is " #value, path);
+	is(token.type, JSON_TOKEN_NUM, "<%." #value_len "s> is num", path); \
+	is(token.num, value, "<%." #value_len "s> is " #value, path);
 
 #define is_next_key(value) \
 	len = strlen(value); \
-	is(json_path_next(&parser, &node), 0, "parse <" value ">"); \
-	is(node.type, JSON_PATH_STR, "<" value "> is str"); \
-	is(node.len, len, "len is %d", len); \
-	is(strncmp(node.str, value, len), 0, "str is " value);
+	is(json_lexer_next_token(&lexer, &token), 0, "parse <" value ">"); \
+	is(token.type, JSON_TOKEN_STR, "<" value "> is str"); \
+	is(token.len, len, "len is %d", len); \
+	is(strncmp(token.str, value, len), 0, "str is " value);
 
 void
 test_basic()
@@ -29,8 +29,8 @@ test_basic()
 	plan(71);
 	const char *path;
 	int len;
-	struct json_path_parser parser;
-	struct json_path_node node;
+	struct json_lexer lexer;
+	struct json_token token;
 
 	reset_to_new_path("[0].field1.field2['field3'][5]");
 	is_next_index(3, 0);
@@ -61,8 +61,8 @@ test_basic()
 
 	/* Empty path. */
 	reset_to_new_path("");
-	is(json_path_next(&parser, &node), 0, "parse empty path");
-	is(node.type, JSON_PATH_END, "is str");
+	is(json_lexer_next_token(&lexer, &token), 0, "parse empty path");
+	is(token.type, JSON_TOKEN_END, "is str");
 
 	/* Path with no '.' at the beginning. */
 	reset_to_new_path("field1.field2");
@@ -81,8 +81,8 @@ test_basic()
 
 #define check_new_path_on_error(value, errpos) \
 	reset_to_new_path(value); \
-	struct json_path_node node; \
-	is(json_path_next(&parser, &node), errpos, "error on position %d" \
+	struct json_token token; \
+	is(json_lexer_next_token(&lexer, &token), errpos, "error on position %d" \
 	   " for <%s>", errpos, path);
 
 struct path_and_errpos {
@@ -97,7 +97,7 @@ test_errors()
 	plan(20);
 	const char *path;
 	int len;
-	struct json_path_parser parser;
+	struct json_lexer lexer;
 	const struct path_and_errpos errors[] = {
 		/* Double [[. */
 		{"[[", 2},
@@ -133,27 +133,27 @@ test_errors()
 	for (size_t i = 0; i < lengthof(errors); ++i) {
 		reset_to_new_path(errors[i].path);
 		int errpos = errors[i].errpos;
-		struct json_path_node node;
-		is(json_path_next(&parser, &node), errpos,
+		struct json_token token;
+		is(json_lexer_next_token(&lexer, &token), errpos,
 		   "error on position %d for <%s>", errpos, path);
 	}
 
 	reset_to_new_path("f.[2]")
-	struct json_path_node node;
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 3, "can not write <field.[index]>")
+	struct json_token token;
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 3, "can not write <field.[index]>")
 
 	reset_to_new_path("f.")
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 3, "error in leading <.>");
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 3, "error in leading <.>");
 
 	reset_to_new_path("fiel d1")
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 5, "space inside identifier");
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 5, "space inside identifier");
 
 	reset_to_new_path("field\t1")
-	json_path_next(&parser, &node);
-	is(json_path_next(&parser, &node), 6, "tab inside identifier");
+	json_lexer_next_token(&lexer, &token);
+	is(json_lexer_next_token(&lexer, &token), 6, "tab inside identifier");
 
 	check_plan();
 	footer();
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] [PATCH v5 1/9] box: refactor json_path_parser class
  2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-11-29 15:39     ` Vladimir Davydov
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-11-29 15:39 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Mon, Nov 26, 2018 at 03:53:06PM +0300, Kirill Shcherbatov wrote:
> Renamed object json_path_node to json_token and
> json_path_parser class to json_lexer.

Please write more descriptive comments, explaining not just what you're
doing, but why.

> 
> Need for #1012

Should be

Needed for #1012

I changed the comment to:

    json: some renames

    We are planning to link json_path_node objects in a tree and attach some
    extra information to them so that they could be used to describe a json
    document structure. Let's rename it to json_token as it sounds more
    appropriate for the purpose.

    Also, rename json_path_parser to json_lexer as it isn't a parser,
    really, it's rather a tokenizer or lexer. Besides, the new name is
    shorter.

    Needed for #1012

and pushed the patch to 2.1.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-11-29 17:38     ` Vladimir Davydov
  2018-11-29 17:50       ` Vladimir Davydov
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-11-29 17:38 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Mon, Nov 26, 2018 at 03:53:03PM +0300, Kirill Shcherbatov wrote:
> New JSON tree class would store JSON paths for tuple fields
> for registered non-plain indexes. It is a hierarchical data
> structure that organize JSON nodes produced by parser.
> Class provides API to lookup node by path and iterate over the
> tree.
> JSON Indexes patch require such functionality to make lookup
> for tuple_fields by path, make initialization of field map and
> build vynyl_stmt msgpack for secondary index via JSON tree
> iteration.
> 
> Need for #1012

In general looks OK, please see a few comments below.

> diff --git a/src/lib/json/json.c b/src/lib/json/json.c
> index eb80e4bbc..6bc74507f 100644
> --- a/src/lib/json/json.c
> +++ b/src/lib/json/json.c
> @@ -241,3 +242,268 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
>  		return json_parse_identifier(lexer, token);
>  	}
>  }
> +
> +/** Compare JSON tokens keys. */
> +static int
> +json_token_key_cmp(const struct json_token *a, const struct json_token *b)
> +{
> +	if (a->key.type != b->key.type)
> +		return a->key.type - b->key.type;
> +	int ret = 0;
> +	if (a->key.type == JSON_TOKEN_STR) {
> +		if (a->key.len != b->key.len)
> +			return a->key.len - b->key.len;
> +		ret = memcmp(a->key.str, b->key.str, a->key.len);
> +	} else if (a->key.type == JSON_TOKEN_NUM) {
> +		ret = a->key.num - b->key.num;
> +	} else {
> +		unreachable();
> +	}
> +	return ret;
> +}
> +
> +/**
> + * Compare hash records of two json tree nodes. Return 0 if equal.
> + */
> +static inline int
> +mh_json_cmp(const struct json_token *a, const struct json_token *b)
> +{
> +	if (a->parent != b->parent)
> +		return a->parent - b->parent;
> +	return json_token_key_cmp(a, b);
> +}

Please merge the two functions - the more functions we have for doing
roughly the same thing (token comparison in this case), the more
difficult it gets to follow the code.

I understand that you intend to use json_token_key_cmp (without parent
comparison) to compare paths in a future patch, but IMO you'd better
nullify token->parent in json_path_cmp before token comparison instead
of having two functions.

The resulting function should be called json_token_cmp.

> +
> +#define MH_SOURCE 1
> +#define mh_name _json
> +#define mh_key_t struct json_token **
> +#define mh_node_t struct json_token *
> +#define mh_arg_t void *
> +#define mh_hash(a, arg) ((*a)->rolling_hash)
> +#define mh_hash_key(a, arg) ((*a)->rolling_hash)
> +#define mh_cmp(a, b, arg) (mh_json_cmp((*a), (*b)))
> +#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
> +#include "salad/mhash.h"
> +
> +static const uint32_t hash_seed = 13U;
> +
> +/** Compute the hash value of a JSON token. */
> +static uint32_t
> +json_token_hash(struct json_token *token, uint32_t seed)

You always pass token->parent->hash to this function so the 'seed'
argument is not really needed.

> +{
> +	uint32_t h = seed;
> +	uint32_t carry = 0;
> +	const void *data;
> +	uint32_t data_size;
> +	if (token->key.type == JSON_TOKEN_STR) {
> +		data = token->key.str;
> +		data_size = token->key.len;
> +	} else if (token->key.type == JSON_TOKEN_NUM) {
> +		data = &token->key.num;
> +		data_size = sizeof(token->key.num);
> +	} else {
> +		unreachable();
> +	}
> +	PMurHash32_Process(&h, &carry, data, data_size);
> +	return PMurHash32_Result(h, carry, data_size);
> +}
> +
> +int
> +json_tree_create(struct json_tree *tree)
> +{
> +	memset(tree, 0, sizeof(struct json_tree));
> +	tree->root.rolling_hash = hash_seed;
> +	tree->root.key.type = JSON_TOKEN_END;
> +	tree->hash = mh_json_new();
> +	return tree->hash == NULL ? -1 : 0;
> +}
> +
> +static void
> +json_token_destroy(struct json_token *token)
> +{
> +	free(token->children);
> +}
> +
> +void
> +json_tree_destroy(struct json_tree *tree)
> +{
> +	assert(tree->hash != NULL);

This is a rather pointless assertion as mh_json_delete will crash anyway
if tree->hash is NULL. Besides, how would anyone ever step on it? By
passing an uninitialized json_tree to this function? This will hardly
ever happen.

OTOH it would be good to ensure that the tree is empty here, because
this is a subtle part of the function protocol.

> +	json_token_destroy(&tree->root);
> +	mh_json_delete(tree->hash);
> +}
> +
> +struct json_token *
> +json_tree_lookup(struct json_tree *tree, struct json_token *parent,
> +		 struct json_token *token)

'token' should be marked const here.

> +{
> +	if (parent == NULL)
> +		parent = &tree->root;

I'd rather require the caller to pass tree->root explicitly.
This would save us a comparison.

> +	struct json_token *ret = NULL;
> +	if (token->key.type == JSON_TOKEN_STR) {
> +		struct json_token key = *token;

Please don't assign a whole object when you need just a few fields:

		struct json_token key;
		key.str = token->str;
		key.len = token->len;
		key.parent = parent;
		key.hash = json_token_hash(token);

This is a warm path after all.

> +		key.rolling_hash = json_token_hash(token, parent->rolling_hash);
> +		key.parent = parent;
> +		token = &key;
> +		mh_int_t id = mh_json_find(tree->hash, &token, NULL);
> +		if (unlikely(id == mh_end(tree->hash)))

unlikely() isn't needed here - there are plenty of function calls so it
will hardly make any difference. Besides, why would one expect that the
key is likely to be present in the tree?

> +			return NULL;
> +		struct json_token **entry = mh_json_node(tree->hash, id);
> +		assert(entry == NULL || (*entry)->parent == parent);
> +		return entry != NULL ? *entry : NULL;
> +	} else if (token->key.type == JSON_TOKEN_NUM) {

> +		uint32_t idx =  token->key.num - 1;
> +		ret = idx < parent->child_size ? parent->children[idx] : NULL;

May be, it's worth making this case (JSON_TOKEN_NUM) inline, to save a
function call in case of a top-level tuple field lookup? BTW likely()
would be appropriate there.

> +	} else {
> +		unreachable();
> +	}
> +	return ret;
> +}
> +
> +int
> +json_tree_add(struct json_tree *tree, struct json_token *parent,
> +	      struct json_token *token)
> +{
> +	if (parent == NULL)
> +		parent = &tree->root;
> +	uint32_t rolling_hash =
> +	       json_token_hash(token, parent->rolling_hash);
> +	assert(json_tree_lookup(tree, parent, token) == NULL);
> +	uint32_t insert_idx = (token->key.type == JSON_TOKEN_NUM) ?
> +			      (uint32_t)token->key.num - 1 :
> +			      parent->child_size;
> +	if (insert_idx >= parent->child_size) {
> +		uint32_t new_size =
> +			parent->child_size == 0 ? 1 : 2 * parent->child_size;
> +		while (insert_idx >= new_size)
> +			new_size *= 2;
> +		struct json_token **children =
> +			realloc(parent->children, new_size*sizeof(void *));
> +		if (unlikely(children == NULL))

Useless unlikely().

> +			return -1;
> +		memset(children + parent->child_size, 0,
> +		       (new_size - parent->child_size)*sizeof(void *));
> +		parent->children = children;
> +		parent->child_size = new_size;

Ouch, this looks much more complex than it was in the previous version.
Please revert. Quoting myself:

} > +     if (insert_idx >= parent->children_count) {
} > +             uint32_t new_size = insert_idx + 1;
} 
} We usually double the size with each allocation. If this is intentional,
} please add a comment.

I didn't push you to change that. I just wanted you to add a comment
saying that we can afford quadratic algorithmic complexity here, because
this function is far from a hot path and we expect the tree to be rather
small.

> +	}
> +	assert(parent->children[insert_idx] == NULL);
> +	parent->children[insert_idx] = token;
> +	parent->child_count = MAX(parent->child_count, insert_idx + 1);
> +	token->sibling_idx = insert_idx;
> +	token->rolling_hash = rolling_hash;
> +	token->parent = parent;
> +
> +	const struct json_token **key =
> +		(const struct json_token **)&token;
> +	mh_int_t rc = mh_json_put(tree->hash, key, NULL, NULL);
> +	if (unlikely(rc == mh_end(tree->hash))) {

Again, unlikely() is pointless here.

> +		parent->children[insert_idx] = NULL;
> +		return -1;
> +	}
> +	assert(json_tree_lookup(tree, parent, token) == token);
> +	return 0;
> +}
> +
> +void
> +json_tree_del(struct json_tree *tree, struct json_token *token)
> +{
> +	struct json_token *parent = token->parent;
> +	assert(json_tree_lookup(tree, parent, token) == token);
> +	struct json_token **child_slot = NULL;
> +	if (token->key.type == JSON_TOKEN_NUM) {
> +		child_slot = &parent->children[token->key.num - 1];
> +	} else {
> +		uint32_t idx = 0;
> +		while (idx < parent->child_size &&
> +		       parent->children[idx] != token) { idx++; }

What's this loop for? Can't you just use sibling_idx here?

> +		if (idx < parent->child_size &&
> +		       parent->children[idx] == token)

Indentation.

> +			child_slot = &parent->children[idx];
> +	}
> +	assert(child_slot != NULL && *child_slot == token);
> +	*child_slot = NULL;
> +
> +	mh_int_t id = mh_json_find(tree->hash, &token, NULL);
> +	assert(id != mh_end(tree->hash));
> +	mh_json_del(tree->hash, id, NULL);
> +	json_token_destroy(token);
> +	assert(json_tree_lookup(tree, parent, token) == NULL);
> +}
> +
> +struct json_token *
> +json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
> +		      const char *path, uint32_t path_len)
> +{
> +	int rc;
> +	struct json_lexer lexer;
> +	struct json_token token;
> +	struct json_token *ret = parent != NULL ? parent : &tree->root;
> +	json_lexer_create(&lexer, path, path_len);
> +	while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
> +	       token.key.type != JSON_TOKEN_END && ret != NULL) {
> +		ret = json_tree_lookup(tree, ret, &token);
> +	}
> +	if (rc != 0 || token.key.type != JSON_TOKEN_END)
> +		return NULL;
> +	return ret;
> +}
> +
> +static struct json_token *
> +json_tree_child_next(struct json_token *parent, struct json_token *pos)
> +{
> +	assert(pos == NULL || pos->parent == parent);
> +	struct json_token **arr = parent->children;
> +	uint32_t arr_size = parent->child_size;
> +	if (arr == NULL)
> +		return NULL;
> +	uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
> +	while (idx < arr_size && arr[idx] == NULL)
> +		idx++;

This function may be called many times from json_tree_foreach helper.
The latter is invoked at each tuple creation so we must strive to make
it as fast as we can. Now, suppose you have JSON arrays of 100 elements
with the last one indexed. You'll be iterating over 99 elements for
nothing. Is it really that bad? I'm not sure, because the iteration
should finish pretty quickly, but it looks rather ugly to me.

A suggestion: may be, we could use rlist, as you did it before, for
iteration and the children array for lookup in an array? That is the
children array wouldn't be allocated at all for JSON_TOKEN_STR, only for
JSON_TOKEN_NUM, and it would be used solely for quick access, not for
iteration. BTW sibling_idx wouldn't be needed then. Would it make sense?

> +	if (idx >= arr_size)
> +		return NULL;
> +	return arr[idx];
> +}
> +
> +static struct json_token *
> +json_tree_leftmost(struct json_token *pos)
> +{
> +	struct json_token *last;
> +	do {
> +		last = pos;
> +		pos = json_tree_child_next(pos, NULL);
> +	} while (pos != NULL);
> +	return last;
> +}
> +
> +struct json_token *
> +json_tree_preorder_next(struct json_token *root, struct json_token *pos)
> +{
> +	if (pos == NULL)
> +		pos = root;
> +	struct json_token *next = json_tree_child_next(pos, NULL);
> +	if (next != NULL)
> +		return next;
> +	while (pos != root) {
> +		next = json_tree_child_next(pos->parent, pos);
> +		if (next != NULL)
> +			return next;
> +		pos = pos->parent;
> +	}
> +	return NULL;
> +}
> +
> +struct json_token *
> +json_tree_postorder_next(struct json_token *root, struct json_token *pos)
> +{
> +	struct json_token *next;
> +	if (pos == NULL) {
> +		next = json_tree_leftmost(root);
> +		return next != root ? next : NULL;
> +	}
> +	if (pos == root)
> +		return NULL;
> +	next = json_tree_child_next(pos->parent, pos);
> +	if (next != NULL) {
> +		next = json_tree_leftmost(next);
> +		return next != root ? next : NULL;
> +	}
> +	return pos->parent != root ? pos->parent : NULL;
> +}
> diff --git a/src/lib/json/json.h b/src/lib/json/json.h
> index ead446878..2bc159ff8 100644
> --- a/src/lib/json/json.h
> +++ b/src/lib/json/json.h
> @@ -61,20 +61,53 @@ enum json_token_type {
>  /**
>   * Element of a JSON path. It can be either string or number.
>   * String idenfiers are in ["..."] and between dots. Numbers are
> - * indexes in [...].
> + * indexes in [...]. May be organized in a JSON tree.

This is such a lame comment :-(

Can we say a few words here about the json_token lifetime: how it is
first filled by lexer, how it then can be passed to json_tree_add to
form a tree, how the tree is organized... May be, even add a code
snippet to the comment.

>   */
>  struct json_token {
> -	enum json_token_type type;
> -	union {
> -		struct {
> -			/** String identifier. */
> -			const char *str;
> -			/** Length of @a str. */
> -			int len;
> +	struct {
> +		enum json_token_type type;
> +		union {
> +			struct {
> +				/** String identifier. */
> +				const char *str;
> +				/** Length of @a str. */
> +				int len;
> +			};
> +			/** Index value. */
> +			uint64_t num;
>  		};
> -		/** Index value. */
> -		uint64_t num;
> -	};
> +	} key;

After some consideration, I don't think that it's worth moving the token
identifier to anonymous struct 'key' - it doesn't conflict with other
members and using 'num' or 'str/len' directly is shorter than 'key.num'
and 'key.str/len'. Also, the 'type' still has type json_token_type, not
json_token_key_type, which suggests that it should be under token.type,
not under token.key.type.

> +	/** Rolling hash for node used to lookup in json_tree. */

Quoting myself:

} My another concern is comments to the code and function members. They
} look better than they used to be, but still not perfect. E.g. take this
} comment:
} 
} > +     /**
} > +      * Rolling hash for node calculated with
} > +      * json_path_node_hash(key, parent).
} > +      */
} > +     uint32_t rolling_hash;
} 
} It took me a while to realize how you calculate rolling hash - I had to
} dive deep into the implementation.

Alas, the new comment isn't a bit better than the previous one.

> +	uint32_t rolling_hash;

Let's call it simply 'hash', short and clear. The rolling nature of the
hash should be explained in the comment.

> +	/**
> +	 * Array of child records. Indexes in this array
> +	 * follows array indexe-1 for JSON_TOKEN_NUM token type

typo: indexe -> index

BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing
from 1 looks weird to me.

> +	 * and are allocated sequently for JSON_TOKEN_NUM child

typo: sequently -> sequentially

Please enable spell checker.

> +	 * tokens. NULLs initializations are performed with new
> +	 * entries allocation.
> +	 */
> +	struct json_token **children;
> +	/** Allocation size of children array. */
> +	uint32_t child_size;
> +	/**
> +	 * Count of defined children array items. Equal the
> +	 * maximum index of inserted item.
> +	 */
> +	uint32_t child_count;
> +	/** Index of node in parent children array. */
> +	uint32_t sibling_idx;
> +	/** Pointer to parent node. */
> +	struct json_token *parent;
> +};
> +
> +struct mh_json_t;
> +
> +/** JSON tree object to manage tokens relations. */
> +struct json_tree {
> +	/** JSON tree root node. */
> +	struct json_token root;
> +	/** Hashtable af all tree nodes. */

How does it work? In a year nobody will remember, and to understand how
this thing operates we will have to dive deep into the code...

> +	struct mh_json_t *hash;
>  };
>  
>  /**
> @@ -104,6 +137,147 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
>  int
>  json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
>  
> +/** Create a JSON tree object to manage data relations. */
> +int
> +json_tree_create(struct json_tree *tree);
> +
> +/**
> + * Destroy JSON tree object. This routine doesn't destroy attached
> + * subtree so it should be called at the end of manual destroy.
> + */
> +void
> +json_tree_destroy(struct json_tree *tree);
> +
> +/**
> + * Make child lookup in JSON tree by token at position specified
> + * with parent. The parent may be set NULL to use tree root
> + * record.
> + */
> +struct json_token *
> +json_tree_lookup(struct json_tree *tree, struct json_token *parent,
> +		 struct json_token *token);
> +
> +/**
> + * Append token to the given parent position in a JSON tree. The
> + * parent mustn't have a child with such content. The parent may
> + * be set NULL to use tree root record.
> + */
> +int
> +json_tree_add(struct json_tree *tree, struct json_token *parent,
> +	      struct json_token *token);
> +
> +/**
> + * Delete a JSON tree token at the given parent position in JSON
> + * tree. Token entry shouldn't have subtree.
> + */
> +void
> +json_tree_del(struct json_tree *tree, struct json_token *token);
> +
> +/** Make child lookup by path in JSON tree. */
> +struct json_token *
> +json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
> +		      const char *path, uint32_t path_len);
> +
> +/** Make pre order traversal in JSON tree. */
> +struct json_token *
> +json_tree_preorder_next(struct json_token *root, struct json_token *pos);
> +
> +/** Make post order traversal in JSON tree. */
> +struct json_token *
> +json_tree_postorder_next(struct json_token *root, struct json_token *pos);

It's definitely worth mentioning that these functions don't visit the
root node.

> +
> +/**
> + * Make safe post-order traversal in JSON tree.
> + * May be used for destructors.
> + */
> +#define json_tree_foreach_safe(node, root)				     \
> +for (struct json_token *__next = json_tree_postorder_next((root), NULL);     \
> +     (((node) = __next) &&						     \
> +     (__next = json_tree_postorder_next((root), (node))), (node) != NULL);)

IMHO it's rather difficult for understanding. Can we rewrite it so that
it uses for() loop as expected, i.e. initializes the loop variable in
the first part, checks for a loop condition in the second part, and
advances the loop variable in the third part? Something like:

#define json_tree_foreach_safe(curr, root, next)
	for (curr = json_tree_postorder_next(root, NULL),
	     next = curr ? json_tree_postorder_next(root, curr) : NULL;
	     curr != NULL;
	     curr = next,
	     next = curr ? json_tree_postorder_next(root, curr) : NULL)

> +
> +#ifndef typeof
> +/* TODO: 'typeof' is a GNU extension */
> +#define typeof __typeof__
> +#endif
> +
> +/** Return container entry by json_tree_node node. */
> +#define json_tree_entry(node, type, member) ({ 				     \
> +	const typeof( ((type *)0)->member ) *__mptr = (node);		     \
> +	(type *)( (char *)__mptr - ((size_t) &((type *)0)->member) );	     \

Use container_of please.

> +})
> +
> +/**
> + * Return container entry by json_tree_node or NULL if
> + * node is NULL.
> + */
> +#define json_tree_entry_safe(node, type, member) ({			     \
> +	(node) != NULL ? json_tree_entry((node), type, member) : NULL;	     \
> +})
> +
> +/** Make entry pre order traversal in JSON tree  */
> +#define json_tree_preorder_next_entry(node, root, type, member) ({	     \
> +	struct json_token *__next =					     \
> +		json_tree_preorder_next((root), (node));		     \
> +	json_tree_entry_safe(__next, type, member);			     \
> +})
> +
> +/** Make entry post order traversal in JSON tree  */
> +#define json_tree_postorder_next_entry(node, root, type, member) ({	     \
> +	struct json_token *__next =					     \
> +		json_tree_postorder_next((root), (node));		     \
> +	json_tree_entry_safe(__next, type, member);			     \
> +})
> +
> +/** Make lookup in tree by path and return entry. */
> +#define json_tree_lookup_path_entry(tree, parent, path, path_len, type,	     \
> +				    member)				     \
> +({struct json_token *__node =						     \
> +	json_tree_lookup_path((tree), (parent), path, path_len);	     \
> +	json_tree_entry_safe(__node, type, member); })
> +
> +/** Make lookup in tree by token and return entry. */
> +#define json_tree_lookup_entry(tree, parent, token, type, member)	     \
> +({struct json_token *__node =						     \
> +	json_tree_lookup((tree), (parent), token);			     \

Sometimes you wrap macro arguments in parentheses (tree, parent),
sometimes you don't (token, path, path_len). Please fix.

> +	json_tree_entry_safe(__node, type, member);			     \
> +})
> +
> +/** Make pre-order traversal in JSON tree. */
> +#define json_tree_foreach_preorder(node, root)				     \
> +for ((node) = json_tree_preorder_next((root), NULL); (node) != NULL;	     \
> +     (node) = json_tree_preorder_next((root), (node)))
> +
> +/** Make post-order traversal in JSON tree. */
> +#define json_tree_foreach_postorder(node, root)				     \
> +for ((node) = json_tree_postorder_next((root), NULL); (node) != NULL;	     \
> +     (node) = json_tree_postorder_next((root), (node)))
> +
> +/** Make post-order traversal in JSON tree and return entry. */
> +#define json_tree_foreach_entry_preorder(node, root, type, member)	     \
> +for ((node) = json_tree_preorder_next_entry(NULL, (root), type, member);     \
> +     (node) != NULL;							     \
> +     (node) = json_tree_preorder_next_entry(&(node)->member, (root), type,   \
> +					    member))
> +
> +/** Make pre-order traversal in JSON tree and return entry. */
> +#define json_tree_foreach_entry_postorder(node, root, type, member)	     \
> +for ((node) = json_tree_postorder_next_entry(NULL, (root), type, member);    \
> +     (node) != NULL;							     \
> +     (node) = json_tree_postorder_next_entry(&(node)->member, (root), type,  \
> +					     member))
> +
> +/**
> + * Make secure post-order traversal in JSON tree and return entry.
> + */
> +#define json_tree_foreach_entry_safe(node, root, type, member)		     \
> +for (type *__next = json_tree_postorder_next_entry(NULL, (root), type,	     \
> +						   member);		     \
> +     (((node) = __next) &&						     \
> +     (__next = json_tree_postorder_next_entry(&(node)->member, (root), type, \
> +					      member)),			     \
> +     (node) != NULL);)
> +
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/test/unit/json_path.c b/test/unit/json_path.c
> index a5f90ad98..f6b0472f4 100644
> --- a/test/unit/json_path.c
> +++ b/test/unit/json_path.c
> @@ -159,14 +160,207 @@ test_errors()
>  	footer();
>  }
>  
> +struct test_struct {
> +	int value;
> +	struct json_token node;
> +};
> +
> +struct test_struct *
> +test_struct_alloc(struct test_struct *records_pool, int *pool_idx)
> +{
> +	struct test_struct *ret = &records_pool[*pool_idx];
> +	*pool_idx = *pool_idx + 1;
> +	memset(&ret->node, 0, sizeof(ret->node));
> +	return ret;
> +}
> +
> +struct test_struct *
> +test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
> +	      struct test_struct *records_pool, int *pool_idx)
> +{
> +	int rc;
> +	struct json_lexer lexer;
> +	struct json_token *parent = NULL;
> +	json_lexer_create(&lexer, path, path_len);
> +	struct test_struct *field = test_struct_alloc(records_pool, pool_idx);
> +	while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 &&
> +		field->node.key.type != JSON_TOKEN_END) {
> +		struct json_token *next =
> +			json_tree_lookup(tree, parent, &field->node);
> +		if (next == NULL) {
> +			rc = json_tree_add(tree, parent, &field->node);
> +			fail_if(rc != 0);
> +			next = &field->node;
> +			field = test_struct_alloc(records_pool, pool_idx);
> +		}
> +		parent = next;
> +	}
> +	fail_if(rc != 0 || field->node.key.type != JSON_TOKEN_END);
> +	*pool_idx = *pool_idx - 1;
> +	/* release field */
> +	return json_tree_entry(parent, struct test_struct, node);
> +}
> +
> +void
> +test_tree()
> +{
> +	header();
> +	plan(35);
> +
> +	struct json_tree tree;
> +	int rc = json_tree_create(&tree);
> +	fail_if(rc != 0);
> +
> +	struct test_struct records[6];
> +	for (int i = 0; i < 6; i++)
> +		records[i].value = i;
> +
> +	const char *path1 = "[1][10]";
> +	const char *path2 = "[1][20].file";
> +	const char *path_unregistered = "[1][3]";

Please add more paths. I'd like to see the following cases covered as
well:

 - JSON_TOKEN_STR as an intermediate node
 - JSON_TOKEN_STR as a common node for two paths
 - coinciding paths
 - one path is a sub-path of another

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-11-29 17:38     ` Vladimir Davydov
@ 2018-11-29 17:50       ` Vladimir Davydov
  2018-12-04 15:22       ` Vladimir Davydov
  2018-12-04 15:47       ` [tarantool-patches] " Kirill Shcherbatov
  2 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-11-29 17:50 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Thu, Nov 29, 2018 at 08:38:16PM +0300, Vladimir Davydov wrote:
> On Mon, Nov 26, 2018 at 03:53:03PM +0300, Kirill Shcherbatov wrote:
> > +int
> > +json_tree_add(struct json_tree *tree, struct json_token *parent,
> > +	      struct json_token *token)
> > +{
> > +	if (parent == NULL)
> > +		parent = &tree->root;
> > +	uint32_t rolling_hash =
> > +	       json_token_hash(token, parent->rolling_hash);
> > +	assert(json_tree_lookup(tree, parent, token) == NULL);
> > +	uint32_t insert_idx = (token->key.type == JSON_TOKEN_NUM) ?
> > +			      (uint32_t)token->key.num - 1 :
> > +			      parent->child_size;
> > +	if (insert_idx >= parent->child_size) {
> > +		uint32_t new_size =
> > +			parent->child_size == 0 ? 1 : 2 * parent->child_size;
> > +		while (insert_idx >= new_size)
> > +			new_size *= 2;
> > +		struct json_token **children =
> > +			realloc(parent->children, new_size*sizeof(void *));
> > +		if (unlikely(children == NULL))
> > +			return -1;
> > +		memset(children + parent->child_size, 0,
> > +		       (new_size - parent->child_size)*sizeof(void *));
> > +		parent->children = children;
> > +		parent->child_size = new_size;
> 
> Ouch, this looks much more complex than it was in the previous version.
> Please revert. Quoting myself:
> 
> } > +     if (insert_idx >= parent->children_count) {
> } > +             uint32_t new_size = insert_idx + 1;
> } 
> } We usually double the size with each allocation. If this is intentional,
> } please add a comment.
> 
> I didn't push you to change that. I just wanted you to add a comment
> saying that we can afford quadratic algorithmic complexity here, because
> this function is far from a hot path and we expect the tree to be rather
> small.

Come to think of it, don't bother. Doubling the array will probably
increases chances that the allocation will stay close to the node
itself, and it looks saner in a common sense. Let's please just rename
child_size to child_count_max and increase the initial allocation up to
say 8, OK?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-11-26 10:49 ` [PATCH v5 3/9] box: manage format fields with JSON tree class Kirill Shcherbatov
@ 2018-11-29 19:07   ` Vladimir Davydov
  2018-12-04 15:47     ` [tarantool-patches] " Kirill Shcherbatov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-11-29 19:07 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Mon, Nov 26, 2018 at 01:49:37PM +0300, Kirill Shcherbatov wrote:
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index d184dba..92028c5 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -38,10 +38,28 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
>  
>  static uint32_t formats_size = 0, formats_capacity = 0;
>  
> -static const struct tuple_field tuple_field_default = {
> -	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
> -	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
> -};
> +static struct tuple_field *
> +tuple_field_create(struct json_token *token)
> +{
> +	struct tuple_field *ret = calloc(1, sizeof(struct tuple_field));
> +	if (ret == NULL) {
> +		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
> +			 "ret");
> +		return NULL;
> +	}
> +	ret->type = FIELD_TYPE_ANY;
> +	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
> +	ret->coll_id = COLL_NONE;
> +	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
> +	ret->token = *token;
> +	return ret;
> +}
> +
> +static void
> +tuple_field_destroy(struct tuple_field *field)
> +{
> +	free(field);
> +}

We usually call methods that allocate/free an object _new/delete, not
_create/destroy. Please fix.

> @@ -258,39 +273,68 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
>  		}
>  	}
>  	uint32_t field_count = MAX(space_field_count, index_field_count);
> -	uint32_t total = sizeof(struct tuple_format) +
> -			 field_count * sizeof(struct tuple_field);
>  
> -	struct tuple_format *format = (struct tuple_format *) malloc(total);
> +	struct tuple_format *format = malloc(sizeof(struct tuple_format));
>  	if (format == NULL) {
>  		diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
>  			 "tuple format");
>  		return NULL;
>  	}
> +	if (json_tree_create(&format->tree) != 0) {
> +		free(format);
> +		return NULL;
> +	}
> +	struct json_token token;
> +	memset(&token, 0, sizeof(token));
> +	token.key.type = JSON_TOKEN_NUM;
> +	for (token.key.num = TUPLE_INDEX_BASE;
> +	     token.key.num < field_count + TUPLE_INDEX_BASE; token.key.num++) {
> +		struct tuple_field *field = tuple_field_create(&token);

I've looked at your following patches - this is the only place where you
pass a non-NULL token to tuple_field_create. Let's not pass token to
this function at all, and initialize field->token directly here.

> +		if (field == NULL)
> +			goto error;
> +		if (json_tree_add(&format->tree, NULL, &field->token) != 0) {

diag_set missing

> +			tuple_field_destroy(field);
> +			goto error;
> +		}
> +	}
>  	if (dict == NULL) {
>  		assert(space_field_count == 0);
>  		format->dict = tuple_dictionary_new(NULL, 0);
> -		if (format->dict == NULL) {
> -			free(format);
> -			return NULL;
> -		}
> +		if (format->dict == NULL)
> +			goto error;
>  	} else {
>  		format->dict = dict;
>  		tuple_dictionary_ref(dict);
>  	}
>  	format->refs = 0;
>  	format->id = FORMAT_ID_NIL;
> -	format->field_count = field_count;
>  	format->index_field_count = index_field_count;
>  	format->exact_field_count = 0;
>  	format->min_field_count = 0;
>  	return format;
> +error:;
> +	struct tuple_field *field;
> +	json_tree_foreach_entry_safe(field, &format->tree.root,
> +				     struct tuple_field, token) {
> +		json_tree_del(&format->tree, &field->token);
> +		tuple_field_destroy(field);
> +	}
> +	json_tree_destroy(&format->tree);

And you have exactly the same piece of code in tuple_format_destroy.
Factor it out to a helper function?

> +	free(format);
> +	return NULL;
>  }
>  
>  /** Free tuple format resources, doesn't unregister. */
>  static inline void
>  tuple_format_destroy(struct tuple_format *format)
>  {
> +	struct tuple_field *field;
> +	json_tree_foreach_entry_safe(field, &format->tree.root,
> +				     struct tuple_field, token) {
> +		json_tree_del(&format->tree, &field->token);
> +		tuple_field_destroy(field);
> +	}
> +	json_tree_destroy(&format->tree);
>  	tuple_dictionary_unref(format->dict);
>  }
>  
> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index 232df22..2da773b 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -34,6 +34,7 @@
>  #include "key_def.h"
>  #include "field_def.h"
>  #include "errinj.h"
> +#include "json/json.h"
>  #include "tuple_dictionary.h"
>  
>  #if defined(__cplusplus)
> @@ -113,6 +114,8 @@ struct tuple_field {
>  	struct coll *coll;
>  	/** Collation identifier. */
>  	uint32_t coll_id;
> +	/** An JSON entry to organize tree. */

	/** Link in tuple_format::fields. */

would be more useful.

> +	struct json_token token;
>  };
>  
>  /**
> @@ -166,16 +169,33 @@ struct tuple_format {
>  	 * index_field_count <= min_field_count <= field_count.
>  	 */
>  	uint32_t min_field_count;
> -	/* Length of 'fields' array. */
> -	uint32_t field_count;
>  	/**
>  	 * Shared names storage used by all formats of a space.
>  	 */
>  	struct tuple_dictionary *dict;
> -	/* Formats of the fields */
> -	struct tuple_field fields[0];
> +	/** JSON tree of fields. */

Would be prudent to say a few words about the tree structure, e.g.

	/**
	 * Fields comprising the format, organized in a tree.
	 * First level nodes correspond to tuple fields.
	 * Deeper levels define indexed JSON paths within
	 * tuple fields. Nodes of the tree are linked by
	 * tuple_field::token.
	 */

I hate to repeat myself, but you ought to pay more attention to comments
in the code.

> +	struct json_tree tree;

format->tree - what tree?

Why not leave the same name - 'fields'?

>  };
>  
> +
> +static inline uint32_t
> +tuple_format_field_count(const struct tuple_format *format)
> +{
> +	return format->tree.root.child_count;
> +}
> +

Comments, comments...

> +static inline struct tuple_field *
> +tuple_format_field(struct tuple_format *format, uint32_t fieldno)
> +{
> +	assert(fieldno < tuple_format_field_count(format));
> +	struct json_token token = {
> +		.key.type = JSON_TOKEN_NUM,
> +		.key.num = fieldno + TUPLE_INDEX_BASE
> +	};
> +	return json_tree_lookup_entry(&format->tree, NULL, &token,
> +				      struct tuple_field, token);
> +}
> +
>  extern struct tuple_format **tuple_formats;
>  
>  static inline uint32_t

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 4/9] lib: introduce json_path_cmp routine
  2018-11-26 10:49 ` [PATCH v5 4/9] lib: introduce json_path_cmp routine Kirill Shcherbatov
@ 2018-11-30 10:46   ` Vladimir Davydov
  2018-12-03 17:37     ` [tarantool-patches] " Konstantin Osipov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-11-30 10:46 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Mon, Nov 26, 2018 at 01:49:38PM +0300, Kirill Shcherbatov wrote:
> diff --git a/src/lib/json/json.h b/src/lib/json/json.h
> index dd09f5a..7d46601 100644
> --- a/src/lib/json/json.h
> +++ b/src/lib/json/json.h
> @@ -137,6 +137,17 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
>  int
>  json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
>  
> +/**
> + * Compare two JSON paths using Lexer class.
> + * - @a path must be valid
> + * - at the case of paths that have same token-sequence prefix,
> + *   the path having more tokens is assumed to be greater
> + * - when @b path contains an error, the path "a" is assumed to
> + *   be greater
> + */
> +int
> +json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len);
> +

One typically expects cmp(a, b) to be equivalent to -cmp(b, a).
Can't we make json_path_cmp satisfy this property, for example, by
requiring both strings to be valid json paths with an assertion?

>  /** Create a JSON tree object to manage data relations. */
>  int
>  json_tree_create(struct json_tree *tree);
> diff --git a/test/unit/json_path.c b/test/unit/json_path.c
> index f6b0472..35c2164 100644
> --- a/test/unit/json_path.c
> +++ b/test/unit/json_path.c
> @@ -352,15 +352,44 @@ test_tree()
>  	footer();
>  }
>  
> +void
> +test_path_cmp()
> +{
> +	const char *a = "Data[1][\"FIO\"].fname";
> +	uint32_t a_len = strlen(a);
> +	const struct path_and_errpos rc[] = {
> +		{a, 0},
> +		{"[\"Data\"][1].FIO[\"fname\"]", 0},
> +		{"Data[[1][\"FIO\"].fname", 6},
> +		{"Data[1]", 1},
> +		{"Data[1][\"FIO\"].fname[1]", -2},
> +		{"Data[1][\"Info\"].fname[1]", -1},

Should we reorder values of json_token_type enum, and we would have to
fix the test. Let's please rewrite the test so that it doesn't depend on
the json_lexer implementation, in other words just check that the sign
of the return value is correct.

> +	};
> +	header();
> +	plan(lengthof(rc));
> +
> +	for (size_t i = 0; i < lengthof(rc); ++i) {
> +		const char *path = rc[i].path;
> +		int errpos = rc[i].errpos;
> +		int rc = json_path_cmp(a, a_len, path, strlen(path));
> +		is(rc, errpos, "path cmp result \"%s\" with \"%s\": "
> +		   "have %d, expected %d", a, path, rc, errpos);
> +	}
> +
> +	check_plan();
> +	footer();
> +}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes
  2018-11-26 10:49 ` [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes Kirill Shcherbatov
@ 2018-11-30 21:28   ` Vladimir Davydov
  2018-12-01 16:49     ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-11-30 21:28 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Mon, Nov 26, 2018 at 01:49:39PM +0300, Kirill Shcherbatov wrote:
> New JSON-path-based indexes allows to index documents content.
> As we need to store user-defined JSON path in key_part
> and key_part_def, we have introduced path and path_len
> fields. JSON path is verified and transformed to canonical
> form on index msgpack unpack.

It is not anymore.

> Path string stored as a part of the key_def allocation:
> 
> +-------+---------+-------+---------+-------+-------+-------+
> |key_def|key_part1|  ...  |key_partN| path1 | pathK | pathN |
> +-------+---------+-------+---------+-------+-------+-------+
>           |                         ^
>           |-> path _________________|
> 
> With format creation JSON paths are stored at the end of format
> allocation:
> +------------+------------+-------+------------+-------+
> |tuple_format|tuple_field1|  ...  |tuple_fieldN| pathK |
> +------------+------------+-------+------------+-------+

I wonder what made you put this in the commit message. Is the way paths
are allocated that important? I don't think so. OTOH it would be nice to
say a few words here about the user API and some subtleties in the
implementation, e.g. about tuple_init_field_map.

> 
> Part of #1012
> ---
>  src/box/errcode.h            |   2 +-
>  src/box/index_def.c          |   8 +-
>  src/box/key_def.c            | 164 +++++++++++++---
>  src/box/key_def.h            |  23 ++-
>  src/box/lua/space.cc         |   5 +
>  src/box/memtx_engine.c       |   3 +
>  src/box/sql.c                |   1 +
>  src/box/sql/build.c          |   1 +
>  src/box/sql/select.c         |   6 +-
>  src/box/sql/where.c          |   1 +
>  src/box/tuple.c              |  38 +---
>  src/box/tuple_compare.cc     |  13 +-
>  src/box/tuple_extract_key.cc |  21 ++-
>  src/box/tuple_format.c       | 439 ++++++++++++++++++++++++++++++++++++-------
>  src/box/tuple_format.h       |  38 +++-
>  src/box/tuple_hash.cc        |   2 +-
>  src/box/vinyl.c              |   3 +
>  src/box/vy_log.c             |   3 +-
>  src/box/vy_point_lookup.c    |   2 -
>  src/box/vy_stmt.c            | 166 +++++++++++++---
>  test/box/misc.result         |   1 +
>  test/engine/tuple.result     | 416 ++++++++++++++++++++++++++++++++++++++++
>  test/engine/tuple.test.lua   | 121 ++++++++++++
>  23 files changed, 1306 insertions(+), 171 deletions(-)
> 
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index 2119ca3..bc6cecd 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -106,13 +111,25 @@ const uint32_t key_mp_type[] = {
>  struct key_def *
>  key_def_dup(const struct key_def *src)
>  {
> -	size_t sz = key_def_sizeof(src->part_count);
> -	struct key_def *res = (struct key_def *)malloc(sz);
> +	const struct key_part *parts = src->parts;
> +	const struct key_part *parts_end = parts + src->part_count;
> +	size_t sz = 0;
> +	for (; parts < parts_end; parts++)
> +		sz += parts->path != NULL ? parts->path_len + 1 : 0;

Why do you require key_part::path to be nul-terminated?
It looks strange as you store the string length anyway.

Without this requirement, the code would look more straightforward as
there would be less branching. E.g. this piece of code would look like:

	for (uint32_t i = 0; i < src->part_count; i++)
		sz += src->parts[i].path_len;

> +	sz = key_def_sizeof(src->part_count, sz);
> +	struct key_def *res = (struct key_def *)calloc(1, sz);
>  	if (res == NULL) {
>  		diag_set(OutOfMemory, sz, "malloc", "res");
>  		return NULL;
>  	}
>  	memcpy(res, src, sz);
> +	/* Update paths to point to the new memory chunk.*/
> +	for (uint32_t i = 0; i < src->part_count; i++) {
> +		if (src->parts[i].path == NULL)
> +			continue;
> +		size_t path_offset = src->parts[i].path - (char *)src;
> +		res->parts[i].path = (char *)res + path_offset;
> +	}
>  	return res;
>  }
>  
> @@ -120,8 +137,23 @@ void
>  key_def_swap(struct key_def *old_def, struct key_def *new_def)
>  {
>  	assert(old_def->part_count == new_def->part_count);
> -	for (uint32_t i = 0; i < new_def->part_count; i++)
> -		SWAP(old_def->parts[i], new_def->parts[i]);
> +	for (uint32_t i = 0; i < new_def->part_count; i++) {
> +		if (old_def->parts[i].path == NULL) {
> +			SWAP(old_def->parts[i], new_def->parts[i]);
> +		} else {
> +			/*
> +			 * Since the data is located in  memory
> +			 * in the same order (otherwise rebuild
> +			 * would be called), just update the
> +			 * pointers.
> +			 */
> +			size_t path_offset =
> +				old_def->parts[i].path - (char *)old_def;
> +			SWAP(old_def->parts[i], new_def->parts[i]);
> +			old_def->parts[i].path = (char *)old_def + path_offset;
> +			new_def->parts[i].path = (char *)new_def + path_offset;
> +		}
> +	}

This would be shorter:

	for (uint32_t i = 0; i < new_def->part_count; i++) {
		SWAP(old_def->parts[i], new_def->parts[i]);
		/*
		 * Paths are allocated as a part of key_def so
		 * we need to swap path pointers back - it's OK
		 * as paths aren't supposed to change.
		 */
		assert(old_def->parts[i].path_len == new_def->parts[i].path_len);
		SWAP(old_def->parts[i].path, new_def->parts[i].path);
	}

>  	SWAP(*old_def, *new_def);
>  }
>  
> @@ -184,16 +231,23 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
>  			}
>  			coll = coll_id->coll;
>  		}
> +		uint32_t path_len = 0;
> +		if (part->path != NULL) {
> +			path_len = strlen(part->path);
> +			def->parts[i].path = data;
> +			data += path_len + 1;
> +		}
>  		key_def_set_part(def, i, part->fieldno, part->type,
>  				 part->nullable_action, coll, part->coll_id,
> -				 part->sort_order);
> +				 part->sort_order, part->path, path_len);
>  	}
>  	key_def_set_cmp(def);
>  	return def;
>  }
>  
> -void
> -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
> +int
> +key_def_dump_parts(struct region *pool, const struct key_def *def,
> +		   struct key_part_def *parts)

Region should be last in the argument list, because it's just an
auxiliary object. We should emphasize on def and parts here.

>  {
>  	for (uint32_t i = 0; i < def->part_count; i++) {
>  		const struct key_part *part = &def->parts[i];
> @@ -445,6 +539,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
>  		return key_def_decode_parts_166(parts, part_count, data,
>  						fields, field_count);
>  	}
> +	struct region *region = &fiber()->gc;

I don't think it's worth making key_def.c depend on fiber.h only because
we need to use a region here. Let's pass it to key_def_decode_parts in
an extra argument as we do in case of key_def_dump_parts.

>  	for (uint32_t i = 0; i < part_count; i++) {
>  		struct key_part_def *part = &parts[i];
>  		if (mp_typeof(**data) != MP_MAP) {
> @@ -595,9 +708,14 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
>  	for (; part != end; part++) {
>  		if (key_def_find(first, part) != NULL)
>  			continue;
> +		if (part->path != NULL) {
> +			new_def->parts[pos].path = data;
> +			data += part->path_len + 1;
> +		}

I counted three places where you initialize part->path before passing
it to key_def_set_part. I think it would be nice to move part->path
initialization to key_def_set_part.

>  		key_def_set_part(new_def, pos++, part->fieldno, part->type,
>  				 part->nullable_action, part->coll,
> -				 part->coll_id, part->sort_order);
> +				 part->coll_id, part->sort_order, part->path,
> +				 part->path_len);
>  	}
>  	key_def_set_cmp(new_def);
>  	return new_def;
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index d4da6c5..7731e48 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -68,6 +68,8 @@ struct key_part_def {
>  	enum on_conflict_action nullable_action;
>  	/** Part sort order. */
>  	enum sort_order sort_order;
> +	/** JSON path to data. */
> +	const char *path;

It would be nice to make the comment a little bit more thorough:

	/**
	 * JSON path to indexed data, relative to the field number,
	 * or NULL if this key part indexes a top-level field.
	 */

>  };
>  
>  extern const struct key_part_def key_part_def_default;
> @@ -86,6 +88,13 @@ struct key_part {
>  	enum on_conflict_action nullable_action;
>  	/** Part sort order. */
>  	enum sort_order sort_order;
> +	/**
> +	 * JSON path to data in 'canonical' form.
> +	 * Read json_path_normalize to get more details.
> +	 */

There's no json_path_normalize anymore. Please fix the comment.

Also, please mention in the comment whether the string is nul-terminated
or not, because the presence of @path_len suggests it isn't. And say a
few words on how these paths are allocated.

> +	char *path;
> +	/** The length of JSON path. */
> +	uint32_t path_len;
>  };
>  
>  struct key_def;
> @@ -260,8 +272,9 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count);
>  /**
>   * Dump part definitions of the given key def.
>   */
> -void
> -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
> +int
> +key_def_dump_parts(struct region *pool, const struct key_def *def,
> +		   struct key_part_def *parts);

Please update the comment as well - explain what the region is used for,
why this function can fail.

>  
>  /**
>   * Update 'has_optional_parts' of @a key_def with correspondence
> diff --git a/src/box/sql/select.c b/src/box/sql/select.c
> index ca709b4..0734712 100644
> --- a/src/box/sql/select.c
> +++ b/src/box/sql/select.c
> @@ -1356,6 +1357,9 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
>  struct sql_key_info *
>  sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
>  {
> +	/** SQL key_parts could not have JSON paths. */
> +	for (uint32_t i = 0; i < key_def->part_count; i++)
> +		assert(key_def->parts[i].path == NULL);
>  	struct sql_key_info *key_info = sqlite3DbMallocRawNN(db,
>  				sql_key_info_sizeof(key_def->part_count));
>  	if (key_info == NULL) {
> @@ -1366,7 +1370,7 @@ sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
>  	key_info->key_def = NULL;
>  	key_info->refs = 1;
>  	key_info->part_count = key_def->part_count;
> -	key_def_dump_parts(key_def, key_info->parts);
> +	key_def_dump_parts(&fiber()->gc, key_def, key_info->parts);

If you passed region=NULL to this function, you wouldn't need the
assertion above.

>  	return key_info;
>  }
>  
> diff --git a/src/box/tuple.c b/src/box/tuple.c
> index aae1c3c..62e06e7 100644
> --- a/src/box/tuple.c
> +++ b/src/box/tuple.c
> @@ -138,38 +138,18 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
>  int
>  tuple_validate_raw(struct tuple_format *format, const char *tuple)
>  {
> -	if (tuple_format_field_count(format) == 0)
> -		return 0; /* Nothing to check */
> -
> -	/* Check to see if the tuple has a sufficient number of fields. */
> -	uint32_t field_count = mp_decode_array(&tuple);
> -	if (format->exact_field_count > 0 &&
> -	    format->exact_field_count != field_count) {
> -		diag_set(ClientError, ER_EXACT_FIELD_COUNT,
> -			 (unsigned) field_count,
> -			 (unsigned) format->exact_field_count);
> +	struct region *region = &fiber()->gc;
> +	uint32_t used = region_used(region);
> +	uint32_t *field_map = region_alloc(region, format->field_map_size);
> +	if (field_map == NULL) {
> +		diag_set(OutOfMemory, format->field_map_size, "region_alloc",
> +			 "field_map");
>  		return -1;
>  	}
> -	if (unlikely(field_count < format->min_field_count)) {
> -		diag_set(ClientError, ER_MIN_FIELD_COUNT,
> -			 (unsigned) field_count,
> -			 (unsigned) format->min_field_count);
> +	field_map = (uint32_t *)((char *)field_map + format->field_map_size);
> +	if (tuple_init_field_map(format, field_map, tuple, true) != 0)
>  		return -1;
> -	}
> -
> -	/* Check field types */
> -	struct tuple_field *field = tuple_format_field(format, 0);
> -	uint32_t i = 0;
> -	uint32_t defined_field_count =
> -		MIN(field_count, tuple_format_field_count(format));
> -	for (; i < defined_field_count; ++i) {
> -		field = tuple_format_field(format, i);
> -		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
> -					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
> -					 tuple_field_is_nullable(field)))
> -			return -1;
> -		mp_next(&tuple);
> -	}
> +	region_truncate(region, used);
>  	return 0;

This could be done in a separate patch with a proper justification.

>  }
>  
> diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
> index e21b009..554c29f 100644
> --- a/src/box/tuple_compare.cc
> +++ b/src/box/tuple_compare.cc
> @@ -283,6 +285,15 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
>  				current_fieldno++;
>  			}
>  		}
> +		const char *field_last, *field_end_last;
> +		if (part->path != NULL) {
> +			field_last = field;
> +			field_end_last = field_end;
> +			(void)tuple_field_go_to_path(&field, part->path,
> +						     part->path_len);

Please add an assertion so that the reader would clearly see you don't
expect an error here.

> +			field_end = field;
> +			mp_next(&field_end);
> +		}
>  		memcpy(key_buf, field, field_end - field);
>  		key_buf += field_end - field;
>  		if (has_optional_parts && null_count != 0) {
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index 92028c5..193d0d8 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -51,7 +52,8 @@ tuple_field_create(struct json_token *token)
>  	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
>  	ret->coll_id = COLL_NONE;
>  	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
> -	ret->token = *token;
> +	if (token != NULL)
> +		ret->token = *token;

As I mentioned in my review to the previous patch, I think it would be
better not to pass token to this function, because you typically pass
NULL. Instead set token->type to JSON_TOKEN_END and initialize it
properly afterwards.

>  	return ret;
>  }
>  
> @@ -61,14 +63,114 @@ tuple_field_destroy(struct tuple_field *field)
>  	free(field);
>  }
>  
> +/** Build a JSON tree path for specified path. */
> +static struct tuple_field *
> +tuple_field_tree_add_path(struct tuple_format *format, const char *path,
> +			  uint32_t path_len, uint32_t fieldno)
> +{
> +	int rc = 0;
> +	struct json_tree *tree = &format->tree;
> +	struct tuple_field *parent = tuple_format_field(format, fieldno);
> +	struct tuple_field *field = tuple_field_create(NULL);
> +	if (unlikely(field == NULL))

Please don't use likely/unlikely in tuple format constructor - this path
is far from hot.

> +		goto end;
> +
> +	struct json_lexer lexer;
> +	bool is_last_new = false;
> +	json_lexer_create(&lexer, path, path_len);
> +	while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 &&
> +	       field->token.key.type != JSON_TOKEN_END) {
> +		enum field_type iterm_node_type =

iterm_node_type? What is that supposed to mean? May be, call it simply
expected_type?

> +			field->token.key.type == JSON_TOKEN_STR ?
> +			FIELD_TYPE_MAP : FIELD_TYPE_ARRAY;
> +		if (parent->type != FIELD_TYPE_ANY &&
> +		    parent->type != iterm_node_type) {
> +			const char *name =
> +				tt_sprintf("[%d]%.*s", fieldno, path_len, path);
> +			diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, name,
> +				 field_type_strs[parent->type],
> +				 field_type_strs[iterm_node_type]);
> +			parent = NULL;
> +			goto end;
> +		}
> +		struct tuple_field *next =
> +			json_tree_lookup_entry(tree, &parent->token,
> +					       &field->token,
> +					       struct tuple_field, token);
> +		if (next == NULL) {
> +			rc = json_tree_add(tree, &parent->token, &field->token);
> +			if (unlikely(rc != 0)) {
> +				diag_set(OutOfMemory, sizeof(struct json_token),
> +					 "json_tree_add", "tree");
> +				parent = NULL;
> +				goto end;
> +			}
> +			next = field;
> +			is_last_new = true;
> +			field = tuple_field_create(NULL);
> +			if (unlikely(next == NULL))

Looks like you forgot to set 'parent' to NULL here. Let's add a new
label, say 'fail', which would set 'parent' to NULL and then jump to
'end'?

> +				goto end;
> +		} else {
> +			is_last_new = false;
> +		}
> +		parent->type = iterm_node_type;
> +		parent = next;
> +	}
> +	if (rc != 0 || field->token.key.type != JSON_TOKEN_END) {

We break the loop if either 'rc' is not 0 or token.type ==
JSON_TOKEN_END. Therefore you don't need to check token.type here.

> +		const char *err_msg =
> +			tt_sprintf("invalid JSON path '%s': path has invalid "
> +				   "structure (error at position %d)", path,
> +				   rc);
> +		diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
> +			 fieldno + TUPLE_INDEX_BASE, err_msg);

IMO it would be better to move path validation to key_def_decode_parts -
it would be more obvious that way, because we do most of the checks
there. Besides, what's the point to go as far as to format creation if
we could return an error right when we parsed the input?

Moving the check to key_def_decode_parts would mean we had to parse the
path one more time, but who cares - index creation is a rare event. If
you agree, then please add json_path_validate in the same patch where
you add json_path_cmp.

> +		parent = NULL;
> +		goto end;
> +	}
> +	assert(parent != NULL);
> +	/* Update tree depth information. */
> +	if (is_last_new) {
> +		uint32_t depth = 1;
> +		for (struct json_token *iter = parent->token.parent;
> +		     iter != &format->tree.root; iter = iter->parent, ++depth) {
> +			struct tuple_field *record =
> +				json_tree_entry(iter, struct tuple_field,
> +						token);
> +			record->subtree_depth =
> +				MAX(record->subtree_depth, depth);

Why do you need to maintain subtree_depth for each tuple_field?
You never use it after the format is created.

To initialize format->subtree_depth you could simply count number of
tokens in the path locally and then do

	format->subtree_depth = MAX(format->subtree_depth, token_count);

no?

BTW subtree_depth doesn't look like a descriptive name. Perhaps, we
should call it max_path_tokens to reflect its true nature and use 0 if
there is no JSON fields?

> +		}
> +	}
> +end:
> +	tuple_field_destroy(field);
> +	return parent;
> +}
> +
>  static int
>  tuple_format_use_key_part(struct tuple_format *format,
>  			  const struct field_def *fields, uint32_t field_count,
>  			  const struct key_part *part, bool is_sequential,
> -			  int *current_slot)
> +			  int *current_slot, char **path_data)
>  {
>  	assert(part->fieldno < tuple_format_field_count(format));
>  	struct tuple_field *field = tuple_format_field(format, part->fieldno);
> +	if (unlikely(part->path != NULL)) {
> +		assert(!is_sequential);
> +		/**
> +		 * Copy JSON path data to reserved area at the
> +		 * end of format allocation.
> +		 */
> +		memcpy(*path_data, part->path, part->path_len);
> +		(*path_data)[part->path_len] = '\0';
> +		struct tuple_field *root = field;
> +		field = tuple_field_tree_add_path(format, *path_data,
> +						  part->path_len,
> +						  part->fieldno);
> +		if (field == NULL)
> +			return -1;
> +		format->subtree_depth =
> +			MAX(format->subtree_depth, root->subtree_depth + 1);
> +		field->is_key_part = true;

Why do you set this flag here? There's code below in this function that
does the same.

> +		*path_data += part->path_len + 1;
> +	}
>  	/*
>  		* If a field is not present in the space format,
>  		* inherit nullable action of the first key part
> @@ -113,7 +215,10 @@ tuple_format_use_key_part(struct tuple_format *format,
>  					       field->type)) {
>  		const char *name;
>  		int fieldno = part->fieldno + TUPLE_INDEX_BASE;
> -		if (part->fieldno >= field_count) {
> +		if (unlikely(part->path != NULL)) {
> +			name = tt_sprintf("[%d]%.*s", fieldno, part->path_len,
> +					  part->path);

For regular fields we try to find field name when reporting an error
(the code is right below), but for JSON fields, we don't. Fix that
please.

> +		} else if (part->fieldno >= field_count) {
>  			name = tt_sprintf("%d", fieldno);
>  		} else {
>  			const struct field_def *def =
> @@ -137,10 +242,9 @@ tuple_format_use_key_part(struct tuple_format *format,
>  	 * simply accessible, so we don't store an offset for it.
>  	 */
>  	if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
> -	    is_sequential == false && part->fieldno > 0) {
> -		*current_slot = *current_slot - 1;
> -		field->offset_slot = *current_slot;
> -	}
> +	    is_sequential == false &&
> +	    (part->fieldno > 0 || part->path != NULL))

> +		field->offset_slot = (*current_slot = *current_slot - 1);

Why did you change this assignment? It looked cleaner before.

>  	return 0;
>  }
>  
> @@ -377,16 +489,37 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
>  {
>  	if (format1->exact_field_count != format2->exact_field_count)
>  		return false;
> -	uint32_t format1_field_count = tuple_format_field_count(format1);
> -	uint32_t format2_field_count = tuple_format_field_count(format2);
> -	for (uint32_t i = 0; i < format1_field_count; ++i) {
> -		const struct tuple_field *field1 =
> -			tuple_format_field(format1, i);
> +	struct tuple_field *field1;
> +	struct json_token *field2_prev_token = NULL;
> +	struct json_token *skip_root_token = NULL;
> +	struct json_token *field1_prev_token = &format1->tree.root;
> +	json_tree_foreach_entry_preorder(field1, &format1->tree.root,
> +					 struct tuple_field, token) {
> +		/* Test if subtree skip is required. */
> +		if (skip_root_token != NULL) {
> +			struct json_token *tmp = &field1->token;
> +			while (tmp->parent != NULL &&
> +			       tmp->parent != skip_root_token)
> +				tmp = tmp->parent;
> +			if (tmp->parent == skip_root_token)
> +				continue;
> +		}
> +		skip_root_token = NULL;
> +		/* Lookup for a valid parent node in new tree. */
> +		while (field1_prev_token != field1->token.parent) {
> +			field1_prev_token = field1_prev_token->parent;
> +			field2_prev_token = field2_prev_token->parent;
> +			assert(field1_prev_token != NULL);
> +		}
> +		struct tuple_field *field2 =
> +			json_tree_lookup_entry(&format2->tree, field2_prev_token,
> +						&field1->token,
> +						struct tuple_field, token);

I fail to understand this code. Can we simplify it somehow?
May be, we could just walk over all format1->fields and try
to look up the corresponding format2 field? That would be
suboptimal, but this function is called only on DDL so
understandability is top-prio here.

Another option is to try to make this code generic and move
it to the json library, augmenting it with some commentary -
IMHO it looks too complex for tuple_format.c, but in json.c
it would probably be OK, if wrapped properly.

>  		/*
>  		 * The field has a data type in format1, but has
>  		 * no data type in format2.
>  		 */
> -		if (i >= format2_field_count) {
> +		if (field2 == NULL) {
>  			/*
>  			 * The field can get a name added
>  			 * for it, and this doesn't require a data
> @@ -397,13 +530,13 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
>  			 * NULLs or miss the subject field.
>  			 */
>  			if (field1->type == FIELD_TYPE_ANY &&
> -			    tuple_field_is_nullable(field1))
> +			    tuple_field_is_nullable(field1)) {
> +				skip_root_token = &field1->token;
>  				continue;
> -			else
> +			} else {
>  				return false;
> +			}
>  		}
> -		const struct tuple_field *field2 =
> -			tuple_format_field(format2, i);
>  		if (! field_type1_contains_type2(field1->type, field2->type))
>  			return false;
>  		/*
> @@ -413,10 +546,82 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
>  		if (tuple_field_is_nullable(field2) &&
>  		    !tuple_field_is_nullable(field1))
>  			return false;
> +
> +		field2_prev_token = &field2->token;
> +		field1_prev_token = &field1->token;
>  	}
>  	return true;
>  }
>  
> +/** Find a field in format by offset slot. */
> +static struct tuple_field *
> +tuple_field_by_offset_slot(const struct tuple_format *format,
> +			   int32_t offset_slot)
> +{
> +	struct tuple_field *field;
> +	struct json_token *root = (struct json_token *)&format->tree.root;
> +	json_tree_foreach_entry_preorder(field, root, struct tuple_field,
> +					 token) {
> +		if (field->offset_slot == offset_slot)
> +			return field;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * Verify field_map and raise error on some indexed field has
> + * not been initialized. Routine rely on field_map has been
> + * initialized with UINT32_MAX marker before field_map
> + * initialization.
> + */
> +static int
> +tuple_field_map_validate(const struct tuple_format *format, uint32_t *field_map)
> +{
> +	struct json_token *tree_node = (struct json_token *)&format->tree.root;
> +	/* Lookup for absent not-nullable fields. */
> +	int32_t field_map_items =
> +		(int32_t)(format->field_map_size/sizeof(field_map[0]));
> +	for (int32_t i = -1; i >= -field_map_items; i--) {
> +		if (field_map[i] != UINT32_MAX)
> +			continue;
> +
> +		struct tuple_field *field =
> +			tuple_field_by_offset_slot(format, i);
> +		assert(field != NULL);
> +		/* Lookup for field number in tree. */
> +		struct json_token *parent = &field->token;
> +		while (parent->parent != &format->tree.root)
> +			parent = parent->parent;
> +		assert(parent->key.type == JSON_TOKEN_NUM);
> +		uint32_t fieldno = parent->key.num;
> +
> +		tree_node = &field->token;
> +		const char *err_msg;
> +		if (field->token.key.type == JSON_TOKEN_STR) {
> +			err_msg = tt_sprintf("invalid field %d document "
> +					     "content: map doesn't contain a "
> +					     "key '%.*s' defined in index",
> +					     fieldno, tree_node->key.len,
> +					     tree_node->key.str);
> +		} else if (field->token.key.type == JSON_TOKEN_NUM) {
> +			err_msg = tt_sprintf("invalid field %d document "
> +					     "content: array size %d is less "
> +					     "than size %d defined in index",
> +					     fieldno, tree_node->key.num,
> +					     tree_node->parent->child_count);
> +		}
> +		diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg);
> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +struct parse_ctx {

The struct name is too generic.

> +	enum json_token_type child_type;
> +	uint32_t items;
> +	uint32_t curr;

Comments...

> +};
> +
>  /** @sa declaration for details. */
>  int
>  tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
> @@ -442,44 +647,123 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
>  			 (unsigned) format->min_field_count);
>  		return -1;
>  	}
> -
> -	/* first field is simply accessible, so we do not store offset to it */
> -	enum mp_type mp_type = mp_typeof(*pos);
> -	const struct tuple_field *field =
> -		tuple_format_field((struct tuple_format *)format, 0);
> -	if (validate &&
> -	    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> -				 TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
> -		return -1;
> -	mp_next(&pos);
> -	/* other fields...*/
> -	uint32_t i = 1;
>  	uint32_t defined_field_count = MIN(field_count, validate ?
>  					   tuple_format_field_count(format) :
>  					   format->index_field_count);
> -	if (field_count < format->index_field_count) {
> -		/*
> -		 * Nullify field map to be able to detect by 0,
> -		 * which key fields are absent in tuple_field().
> -		 */
> -		memset((char *)field_map - format->field_map_size, 0,
> -		       format->field_map_size);
> -	}
> -	for (; i < defined_field_count; ++i) {
> -		field = tuple_format_field((struct tuple_format *)format, i);
> -		mp_type = mp_typeof(*pos);
> -		if (validate &&
> -		    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
> -					 i + TUPLE_INDEX_BASE,
> -					 tuple_field_is_nullable(field)))
> -			return -1;
> -		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> -			field_map[field->offset_slot] =
> -				(uint32_t) (pos - tuple);
> +	/*
> +	 * Fill field_map with marker for toutine

Typo: toutine -> routine

> +	 * tuple_field_map_validate to detect absent fields.
> +	 */
> +	memset((char *)field_map - format->field_map_size,
> +		validate ? UINT32_MAX : 0, format->field_map_size);
> +
> +	struct region *region = &fiber()->gc;
> +	uint32_t mp_stack_items = format->subtree_depth + 1;
> +	uint32_t mp_stack_size = mp_stack_items * sizeof(struct parse_ctx);
> +	struct parse_ctx *mp_stack = region_alloc(region, mp_stack_size);
> +	if (unlikely(mp_stack == NULL)) {
> +		diag_set(OutOfMemory, mp_stack_size, "region_alloc",
> +			 "mp_stack");
> +		return -1;
> +	}
> +	mp_stack[0] = (struct parse_ctx){
> +		.child_type = JSON_TOKEN_NUM,
> +		.items = defined_field_count,
> +		.curr = 0,
> +	};
> +	uint32_t mp_stack_idx = 0;
> +	struct json_tree *tree = (struct json_tree *)&format->tree;
> +	struct json_token *parent = &tree->root;
> +	while (mp_stack[0].curr <= mp_stack[0].items) {
> +		/* Prepare key for tree lookup. */
> +		struct json_token token;
> +		token.key.type = mp_stack[mp_stack_idx].child_type;
> +		++mp_stack[mp_stack_idx].curr;
> +		if (token.key.type == JSON_TOKEN_NUM) {
> +			token.key.num = mp_stack[mp_stack_idx].curr;
> +		} else if (token.key.type == JSON_TOKEN_STR) {
> +			if (mp_typeof(*pos) != MP_STR) {
> +				/*
> +				 * We do not support non-string
> +				 * keys in maps.
> +				 */
> +				mp_next(&pos);
> +				mp_next(&pos);
> +				continue;
> +			}
> +			token.key.str =
> +				mp_decode_str(&pos, (uint32_t *)&token.key.len);
> +		} else {
> +			unreachable();
> +		}
> +		struct tuple_field *field =
> +			json_tree_lookup_entry(tree, parent, &token,
> +					       struct tuple_field, token);
> +		enum mp_type type = mp_typeof(*pos);
> +		if (field != NULL) {
> +			bool is_nullable = tuple_field_is_nullable(field);
> +			if (validate &&
> +			    key_mp_type_validate(field->type, type,
> +						 ER_FIELD_TYPE,
> +						 mp_stack[0].curr,
> +						 is_nullable) != 0)
> +				return -1;
> +			if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> +				field_map[field->offset_slot] =
> +					(uint32_t)(pos - tuple);
> +			}
> +		}
> +		/* Prepare stack info for next iteration. */
> +		if (field != NULL && type == MP_ARRAY &&
> +		    mp_stack_idx + 1 < format->subtree_depth) {
> +			uint32_t size = mp_decode_array(&pos);
> +			if (unlikely(size == 0))
> +				continue;
> +			parent = &field->token;
> +			mp_stack[++mp_stack_idx] = (struct parse_ctx){
> +				.child_type = JSON_TOKEN_NUM,
> +				.items = size,
> +				.curr = 0,
> +			};
> +		} else if (field != NULL && type == MP_MAP &&
> +			   mp_stack_idx + 1 < format->subtree_depth) {
> +			uint32_t size = mp_decode_map(&pos);
> +			if (unlikely(size == 0))
> +				continue;
> +			parent = &field->token;
> +			mp_stack[++mp_stack_idx] = (struct parse_ctx){
> +				.child_type = JSON_TOKEN_STR,
> +				.items = size,
> +				.curr = 0,
> +			};
> +		} else {
> +			mp_next(&pos);
> +			while (mp_stack[mp_stack_idx].curr >=
> +			       mp_stack[mp_stack_idx].items) {
> +				assert(parent != NULL);
> +				parent = parent->parent;
> +				if (mp_stack_idx-- == 0)
> +					goto end;
> +			}

This loop looks sloppy. We need to rewrite it somehow to make it more
compact and straightforward. I think it's possible - there's quite a bit
of code duplication. May be factor out something in a helper function.
And add some comments to make it easier for understanding.

>  		}
> -		mp_next(&pos);
> +	};
> +end:;
> +	/*
> +	 * Field map has already been initialized with zeros when
> +	 * no validation is required.
> +	 */
> +	if (!validate)
> +		return 0;
> +	struct tuple_field *field;
> +	struct json_token *root = (struct json_token *)&format->tree.root;
> +	json_tree_foreach_entry_preorder(field, root, struct tuple_field,
> +					 token) {
> +		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL &&
> +			tuple_field_is_nullable(field) &&
> +			field_map[field->offset_slot] == UINT32_MAX)
> +			field_map[field->offset_slot] = 0;

Bad indentation.

Anyway, can't we eliminate this extra walk over the JSON tree anyhow?
For example, store the initial field map in tuple_format with nullable
fields already set to zeros and others set to UINT32_MAX and use it for
initialization.

>  	}
> -	return 0;
> +	return tuple_field_map_validate(format, field_map);
>  }
>  
>  uint32_t
> @@ -731,3 +1007,40 @@ error:
>  		 tt_sprintf("error in path on position %d", rc));
>  	return -1;
>  }
> +
> +const char *
> +tuple_field_by_part_raw(struct tuple_format *format, const char *data,
> +			const uint32_t *field_map, struct key_part *part)
> +{
> +	if (likely(part->path == NULL))
> +		return tuple_field_raw(format, data, field_map, part->fieldno);

I think we need to make lookup by fieldno inline and do a call only if
part->path != NULL.

> +
> +	uint32_t field_count = tuple_format_field_count(format);
> +	struct tuple_field *root_field =
> +		likely(part->fieldno < field_count) ?
> +		tuple_format_field(format, part->fieldno) : NULL;
> +	struct tuple_field *field =
> +		unlikely(root_field == NULL) ? NULL:

Please stop using likely/unlikely. I struggle to read the code because
of them and I don't understand why you use them anyway.

> +		tuple_format_field_by_path(format, root_field, part->path,
> +					   part->path_len);

Can't you move top-level field lookup to tuple_format_field_by_path?
Both places in the code that use this function do the lookup by
themselves anyway.

> +	if (unlikely(field == NULL)) {
> +		/*
> +		 * Legacy tuple having no field map for JSON
> +		 * index require full path parse.

Legacy? What does it mean? Improve the comment please.

> +		 */
> +		const char *field_raw =
> +			tuple_field_raw(format, data, field_map, part->fieldno);
> +		if (unlikely(field_raw == NULL))
> +			return NULL;
> +		if (tuple_field_go_to_path(&field_raw, part->path,
> +					   part->path_len) != 0)

tuple_field_go_to_path can't return a non-zero code here.
Please add an assertion.

> +			return NULL;
> +		return field_raw;
> +	}
> +	int32_t offset_slot = field->offset_slot;
> +	assert(offset_slot < 0);
> +	assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size);
> +	if (unlikely(field_map[offset_slot] == 0))
> +		return NULL;
> +	return data + field_map[offset_slot];

Better move this optimistic path above, before the pessimistic path
handling "legacy" tuples, because one usually expects optimistic paths
to come first in a function.

> +}
> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index 2da773b..860f052 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -116,6 +116,8 @@ struct tuple_field {
>  	uint32_t coll_id;
>  	/** An JSON entry to organize tree. */
>  	struct json_token token;
> +	/** A maximum depth of field subtree. */
> +	uint32_t subtree_depth;
>  };
>  
>  /**
> @@ -169,12 +171,16 @@ struct tuple_format {
>  	 * index_field_count <= min_field_count <= field_count.
>  	 */
>  	uint32_t min_field_count;
> +	/** Size of format allocation. */
> +	uint32_t allocation_size;

I don't see it's used anywhere after format construction.
Why do you need it?

>  	/**
>  	 * Shared names storage used by all formats of a space.
>  	 */
>  	struct tuple_dictionary *dict;
>  	/** JSON tree of fields. */
>  	struct json_tree tree;
> +	/** A maximum depth of fields subtree. */
> +	uint32_t subtree_depth;

Please add a comment explaining why you need it.

>  };
>  
>  
> diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
> index 7b704b8..9d5e220 100644
> --- a/src/box/vy_point_lookup.c
> +++ b/src/box/vy_point_lookup.c
> @@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
>  		const struct vy_read_view **rv,
>  		struct tuple *key, struct tuple **ret)
>  {
> -	assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
> -

I don't like that you simply throw away this assertion. Please think how
we can keep it.

>  	*ret = NULL;
>  	double start_time = ev_monotonic_now(loop());
>  	int rc = 0;
> diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
> index 3e60fec..2f35284 100644
> --- a/src/box/vy_stmt.c
> +++ b/src/box/vy_stmt.c
> @@ -29,6 +29,7 @@
>   * SUCH DAMAGE.
>   */
>  
> +#include "assoc.h"
>  #include "vy_stmt.h"
>  
>  #include <stdlib.h>
> @@ -370,6 +371,85 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
>  	return replace;
>  }
>  
> +/**
> + * Construct tuple or calculate it's size. The fields_iov_ht
> + * is a hashtable that links leaf field records of field path
> + * tree and iovs that contain raw data. Function also fills the
> + * tuple field_map when write_data flag is set true.
> + */
> +static void
> +vy_stmt_tuple_restore_raw(struct tuple_format *format, char *tuple_raw,
> +			  uint32_t *field_map, char **offset,
> +			  struct mh_i64ptr_t *fields_iov_ht, bool write_data)
> +{
> +	struct tuple_field *prev = NULL;
> +	struct tuple_field *curr;
> +	json_tree_foreach_entry_preorder(curr, &format->tree.root,
> +					 struct tuple_field, token) {
> +		struct json_token *curr_node = &curr->token;
> +		struct tuple_field *parent =
> +			curr_node->parent == NULL ? NULL :
> +			json_tree_entry(curr_node->parent, struct tuple_field,
> +					token);
> +		if (parent != NULL && parent->type == FIELD_TYPE_ARRAY &&
> +		    curr_node->sibling_idx > 0) {
> +			/*
> +			 * Fill unindexed array items with nulls.
> +			 * Gaps size calculated as a difference
> +			 * between sibling nodes.
> +			 */
> +			for (uint32_t i = curr_node->sibling_idx - 1;
> +			     curr_node->parent->children[i] == NULL &&
> +			     i > 0; i--) {
> +				*offset = !write_data ?
> +					  (*offset += mp_sizeof_nil()) :
> +					  mp_encode_nil(*offset);
> +			}
> +		} else if (parent != NULL && parent->type == FIELD_TYPE_MAP) {
> +			/* Set map key. */
> +			const char *str = curr_node->key.str;
> +			uint32_t len = curr_node->key.len;
> +			*offset = !write_data ?
> +				  (*offset += mp_sizeof_str(len)) :
> +				  mp_encode_str(*offset, str, len);
> +		}
> +		/* Fill data. */
> +		uint32_t children_count = curr_node->child_count;
> +		if (curr->type == FIELD_TYPE_ARRAY) {
> +			*offset = !write_data ?
> +				  (*offset += mp_sizeof_array(children_count)) :
> +				  mp_encode_array(*offset, children_count);
> +		} else if (curr->type == FIELD_TYPE_MAP) {
> +			*offset = !write_data ?
> +				  (*offset += mp_sizeof_map(children_count)) :
> +				  mp_encode_map(*offset, children_count);
> +		} else {
> +			/* Leaf record. */
> +			mh_int_t k = mh_i64ptr_find(fields_iov_ht,
> +						    (uint64_t)curr, NULL);
> +			struct iovec *iov =
> +				k != mh_end(fields_iov_ht) ?
> +				mh_i64ptr_node(fields_iov_ht, k)->val : NULL;
> +			if (iov == NULL) {
> +				*offset = !write_data ?
> +					  (*offset += mp_sizeof_nil()) :
> +					  mp_encode_nil(*offset);
> +			} else {
> +				uint32_t data_offset = *offset - tuple_raw;
> +				int32_t slot = curr->offset_slot;
> +				if (write_data) {
> +					memcpy(*offset, iov->iov_base,
> +					       iov->iov_len);
> +					if (slot != TUPLE_OFFSET_SLOT_NIL)
> +						field_map[slot] = data_offset;
> +				}
> +				*offset += iov->iov_len;
> +			}
> +		}
> +		prev = curr;
> +	}
> +}
> +
>  static struct tuple *
>  vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
>  			       const struct key_def *cmp_def,
> @@ -378,51 +458,79 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
>  	/* UPSERT can't be surrogate. */
>  	assert(type != IPROTO_UPSERT);
>  	struct region *region = &fiber()->gc;
> +	struct tuple *stmt = NULL;
>  
>  	uint32_t field_count = format->index_field_count;
> -	struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
> +	uint32_t part_count = mp_decode_array(&key);
> +	assert(part_count == cmp_def->part_count);
> +	struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count);
>  	if (iov == NULL) {
> -		diag_set(OutOfMemory, sizeof(*iov) * field_count,
> -			 "region", "iov for surrogate key");
> +		diag_set(OutOfMemory, sizeof(*iov) * part_count, "region",
> +			"iov for surrogate key");
>  		return NULL;
>  	}
> -	memset(iov, 0, sizeof(*iov) * field_count);
> -	uint32_t part_count = mp_decode_array(&key);
> -	assert(part_count == cmp_def->part_count);
> -	assert(part_count <= field_count);
> -	uint32_t nulls_count = field_count - cmp_def->part_count;
> -	uint32_t bsize = mp_sizeof_array(field_count) +
> -			 mp_sizeof_nil() * nulls_count;
> -	for (uint32_t i = 0; i < part_count; ++i) {
> -		const struct key_part *part = &cmp_def->parts[i];
> +	/* Hastable linking leaf field and corresponding iov. */
> +	struct mh_i64ptr_t *fields_iov_ht = mh_i64ptr_new();

Allocating a new hash map per each statement allocation looks very bad.
I haven't looked into vy_stmt_tuple_restore_raw yet - I will a bit
later and try to figure out if we can actually live without it - but
anyway I don't think we could afford it. Please think of a better way
to implement this.

> +	if (fields_iov_ht == NULL) {
> +		diag_set(OutOfMemory, sizeof(struct mh_i64ptr_t),
> +			 "mh_i64ptr_new", "fields_iov_ht");
> +		return NULL;
> +	}
> +	if (mh_i64ptr_reserve(fields_iov_ht, part_count, NULL) != 0) {
> +		diag_set(OutOfMemory, part_count, "mh_i64ptr_reserve",
> +			 "fields_iov_ht");
> +		goto end;
> +	}
> +	memset(iov, 0, sizeof(*iov) * part_count);
> +	const struct key_part *part = cmp_def->parts;
> +	for (uint32_t i = 0; i < part_count; ++i, ++part) {
>  		assert(part->fieldno < field_count);
>  		const char *svp = key;
> -		iov[part->fieldno].iov_base = (char *) key;
> +		iov[i].iov_base = (char *) key;
>  		mp_next(&key);
> -		iov[part->fieldno].iov_len = key - svp;
> -		bsize += key - svp;
> +		iov[i].iov_len = key - svp;
> +		struct tuple_field *field;
> +		field = tuple_format_field(format, part->fieldno);
> +		assert(field != NULL);
> +		if (unlikely(part->path != NULL)) {
> +			field = tuple_format_field_by_path(format, field,
> +							   part->path,
> +							   part->path_len);
> +		}
> +		assert(field != NULL);
> +		struct mh_i64ptr_node_t node = {(uint64_t)field, &iov[i]};
> +		mh_int_t k = mh_i64ptr_put(fields_iov_ht, &node, NULL, NULL);
> +		if (unlikely(k == mh_end(fields_iov_ht))) {
> +			diag_set(OutOfMemory, part_count, "mh_i64ptr_put",
> +				 "fields_iov_ht");
> +			goto end;
> +		}
> +		k = mh_i64ptr_find(fields_iov_ht, (uint64_t)field, NULL);
> +		assert(k != mh_end(fields_iov_ht));
>  	}
> +	/* Calculate tuple size to make allocation. */
> +	char *data = NULL;
> +	vy_stmt_tuple_restore_raw(format, NULL, NULL, &data, fields_iov_ht,
> +				  false);
> +	uint32_t bsize = mp_sizeof_array(field_count) + data - (char *)NULL;
>  
> -	struct tuple *stmt = vy_stmt_alloc(format, bsize);
> +	stmt = vy_stmt_alloc(format, bsize);
>  	if (stmt == NULL)
> -		return NULL;
> +		goto end;
>  
> +	/* Construct tuple. */
>  	char *raw = (char *) tuple_data(stmt);
>  	uint32_t *field_map = (uint32_t *) raw;
> +	memset((char *)field_map - format->field_map_size, 0,
> +	       format->field_map_size);
>  	char *wpos = mp_encode_array(raw, field_count);
> -	for (uint32_t i = 0; i < field_count; ++i) {
> -		const struct tuple_field *field = tuple_format_field(format, i);
> -		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
> -			field_map[field->offset_slot] = wpos - raw;
> -		if (iov[i].iov_base == NULL) {
> -			wpos = mp_encode_nil(wpos);
> -		} else {
> -			memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
> -			wpos += iov[i].iov_len;
> -		}
> -	}
> -	assert(wpos == raw + bsize);
> +	vy_stmt_tuple_restore_raw(format, raw, field_map, &wpos, fields_iov_ht,
> +				  true);
> +
> +	assert(wpos <= raw + bsize);
>  	vy_stmt_set_type(stmt, type);
> +end:
> +	mh_i64ptr_delete(fields_iov_ht);
>  	return stmt;
>  }
>  
> diff --git a/test/engine/tuple.result b/test/engine/tuple.result
> index 35c700e..322821e 100644
> --- a/test/engine/tuple.result
> +++ b/test/engine/tuple.result
> @@ -954,6 +954,422 @@ type(tuple:tomap().fourth)
>  s:drop()
>  ---
>  ...
> +--
> +-- gh-1012: Indexes for JSON-defined paths.
> +--

Add a new file for this test please.

> +box.cfg()
> +---
> +...
> +s = box.schema.space.create('withdata', {engine = engine})
> +---
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
> +---
> +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
> +    part is indexed twice'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
> +---
> +- error: 'Wrong index options (field 2): ''path'' must be string'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
> +---
> +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
> +    ''map'' is not supported'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
> +---
> +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
> +    ''array'' is not supported'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = '["FIO"].fname'}}})
> +---
> +- error: Field [2]["FIO"].fname has type 'string' in one index, but type 'map' in
> +    another
> +...

If I change the path from '["FIO"].fname' to 'FIO.fname' in the index
definition, I will get the following error:

 error: Field [2]FIO.fname has type 'string' in one index, but type 'map' in another

Note, [2]FIO.fname isn't a valid JSON path. Please fix.

> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
> +---
> +- error: Field [2]["FIO"].fname has type 'array' in one index, but type 'map' in another
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
> +---
> +- error: 'Wrong index options (field 3): invalid JSON path ''FIO....fname'': path
> +    has invalid structure (error at position 5)'
> +...
> +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
> +---
> +...
> +assert(idx ~= nil)
> +---
> +- true
> +...
> +assert(idx.parts[2].path == "FIO.fname")
> +---
> +- true
> +...
> +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
> +---
> +- error: 'Tuple field 3 type does not match one required by operation: expected map'
> +...

But field 3 does have type 'map'. The error message should clarify that
it's not field 3 that has invalid type, that it's path FIO in field 3.

> +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
> +---
> +- error: 'Tuple field 3 type does not match one required by operation: expected string'
> +...
> +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
> +---
> +- error: 'Tuple doesn''t math document structure: invalid field 3 document content:
> +    map doesn''t contain a key ''sname'' defined in index'
> +...

Again, it's not quite clear what's missing, [3].sname or [3].FIO.sname.
Please improve this kind of error message as well.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes
  2018-11-30 21:28   ` Vladimir Davydov
@ 2018-12-01 16:49     ` Vladimir Davydov
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-01 16:49 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Sat, Dec 01, 2018 at 12:28:32AM +0300, Vladimir Davydov wrote:
> > diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
> > index 3e60fec..2f35284 100644
> > --- a/src/box/vy_stmt.c
> > +++ b/src/box/vy_stmt.c
> > @@ -29,6 +29,7 @@
> >   * SUCH DAMAGE.
> >   */
> >  
> > +#include "assoc.h"
> >  #include "vy_stmt.h"
> >  
> >  #include <stdlib.h>
> > @@ -370,6 +371,85 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert)
> >  	return replace;
> >  }
> >  
> > +/**
> > + * Construct tuple or calculate it's size. The fields_iov_ht
> > + * is a hashtable that links leaf field records of field path
> > + * tree and iovs that contain raw data. Function also fills the
> > + * tuple field_map when write_data flag is set true.
> > + */
> > +static void
> > +vy_stmt_tuple_restore_raw(struct tuple_format *format, char *tuple_raw,
> > +			  uint32_t *field_map, char **offset,
> > +			  struct mh_i64ptr_t *fields_iov_ht, bool write_data)
> > +{
> > +	struct tuple_field *prev = NULL;
> > +	struct tuple_field *curr;
> > +	json_tree_foreach_entry_preorder(curr, &format->tree.root,
> > +					 struct tuple_field, token) {
> > +		struct json_token *curr_node = &curr->token;
> > +		struct tuple_field *parent =
> > +			curr_node->parent == NULL ? NULL :
> > +			json_tree_entry(curr_node->parent, struct tuple_field,
> > +					token);
> > +		if (parent != NULL && parent->type == FIELD_TYPE_ARRAY &&
> > +		    curr_node->sibling_idx > 0) {
> > +			/*
> > +			 * Fill unindexed array items with nulls.
> > +			 * Gaps size calculated as a difference
> > +			 * between sibling nodes.
> > +			 */
> > +			for (uint32_t i = curr_node->sibling_idx - 1;
> > +			     curr_node->parent->children[i] == NULL &&
> > +			     i > 0; i--) {
> > +				*offset = !write_data ?
> > +					  (*offset += mp_sizeof_nil()) :
> > +					  mp_encode_nil(*offset);
> > +			}
> > +		} else if (parent != NULL && parent->type == FIELD_TYPE_MAP) {
> > +			/* Set map key. */
> > +			const char *str = curr_node->key.str;
> > +			uint32_t len = curr_node->key.len;
> > +			*offset = !write_data ?
> > +				  (*offset += mp_sizeof_str(len)) :
> > +				  mp_encode_str(*offset, str, len);
> > +		}
> > +		/* Fill data. */
> > +		uint32_t children_count = curr_node->child_count;
> > +		if (curr->type == FIELD_TYPE_ARRAY) {
> > +			*offset = !write_data ?
> > +				  (*offset += mp_sizeof_array(children_count)) :
> > +				  mp_encode_array(*offset, children_count);
> > +		} else if (curr->type == FIELD_TYPE_MAP) {
> > +			*offset = !write_data ?
> > +				  (*offset += mp_sizeof_map(children_count)) :
> > +				  mp_encode_map(*offset, children_count);
> > +		} else {
> > +			/* Leaf record. */
> > +			mh_int_t k = mh_i64ptr_find(fields_iov_ht,
> > +						    (uint64_t)curr, NULL);
> > +			struct iovec *iov =
> > +				k != mh_end(fields_iov_ht) ?
> > +				mh_i64ptr_node(fields_iov_ht, k)->val : NULL;
> > +			if (iov == NULL) {
> > +				*offset = !write_data ?
> > +					  (*offset += mp_sizeof_nil()) :
> > +					  mp_encode_nil(*offset);
> > +			} else {
> > +				uint32_t data_offset = *offset - tuple_raw;
> > +				int32_t slot = curr->offset_slot;
> > +				if (write_data) {
> > +					memcpy(*offset, iov->iov_base,
> > +					       iov->iov_len);
> > +					if (slot != TUPLE_OFFSET_SLOT_NIL)
> > +						field_map[slot] = data_offset;
> > +				}
> > +				*offset += iov->iov_len;
> > +			}
> > +		}
> > +		prev = curr;
> > +	}
> > +}
> > +
> >  static struct tuple *
> >  vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
> >  			       const struct key_def *cmp_def,
> > @@ -378,51 +458,79 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
> >  	/* UPSERT can't be surrogate. */
> >  	assert(type != IPROTO_UPSERT);
> >  	struct region *region = &fiber()->gc;
> > +	struct tuple *stmt = NULL;
> >  
> >  	uint32_t field_count = format->index_field_count;
> > -	struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
> > +	uint32_t part_count = mp_decode_array(&key);
> > +	assert(part_count == cmp_def->part_count);
> > +	struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count);
> >  	if (iov == NULL) {
> > -		diag_set(OutOfMemory, sizeof(*iov) * field_count,
> > -			 "region", "iov for surrogate key");
> > +		diag_set(OutOfMemory, sizeof(*iov) * part_count, "region",
> > +			"iov for surrogate key");
> >  		return NULL;
> >  	}
> > -	memset(iov, 0, sizeof(*iov) * field_count);
> > -	uint32_t part_count = mp_decode_array(&key);
> > -	assert(part_count == cmp_def->part_count);
> > -	assert(part_count <= field_count);
> > -	uint32_t nulls_count = field_count - cmp_def->part_count;
> > -	uint32_t bsize = mp_sizeof_array(field_count) +
> > -			 mp_sizeof_nil() * nulls_count;
> > -	for (uint32_t i = 0; i < part_count; ++i) {
> > -		const struct key_part *part = &cmp_def->parts[i];
> > +	/* Hastable linking leaf field and corresponding iov. */
> > +	struct mh_i64ptr_t *fields_iov_ht = mh_i64ptr_new();
> 
> Allocating a new hash map per each statement allocation looks very bad.
> I haven't looked into vy_stmt_tuple_restore_raw yet - I will a bit
> later and try to figure out if we can actually live without it - but
> anyway I don't think we could afford it. Please think of a better way
> to implement this.

Yes, I don't think we need a hash table to build a surrogate tuple from
a key. We can simply assign an ordinal number to each leaf tuple field
on format construction and use them as indexes in iov array.

Also, I don't like that you traverse the tree twice - first to calculate
the surrogate tuple size and then to build the tuple. I think we can
avoid the first traversal. All we need to do is traverse the tree once
on format construction and calculate the size of a surrogate tuple with
all leaf fields set to null. We could then use this information to
quickly estimate the size of a surrogate tuple with the given number of
non-null fields.

> 
> > +	if (fields_iov_ht == NULL) {
> > +		diag_set(OutOfMemory, sizeof(struct mh_i64ptr_t),
> > +			 "mh_i64ptr_new", "fields_iov_ht");
> > +		return NULL;
> > +	}
> > +	if (mh_i64ptr_reserve(fields_iov_ht, part_count, NULL) != 0) {
> > +		diag_set(OutOfMemory, part_count, "mh_i64ptr_reserve",
> > +			 "fields_iov_ht");
> > +		goto end;
> > +	}
> > +	memset(iov, 0, sizeof(*iov) * part_count);
> > +	const struct key_part *part = cmp_def->parts;
> > +	for (uint32_t i = 0; i < part_count; ++i, ++part) {
> >  		assert(part->fieldno < field_count);
> >  		const char *svp = key;
> > -		iov[part->fieldno].iov_base = (char *) key;
> > +		iov[i].iov_base = (char *) key;
> >  		mp_next(&key);
> > -		iov[part->fieldno].iov_len = key - svp;
> > -		bsize += key - svp;
> > +		iov[i].iov_len = key - svp;
> > +		struct tuple_field *field;
> > +		field = tuple_format_field(format, part->fieldno);
> > +		assert(field != NULL);
> > +		if (unlikely(part->path != NULL)) {
> > +			field = tuple_format_field_by_path(format, field,
> > +							   part->path,
> > +							   part->path_len);
> > +		}
> > +		assert(field != NULL);
> > +		struct mh_i64ptr_node_t node = {(uint64_t)field, &iov[i]};
> > +		mh_int_t k = mh_i64ptr_put(fields_iov_ht, &node, NULL, NULL);
> > +		if (unlikely(k == mh_end(fields_iov_ht))) {
> > +			diag_set(OutOfMemory, part_count, "mh_i64ptr_put",
> > +				 "fields_iov_ht");
> > +			goto end;
> > +		}
> > +		k = mh_i64ptr_find(fields_iov_ht, (uint64_t)field, NULL);
> > +		assert(k != mh_end(fields_iov_ht));
> >  	}
> > +	/* Calculate tuple size to make allocation. */
> > +	char *data = NULL;
> > +	vy_stmt_tuple_restore_raw(format, NULL, NULL, &data, fields_iov_ht,
> > +				  false);
> > +	uint32_t bsize = mp_sizeof_array(field_count) + data - (char *)NULL;
> >  
> > -	struct tuple *stmt = vy_stmt_alloc(format, bsize);
> > +	stmt = vy_stmt_alloc(format, bsize);
> >  	if (stmt == NULL)
> > -		return NULL;
> > +		goto end;
> >  
> > +	/* Construct tuple. */
> >  	char *raw = (char *) tuple_data(stmt);
> >  	uint32_t *field_map = (uint32_t *) raw;
> > +	memset((char *)field_map - format->field_map_size, 0,
> > +	       format->field_map_size);
> >  	char *wpos = mp_encode_array(raw, field_count);
> > -	for (uint32_t i = 0; i < field_count; ++i) {
> > -		const struct tuple_field *field = tuple_format_field(format, i);
> > -		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
> > -			field_map[field->offset_slot] = wpos - raw;
> > -		if (iov[i].iov_base == NULL) {
> > -			wpos = mp_encode_nil(wpos);
> > -		} else {
> > -			memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
> > -			wpos += iov[i].iov_len;
> > -		}
> > -	}
> > -	assert(wpos == raw + bsize);
> > +	vy_stmt_tuple_restore_raw(format, raw, field_map, &wpos, fields_iov_ht,
> > +				  true);
> > +
> > +	assert(wpos <= raw + bsize);
> >  	vy_stmt_set_type(stmt, type);
> > +end:
> > +	mh_i64ptr_delete(fields_iov_ht);
> >  	return stmt;
> >  }
> >  

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data
  2018-11-26 10:49 ` [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
@ 2018-12-01 17:20   ` Vladimir Davydov
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-01 17:20 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Mon, Nov 26, 2018 at 01:49:41PM +0300, Kirill Shcherbatov wrote:
> We don't need to parse tuple in tuple_field_raw_by_path if
> required field has been indexed. We do path lookup in field
> tree of JSON paths and return data by it's offset from field_map
> instead of whole tuple parsing.
> 
> Part of #1012
> ---
>  src/box/tuple_format.c     | 34 ++++++++++++++++++++++++----------
>  test/engine/tuple.result   |  5 +++++
>  test/engine/tuple.test.lua |  2 ++
>  3 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index 193d0d8..be89764 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -956,15 +956,12 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
>  		goto error;
>  	switch(token.key.type) {
>  	case JSON_TOKEN_NUM: {
> -		int index = token.key.num;
> -		if (index == 0) {
> +		fieldno = token.key.num;
> +		if (fieldno == 0) {
>  			*field = NULL;
>  			return 0;
>  		}
> -		index -= TUPLE_INDEX_BASE;
> -		*field = tuple_field_raw(format, tuple, field_map, index);
> -		if (*field == NULL)
> -			return 0;
> +		fieldno -= TUPLE_INDEX_BASE;
>  		break;
>  	}
>  	case JSON_TOKEN_STR: {
> @@ -982,10 +979,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
>  			name_hash = field_name_hash(token.key.str,
>  						    token.key.len);
>  		}
> -		*field = tuple_field_raw_by_name(format, tuple, field_map,
> -						 token.key.str, token.key.len,
> -						 name_hash);
> -		if (*field == NULL)

After this patch tuple_field_raw_by_name and tuple_field_by_name are not
used. Please remove them in this patch.

> +		if (tuple_fieldno_by_name(format->dict, token.key.str,
> +					  token.key.len, name_hash,
> +					  &fieldno) != 0)
>  			return 0;
>  		break;
>  	}
> @@ -994,6 +990,24 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
>  		*field = NULL;
>  		return 0;
>  	}
> +	/* Optimize indexed JSON field data access. */
> +	assert(field != NULL);
> +	struct tuple_field *indexed_field =
> +		unlikely(fieldno >= tuple_format_field_count(format)) ? NULL :
> +		tuple_format_field_by_path(format,
> +					   tuple_format_field(format, fieldno),
> +					   path + lexer.offset,
> +					   path_len - lexer.offset);
> +	if (indexed_field != NULL &&
> +	    indexed_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> +		*field = tuple + field_map[indexed_field->offset_slot];
> +		return 0;
> +	}
> +
> +	/* No such field in index. Continue parsing JSON path. */
> +	*field = tuple_field_raw(format, tuple, field_map, fieldno);
> +	if (*field == NULL)
> +		return 0;

This code essentially does the same thing as tuple_field_by_part_raw.
Please reuse.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp routine
  2018-11-30 10:46   ` Vladimir Davydov
@ 2018-12-03 17:37     ` Konstantin Osipov
  2018-12-03 18:48       ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Osipov @ 2018-12-03 17:37 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Kirill Shcherbatov

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/11/30 17:01]:
> > + * Compare two JSON paths using Lexer class.
> > + * - @a path must be valid
> > + * - at the case of paths that have same token-sequence prefix,
> > + *   the path having more tokens is assumed to be greater
> > + * - when @b path contains an error, the path "a" is assumed to
> > + *   be greater
> > + */
> > +int
> > +json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len);
> > +
> 
> One typically expects cmp(a, b) to be equivalent to -cmp(b, a).
> Can't we make json_path_cmp satisfy this property, for example, by
> requiring both strings to be valid json paths with an assertion?

Why bother?


-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp routine
  2018-12-03 17:37     ` [tarantool-patches] " Konstantin Osipov
@ 2018-12-03 18:48       ` Vladimir Davydov
  2018-12-03 20:14         ` Konstantin Osipov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-03 18:48 UTC (permalink / raw)
  To: Konstantin Osipov; +Cc: tarantool-patches, Kirill Shcherbatov

On Mon, Dec 03, 2018 at 08:37:41PM +0300, Konstantin Osipov wrote:
> * Vladimir Davydov <vdavydov.dev@gmail.com> [18/11/30 17:01]:
> > > + * Compare two JSON paths using Lexer class.
> > > + * - @a path must be valid
> > > + * - at the case of paths that have same token-sequence prefix,
> > > + *   the path having more tokens is assumed to be greater
> > > + * - when @b path contains an error, the path "a" is assumed to
> > > + *   be greater
> > > + */
> > > +int
> > > +json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len);
> > > +
> > 
> > One typically expects cmp(a, b) to be equivalent to -cmp(b, a).
> > Can't we make json_path_cmp satisfy this property, for example, by
> > requiring both strings to be valid json paths with an assertion?
> 
> Why bother?

To simplify the function protocol. Currently, it's kinda lopsided: path
'a' must be valid, which is checked by an assertion, while path 'b' may
not be valid, in which case special comparison rules are applied. I find
such an asymmetry rather unnatural for a comparison function. I'd prefer
if we either allowed both paths to be invalid or required them both to
be valid.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp routine
  2018-12-03 18:48       ` Vladimir Davydov
@ 2018-12-03 20:14         ` Konstantin Osipov
  2018-12-06  7:56           ` [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov
  0 siblings, 1 reply; 41+ messages in thread
From: Konstantin Osipov @ 2018-12-03 20:14 UTC (permalink / raw)
  To: tarantool-patches; +Cc: Kirill Shcherbatov

* Vladimir Davydov <vdavydov.dev@gmail.com> [18/12/03 21:51]:
> On Mon, Dec 03, 2018 at 08:37:41PM +0300, Konstantin Osipov wrote:
> > * Vladimir Davydov <vdavydov.dev@gmail.com> [18/11/30 17:01]:
> > > > + * Compare two JSON paths using Lexer class.
> > > > + * - @a path must be valid
> > > > + * - at the case of paths that have same token-sequence prefix,
> > > > + *   the path having more tokens is assumed to be greater
> > > > + * - when @b path contains an error, the path "a" is assumed to
> > > > + *   be greater
> > > > + */
> > > > +int
> > > > +json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len);
> > > > +
> > > 
> > > One typically expects cmp(a, b) to be equivalent to -cmp(b, a).
> > > Can't we make json_path_cmp satisfy this property, for example, by
> > > requiring both strings to be valid json paths with an assertion?
> > 
> > Why bother?
> 
> To simplify the function protocol. Currently, it's kinda lopsided: path
> 'a' must be valid, which is checked by an assertion, while path 'b' may
> not be valid, in which case special comparison rules are applied. I find
> such an asymmetry rather unnatural for a comparison function. I'd prefer
> if we either allowed both paths to be invalid or required them both to
> be valid.

I agree but perhaps we could change the function logic instead, there
would be an extra comparison in the bad cases when our assumption
about which path is valid turned to be wrong. I haven't checked if
it is really possible though.

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 8/9] box: introduce offset slot cache in key_part
  2018-11-26 10:49 ` [PATCH v5 8/9] box: introduce offset slot cache in key_part Kirill Shcherbatov
@ 2018-12-03 21:04   ` Vladimir Davydov
  2018-12-04 15:51     ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-03 21:04 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Mon, Nov 26, 2018 at 01:49:42PM +0300, Kirill Shcherbatov wrote:
> Tuned tuple_field_by_part_raw routine with key_part's
> offset_slot_cache. Introduced tuple_format epoch to test validity
> of this cache. The key_part caches last offset_slot source
> format to make epoch comparison because same space may have
> multiple format of same epoch that have different key_parts and
> related offset_slots distribution.
> 
> Part of #1012

If I didn't know what this patch is about, I'd be puzzled after reading
this comment. I'd put it like that:

  tuple_field_by_part looks up the tuple_field corresponding to the
  given key part in tuple_format in order to quickly retrieve the offset
  of indexed data from the tuple field map. For regular indexes this
  operation is blazing fast, however of JSON indexes it is not as we
  have to parse the path to data and then do multiple lookups in a JSON
  tree. Since tuple_field_by_part is used by comparators, we should
  strive to make this routine as fast as possible for all kinds of
  indexes.

  This patch introduces an optimization that is supposed to make
  tuple_field_by_part for JSON indexes as fast as it is for regular
  indexes in most cases. We do that by caching the offset slot right in
  key_part. There's a catch here however - we create a new format
  whenever an index is dropped or created and we don't reindex old
  tuples. As a result, there may be several generations of tuples in the
  same space, all using different formats while there's the only key_def
  used for comparison.

  To overcome this problem, we introduce the notion of tuple_format
  epoch. This is a counter incremented each time a new format is
  created. We store it in tuple_format and key_def, and we only use
  the offset slot cached in a key_def if it's epoch coincides with the
  epoch of the tuple format. If they don't, we look up a tuple_field as
  before, and then update the cached value provided the epoch of the
  tuple format is newer than the key_def's.

> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 029da02..6291159 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -856,7 +856,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
>  	 * Create a new (empty) space for the new definition.
>  	 * Sic: the triggers are not moved over yet.
>  	 */
> -	alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
> +	alter->new_space =
> +		space_new_xc(alter->space_def, &alter->key_list,
> +			     alter->old_space->format != NULL ?
> +			     alter->old_space->format->epoch + 1 : 1);

Passing the epoch around looks ugly. Let's introduce a global counter
and bump it right in tuple_format_new(). If you worry about disk_format
and mem_format having different epochs in vinyl, it's not really a
problem: make epoch an optional argument to tuple_format_new(); if the
caller passes an epoch, use it, otherwise generate a new one; in vinyl
reuse mem_format->epoch for disk_format.

> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index 7731e48..3e08eb4 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -95,6 +95,14 @@ struct key_part {
>  	char *path;
>  	/** The length of JSON path. */
>  	uint32_t path_len;
> +	/**
> +	 * Source format for offset_slot_cache hit validations.
> +	 * Cache is expected to use "the format with the newest
> +	 * epoch is most relevant" strategy.
> +	 */
> +	struct tuple_format *format_cache;

Why did you decide to store tuple_format in key_part instead of epoch,
as you did originally? I liked that more.

> +	/** Cache with format's field offset slot. */

	/**
	 * Cached value of the offset slot corresponding to
	 * the indexed field (tuple_field::offset_slot).
	 * Valid only if key_def::epoch equals the epoch of
	 * the tuple format. Updated in tuple_field_by_part
	 * to always store the offset corresponding to the
	 * most recent tuple format (the one with the greatest
	 * epoch value).
	 */

> +	int32_t offset_slot_cache;

> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index 860f052..8a7ebfa 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -137,6 +137,8 @@ tuple_field_is_nullable(const struct tuple_field *tuple_field)
>   * Tuple format describes how tuple is stored and information about its fields
>   */
>  struct tuple_format {
> +	/** Counter that grows incrementally on space rebuild. */

... used for caching offset slot in key_part, for more details
see key_part::offset_slot_cache

> +	uint64_t epoch;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 9/9] box: specify indexes in user-friendly form
  2018-11-26 10:49 ` [PATCH v5 9/9] box: specify indexes in user-friendly form Kirill Shcherbatov
@ 2018-12-04 12:22   ` Vladimir Davydov
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-04 12:22 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Mon, Nov 26, 2018 at 01:49:43PM +0300, Kirill Shcherbatov wrote:
> It is now possible to create indexes by JSON path and using
> field names specified in the space format.
> 
> Closes #1012
> 
> @TarantoolBot document
> Title: Indexes by JSON path
> Sometimes field data could have complex document structure.
> When this structure is consistent across whole document,

whole document => whole space

> you are able to create an index by JSON path.
> 
> Example:
> s:create_index('json_index', {parts = {{'FIO["fname"]', 'str'}}})

Please describe an alternative way of creating an index (via path).


> @@ -626,16 +626,16 @@ local function update_index_parts(format, parts)
>              box.error(box.error.ILLEGAL_PARAMS,
>                        "options.parts[" .. i .. "]: field (name or number) is expected")
>          elseif type(part.field) == 'string' then
> -            for k,v in pairs(format) do
> -                if v.name == part.field then
> -                    part.field = k
> -                    break
> -                end
> -            end
> -            if type(part.field) == 'string' then
> +            local idx, path = box.internal.path_resolve(i, space_id, part.field)
> +            if part.path ~= nil and part.path ~= path then

I'd prefer if we didn't involve a space in this function: currently
update_index_parts is a neat independent function that takes a format
and does some index part transformations. After your patch it requires
both space_id and format. This looks like encapsulation violation to me.
It would be great if we could rewrite it without it.

After all, all we need to do is extract the first component of a json
path and I think this should be fairly easy to implement in Lua.

>                  box.error(box.error.ILLEGAL_PARAMS,
> -                          "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
> +                          "options.parts[" .. i .. "]: field path '"..
> +                          part.path.." doesn't math the path '" ..

math => match

Anyway, I think that you should raise an error unconditionally if the
user tries to pass 'path' without a field number. Also, this error path
isn't covered by the test. Please fix.

> +                          part.field .. "'")
>              end
> +            parts_can_be_simplified = parts_can_be_simplified and path == nil
> +            part.field = idx
> +            part.path = path or part.path
>          elseif part.field == 0 then
>              box.error(box.error.ILLEGAL_PARAMS,
>                        "options.parts[" .. i .. "]: field (number) must be one-based")
> @@ -792,7 +792,7 @@ box.schema.index.create = function(space_id, name, options)
>          end
>      end
>      local parts, parts_can_be_simplified =
> -        update_index_parts(format, options.parts)
> +        update_index_parts(format, options.parts, space_id)
>      -- create_index() options contains type, parts, etc,
>      -- stored separately. Remove these members from index_opts
>      local index_opts = {
> @@ -959,7 +959,7 @@ box.schema.index.alter = function(space_id, index_id, options)
>      if options.parts then
>          local parts_can_be_simplified
>          parts, parts_can_be_simplified =
> -            update_index_parts(format, options.parts)
> +            update_index_parts(format, options.parts, space_id)
>          -- save parts in old format if possible
>          if parts_can_be_simplified then
>              parts = simplify_index_parts(parts)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-11-29 17:38     ` Vladimir Davydov
  2018-11-29 17:50       ` Vladimir Davydov
@ 2018-12-04 15:22       ` Vladimir Davydov
  2018-12-04 15:47       ` [tarantool-patches] " Kirill Shcherbatov
  2 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-04 15:22 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Thu, Nov 29, 2018 at 08:38:16PM +0300, Vladimir Davydov wrote:
> > +/** JSON tree object to manage tokens relations. */
> > +struct json_tree {
> > +	/** JSON tree root node. */
> > +	struct json_token root;
> > +	/** Hashtable af all tree nodes. */
> > +	struct mh_json_t *hash;
> 
> How does it work? In a year nobody will remember, and to understand how
> this thing operates we will have to dive deep into the code...


/**
 * Element of a JSON path. It can be either string or number.
 * String idenfiers are in ["..."] and between dots. Numbers are
 * indexes in [...].
 *
 * May be organized in a tree-like structure reflecting a JSON
 * document structure, for more details see the comment to struct
 * json_tree.
 */
struct json_token {
	...
};

/**
 * This structure is used for organizing JSON tokens produced
 * by a lexer in a tree-like structure reflecting a JSON document
 * structure.
 *
 * Each intermediate node of the tree corresponds to either
 * a JSON map or an array, depending on the key type used by
 * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively).
 * Leaf nodes may represent both complex JSON structures and
 * final values - it is not mandated by the JSON tree design.
 * The root of the tree doesn't have a key and is preallocated
 * when the tree is created.
 *
 * The json_token structure is intrusive by design, i.e. to store
 * arbitrary information in a JSON tree, one has to incorporate it
 * into a user defined structure.
 *
 * Example:
 *
 *   struct data {
 *           ...
 *           struct json_token token;
 *   };
 *
 *   struct json_tree tree;
 *   json_tree_create(&tree);
 *   struct json_token *parent = &tree->root;
 *
 *   // Add a path to the tree.
 *   struct data *data = data_new();
 *   struct json_lexer lexer;
 *   json_lexer_create(&lexer, path, path_len);
 *   json_lexer_next_token(&lexer, &data->token);
 *   while (data->token.type != JSON_TOKEN_END) {
 *           json_tree_add(&tree, parent, &data->token);
 *           parent = &data->token;
 *           data = data_new();
 *           json_lexer_next_token(&lexer, &data->token);
 *   }
 *   data_delete(data);
 *
 *   // Look up a path in the tree.
 *   data = json_tree_lookup_path(&tree, &tree.root, path, path_len);
 */
struct json_tree {
	/**
	 * Preallocated token corresponding to the JSON tree root.
	 * It doesn't have a key (set to JSON_TOKEN_END).
	 */
	struct json_token root;
	/**
	 * Hash table that is used to quickly look up a token
	 * corresponding to a JSON map item given a key and
	 * a parent token. We store all tokens that have type
	 * JSON_TOKEN_STR in this hash table. Apparently, we
	 * don't need to store JSON_TOKEN_NUM tokens as we can
	 * quickly look them up in the children array anyway.
	 *
	 * The hash table uses pair <parent, key> as key, so
	 * even tokens that happen to have the same key will
	 * have different keys in the hash. To look up a tree
	 * node corresponding to a particular path, we split
	 * the path into tokens and look up the first token
	 * in the root node and each following token in the
	 * node returned at the previous step.
	 *
	 * We compute a hash value for a token by hashing its
	 * key using the hash value of its parent as seed. This
	 * is equivalent to computing hash for the path leading
	 * to the token. However, we don't need to recompute
	 * hash starting from the root at each step as we
	 * descend the tree looking for a specific path, and we
	 * can start descent from any node, not only from the root.
	 *
	 * As a consequence of this hashing technique, even
	 * though we don't need to store JSON_TOKEN_NUM tokens
	 * in the hash table, we still have to compute hash
	 * values for them.
	 */
	struct mh_json_t *hash;
};

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-11-29 19:07   ` Vladimir Davydov
@ 2018-12-04 15:47     ` Kirill Shcherbatov
  2018-12-04 16:09       ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-04 15:47 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov

> We usually call methods that allocate/free an object _new/delete, not
> _create/destroy. Please fix.
Ok, done
> I've looked at your following patches - this is the only place where you
> pass a non-NULL token to tuple_field_create. Let's not pass token to
> this function at all, and initialize field->token directly here.
Ok, done
> diag_set missing
Ok, done
> And you have exactly the same piece of code in tuple_format_destroy.
> Factor it out to a helper function?
Ok,
static inline void
tuple_format_field_tree_destroy(struct tuple_format *format)

> 	/** Link in tuple_format::fields. */
> 
> would be more useful.
Ok, done

>> +	/** JSON tree of fields. */
> 
> Would be prudent to say a few words about the tree structure, e.g.
> 
> 	/**
> 	 * Fields comprising the format, organized in a tree.
> 	 * First level nodes correspond to tuple fields.
> 	 * Deeper levels define indexed JSON paths within
> 	 * tuple fields. Nodes of the tree are linked by
> 	 * tuple_field::token.
> 	 */
Ok, done

> format->tree - what tree?
> 
> Why not leave the same name - 'fields'?
Ok, done

> Comments, comments...
/**
 * Return a count of first level nodes correspond to tuple
 * fields.
 */
static inline uint32_t
tuple_format_field_count(const struct tuple_format *format)

/**
 * Get the first level node corresponding to tuple field by its
 * fieldno.
 */
static inline struct tuple_field *
tuple_format_field(struct tuple_format *format, uint32_t fieldno)

=========================================

As we going to work with format fields in a unified way, we
started to use the tree JSON class for working with first-level
format fields.

Need for #1012
---
 src/box/sql.c          |  16 +++---
 src/box/sql/build.c    |   5 +-
 src/box/tuple.c        |  10 ++--
 src/box/tuple_format.c | 125 +++++++++++++++++++++++++++++------------
 src/box/tuple_format.h |  49 +++++++++++++---
 src/box/vy_stmt.c      |   4 +-
 6 files changed, 150 insertions(+), 59 deletions(-)

diff --git a/src/box/sql.c b/src/box/sql.c
index 7b41c9926..9effe1eb2 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -201,7 +201,8 @@ tarantoolSqlite3TupleColumnFast(BtCursor *pCur, u32 fieldno, u32 *field_size)
 	struct tuple_format *format = tuple_format(pCur->last_tuple);
 	assert(format->exact_field_count == 0
 	       || fieldno < format->exact_field_count);
-	if (format->fields[fieldno].offset_slot == TUPLE_OFFSET_SLOT_NIL)
+	if (tuple_format_field(format, fieldno)->offset_slot ==
+	    TUPLE_OFFSET_SLOT_NIL)
 		return NULL;
 	const char *field = tuple_field(pCur->last_tuple, fieldno);
 	const char *end = field;
@@ -896,7 +897,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	struct key_def *key_def;
 	const struct tuple *tuple;
 	const char *base;
-	const struct tuple_format *format;
+	struct tuple_format *format;
 	const uint32_t *field_map;
 	uint32_t field_count, next_fieldno = 0;
 	const char *p, *field0;
@@ -914,7 +915,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	base = tuple_data(tuple);
 	format = tuple_format(tuple);
 	field_map = tuple_field_map(tuple);
-	field_count = format->field_count;
+	field_count = tuple_format_field_count(format);
 	field0 = base; mp_decode_array(&field0); p = field0;
 	for (i = 0; i < n; i++) {
 		/*
@@ -932,9 +933,10 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 		uint32_t fieldno = key_def->parts[i].fieldno;
  		if (fieldno != next_fieldno) {
+			struct tuple_field *field =
+				tuple_format_field(format, fieldno);
 			if (fieldno >= field_count ||
-			    format->fields[fieldno].offset_slot ==
-			    TUPLE_OFFSET_SLOT_NIL) {
+			    field->offset_slot == TUPLE_OFFSET_SLOT_NIL) {
 				/* Outdated field_map. */
 				uint32_t j = 0;
 @@ -942,9 +944,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 				while (j++ != fieldno)
 					mp_next(&p);
 			} else {
-				p = base + field_map[
-					format->fields[fieldno].offset_slot
-					];
+				p = base + field_map[field->offset_slot];
 			}
 		}
 		next_fieldno = fieldno + 1;
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 52f0bde15..b5abaeeda 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -936,8 +936,9 @@ sql_column_collation(struct space_def *def, uint32_t column, uint32_t *coll_id)
 		struct coll_id *collation = coll_by_id(*coll_id);
 		return collation != NULL ? collation->coll : NULL;
 	}
-	*coll_id = space->format->fields[column].coll_id;
-	return space->format->fields[column].coll;
+	struct tuple_field *field = tuple_format_field(space->format, column);
+	*coll_id = field->coll_id;
+	return field->coll;
 }
  struct ExprList *
diff --git a/src/box/tuple.c b/src/box/tuple.c
index ef4d16f39..aae1c3cdd 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,7 +138,7 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
 int
 tuple_validate_raw(struct tuple_format *format, const char *tuple)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to check */
  	/* Check to see if the tuple has a sufficient number of fields. */
@@ -158,10 +158,12 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
 	}
  	/* Check field types */
-	struct tuple_field *field = &format->fields[0];
+	struct tuple_field *field = tuple_format_field(format, 0);
 	uint32_t i = 0;
-	uint32_t defined_field_count = MIN(field_count, format->field_count);
-	for (; i < defined_field_count; ++i, ++field) {
+	uint32_t defined_field_count =
+		MIN(field_count, tuple_format_field_count(format));
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field(format, i);
 		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
 					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
 					 tuple_field_is_nullable(field)))
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 661cfdc94..b801a0eb4 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -38,10 +38,27 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
  static uint32_t formats_size = 0, formats_capacity = 0;
 -static const struct tuple_field tuple_field_default = {
-	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
-};
+static struct tuple_field *
+tuple_field_new(void)
+{
+	struct tuple_field *ret = calloc(1, sizeof(struct tuple_field));
+	if (ret == NULL) {
+		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
+			 "ret");
+		return NULL;
+	}
+	ret->type = FIELD_TYPE_ANY;
+	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
+	ret->coll_id = COLL_NONE;
+	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
+	return ret;
+}
+
+static void
+tuple_field_delete(struct tuple_field *field)
+{
+	free(field);
+}
  static int
 tuple_format_use_key_part(struct tuple_format *format,
@@ -49,8 +66,8 @@ tuple_format_use_key_part(struct tuple_format *format,
 			  const struct key_part *part, bool is_sequential,
 			  int *current_slot)
 {
-	assert(part->fieldno < format->field_count);
-	struct tuple_field *field = &format->fields[part->fieldno];
+	assert(part->fieldno < tuple_format_field_count(format));
+	struct tuple_field *field = tuple_format_field(format, part->fieldno);
 	/*
 		* If a field is not present in the space format,
 		* inherit nullable action of the first key part
@@ -138,16 +155,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 	format->min_field_count =
 		tuple_format_min_field_count(keys, key_count, fields,
 					     field_count);
-	if (format->field_count == 0) {
+	if (tuple_format_field_count(format) == 0) {
 		format->field_map_size = 0;
 		return 0;
 	}
 	/* Initialize defined fields */
 	for (uint32_t i = 0; i < field_count; ++i) {
-		format->fields[i].is_key_part = false;
-		format->fields[i].type = fields[i].type;
-		format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
-		format->fields[i].nullable_action = fields[i].nullable_action;
+		struct tuple_field *field = tuple_format_field(format, i);
+		field->type = fields[i].type;
+		field->nullable_action = fields[i].nullable_action;
 		struct coll *coll = NULL;
 		uint32_t cid = fields[i].coll_id;
 		if (cid != COLL_NONE) {
@@ -159,12 +175,9 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			}
 			coll = coll_id->coll;
 		}
-		format->fields[i].coll = coll;
-		format->fields[i].coll_id = cid;
+		field->coll = coll;
+		field->coll_id = cid;
 	}
-	/* Initialize remaining fields */
-	for (uint32_t i = field_count; i < format->field_count; i++)
-		format->fields[i] = tuple_field_default;
  	int current_slot = 0;
 @@ -184,7 +197,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		}
 	}
 -	assert(format->fields[0].offset_slot == TUPLE_OFFSET_SLOT_NIL);
+	assert(tuple_format_field(format, 0)->offset_slot ==
+		TUPLE_OFFSET_SLOT_NIL);
 	size_t field_map_size = -current_slot * sizeof(uint32_t);
 	if (field_map_size > UINT16_MAX) {
 		/** tuple->data_offset is 16 bits */
@@ -242,6 +256,19 @@ tuple_format_deregister(struct tuple_format *format)
 	format->id = FORMAT_ID_NIL;
 }
 +/** Destroy field JSON tree and release allocated memory. */
+static inline void
+tuple_format_field_tree_destroy(struct tuple_format *format)
+{
+	struct tuple_field *field, *tmp;
+	json_tree_foreach_entry_safe(&format->fields.root, field,
+				     struct tuple_field, token, tmp) {
+		json_tree_del(&format->fields, &field->token);
+		tuple_field_delete(field);
+	}
+	json_tree_destroy(&format->fields);
+}
+
 static struct tuple_format *
 tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		   uint32_t space_field_count, struct tuple_dictionary *dict)
@@ -258,39 +285,60 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
-	uint32_t total = sizeof(struct tuple_format) +
-			 field_count * sizeof(struct tuple_field);
 -	struct tuple_format *format = (struct tuple_format *) malloc(total);
+	struct tuple_format *format = malloc(sizeof(struct tuple_format));
 	if (format == NULL) {
 		diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
 			 "tuple format");
 		return NULL;
 	}
+	if (json_tree_create(&format->fields) != 0) {
+		free(format);
+		return NULL;
+	}
+	struct json_token token;
+	memset(&token, 0, sizeof(token));
+	token.type = JSON_TOKEN_NUM;
+	for (token.num = TUPLE_INDEX_BASE;
+	     token.num < field_count + TUPLE_INDEX_BASE; token.num++) {
+		struct tuple_field *field = tuple_field_new();
+		if (field == NULL)
+			goto error;
+		field->token = token;
+		if (json_tree_add(&format->fields, &format->fields.root,
+				  &field->token) != 0) {
+			diag_set(OutOfMemory, 0, "json_tree_add",
+				 "&format->tree");
+			tuple_field_delete(field);
+			goto error;
+		}
+	}
 	if (dict == NULL) {
 		assert(space_field_count == 0);
 		format->dict = tuple_dictionary_new(NULL, 0);
-		if (format->dict == NULL) {
-			free(format);
-			return NULL;
-		}
+		if (format->dict == NULL)
+			goto error;
 	} else {
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
-	format->field_count = field_count;
 	format->index_field_count = index_field_count;
 	format->exact_field_count = 0;
 	format->min_field_count = 0;
 	return format;
+error:;
+	tuple_format_field_tree_destroy(format);
+	free(format);
+	return NULL;
 }
  /** Free tuple format resources, doesn't unregister. */
 static inline void
 tuple_format_destroy(struct tuple_format *format)
 {
+	tuple_format_field_tree_destroy(format);
 	tuple_dictionary_unref(format->dict);
 }
 @@ -328,18 +376,21 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 }
  bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2)
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2)
 {
 	if (format1->exact_field_count != format2->exact_field_count)
 		return false;
-	for (uint32_t i = 0; i < format1->field_count; ++i) {
-		const struct tuple_field *field1 = &format1->fields[i];
+	uint32_t format1_field_count = tuple_format_field_count(format1);
+	uint32_t format2_field_count = tuple_format_field_count(format2);
+	for (uint32_t i = 0; i < format1_field_count; ++i) {
+		const struct tuple_field *field1 =
+			tuple_format_field(format1, i);
 		/*
 		 * The field has a data type in format1, but has
 		 * no data type in format2.
 		 */
-		if (i >= format2->field_count) {
+		if (i >= format2_field_count) {
 			/*
 			 * The field can get a name added
 			 * for it, and this doesn't require a data
@@ -355,7 +406,8 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
 			else
 				return false;
 		}
-		const struct tuple_field *field2 = &format2->fields[i];
+		const struct tuple_field *field2 =
+			tuple_format_field(format2, i);
 		if (! field_type1_contains_type2(field1->type, field2->type))
 			return false;
 		/*
@@ -374,7 +426,7 @@ int
 tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple, bool validate)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to initialize */
  	const char *pos = tuple;
@@ -397,17 +449,17 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
  	/* first field is simply accessible, so we do not store offset to it */
 	enum mp_type mp_type = mp_typeof(*pos);
-	const struct tuple_field *field = &format->fields[0];
+	const struct tuple_field *field =
+		tuple_format_field((struct tuple_format *)format, 0);
 	if (validate &&
 	    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
 				 TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
 		return -1;
 	mp_next(&pos);
 	/* other fields...*/
-	++field;
 	uint32_t i = 1;
 	uint32_t defined_field_count = MIN(field_count, validate ?
-					   format->field_count :
+					   tuple_format_field_count(format) :
 					   format->index_field_count);
 	if (field_count < format->index_field_count) {
 		/*
@@ -417,7 +469,8 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		memset((char *)field_map - format->field_map_size, 0,
 		       format->field_map_size);
 	}
-	for (; i < defined_field_count; ++i, ++field) {
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field((struct tuple_format *)format, i);
 		mp_type = mp_typeof(*pos);
 		if (validate &&
 		    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 232df22b2..623f4e25f 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -34,6 +34,7 @@
 #include "key_def.h"
 #include "field_def.h"
 #include "errinj.h"
+#include "json/json.h"
 #include "tuple_dictionary.h"
  #if defined(__cplusplus)
@@ -113,6 +114,8 @@ struct tuple_field {
 	struct coll *coll;
 	/** Collation identifier. */
 	uint32_t coll_id;
+	/** Link in tuple_format::fields. */
+	struct json_token token;
 };
  /**
@@ -166,16 +169,46 @@ struct tuple_format {
 	 * index_field_count <= min_field_count <= field_count.
 	 */
 	uint32_t min_field_count;
-	/* Length of 'fields' array. */
-	uint32_t field_count;
 	/**
 	 * Shared names storage used by all formats of a space.
 	 */
 	struct tuple_dictionary *dict;
-	/* Formats of the fields */
-	struct tuple_field fields[0];
+	/**
+	 * Fields comprising the format, organized in a tree.
+	 * First level nodes correspond to tuple fields.
+	 * Deeper levels define indexed JSON paths within
+	 * tuple fields. Nodes of the tree are linked by
+	 * tuple_field::token.
+	 */
+	struct json_tree fields;
 };
 +/**
+ * Return a count of first level nodes correspond to tuple
+ * fields.
+ */
+static inline uint32_t
+tuple_format_field_count(const struct tuple_format *format)
+{
+	return format->fields.root.child_count;
+}
+
+/**
+ * Get the first level node corresponding to tuple field by its
+ * fieldno.
+ */
+static inline struct tuple_field *
+tuple_format_field(struct tuple_format *format, uint32_t fieldno)
+{
+	assert(fieldno < tuple_format_field_count(format));
+	struct json_token token = {
+		.type = JSON_TOKEN_NUM,
+		.num = fieldno + TUPLE_INDEX_BASE
+	};
+	return json_tree_lookup_entry(&format->fields, &format->fields.root,
+				      &token, struct tuple_field, token);
+}
+
 extern struct tuple_format **tuple_formats;
  static inline uint32_t
@@ -238,8 +271,8 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
  * @retval True, if @a format1 can store any tuples of @a format2.
  */
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2);
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2);
  /**
  * Calculate minimal field count of tuples with specified keys and
@@ -333,7 +366,9 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple,
 			return tuple;
 		}
 -		int32_t offset_slot = format->fields[field_no].offset_slot;
+		int32_t offset_slot =
+			tuple_format_field((struct tuple_format *)format,
+					   field_no)->offset_slot;
 		if (offset_slot != TUPLE_OFFSET_SLOT_NIL) {
 			if (field_map[offset_slot] != 0)
 				return tuple + field_map[offset_slot];
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index d83840406..3e60fece9 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -411,7 +411,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	uint32_t *field_map = (uint32_t *) raw;
 	char *wpos = mp_encode_array(raw, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
 			field_map[field->offset_slot] = wpos - raw;
 		if (iov[i].iov_base == NULL) {
@@ -465,7 +465,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format,
 	}
 	char *pos = mp_encode_array(data, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (! field->is_key_part) {
 			/* Unindexed field - write NIL. */
 			assert(i < src_count);
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-11-29 17:38     ` Vladimir Davydov
  2018-11-29 17:50       ` Vladimir Davydov
  2018-12-04 15:22       ` Vladimir Davydov
@ 2018-12-04 15:47       ` Kirill Shcherbatov
  2018-12-04 17:54         ` Vladimir Davydov
  2 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-04 15:47 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov; +Cc: Kostya Osipov

> In general looks OK, please see a few comments below.
:)

> Please merge the two functions - the more functions we have for doing
> roughly the same thing (token comparison in this case), the more
> difficult it gets to follow the code.
> 
> I understand that you intend to use json_token_key_cmp (without parent
> comparison) to compare paths in a future patch, but IMO you'd better
> nullify token->parent in json_path_cmp before token comparison instead
> of having two functions.
> 
> The resulting function should be called json_token_cmp.
Ok, done.

> You always pass token->parent->hash to this function so the 'seed'
> argument is not really needed.
Ok, done.

> This is a rather pointless assertion as mh_json_delete will crash anyway
> if tree->hash is NULL. Besides, how would anyone ever step on it? By
> passing an uninitialized json_tree to this function? This will hardly
> ever happen.
> 
> OTOH it would be good to ensure that the tree is empty here, because
> this is a subtle part of the function protocol.
Ok, done.
> 'token' should be marked const here.
Ok, done.

> I'd rather require the caller to pass tree->root explicitly.
> This would save us a comparison.

Ok, done.

> Please don't assign a whole object when you need just a few fields:> This is a warm path after all.
Ok, done.

> unlikely() isn't needed here - there are plenty of function calls so it
> will hardly make any difference. Besides, why would one expect that the
> key is likely to be present in the tree?
Ok, fixed.

> May be, it's worth making this case (JSON_TOKEN_NUM) inline, to save a
> function call in case of a top-level tuple field lookup? BTW likely()
> would be appropriate there.
Introduce json_tree_lookup_slowpath and inline wrapper json_tree_lookup

> Useless unlikely().
Ok, done
>>
>> Ouch, this looks much more complex than it was in the previous version.
>> Please revert. Quoting myself:
>>
>> } > +     if (insert_idx >= parent->children_count) {
>> } > +             uint32_t new_size = insert_idx + 1;
>> } 
>> } We usually double the size with each allocation. If this is intentional,
>> } please add a comment.
>>
>> I didn't push you to change that. I just wanted you to add a comment
>> saying that we can afford quadratic algorithmic complexity here, because
>> this function is far from a hot path and we expect the tree to be rather
>> small.
>Come to think of it, don't bother. Doubling the array will probably
>increases chances that the allocation will stay close to the node
>itself, and it looks saner in a common sense. Let's please just rename
>child_size to child_count_max and increase the initial allocation up to
>say 8, OK?
Ok, done.

> Again, unlikely() is pointless here.
Ok, done.

> What's this loop for? Can't you just use sibling_idx here?
> Indentation.
You are right, it is a legacy code, reworked.

> This function may be called many times from json_tree_foreach helper.
> The latter is invoked at each tuple creation so we must strive to make
> it as fast as we can. Now, suppose you have JSON arrays of 100 elements
> with the last one indexed. You'll be iterating over 99 elements for
> nothing. Is it really that bad? I'm not sure, because the iteration
> should finish pretty quickly, but it looks rather ugly to me.
> 
> A suggestion: may be, we could use rlist, as you did it before, for
> iteration and the children array for lookup in an array? That is the
> children array wouldn't be allocated at all for JSON_TOKEN_STR, only for
> JSON_TOKEN_NUM, and it would be used solely for quick access, not for
> iteration. BTW sibling_idx wouldn't be needed then. Would it make sense?
As we discussed verbally, this is not hot path for our typical scenarios; tuple_format_create, tuple_format_delete, and (not so cold, but)
vy_stmt_new_surrogate_from_key - may be not extremal fast.
Also this memory region would be in cache and linear comparison of
hypothetical hundred zeros is not really time-consuming.
Moreover, we are going to "flatenning" direction where all fields would be
present in JSON tree.
>>  /**
>>   * Element of a JSON path. It can be either string or number.
>>   * String idenfiers are in ["..."] and between dots. Numbers are
>> - * indexes in [...].
>> + * indexes in [...]. May be organized in a JSON tree.
> 
> This is such a lame comment :-(
> 
> Can we say a few words here about the json_token lifetime: how it is
> first filled by lexer, how it then can be passed to json_tree_add to
> form a tree, how the tree is organized... May be, even add a code
> snippet to the comment.

>> +	} key;
> 
> After some consideration, I don't think that it's worth moving the token
> identifier to anonymous struct 'key' - it doesn't conflict with other
> members and using 'num' or 'str/len' directly is shorter than 'key.num'
> and 'key.str/len'. Also, the 'type' still has type json_token_type, not
> json_token_key_type, which suggests that it should be under token.type,
> not under token.key.type.
Ok, done

> } It took me a while to realize how you calculate rolling hash - I had to
> } dive deep into the implementation.
> 
> Alas, the new comment isn't a bit better than the previous one.
> 
>> +	uint32_t rolling_hash;
> 
> Let's call it simply 'hash', short and clear. The rolling nature of the
> hash should be explained in the comment.
Ok, done

> typo: indexe -> index
> 
> BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing
> from 1 looks weird to me.
> 
>> +	 * and are allocated sequently for JSON_TOKEN_NUM child
> 
> typo: sequently -> sequentially
Ok, done.

> +	/** Hashtable af all tree nodes. */
> 
> How does it work? In a year nobody will remember, and to understand how
> this thing operates we will have to dive deep into the code...
      /**
	 * Hashtable af all tree nodes.
	 * The uniqueness of the records in the table is
	 * provided by:
	 * 1) the value of json_token initialized by the parser
	 *    is <type, num | str + len>;
	 * 2) pointer to parent json_token;
	 * 3) private hash function value calculated for
	 *    json_token (1) using parent (2) hash value as a
	 *    seed.
	 * In fact, only JSON_TOKEN_STR json_tokens are physically
	 * added in hashtable, because JSON_TOKEN_NUM doesn't
	 * require support structure to make lookup.
	 * However, each json_tree token should have valid
	 * hash value (3) as it may be required by following
	 * subtree level.
	 */
	struct mh_json_t *hash;


> It's definitely worth mentioning that these functions don't visit the
> root node.
Ok.

> 
>> +
>> +/**
>> + * Make safe post-order traversal in JSON tree.
>> + * May be used for destructors.
>> + */
>> +#define json_tree_foreach_safe(node, root)				     \
>> +for (struct json_token *__next = json_tree_postorder_next((root), NULL);     \
>> +     (((node) = __next) &&						     \
>> +     (__next = json_tree_postorder_next((root), (node))), (node) != NULL);)
> 
> IMHO it's rather difficult for understanding. Can we rewrite it so that
> it uses for() loop as expected, i.e. initializes the loop variable in
> the first part, checks for a loop condition in the second part, and
> advances the loop variable in the third part? Something like:
> 
> #define json_tree_foreach_safe(curr, root, next)
> 	for (curr = json_tree_postorder_next(root, NULL),
> 	     next = curr ? json_tree_postorder_next(root, curr) : NULL;
> 	     curr != NULL;
> 	     curr = next,
> 	     next = curr ? json_tree_postorder_next(root, curr) : NULL)
Ok, done.
#define json_tree_foreach_safe(root, node, tmp)				     \
	for ((node) = json_tree_postorder_next((root), NULL);		     \
	     (node) != (root) &&					     \
	     ((tmp) =  json_tree_postorder_next((root), (node)));	     \
	     (node) = (tmp))

> Use container_of please.
Ok, done
#define json_tree_entry(node, type, member) \
	container_of((node), type, member)

> Sometimes you wrap macro arguments in parentheses (tree, parent),
> sometimes you don't (token, path, path_len). Please fix.
Ok, done.

> Please add more paths. I'd like to see the following cases covered as
> well:
> 
>  - JSON_TOKEN_STR as an intermediate node
>  - JSON_TOKEN_STR as a common node for two paths
>  - coinciding paths
>  - one path is a sub-path of another

	const char *path1 = "[1][10]";
	const char *path2 = "[1][20].file";
	const char *path3 = "[1][20].file[2]";
	const char *path4 = "[1][20].file[4]";
	const char *path4_copy = "[1][20][\"file\"][4]";
	const char *path_unregistered = "[1][3]";

====================================================================



New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

Need for #1012
---
src/lib/json/CMakeLists.txt | 1 +
src/lib/json/json.c | 268 +++++++++++++++++++++++++++++++++
src/lib/json/json.h | 293 +++++++++++++++++++++++++++++++++++-
test/unit/json_path.c | 211 +++++++++++++++++++++++++-
test/unit/json_path.result | 56 ++++++-
5 files changed, 826 insertions(+), 3 deletions(-)

diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 0f0739620..51a1f027a 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -4,3 +4,4 @@ set(lib_sources

set_source_files_compile_flags(${lib_sources})
add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index eb80e4bbc..58a842ef1 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -30,6 +30,7 @@
*/

#include "json.h"
+#include "third_party/PMurHash.h"
#include <ctype.h>
#include <stdbool.h>
#include <unicode/uchar.h>
@@ -241,3 +242,270 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
return json_parse_identifier(lexer, token);
}
}
+
+/** Compare JSON tokens keys. */
+static int
+json_token_cmp(const struct json_token *a, const struct json_token *b)
+{
+ if (a->parent != b->parent)
+ return a->parent - b->parent;
+ if (a->type != b->type)
+ return a->type - b->type;
+ int ret = 0;
+ if (a->type == JSON_TOKEN_STR) {
+ if (a->len != b->len)
+ return a->len - b->len;
+ ret = memcmp(a->str, b->str, a->len);
+ } else if (a->type == JSON_TOKEN_NUM) {
+ ret = a->num - b->num;
+ } else {
+ unreachable();
+ }
+ return ret;
+}
+
+#define MH_SOURCE 1
+#define mh_name _json
+#define mh_key_t struct json_token **
+#define mh_node_t struct json_token *
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((*a)->hash)
+#define mh_hash_key(a, arg) ((*a)->hash)
+#define mh_cmp(a, b, arg) (json_token_cmp((*a), (*b)))
+#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
+#include "salad/mhash.h"
+
+static const uint32_t hash_seed = 13U;
+
+/** Compute the hash value of a JSON token. */
+static uint32_t
+json_token_hash(struct json_token *token)
+{
+ uint32_t h = token->parent->hash;
+ uint32_t carry = 0;
+ const void *data;
+ uint32_t data_size;
+ if (token->type == JSON_TOKEN_STR) {
+ data = token->str;
+ data_size = token->len;
+ } else if (token->type == JSON_TOKEN_NUM) {
+ data = &token->num;
+ data_size = sizeof(token->num);
+ } else {
+ unreachable();
+ }
+ PMurHash32_Process(&h, &carry, data, data_size);
+ return PMurHash32_Result(h, carry, data_size);
+}
+
+int
+json_tree_create(struct json_tree *tree)
+{
+ memset(tree, 0, sizeof(struct json_tree));
+ tree->root.hash = hash_seed;
+ tree->root.type = JSON_TOKEN_END;
+ tree->hash = mh_json_new();
+ return tree->hash == NULL ? -1 : 0;
+}
+
+static void
+json_token_destroy(struct json_token *token)
+{
+ /* Token mustn't have JSON subtree. */
+ #ifndef NDEBUG
+ struct json_token *iter;
+ uint32_t nodes = 0;
+ json_tree_foreach_preorder(token, iter)
+ nodes++;
+ assert(nodes == 0);
+ #endif /* NDEBUG */
+
+ free(token->children);
+}
+
+void
+json_tree_destroy(struct json_tree *tree)
+{
+ /* Tree must be empty. */
+ #ifndef NDEBUG
+ struct json_token *iter;
+ uint32_t nodes = 0;
+ json_tree_foreach_preorder(&tree->root, iter)
+ nodes++;
+ assert(nodes == 0);
+ #endif /* NDEBUG */
+
+ json_token_destroy(&tree->root);
+ mh_json_delete(tree->hash);
+}
+
+struct json_token *
+json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
+ const struct json_token *token)
+{
+ assert(parent != NULL);
+ if (likely(token->type == JSON_TOKEN_STR)) {
+ struct json_token key, *key_ptr;
+ key.type = token->type;
+ key.str = token->str;
+ key.len = token->len;
+ key.parent = parent;
+ key.hash = json_token_hash(&key);
+ key_ptr = &key;
+ mh_int_t id = mh_json_find(tree->hash, &key_ptr, NULL);
+ if (id == mh_end(tree->hash))
+ return NULL;
+ struct json_token **entry = mh_json_node(tree->hash, id);
+ assert(entry == NULL || (*entry)->parent == parent);
+ return entry != NULL ? *entry : NULL;
+ } else if (token->type == JSON_TOKEN_NUM) {
+ uint32_t idx = token->num - 1;
+ return likely(idx < parent->child_count) ?
+ parent->children[idx] : NULL;
+ }
+ unreachable();
+ return NULL;
+}
+
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+ struct json_token *token)
+{
+ assert(parent != NULL);
+ token->parent = parent;
+ uint32_t hash = json_token_hash(token);
+ assert(json_tree_lookup(tree, parent, token) == NULL);
+ uint32_t insert_idx = token->type == JSON_TOKEN_NUM ?
+ (uint32_t)token->num - 1 :
+ parent->child_count_max;
+ if (insert_idx >= parent->child_count_max) {
+ uint32_t new_size = parent->child_count_max == 0 ?
+ 8 : 2 * parent->child_count_max;
+ while (insert_idx >= new_size)
+ new_size *= 2;
+ struct json_token **children =
+ realloc(parent->children, new_size*sizeof(void *));
+ if (children == NULL)
+ return -1;
+ memset(children + parent->child_count_max, 0,
+ (new_size - parent->child_count_max)*sizeof(void *));
+ parent->children = children;
+ parent->child_count_max = new_size;
+ }
+ assert(parent->children[insert_idx] == NULL);
+ parent->children[insert_idx] = token;
+ parent->child_count = MAX(parent->child_count, insert_idx + 1);
+ token->sibling_idx = insert_idx;
+ token->hash = hash;
+ token->parent = parent;
+ if (token->type != JSON_TOKEN_STR)
+ goto end;
+ /*
+ * Add string token to json_tree hash to make lookup
+ * by name.
+ */
+ const struct json_token **key =
+ (const struct json_token **)&token;
+ mh_int_t rc = mh_json_put(tree->hash, key, NULL, NULL);
+ if (rc == mh_end(tree->hash)) {
+ parent->children[insert_idx] = NULL;
+ return -1;
+ }
+end:
+ assert(json_tree_lookup(tree, parent, token) == token);
+ return 0;
+}
+
+void
+json_tree_del(struct json_tree *tree, struct json_token *token)
+{
+ struct json_token *parent = token->parent;
+ assert(json_tree_lookup(tree, parent, token) == token);
+ struct json_token **child_slot = &parent->children[token->sibling_idx];
+ assert(child_slot != NULL && *child_slot == token);
+ *child_slot = NULL;
+
+ if (token->type != JSON_TOKEN_STR)
+ goto end;
+ /* Remove string token from json_tree hash. */
+ mh_int_t id = mh_json_find(tree->hash, &token, NULL);
+ assert(id != mh_end(tree->hash));
+ mh_json_del(tree->hash, id, NULL);
+ json_token_destroy(token);
+end:
+ assert(json_tree_lookup(tree, parent, token) == NULL);
+}
+
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+ const char *path, uint32_t path_len)
+{
+ int rc;
+ struct json_lexer lexer;
+ struct json_token token;
+ struct json_token *ret = parent != NULL ? parent : &tree->root;
+ json_lexer_create(&lexer, path, path_len);
+ while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
+ token.type != JSON_TOKEN_END && ret != NULL) {
+ ret = json_tree_lookup(tree, ret, &token);
+ }
+ if (rc != 0 || token.type != JSON_TOKEN_END)
+ return NULL;
+ return ret;
+}
+
+static struct json_token *
+json_tree_child_next(struct json_token *parent, struct json_token *pos)
+{
+ assert(pos == NULL || pos->parent == parent);
+ struct json_token **arr = parent->children;
+ uint32_t arr_size = parent->child_count;
+ if (arr == NULL)
+ return NULL;
+ uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
+ while (idx < arr_size && arr[idx] == NULL)
+ idx++;
+ if (idx >= arr_size)
+ return NULL;
+ return arr[idx];
+}
+
+static struct json_token *
+json_tree_leftmost(struct json_token *pos)
+{
+ struct json_token *last;
+ do {
+ last = pos;
+ pos = json_tree_child_next(pos, NULL);
+ } while (pos != NULL);
+ return last;
+}
+
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos)
+{
+ struct json_token *next = json_tree_child_next(pos, NULL);
+ if (next != NULL)
+ return next;
+ while (pos != root) {
+ next = json_tree_child_next(pos->parent, pos);
+ if (next != NULL)
+ return next;
+ pos = pos->parent;
+ }
+ return NULL;
+}
+
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos)
+{
+ struct json_token *next;
+ if (pos == NULL)
+ return json_tree_leftmost(root);
+ if (pos == root)
+ return NULL;
+ next = json_tree_child_next(pos->parent, pos);
+ if (next != NULL)
+ return json_tree_leftmost(next);
+ return pos->parent;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index ead446878..948fcdb73 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -30,7 +30,7 @@
* THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
-#include <stdint.h>
+#include "trivia/util.h"

#ifdef __cplusplus
extern "C" {
@@ -62,6 +62,10 @@ enum json_token_type {
* Element of a JSON path. It can be either string or number.
* String idenfiers are in ["..."] and between dots. Numbers are
* indexes in [...].
+ *
+ * May be organized in a tree-like structure reflecting a JSON
+ * document structure, for more details see the comment to struct
+ * json_tree.
*/
struct json_token {
enum json_token_type type;
@@ -75,6 +79,114 @@ struct json_token {
/** Index value. */
uint64_t num;
};
+ /**
+ * Hash value of the token. Used for lookups in a JSON tree.
+ * For more details, see the comment to json_tree::hash_table.
+ */
+ uint32_t hash;
+ /**
+ * Array of child records. Indexes in this array
+ * match array index for JSON_TOKEN_STR tokens type
+ * and are allocated sequentially for JSON_TOKEN_NUM child
+ * tokens.
+ */
+ struct json_token **children;
+ /** Allocation size of children array. */
+ uint32_t child_count_max;
+ /**
+ * Count of defined children array items. Equal the
+ * maximum index of inserted item.
+ */
+ uint32_t child_count;
+ /** Index of node in parent children array. */
+ uint32_t sibling_idx;
+ /** Pointer to parent node. */
+ struct json_token *parent;
+};
+
+struct mh_json_t;
+
+/**
+ * This structure is used for organizing JSON tokens produced
+ * by a lexer in a tree-like structure reflecting a JSON document
+ * structure.
+ *
+ * Each intermediate node of the tree corresponds to either
+ * a JSON map or an array, depending on the key type used by
+ * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively).
+ * Leaf nodes may represent both complex JSON structures and
+ * final values - it is not mandated by the JSON tree design.
+ * The root of the tree doesn't have a key and is preallocated
+ * when the tree is created.
+ *
+ * The json_token structure is intrusive by design, i.e. to store
+ * arbitrary information in a JSON tree, one has to incorporate it
+ * into a user defined structure.
+ *
+ * Example:
+ *
+ * struct data {
+ * ...
+ * struct json_token token;
+ * };
+ *
+ * struct json_tree tree;
+ * json_tree_create(&tree);
+ * struct json_token *parent = &tree->root;
+ *
+ * // Add a path to the tree.
+ * struct data *data = data_new();
+ * struct json_lexer lexer;
+ * json_lexer_create(&lexer, path, path_len);
+ * json_lexer_next_token(&lexer, &data->token);
+ * while (data->token.type != JSON_TOKEN_END) {
+ * json_tree_add(&tree, parent, &data->token);
+ * parent = &data->token;
+ * data = data_new();
+ * json_lexer_next_token(&lexer, &data->token);
+ * }
+ * data_delete(data);
+ *
+ * // Look up a path in the tree.
+ * data = json_tree_lookup_path(&tree, &tree.root,
+ * path, path_len);
+ */
+struct json_tree {
+ /**
+ * Preallocated token corresponding to the JSON tree root.
+ * It doesn't have a key (set to JSON_TOKEN_END).
+ */
+ struct json_token root;
+ /**
+ * Hash table that is used to quickly look up a token
+ * corresponding to a JSON map item given a key and
+ * a parent token. We store all tokens that have type
+ * JSON_TOKEN_STR in this hash table. Apparently, we
+ * don't need to store JSON_TOKEN_NUM tokens as we can
+ * quickly look them up in the children array anyway.
+ *
+ * The hash table uses pair <parent, key> as key, so
+ * even tokens that happen to have the same key will
+ * have different keys in the hash. To look up a tree
+ * node corresponding to a particular path, we split
+ * the path into tokens and look up the first token
+ * in the root node and each following token in the
+ * node returned at the previous step.
+ *
+ * We compute a hash value for a token by hashing its
+ * key using the hash value of its parent as seed. This
+ * is equivalent to computing hash for the path leading
+ * to the token. However, we don't need to recompute
+ * hash starting from the root at each step as we
+ * descend the tree looking for a specific path, and we
+ * can start descent from any node, not only from the root.
+ *
+ * As a consequence of this hashing technique, even
+ * though we don't need to store JSON_TOKEN_NUM tokens
+ * in the hash table, we still have to compute hash
+ * values for them.
+ */
+ struct mh_json_t *hash;
};

/**
@@ -104,6 +216,185 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
int
json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);

+/** Create a JSON tree object to manage data relations. */
+int
+json_tree_create(struct json_tree *tree);
+
+/**
+ * Destroy JSON tree object. This routine doesn't destroy attached
+ * subtree so it should be called at the end of manual destroy.
+ */
+void
+json_tree_destroy(struct json_tree *tree);
+
+/**
+ * Make child lookup in JSON tree by token at position specified
+ * with parent.
+ */
+struct json_token *
+json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
+ const struct json_token *token);
+
+/**
+ * Make child lookup in JSON tree by token at position specified
+ * with parent without function call in the best-case. */
+static inline struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+ const struct json_token *token)
+{
+ struct json_token *ret = NULL;
+ if (token->type == JSON_TOKEN_NUM) {
+ uint32_t idx = token->num - 1;
+ ret = likely(idx < parent->child_count_max) ?
+ parent->children[idx] : NULL;
+ } else {
+ ret = json_tree_lookup_slowpath(tree, parent, token);
+ }
+ return ret;
+}
+
+/**
+ * Append token to the given parent position in a JSON tree. The
+ * parent mustn't have a child with such content.
+ */
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+ struct json_token *token);
+
+/**
+ * Delete a JSON tree token at the given parent position in JSON
+ * tree. Token entry shouldn't have subtree.
+ */
+void
+json_tree_del(struct json_tree *tree, struct json_token *token);
+
+/** Make child lookup by path in JSON tree. */
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+ const char *path, uint32_t path_len);
+
+/** Make pre order traversal in JSON tree. */
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos);
+
+/** Make post order traversal in JSON tree. */
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos);
+
+#ifndef typeof
+#define typeof __typeof__
+#endif
+
+/** Return container entry by json_tree_node node. */
+#define json_tree_entry(node, type, member) \
+ container_of((node), type, member)
+/**
+ * Return container entry by json_tree_node or NULL if
+ * node is NULL.
+ */
+#define json_tree_entry_safe(node, type, member) ({ \
+ (node) != NULL ? json_tree_entry((node), type, member) : NULL; \
+})
+
+/** Make entry pre order traversal in JSON tree */
+#define json_tree_preorder_next_entry(root, node, type, member) ({ \
+ struct json_token *__next = \
+ json_tree_preorder_next((root), (node)); \
+ json_tree_entry_safe(__next, type, member); \
+})
+
+/**
+ * Make entry post order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_postorder_next_entry(root, node, type, member) ({ \
+ struct json_token *__next = \
+ json_tree_postorder_next((root), (node)); \
+ json_tree_entry_safe(__next, type, member); \
+})
+
+/**
+ * Make lookup in tree by path and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_lookup_path_entry(tree, parent, path, path_len, type, \
+ member) ({ \
+ struct json_token *__node = \
+ json_tree_lookup_path((tree), (parent), path, path_len); \
+ json_tree_entry_safe(__node, type, member); \
+})
+
+/** Make lookup in tree by token and return entry. */
+#define json_tree_lookup_entry(tree, parent, token, type, member) ({ \
+ struct json_token *__node = \
+ json_tree_lookup((tree), (parent), token); \
+ json_tree_entry_safe(__node, type, member); \
+})
+
+/**
+ * Make pre-order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_preorder(root, node) \
+ for ((node) = json_tree_preorder_next((root), (root)); \
+ (node) != NULL; \
+ (node) = json_tree_preorder_next((root), (node)))
+
+/**
+ * Make post-order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_postorder(root, node) \
+ for ((node) = json_tree_postorder_next((root), NULL); \
+ (node) != (root); \
+ (node) = json_tree_postorder_next((root), (node)))
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * May be used for destructors.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_safe(root, node, tmp) \
+ for ((node) = json_tree_postorder_next((root), NULL); \
+ (node) != (root) && \
+ ((tmp) = json_tree_postorder_next((root), (node))); \
+ (node) = (tmp))
+
+/**
+ * Make post-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_preorder(root, node, type, member) \
+ for ((node) = json_tree_preorder_next_entry((root), (root), \
+ type, member); \
+ (node) != NULL; \
+ (node) = json_tree_preorder_next_entry((root), &(node)->member, \
+ type, member))
+
+/**
+ * Make pre-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_postorder(root, node, type, member) \
+ for ((node) = json_tree_postorder_next_entry((root), NULL, type, \
+ member); \
+ &(node)->member != (root); \
+ (node) = json_tree_postorder_next_entry((root), &(node)->member,\
+ type, member))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_safe(root, node, type, member, tmp) \
+ for ((node) = json_tree_postorder_next_entry((root), NULL, \
+ type, member); \
+ &(node)->member != (root) && \
+ ((tmp) = json_tree_postorder_next_entry((root), \
+ &(node)->member, \
+ type, member)); \
+ (node) = (tmp))
+
#ifdef __cplusplus
}
#endif
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index a5f90ad98..1e1011237 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -2,6 +2,7 @@
#include "unit.h"
#include "trivia/util.h"
#include <string.h>
+#include <stdbool.h>

#define reset_to_new_path(value) \
path = value; \
@@ -159,14 +160,222 @@ test_errors()
footer();
}

+struct test_struct {
+ int value;
+ struct json_token node;
+};
+
+struct test_struct *
+test_struct_alloc(struct test_struct *records_pool, int *pool_idx)
+{
+ struct test_struct *ret = &records_pool[*pool_idx];
+ *pool_idx = *pool_idx + 1;
+ memset(&ret->node, 0, sizeof(ret->node));
+ return ret;
+}
+
+struct test_struct *
+test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
+ struct test_struct *records_pool, int *pool_idx)
+{
+ int rc;
+ struct json_lexer lexer;
+ struct json_token *parent = &tree->root;
+ json_lexer_create(&lexer, path, path_len);
+ struct test_struct *field = test_struct_alloc(records_pool, pool_idx);
+ while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 &&
+ field->node.type != JSON_TOKEN_END) {
+ struct json_token *next =
+ json_tree_lookup(tree, parent, &field->node);
+ if (next == NULL) {
+ rc = json_tree_add(tree, parent, &field->node);
+ fail_if(rc != 0);
+ next = &field->node;
+ field = test_struct_alloc(records_pool, pool_idx);
+ }
+ parent = next;
+ }
+ fail_if(rc != 0 || field->node.type != JSON_TOKEN_END);
+ *pool_idx = *pool_idx - 1;
+ /* release field */
+ return json_tree_entry(parent, struct test_struct, node);
+}
+
+void
+test_tree()
+{
+ header();
+ plan(50);
+
+ struct json_tree tree;
+ int rc = json_tree_create(&tree);
+ fail_if(rc != 0);
+
+ struct test_struct records[7];
+ for (int i = 0; i < 6; i++)
+ records[i].value = i;
+
+ const char *path1 = "[1][10]";
+ const char *path2 = "[1][20].file";
+ const char *path3 = "[1][20].file[2]";
+ const char *path4 = "[1][20].file[4]";
+ const char *path4_copy = "[1][20][\"file\"][4]";
+ const char *path_unregistered = "[1][3]";
+
+ int records_idx = 0;
+ struct test_struct *node, *node_tmp;
+ node = test_add_path(&tree, path1, strlen(path1), records,
+ &records_idx);
+ is(node, &records[1], "add path '%s'", path1);
+
+ node = test_add_path(&tree, path2, strlen(path2), records,
+ &records_idx);
+ is(node, &records[3], "add path '%s'", path2);
+
+ node = test_add_path(&tree, path3, strlen(path3), records,
+ &records_idx);
+ is(node, &records[4], "add path '%s'", path3);
+
+ node = test_add_path(&tree, path4, strlen(path4), records,
+ &records_idx);
+ is(node, &records[5], "add path '%s'", path4);
+
+ node = test_add_path(&tree, path4_copy, strlen(path4_copy), records,
+ &records_idx);
+ is(node, &records[5], "add path '%s'", path4_copy);
+
+ node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1),
+ struct test_struct, node);
+ is(node, &records[1], "lookup path '%s'", path1);
+
+ node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+ struct test_struct, node);
+ is(node, &records[3], "lookup path '%s'", path2);
+
+ node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered,
+ strlen(path_unregistered),
+ struct test_struct, node);
+ is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+ /* Test iterators. */
+ struct json_token *token = NULL, *tmp;
+ const struct json_token *tokens_preorder[] =
+ {&records[0].node, &records[1].node, &records[2].node,
+ &records[3].node, &records[4].node, &records[5].node};
+ int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]);
+ int idx = 0;
+
+ json_tree_foreach_preorder(&tree.root, token) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t1 =
+ json_tree_entry(token, struct test_struct, node);
+ struct test_struct *t2 =
+ json_tree_entry(tokens_preorder[idx],
+ struct test_struct, node);
+ is(token, tokens_preorder[idx],
+ "test foreach pre order %d: have %d expected of %d",
+ idx, t1->value, t2->value);
+ ++idx;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ const struct json_token *tree_nodes_postorder[] =
+ {&records[1].node, &records[4].node, &records[5].node,
+ &records[3].node, &records[2].node, &records[0].node};
+ cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+ idx = 0;
+ json_tree_foreach_postorder(&tree.root, token) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t1 =
+ json_tree_entry(token, struct test_struct, node);
+ struct test_struct *t2 =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, node);
+ is(token, tree_nodes_postorder[idx],
+ "test foreach post order %d: have %d expected of %d",
+ idx, t1->value, t2->value);
+ ++idx;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_safe(&tree.root, token, tmp) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t1 =
+ json_tree_entry(token, struct test_struct, node);
+ struct test_struct *t2 =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, node);
+ is(token, tree_nodes_postorder[idx],
+ "test foreach safe order %d: have %d expected of %d",
+ idx, t1->value, t2->value);
+ ++idx;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_entry_preorder(&tree.root, node, struct test_struct,
+ node) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t =
+ json_tree_entry(tokens_preorder[idx],
+ struct test_struct, node);
+ is(&node->node, tokens_preorder[idx],
+ "test foreach entry pre order %d: have %d expected of %d",
+ idx, node->value, t->value);
+ idx++;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_entry_postorder(&tree.root, node, struct test_struct,
+ node) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, node);
+ is(&node->node, tree_nodes_postorder[idx],
+ "test foreach entry post order %d: have %d expected of %d",
+ idx, node->value, t->value);
+ idx++;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+ idx = 0;
+ json_tree_foreach_entry_safe(&tree.root, node, struct test_struct,
+ node, node_tmp) {
+ if (idx >= cnt)
+ break;
+ struct test_struct *t =
+ json_tree_entry(tree_nodes_postorder[idx],
+ struct test_struct, node);
+ is(&node->node, tree_nodes_postorder[idx],
+ "test foreach entry safe order %d: have %d expected of %d",
+ idx, node->value, t->value);
+ json_tree_del(&tree, &node->node);
+ idx++;
+ }
+ is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+ json_tree_destroy(&tree);
+
+ check_plan();
+ footer();
+}
+
int
main()
{
header();
- plan(2);
+ plan(3);

test_basic();
test_errors();
+ test_tree();

int rc = check_plan();
footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f829f..ea5fbd317 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
*** main ***
-1..2
+1..3
*** test_basic ***
1..71
ok 1 - parse <[0]>
@@ -99,4 +99,58 @@ ok 1 - subtests
ok 20 - tab inside identifier
ok 2 - subtests
*** test_errors: done ***
+ *** test_tree ***
+ 1..50
+ ok 1 - add path '[1][10]'
+ ok 2 - add path '[1][20].file'
+ ok 3 - add path '[1][20].file[2]'
+ ok 4 - add path '[1][20].file[4]'
+ ok 5 - add path '[1][20]["file"][4]'
+ ok 6 - lookup path '[1][10]'
+ ok 7 - lookup path '[1][20].file'
+ ok 8 - lookup unregistered path '[1][3]'
+ ok 9 - test foreach pre order 0: have 0 expected of 0
+ ok 10 - test foreach pre order 1: have 1 expected of 1
+ ok 11 - test foreach pre order 2: have 2 expected of 2
+ ok 12 - test foreach pre order 3: have 3 expected of 3
+ ok 13 - test foreach pre order 4: have 4 expected of 4
+ ok 14 - test foreach pre order 5: have 5 expected of 5
+ ok 15 - records iterated count 6 of 6
+ ok 16 - test foreach post order 0: have 1 expected of 1
+ ok 17 - test foreach post order 1: have 4 expected of 4
+ ok 18 - test foreach post order 2: have 5 expected of 5
+ ok 19 - test foreach post order 3: have 3 expected of 3
+ ok 20 - test foreach post order 4: have 2 expected of 2
+ ok 21 - test foreach post order 5: have 0 expected of 0
+ ok 22 - records iterated count 6 of 6
+ ok 23 - test foreach safe order 0: have 1 expected of 1
+ ok 24 - test foreach safe order 1: have 4 expected of 4
+ ok 25 - test foreach safe order 2: have 5 expected of 5
+ ok 26 - test foreach safe order 3: have 3 expected of 3
+ ok 27 - test foreach safe order 4: have 2 expected of 2
+ ok 28 - test foreach safe order 5: have 0 expected of 0
+ ok 29 - records iterated count 6 of 6
+ ok 30 - test foreach entry pre order 0: have 0 expected of 0
+ ok 31 - test foreach entry pre order 1: have 1 expected of 1
+ ok 32 - test foreach entry pre order 2: have 2 expected of 2
+ ok 33 - test foreach entry pre order 3: have 3 expected of 3
+ ok 34 - test foreach entry pre order 4: have 4 expected of 4
+ ok 35 - test foreach entry pre order 5: have 5 expected of 5
+ ok 36 - records iterated count 6 of 6
+ ok 37 - test foreach entry post order 0: have 1 expected of 1
+ ok 38 - test foreach entry post order 1: have 4 expected of 4
+ ok 39 - test foreach entry post order 2: have 5 expected of 5
+ ok 40 - test foreach entry post order 3: have 3 expected of 3
+ ok 41 - test foreach entry post order 4: have 2 expected of 2
+ ok 42 - test foreach entry post order 5: have 0 expected of 0
+ ok 43 - records iterated count 6 of 6
+ ok 44 - test foreach entry safe order 0: have 1 expected of 1
+ ok 45 - test foreach entry safe order 1: have 4 expected of 4
+ ok 46 - test foreach entry safe order 2: have 5 expected of 5
+ ok 47 - test foreach entry safe order 3: have 3 expected of 3
+ ok 48 - test foreach entry safe order 4: have 2 expected of 2
+ ok 49 - test foreach entry safe order 5: have 0 expected of 0
+ ok 50 - records iterated count 6 of 6
+ok 3 - subtests
+ *** test_tree: done ***
*** main: done ***
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v5 8/9] box: introduce offset slot cache in key_part
  2018-12-03 21:04   ` Vladimir Davydov
@ 2018-12-04 15:51     ` Vladimir Davydov
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-04 15:51 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja

On Tue, Dec 04, 2018 at 12:04:18AM +0300, Vladimir Davydov wrote:
> On Mon, Nov 26, 2018 at 01:49:42PM +0300, Kirill Shcherbatov wrote:
> > diff --git a/src/box/alter.cc b/src/box/alter.cc
> > index 029da02..6291159 100644
> > --- a/src/box/alter.cc
> > +++ b/src/box/alter.cc
> > @@ -856,7 +856,10 @@ alter_space_do(struct txn *txn, struct alter_space *alter)
> >  	 * Create a new (empty) space for the new definition.
> >  	 * Sic: the triggers are not moved over yet.
> >  	 */
> > -	alter->new_space = space_new_xc(alter->space_def, &alter->key_list);
> > +	alter->new_space =
> > +		space_new_xc(alter->space_def, &alter->key_list,
> > +			     alter->old_space->format != NULL ?
> > +			     alter->old_space->format->epoch + 1 : 1);
> 
> Passing the epoch around looks ugly. Let's introduce a global counter
> and bump it right in tuple_format_new(). If you worry about disk_format
> and mem_format having different epochs in vinyl, it's not really a
> problem: make epoch an optional argument to tuple_format_new(); if the
> caller passes an epoch, use it, otherwise generate a new one; in vinyl
> reuse mem_format->epoch for disk_format.
> 
> > diff --git a/src/box/key_def.h b/src/box/key_def.h
> > index 7731e48..3e08eb4 100644
> > --- a/src/box/key_def.h
> > +++ b/src/box/key_def.h
> > @@ -95,6 +95,14 @@ struct key_part {
> >  	char *path;
> >  	/** The length of JSON path. */
> >  	uint32_t path_len;
> > +	/**
> > +	 * Source format for offset_slot_cache hit validations.
> > +	 * Cache is expected to use "the format with the newest
> > +	 * epoch is most relevant" strategy.
> > +	 */
> > +	struct tuple_format *format_cache;
> 
> Why did you decide to store tuple_format in key_part instead of epoch,
> as you did originally? I liked that more.

Discussed verbally, agreed that:
 - Storing a pointer to the format in key_part is potentially dangerous,
   because pointers may coincide after reallocation, resulting in
   invalid behavior.
 - We should make the epoch counter global rather than passing it around
   on space creation. That is, tuple_format_new() will bump a global
   int64_t counter and assign the new value to the new format.
 - In vinyl disk_format and mem_format must have different epochs,
   because they have different field maps so no optional argument
   argument is required for tuple_format_new() - it will always assign a
   new epoch to the new format.
 - 'Epoch' isn't a very good name. We should probably call it 'version'
   or 'format_version' or 'tuple_format_version', depending on the
   context.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-12-04 15:47     ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-12-04 16:09       ` Vladimir Davydov
  2018-12-04 16:32         ` Kirill Shcherbatov
                           ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-04 16:09 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Tue, Dec 04, 2018 at 06:47:16PM +0300, Kirill Shcherbatov wrote:
> As we going to work with format fields in a unified way, we
> started to use the tree JSON class for working with first-level
> format fields.
> 
> Need for #1012
> ---
>  src/box/sql.c          |  16 +++---
>  src/box/sql/build.c    |   5 +-
>  src/box/tuple.c        |  10 ++--
>  src/box/tuple_format.c | 125 +++++++++++++++++++++++++++++------------
>  src/box/tuple_format.h |  49 +++++++++++++---
>  src/box/vy_stmt.c      |   4 +-
>  6 files changed, 150 insertions(+), 59 deletions(-)

This patch doesn't compile on my laptop. Please fix.

In file included from /home/vlad/src/tarantool/src/box/tuple.h:38:0,
                 from /home/vlad/src/tarantool/src/box/tuple_compare.h:38,
                 from /home/vlad/src/tarantool/src/box/tuple_compare.cc:31:
/home/vlad/src/tarantool/src/box/tuple_format.h: In function ‘tuple_field* tuple_format_field(tuple_format*, uint32_t)’:
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: sorry, unimplemented: non-trivial designated initializers not supported
  };
  ^
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::hash’ [-Werror=missing-field-initializers]
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::children’ [-Werror=missing-field-initializers]
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::child_count_max’ [-Werror=missing-field-initializers]
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::child_count’ [-Werror=missing-field-initializers]
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::sibling_idx’ [-Werror=missing-field-initializers]
/home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::parent’ [-Werror=missing-field-initializers]
At global scope:
cc1plus: error: unrecognized command line option ‘-Wno-cast-function-type’ [-Werror]
cc1plus: error: unrecognized command line option ‘-Wno-format-truncation’ [-Werror]
cc1plus: all warnings being treated as errors

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-12-04 16:09       ` Vladimir Davydov
@ 2018-12-04 16:32         ` Kirill Shcherbatov
  2018-12-05  8:37         ` Kirill Shcherbatov
  2018-12-06  7:56         ` Kirill Shcherbatov
  2 siblings, 0 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-04 16:32 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches, Kostya Osipov

> This patch doesn't compile on my laptop. Please fix.
> 
> In file included from /home/vlad/src/tarantool/src/box/tuple.h:38:0,
>                  from /home/vlad/src/tarantool/src/box/tuple_compare.h:38,
>                  from /home/vlad/src/tarantool/src/box/tuple_compare.cc:31:
> /home/vlad/src/tarantool/src/box/tuple_format.h: In function ‘tuple_field* tuple_format_field(tuple_format*, uint32_t)’:
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: sorry, unimplemented: non-trivial designated initializers not supported
>   };
>   ^
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::hash’ [-Werror=missing-field-initializers]
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::children’ [-Werror=missing-field-initializers]
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::child_count_max’ [-Werror=missing-field-initializers]
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::child_count’ [-Werror=missing-field-initializers]
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::sibling_idx’ [-Werror=missing-field-initializers]
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: error: missing initializer for member ‘json_token::parent’ [-Werror=missing-field-initializers]
> At global scope:
> cc1plus: error: unrecognized command line option ‘-Wno-cast-function-type’ [-Werror]
> cc1plus: error: unrecognized command line option ‘-Wno-format-truncation’ [-Werror]
> cc1plus: all warnings being treated as errors

@@ -201,10 +201,9 @@ static inline struct tuple_field *
 tuple_format_field(struct tuple_format *format, uint32_t fieldno)
 {
        assert(fieldno < tuple_format_field_count(format));
-       struct json_token token = {
-               .type = JSON_TOKEN_NUM,
-               .num = fieldno + TUPLE_INDEX_BASE
-       };
+       struct json_token token;
+       token.type = JSON_TOKEN_NUM;
+       token.num = fieldno + TUPLE_INDEX_BASE;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-12-04 15:47       ` [tarantool-patches] " Kirill Shcherbatov
@ 2018-12-04 17:54         ` Vladimir Davydov
  2018-12-05  8:37           ` Kirill Shcherbatov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-04 17:54 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Tue, Dec 04, 2018 at 06:47:27PM +0300, Kirill Shcherbatov wrote:
> >> +	uint32_t rolling_hash;
> > 
> > Let's call it simply 'hash', short and clear. The rolling nature of the
> > hash should be explained in the comment.
> Ok, done
> 
> > typo: indexe -> index
> > 
> > BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing
> > from 1 looks weird to me.

You left this comment from my previous review unattended.

> > 
> >> +	 * and are allocated sequently for JSON_TOKEN_NUM child
> > 
> > typo: sequently -> sequentially
> Ok, done.

See below for my comments to the new version of the patch.

> From c4e0001ecfd0987fffa2ef5f747ef6f3c016dae7 Mon Sep 17 00:00:00 2001
> From: Kirill Shcherbatov <kshcherbatov@tarantool.org>
> Date: Mon, 1 Oct 2018 15:10:19 +0300
> Subject: [PATCH] lib: implement JSON tree class for json library
> 
> New JSON tree class would store JSON paths for tuple fields
> for registered non-plain indexes. It is a hierarchical data
> structure that organize JSON nodes produced by parser.
> Class provides API to lookup node by path and iterate over the
> tree.
> JSON Indexes patch require such functionality to make lookup
> for tuple_fields by path, make initialization of field map and
> build vynyl_stmt msgpack for secondary index via JSON tree
> iteration.
> 
> Need for #1012

As I've already told you, should be

Needed for #1012

> diff --git a/src/lib/json/json.c b/src/lib/json/json.c
> index eb80e4bb..58a842ef 100644
> --- a/src/lib/json/json.c
> +++ b/src/lib/json/json.c
> +static void
> +json_token_destroy(struct json_token *token)
> +{
> +	/* Token mustn't have JSON subtree. */
> +	#ifndef NDEBUG

#ifndef/endif shouldn't be indented.

> +	struct json_token *iter;
> +	uint32_t nodes = 0;
> +	json_tree_foreach_preorder(token, iter)
> +		nodes++;
> +	assert(nodes == 0);
> +	#endif /* NDEBUG */

I'd prefer to change this to something simpler, like

	assert(token->child_count == 0);

but now I realize that child_count isn't actually the number of
children, as I thought, but the max id of ever existed child.
This is confusing. We need to do something about it.

What about?

	/**
	 * Allocation size of the children array.
	 */
	int children_capacity;
	/**
	 * Max occupied index in the children array.
	 */
	int max_child_idx;

and update max_child_idx on json_tree_del() as well

> +
> +	free(token->children);
> +}
> +
> +void
> +json_tree_destroy(struct json_tree *tree)
> +{
> +	/* Tree must be empty. */
> +	#ifndef NDEBUG
> +	struct json_token *iter;
> +	uint32_t nodes = 0;
> +	json_tree_foreach_preorder(&tree->root, iter)
> +		nodes++;
> +	assert(nodes == 0);
> +	#endif /* NDEBUG */

This check is pointless as the same check is done by json_token_destroy
called right below.

> +
> +	json_token_destroy(&tree->root);
> +	mh_json_delete(tree->hash);
> +}
> +
> +struct json_token *
> +json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
> +			  const struct json_token *token)
> +{
> +	assert(parent != NULL);

This particular assertion is pointless. You could as well add

	assert(tree != NULL);
	assert(token != NULL);

but why? Such assertions wouldn't enlighten the reader while the program
would crash anyway while trying to dereference NULL. An assertion should
either ensure some non-trivial condition, to prevent the program from
running any further and increasing the mess, or tip the reader what's
going on here.

> +	if (likely(token->type == JSON_TOKEN_STR)) {
> +		struct json_token key, *key_ptr;
> +		key.type = token->type;
> +		key.str = token->str;
> +		key.len = token->len;
> +		key.parent = parent;
> +		key.hash = json_token_hash(&key);
> +		key_ptr = &key;
> +		mh_int_t id = mh_json_find(tree->hash, &key_ptr, NULL);

You pass token** to mh_json_find instead of token*. I haven't noticed
that before, but turns out that

> +#define mh_key_t struct json_token **

This looks weird. Why not

 #define mh_key_t struct json_token *

?

> +		if (id == mh_end(tree->hash))
> +			return NULL;
> +		struct json_token **entry = mh_json_node(tree->hash, id);
> +		assert(entry == NULL || (*entry)->parent == parent);
> +		return entry != NULL ? *entry : NULL;

AFAIU entry can't be NULL here.

> +	} else if (token->type == JSON_TOKEN_NUM) {
> +		uint32_t idx =  token->num - 1;
> +		return likely(idx < parent->child_count) ?
> +		       parent->children[idx] : NULL;
> +	}

What's the point to handle JSON_TOKEN_NUM here? Nobody is supposed to
call json_tree_lookup_slowpath() directly. Everyone should use
json_tree_lookup() instead.

Please change to an assertion ensuring that token->type is NUM and add
a comment to json_tree_lookup_slowpath() saying that it's an internal
function that shouldn't be used directly.

> diff --git a/src/lib/json/json.h b/src/lib/json/json.h
> index ead44687..948fcdb7 100644
> --- a/src/lib/json/json.h
> +++ b/src/lib/json/json.h

> +/**
> + * Make child lookup in JSON tree by token at position specified
> + * with parent.
> + */
> +struct json_token *
> +json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
> +			  const struct json_token *token);

The comment to this function could be as short as:

/**
 * Internal function, use json_tree_lookup instead.
 */

> +
> +/**
> + * Make child lookup in JSON tree by token at position specified

They don't usually say "make lookup". It's "do lookup" or, even better,
simply "look up a token in a tree". "Make" is more like "build" or
"construct".

> + * with parent without function call in the best-case. */

Comment style.

> +static inline struct json_token *
> +json_tree_lookup(struct json_tree *tree, struct json_token *parent,
> +		 const struct json_token *token)
> +{
> +	struct json_token *ret = NULL;
> +	if (token->type == JSON_TOKEN_NUM) {
> +		uint32_t idx =  token->num - 1;
> +		ret = likely(idx < parent->child_count_max) ?
> +		      parent->children[idx] : NULL;
> +	} else {
> +		ret = json_tree_lookup_slowpath(tree, parent, token);
> +	}
> +	return ret;
> +}

> +/**
> + * Make secure post-order traversal in JSON tree and return entry.
> + * This cycle doesn't visit root node.
> + */
> +#define json_tree_foreach_entry_safe(root, node, type, member, tmp)	     \
> +	for ((node) = json_tree_postorder_next_entry((root), NULL,	     \
> +						     type, member);	     \
> +	     &(node)->member != (root) &&				     \
> +	     ((tmp) =  json_tree_postorder_next_entry((root),		     \

Extra space.

> +	     					      &(node)->member,	     \
> +	     					      type, member));	     \

Mixed tabs and spaces. There are more things like that in this patch.
Please carefully self-review your patch next time to make sure it's
neatly formatted.

> +	     (node) = (tmp))
> +
>  #ifdef __cplusplus
>  }
>  #endif

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-12-04 17:54         ` Vladimir Davydov
@ 2018-12-05  8:37           ` Kirill Shcherbatov
  2018-12-05  9:07             ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-05  8:37 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches, Kostya Osipov

Hi! Thank you for review.

>>> BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing
>>> from 1 looks weird to me.
> 
> You left this comment from my previous review unattended.

In fact, it is not so; we use [token.num - 1] to retrieve field.
Let's better describe it in comment:
	/**
	 * Array of child records. Indexes in this array
	 * match [token.num-1] index for JSON_TOKEN_NUM  type
	 * and are allocated sequentially for JSON_TOKEN_STR child
	 * tokens.
	 */
	struct json_token **children;

> As I've already told you, should be
> Needed for #1012

> #ifndef/endif shouldn't be indented.

> I'd prefer to change this to something simpler, like
> 
> 	assert(token->child_count == 0);
> 
> but now I realize that child_count isn't actually the number of
> children, as I thought, but the max id of ever existed child.
> This is confusing. We need to do something about it.
> 
> What about?
> 
> 	/**
> 	 * Allocation size of the children array.
> 	 */
> 	int children_capacity;
> 	/**
> 	 * Max occupied index in the children array.
> 	 */
> 	int max_child_idx;
> 
> and update max_child_idx on json_tree_del() as well

> You pass token** to mh_json_find instead of token*. I haven't noticed
> that before, but turns out that
> 
>> +#define mh_key_t struct json_token **
> 
> This looks weird. Why not
> 
>  #define mh_key_t struct json_token *

Ok

>> +		return entry != NULL ? *entry : NULL;
> 
> AFAIU entry can't be NULL here.

		assert(entry != NULL);
		return *entry;

>> +#define json_tree_foreach_entry_safe(root, node, type, member, tmp)	     \
>> +	for ((node) = json_tree_postorder_next_entry((root), NULL,	     \
>> +						     type, member);	     \
>> +	     &(node)->member != (root) &&				     \
>> +	     ((tmp) =  json_tree_postorder_next_entry((root),		     \
> 
> Extra space.

Ok

===============================================

New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

Needed for #1012
---
 src/lib/json/CMakeLists.txt |   1 +
 src/lib/json/json.c         | 257 ++++++++++++++++++++++++++++++++
 src/lib/json/json.h         | 287 +++++++++++++++++++++++++++++++++++-
 test/unit/json_path.c       | 237 ++++++++++++++++++++++++++++-
 test/unit/json_path.result  |  60 +++++++-
 5 files changed, 839 insertions(+), 3 deletions(-)

diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 0f0739620..51a1f027a 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -4,3 +4,4 @@ set(lib_sources
 
 set_source_files_compile_flags(${lib_sources})
 add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index eb80e4bbc..f809d79c7 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -30,6 +30,7 @@
  */
 
 #include "json.h"
+#include "third_party/PMurHash.h"
 #include <ctype.h>
 #include <stdbool.h>
 #include <unicode/uchar.h>
@@ -241,3 +242,259 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
 		return json_parse_identifier(lexer, token);
 	}
 }
+
+/** Compare JSON tokens keys. */
+static int
+json_token_cmp(const struct json_token *a, const struct json_token *b)
+{
+	if (a->parent != b->parent)
+		return a->parent - b->parent;
+	if (a->type != b->type)
+		return a->type - b->type;
+	int ret = 0;
+	if (a->type == JSON_TOKEN_STR) {
+		if (a->len != b->len)
+			return a->len - b->len;
+		ret = memcmp(a->str, b->str, a->len);
+	} else if (a->type == JSON_TOKEN_NUM) {
+		ret = a->num - b->num;
+	} else {
+		unreachable();
+	}
+	return ret;
+}
+
+#define MH_SOURCE 1
+#define mh_name _json
+#define mh_key_t struct json_token *
+#define mh_node_t struct json_token *
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((*(a))->hash)
+#define mh_hash_key(a, arg) ((a)->hash)
+#define mh_cmp(a, b, arg) (json_token_cmp(*(a), *(b)))
+#define mh_cmp_key(a, b, arg) (json_token_cmp((a), *(b)))
+#include "salad/mhash.h"
+
+static const uint32_t hash_seed = 13U;
+
+/** Compute the hash value of a JSON token. */
+static uint32_t
+json_token_hash(struct json_token *token)
+{
+	uint32_t h = token->parent->hash;
+	uint32_t carry = 0;
+	const void *data;
+	uint32_t data_size;
+	if (token->type == JSON_TOKEN_STR) {
+		data = token->str;
+		data_size = token->len;
+	} else if (token->type == JSON_TOKEN_NUM) {
+		data = &token->num;
+		data_size = sizeof(token->num);
+	} else {
+		unreachable();
+	}
+	PMurHash32_Process(&h, &carry, data, data_size);
+	return PMurHash32_Result(h, carry, data_size);
+}
+
+int
+json_tree_create(struct json_tree *tree)
+{
+	memset(tree, 0, sizeof(struct json_tree));
+	tree->root.hash = hash_seed;
+	tree->root.type = JSON_TOKEN_END;
+	tree->hash = mh_json_new();
+	return tree->hash == NULL ? -1 : 0;
+}
+
+static void
+json_token_destroy(struct json_token *token)
+{
+	/* Token mustn't have JSON subtree. */
+#ifndef NDEBUG
+	struct json_token *iter;
+	uint32_t nodes = 0;
+	json_tree_foreach_preorder(token, iter)
+		nodes++;
+	assert(nodes == 0);
+#endif /* NDEBUG */
+
+	free(token->children);
+}
+
+void
+json_tree_destroy(struct json_tree *tree)
+{
+	json_token_destroy(&tree->root);
+	mh_json_delete(tree->hash);
+}
+
+struct json_token *
+json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
+			  const struct json_token *token)
+{
+	assert(token->type == JSON_TOKEN_STR);
+	struct json_token key;
+	key.type = token->type;
+	key.str = token->str;
+	key.len = token->len;
+	key.parent = parent;
+	key.hash = json_token_hash(&key);
+	mh_int_t id = mh_json_find(tree->hash, &key, NULL);
+	if (id == mh_end(tree->hash))
+		return NULL;
+	struct json_token **entry = mh_json_node(tree->hash, id);
+	assert(entry != NULL);
+	return *entry;
+}
+
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token)
+{
+	int ret = 0;
+	token->parent = parent;
+	uint32_t hash = json_token_hash(token);
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+	uint32_t insert_idx = token->type == JSON_TOKEN_NUM ?
+			      (uint32_t)token->num - 1 :
+			      parent->children_capacity;
+	if (insert_idx >= parent->children_capacity) {
+		uint32_t new_size = parent->children_capacity == 0 ?
+				    8 : 2 * parent->children_capacity;
+		while (insert_idx >= new_size)
+			new_size *= 2;
+		struct json_token **children =
+			realloc(parent->children, new_size*sizeof(void *));
+		if (children == NULL) {
+			ret = -1;
+			goto end;
+		}
+		memset(children + parent->children_capacity, 0,
+		       (new_size - parent->children_capacity)*sizeof(void *));
+		parent->children = children;
+		parent->children_capacity = new_size;
+	}
+	assert(parent->children[insert_idx] == NULL);
+	parent->children[insert_idx] = token;
+	parent->max_child_idx = MAX(parent->max_child_idx, insert_idx);
+	token->sibling_idx = insert_idx;
+	token->hash = hash;
+	token->parent = parent;
+	if (token->type != JSON_TOKEN_STR)
+		goto end;
+	/*
+	 * Add string token to json_tree hash to make lookup
+	 * by name.
+	 */
+	mh_int_t rc =
+		mh_json_put(tree->hash, (const struct json_token **)&token,
+			    NULL, NULL);
+	if (rc == mh_end(tree->hash)) {
+		parent->children[insert_idx] = NULL;
+		ret = -1;
+		goto end;
+	}
+end:
+	assert(json_tree_lookup(tree, parent, token) == token);
+	return ret;
+}
+
+void
+json_tree_del(struct json_tree *tree, struct json_token *token)
+{
+	struct json_token *parent = token->parent;
+	assert(json_tree_lookup(tree, parent, token) == token);
+	struct json_token **child_slot = &parent->children[token->sibling_idx];
+	assert(child_slot != NULL && *child_slot == token);
+	*child_slot = NULL;
+	/* The max_child_idx field may require update. */
+	if (token->sibling_idx == parent->max_child_idx &&
+	    parent->max_child_idx > 0) {
+		uint32_t idx = token->sibling_idx - 1;
+		while (idx > 0 && parent->children[idx] == 0)
+			idx--;
+		parent->max_child_idx = idx;
+	}
+	if (token->type != JSON_TOKEN_STR)
+		goto end;
+	/* Remove string token from json_tree hash. */
+	mh_int_t id = mh_json_find(tree->hash, token, NULL);
+	assert(id != mh_end(tree->hash));
+	mh_json_del(tree->hash, id, NULL);
+	json_token_destroy(token);
+end:
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+}
+
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token token;
+	struct json_token *ret = parent != NULL ? parent : &tree->root;
+	json_lexer_create(&lexer, path, path_len);
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
+	       token.type != JSON_TOKEN_END && ret != NULL) {
+		ret = json_tree_lookup(tree, ret, &token);
+	}
+	if (rc != 0 || token.type != JSON_TOKEN_END)
+		return NULL;
+	return ret;
+}
+
+static struct json_token *
+json_tree_child_next(struct json_token *parent, struct json_token *pos)
+{
+	assert(pos == NULL || pos->parent == parent);
+	struct json_token **arr = parent->children;
+	if (arr == NULL)
+		return NULL;
+	uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
+	while (idx <= parent->max_child_idx && arr[idx] == NULL)
+		idx++;
+	return idx <= parent->max_child_idx ? arr[idx] : NULL;
+}
+
+static struct json_token *
+json_tree_leftmost(struct json_token *pos)
+{
+	struct json_token *last;
+	do {
+		last = pos;
+		pos = json_tree_child_next(pos, NULL);
+	} while (pos != NULL);
+	return last;
+}
+
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos)
+{
+	struct json_token *next = json_tree_child_next(pos, NULL);
+	if (next != NULL)
+		return next;
+	while (pos != root) {
+		next = json_tree_child_next(pos->parent, pos);
+		if (next != NULL)
+			return next;
+		pos = pos->parent;
+	}
+	return NULL;
+}
+
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos)
+{
+	struct json_token *next;
+	if (pos == NULL)
+		return json_tree_leftmost(root);
+	if (pos == root)
+		return NULL;
+	next = json_tree_child_next(pos->parent, pos);
+	if (next != NULL)
+		return json_tree_leftmost(next);
+	return pos->parent;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index ead446878..43cee65d4 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -30,7 +30,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
-#include <stdint.h>
+#include "trivia/util.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +62,10 @@ enum json_token_type {
  * Element of a JSON path. It can be either string or number.
  * String idenfiers are in ["..."] and between dots. Numbers are
  * indexes in [...].
+ *
+ * May be organized in a tree-like structure reflecting a JSON
+ * document structure, for more details see the comment to struct
+ * json_tree.
  */
 struct json_token {
 	enum json_token_type type;
@@ -75,6 +79,111 @@ struct json_token {
 		/** Index value. */
 		uint64_t num;
 	};
+	/**
+	 * Hash value of the token. Used for lookups in a JSON tree.
+	 * For more details, see the comment to json_tree::hash_table.
+	 */
+	uint32_t hash;
+	/**
+	 * Array of child records. Indexes in this array
+	 * match [token.num-1] index for JSON_TOKEN_NUM  type
+	 * and are allocated sequentially for JSON_TOKEN_STR child
+	 * tokens.
+	 */
+	struct json_token **children;
+	/** Allocation size of children array. */
+	uint32_t children_capacity;
+	/** Max occupied index in the children array. */
+	uint32_t max_child_idx;
+	/** Index of node in parent children array. */
+	uint32_t sibling_idx;
+	/** Pointer to parent node. */
+	struct json_token *parent;
+};
+
+struct mh_json_t;
+
+/**
+ * This structure is used for organizing JSON tokens produced
+ * by a lexer in a tree-like structure reflecting a JSON document
+ * structure.
+ *
+ * Each intermediate node of the tree corresponds to either
+ * a JSON map or an array, depending on the key type used by
+ * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively).
+ * Leaf nodes may represent both complex JSON structures and
+ * final values - it is not mandated by the JSON tree design.
+ * The root of the tree doesn't have a key and is preallocated
+ * when the tree is created.
+ *
+ * The json_token structure is intrusive by design, i.e. to store
+ * arbitrary information in a JSON tree, one has to incorporate it
+ * into a user defined structure.
+ *
+ * Example:
+ *
+ *   struct data {
+ *           ...
+ *           struct json_token token;
+ *   };
+ *
+ *   struct json_tree tree;
+ *   json_tree_create(&tree);
+ *   struct json_token *parent = &tree->root;
+ *
+ *   // Add a path to the tree.
+ *   struct data *data = data_new();
+ *   struct json_lexer lexer;
+ *   json_lexer_create(&lexer, path, path_len);
+ *   json_lexer_next_token(&lexer, &data->token);
+ *   while (data->token.type != JSON_TOKEN_END) {
+ *           json_tree_add(&tree, parent, &data->token);
+ *           parent = &data->token;
+ *           data = data_new();
+ *           json_lexer_next_token(&lexer, &data->token);
+ *   }
+ *   data_delete(data);
+ *
+ *   // Look up a path in the tree.
+ *   data = json_tree_lookup_path(&tree, &tree.root,
+ *                                path, path_len);
+ */
+struct json_tree {
+	/**
+	 * Preallocated token corresponding to the JSON tree root.
+	 * It doesn't have a key (set to JSON_TOKEN_END).
+	 */
+	struct json_token root;
+	/**
+	 * Hash table that is used to quickly look up a token
+	 * corresponding to a JSON map item given a key and
+	 * a parent token. We store all tokens that have type
+	 * JSON_TOKEN_STR in this hash table. Apparently, we
+	 * don't need to store JSON_TOKEN_NUM tokens as we can
+	 * quickly look them up in the children array anyway.
+	 *
+	 * The hash table uses pair <parent, key> as key, so
+	 * even tokens that happen to have the same key will
+	 * have different keys in the hash. To look up a tree
+	 * node corresponding to a particular path, we split
+	 * the path into tokens and look up the first token
+	 * in the root node and each following token in the
+	 * node returned at the previous step.
+	 *
+	 * We compute a hash value for a token by hashing its
+	 * key using the hash value of its parent as seed. This
+	 * is equivalent to computing hash for the path leading
+	 * to the token. However, we don't need to recompute
+	 * hash starting from the root at each step as we
+	 * descend the tree looking for a specific path, and we
+	 * can start descent from any node, not only from the root.
+	 *
+	 * As a consequence of this hashing technique, even
+	 * though we don't need to store JSON_TOKEN_NUM tokens
+	 * in the hash table, we still have to compute hash
+	 * values for them.
+	 */
+	struct mh_json_t *hash;
 };
 
 /**
@@ -104,6 +213,182 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
 int
 json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
+/** Create a JSON tree object to manage data relations. */
+int
+json_tree_create(struct json_tree *tree);
+
+/**
+ * Destroy JSON tree object. This routine doesn't destroy attached
+ * subtree so it should be called at the end of manual destroy.
+ */
+void
+json_tree_destroy(struct json_tree *tree);
+
+/** Internal function, use json_tree_lookup instead. */
+struct json_token *
+json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
+			  const struct json_token *token);
+
+/**
+ * Look up a token in a tree at position specified with parent.
+ */
+static inline struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+		 const struct json_token *token)
+{
+	struct json_token *ret = NULL;
+	if (token->type == JSON_TOKEN_NUM) {
+		uint32_t idx =  token->num - 1;
+		ret = likely(idx < parent->children_capacity) ?
+		      parent->children[idx] : NULL;
+	} else {
+		ret = json_tree_lookup_slowpath(tree, parent, token);
+	}
+	return ret;
+}
+
+/**
+ * Append token to the given parent position in a JSON tree. The
+ * parent mustn't have a child with such content.
+ */
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token);
+
+/**
+ * Delete a JSON tree token at the given parent position in JSON
+ * tree. Token entry shouldn't have subtree.
+ */
+void
+json_tree_del(struct json_tree *tree, struct json_token *token);
+
+/** Make child lookup by path in JSON tree. */
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len);
+
+/** Make pre order traversal in JSON tree. */
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos);
+
+/** Make post order traversal in JSON tree. */
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos);
+
+#ifndef typeof
+#define typeof __typeof__
+#endif
+
+/** Return container entry by json_tree_node node. */
+#define json_tree_entry(node, type, member) \
+	container_of((node), type, member)
+/**
+ * Return container entry by json_tree_node or NULL if
+ * node is NULL.
+ */
+#define json_tree_entry_safe(node, type, member) ({			     \
+	(node) != NULL ? json_tree_entry((node), type, member) : NULL;	     \
+})
+
+/** Make entry pre order traversal in JSON tree */
+#define json_tree_preorder_next_entry(root, node, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_preorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/**
+ * Make entry post order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_postorder_next_entry(root, node, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_postorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/**
+ * Make lookup in tree by path and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_lookup_path_entry(tree, parent, path, path_len, type,	     \
+				    member) ({				     \
+	struct json_token *__node =					     \
+		json_tree_lookup_path((tree), (parent), path, path_len);     \
+	json_tree_entry_safe(__node, type, member);			     \
+})
+
+/** Make lookup in tree by token and return entry. */
+#define json_tree_lookup_entry(tree, parent, token, type, member) ({	     \
+	struct json_token *__node =					     \
+		json_tree_lookup((tree), (parent), token);		     \
+	json_tree_entry_safe(__node, type, member);			     \
+})
+
+/**
+ * Make pre-order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_preorder(root, node)				     \
+	for ((node) = json_tree_preorder_next((root), (root));		     \
+	     (node) != NULL;						     \
+	     (node) = json_tree_preorder_next((root), (node)))
+
+/**
+ * Make post-order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_postorder(root, node)				     \
+	for ((node) = json_tree_postorder_next((root), NULL);		     \
+	     (node) != (root);						     \
+	     (node) = json_tree_postorder_next((root), (node)))
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * May be used for destructors.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_safe(root, node, tmp)				     \
+	for ((node) = json_tree_postorder_next((root), NULL);		     \
+	     (node) != (root) &&					     \
+	     ((tmp) =  json_tree_postorder_next((root), (node)));	     \
+	     (node) = (tmp))
+
+/**
+ * Make post-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_preorder(root, node, type, member)	     \
+	for ((node) = json_tree_preorder_next_entry((root), (root),	     \
+						    type, member);	     \
+	     (node) != NULL;						     \
+	     (node) = json_tree_preorder_next_entry((root), &(node)->member, \
+						    type, member))
+
+/**
+ * Make pre-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_postorder(root, node, type, member)	     \
+	for ((node) = json_tree_postorder_next_entry((root), NULL, type,     \
+						     member);		     \
+	     &(node)->member != (root);					     \
+	     (node) = json_tree_postorder_next_entry((root), &(node)->member,\
+						     type, member))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_safe(root, node, type, member, tmp)	     \
+	for ((node) = json_tree_postorder_next_entry((root), NULL,	     \
+						     type, member);	     \
+	     &(node)->member != (root) &&				     \
+	     ((tmp) =  json_tree_postorder_next_entry((root),		     \
+						      &(node)->member,	     \
+						      type, member));	     \
+	     (node) = (tmp))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index a5f90ad98..563d61d1b 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -2,6 +2,7 @@
 #include "unit.h"
 #include "trivia/util.h"
 #include <string.h>
+#include <stdbool.h>
 
 #define reset_to_new_path(value) \
 	path = value; \
@@ -159,14 +160,248 @@ test_errors()
 	footer();
 }
 
+struct test_struct {
+	int value;
+	struct json_token node;
+};
+
+struct test_struct *
+test_struct_alloc(struct test_struct *records_pool, int *pool_idx)
+{
+	struct test_struct *ret = &records_pool[*pool_idx];
+	*pool_idx = *pool_idx + 1;
+	memset(&ret->node, 0, sizeof(ret->node));
+	return ret;
+}
+
+struct test_struct *
+test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
+	      struct test_struct *records_pool, int *pool_idx)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token *parent = &tree->root;
+	json_lexer_create(&lexer, path, path_len);
+	struct test_struct *field = test_struct_alloc(records_pool, pool_idx);
+	while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 &&
+		field->node.type != JSON_TOKEN_END) {
+		struct json_token *next =
+			json_tree_lookup(tree, parent, &field->node);
+		if (next == NULL) {
+			rc = json_tree_add(tree, parent, &field->node);
+			fail_if(rc != 0);
+			next = &field->node;
+			field = test_struct_alloc(records_pool, pool_idx);
+		}
+		parent = next;
+	}
+	fail_if(rc != 0 || field->node.type != JSON_TOKEN_END);
+	/* Release field. */
+	*pool_idx = *pool_idx - 1;
+	return json_tree_entry(parent, struct test_struct, node);
+}
+
+void
+test_tree()
+{
+	header();
+	plan(54);
+
+	struct json_tree tree;
+	int rc = json_tree_create(&tree);
+	fail_if(rc != 0);
+
+	struct test_struct records[7];
+	for (int i = 0; i < 6; i++)
+		records[i].value = i;
+
+	const char *path1 = "[1][10]";
+	const char *path2 = "[1][20].file";
+	const char *path3 = "[1][20].file[2]";
+	const char *path4 = "[1][20].file[8]";
+	const char *path4_copy = "[1][20][\"file\"][8]";
+	const char *path_unregistered = "[1][3]";
+
+	int records_idx = 0;
+	struct test_struct *node, *node_tmp;
+	node = test_add_path(&tree, path1, strlen(path1), records,
+			     &records_idx);
+	is(node, &records[1], "add path '%s'", path1);
+
+	node = test_add_path(&tree, path2, strlen(path2), records,
+			     &records_idx);
+	is(node, &records[3], "add path '%s'", path2);
+
+	node = test_add_path(&tree, path3, strlen(path3), records,
+			     &records_idx);
+	is(node, &records[4], "add path '%s'", path3);
+
+	node = test_add_path(&tree, path4, strlen(path4), records,
+			     &records_idx);
+	is(node, &records[5], "add path '%s'", path4);
+
+	node = test_add_path(&tree, path4_copy, strlen(path4_copy), records,
+			     &records_idx);
+	is(node, &records[5], "add path '%s'", path4_copy);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1),
+					   struct test_struct, node);
+	is(node, &records[1], "lookup path '%s'", path1);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+					   struct test_struct, node);
+	is(node, &records[3], "lookup path '%s'", path2);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered,
+					   strlen(path_unregistered),
+					   struct test_struct, node);
+	is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+	/* Test iterators. */
+	struct json_token *token = NULL, *tmp;
+	const struct json_token *tokens_preorder[] =
+		{&records[0].node, &records[1].node, &records[2].node,
+		 &records[3].node, &records[4].node, &records[5].node};
+	int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]);
+	int idx = 0;
+
+	json_tree_foreach_preorder(&tree.root, token) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(token, tokens_preorder[idx],
+		   "test foreach pre order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	const struct json_token *tree_nodes_postorder[] =
+		{&records[1].node, &records[4].node, &records[5].node,
+		 &records[3].node, &records[2].node, &records[0].node};
+	cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+	idx = 0;
+	json_tree_foreach_postorder(&tree.root, token) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach post order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_safe(&tree.root, token, tmp) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach safe order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_preorder(&tree.root, node, struct test_struct,
+					 node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(&node->node, tokens_preorder[idx],
+		   "test foreach entry pre order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_postorder(&tree.root, node, struct test_struct,
+					  node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder[idx],
+		   "test foreach entry post order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	/* Test record deletion. */
+	is(records[3].node.max_child_idx, 7, "max_child_index %d expected of %d",
+	   records[3].node.max_child_idx, 7);
+	json_tree_del(&tree, &records[5].node);
+	is(records[3].node.max_child_idx, 1, "max_child_index %d expected of %d",
+	   records[3].node.max_child_idx, 1);
+	json_tree_del(&tree, &records[4].node);
+	is(records[3].node.max_child_idx, 0, "max_child_index %d expected of %d",
+	   records[3].node.max_child_idx, 0);
+	node = json_tree_lookup_path_entry(&tree, NULL, path3, strlen(path3),
+					   struct test_struct, node);
+	is(node, NULL, "lookup removed path '%s'", path3);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path4, strlen(path4),
+					   struct test_struct, node);
+	is(node, NULL, "lookup removed path '%s'", path4);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+					   struct test_struct, node);
+	is(node, &records[3], "lookup path was not corrupted '%s'", path2);
+
+	const struct json_token *tree_nodes_postorder_new[] =
+		{&records[1].node, &records[3].node,
+		 &records[2].node, &records[0].node};
+	cnt = sizeof(tree_nodes_postorder_new) /
+	      sizeof(tree_nodes_postorder_new[0]);
+	idx = 0;
+	json_tree_foreach_entry_safe(&tree.root, node, struct test_struct,
+				     node, node_tmp) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder_new[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder_new[idx],
+		   "test foreach entry safe order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		json_tree_del(&tree, &node->node);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+	json_tree_destroy(&tree);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(2);
+	plan(3);
 
 	test_basic();
 	test_errors();
+	test_tree();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f829f..241c7a960 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..2
+1..3
 	*** test_basic ***
     1..71
     ok 1 - parse <[0]>
@@ -99,4 +99,62 @@ ok 1 - subtests
     ok 20 - tab inside identifier
 ok 2 - subtests
 	*** test_errors: done ***
+	*** test_tree ***
+    1..54
+    ok 1 - add path '[1][10]'
+    ok 2 - add path '[1][20].file'
+    ok 3 - add path '[1][20].file[2]'
+    ok 4 - add path '[1][20].file[8]'
+    ok 5 - add path '[1][20]["file"][8]'
+    ok 6 - lookup path '[1][10]'
+    ok 7 - lookup path '[1][20].file'
+    ok 8 - lookup unregistered path '[1][3]'
+    ok 9 - test foreach pre order 0: have 0 expected of 0
+    ok 10 - test foreach pre order 1: have 1 expected of 1
+    ok 11 - test foreach pre order 2: have 2 expected of 2
+    ok 12 - test foreach pre order 3: have 3 expected of 3
+    ok 13 - test foreach pre order 4: have 4 expected of 4
+    ok 14 - test foreach pre order 5: have 5 expected of 5
+    ok 15 - records iterated count 6 of 6
+    ok 16 - test foreach post order 0: have 1 expected of 1
+    ok 17 - test foreach post order 1: have 4 expected of 4
+    ok 18 - test foreach post order 2: have 5 expected of 5
+    ok 19 - test foreach post order 3: have 3 expected of 3
+    ok 20 - test foreach post order 4: have 2 expected of 2
+    ok 21 - test foreach post order 5: have 0 expected of 0
+    ok 22 - records iterated count 6 of 6
+    ok 23 - test foreach safe order 0: have 1 expected of 1
+    ok 24 - test foreach safe order 1: have 4 expected of 4
+    ok 25 - test foreach safe order 2: have 5 expected of 5
+    ok 26 - test foreach safe order 3: have 3 expected of 3
+    ok 27 - test foreach safe order 4: have 2 expected of 2
+    ok 28 - test foreach safe order 5: have 0 expected of 0
+    ok 29 - records iterated count 6 of 6
+    ok 30 - test foreach entry pre order 0: have 0 expected of 0
+    ok 31 - test foreach entry pre order 1: have 1 expected of 1
+    ok 32 - test foreach entry pre order 2: have 2 expected of 2
+    ok 33 - test foreach entry pre order 3: have 3 expected of 3
+    ok 34 - test foreach entry pre order 4: have 4 expected of 4
+    ok 35 - test foreach entry pre order 5: have 5 expected of 5
+    ok 36 - records iterated count 6 of 6
+    ok 37 - test foreach entry post order 0: have 1 expected of 1
+    ok 38 - test foreach entry post order 1: have 4 expected of 4
+    ok 39 - test foreach entry post order 2: have 5 expected of 5
+    ok 40 - test foreach entry post order 3: have 3 expected of 3
+    ok 41 - test foreach entry post order 4: have 2 expected of 2
+    ok 42 - test foreach entry post order 5: have 0 expected of 0
+    ok 43 - records iterated count 6 of 6
+    ok 44 - max_child_index 7 expected of 7
+    ok 45 - max_child_index 1 expected of 1
+    ok 46 - max_child_index 0 expected of 0
+    ok 47 - lookup removed path '[1][20].file[2]'
+    ok 48 - lookup removed path '[1][20].file[8]'
+    ok 49 - lookup path was not corrupted '[1][20].file'
+    ok 50 - test foreach entry safe order 0: have 1 expected of 1
+    ok 51 - test foreach entry safe order 1: have 3 expected of 3
+    ok 52 - test foreach entry safe order 2: have 2 expected of 2
+    ok 53 - test foreach entry safe order 3: have 0 expected of 0
+    ok 54 - records iterated count 4 of 4
+ok 3 - subtests
+	*** test_tree: done ***
 	*** main: done ***
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-12-04 16:09       ` Vladimir Davydov
  2018-12-04 16:32         ` Kirill Shcherbatov
@ 2018-12-05  8:37         ` Kirill Shcherbatov
  2018-12-06  7:56         ` Kirill Shcherbatov
  2 siblings, 0 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-05  8:37 UTC (permalink / raw)
  To: Vladimir Davydov, tarantool-patches; +Cc: Kostya Osipov

> /home/vlad/src/tarantool/src/box/tuple_format.h: In function ‘tuple_field* tuple_format_field(tuple_format*, uint32_t)’:
> /home/vlad/src/tarantool/src/box/tuple_format.h:207:2: sorry, unimplemented: non-trivial designated initializers not supported

Fixed on branch. I've used gcc extension.

=========================================

As we going to work with format fields in a unified way, we
started to use the tree JSON class for working with first-level
format fields.

Needed for #1012
---
 src/box/sql.c          |  16 +++---
 src/box/sql/build.c    |   5 +-
 src/box/tuple.c        |  10 ++--
 src/box/tuple_format.c | 125 +++++++++++++++++++++++++++++------------
 src/box/tuple_format.h |  49 +++++++++++++---
 src/box/vy_stmt.c      |   4 +-
 6 files changed, 150 insertions(+), 59 deletions(-)

diff --git a/src/box/sql.c b/src/box/sql.c
index 7b41c9926..9effe1eb2 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -201,7 +201,8 @@ tarantoolSqlite3TupleColumnFast(BtCursor *pCur, u32 fieldno, u32 *field_size)
 	struct tuple_format *format = tuple_format(pCur->last_tuple);
 	assert(format->exact_field_count == 0
 	       || fieldno < format->exact_field_count);
-	if (format->fields[fieldno].offset_slot == TUPLE_OFFSET_SLOT_NIL)
+	if (tuple_format_field(format, fieldno)->offset_slot ==
+	    TUPLE_OFFSET_SLOT_NIL)
 		return NULL;
 	const char *field = tuple_field(pCur->last_tuple, fieldno);
 	const char *end = field;
@@ -896,7 +897,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	struct key_def *key_def;
 	const struct tuple *tuple;
 	const char *base;
-	const struct tuple_format *format;
+	struct tuple_format *format;
 	const uint32_t *field_map;
 	uint32_t field_count, next_fieldno = 0;
 	const char *p, *field0;
@@ -914,7 +915,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	base = tuple_data(tuple);
 	format = tuple_format(tuple);
 	field_map = tuple_field_map(tuple);
-	field_count = format->field_count;
+	field_count = tuple_format_field_count(format);
 	field0 = base; mp_decode_array(&field0); p = field0;
 	for (i = 0; i < n; i++) {
 		/*
@@ -932,9 +933,10 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 
 		if (fieldno != next_fieldno) {
+			struct tuple_field *field =
+				tuple_format_field(format, fieldno);
 			if (fieldno >= field_count ||
-			    format->fields[fieldno].offset_slot ==
-			    TUPLE_OFFSET_SLOT_NIL) {
+			    field->offset_slot == TUPLE_OFFSET_SLOT_NIL) {
 				/* Outdated field_map. */
 				uint32_t j = 0;
 
@@ -942,9 +944,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 				while (j++ != fieldno)
 					mp_next(&p);
 			} else {
-				p = base + field_map[
-					format->fields[fieldno].offset_slot
-					];
+				p = base + field_map[field->offset_slot];
 			}
 		}
 		next_fieldno = fieldno + 1;
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 52f0bde15..b5abaeeda 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -936,8 +936,9 @@ sql_column_collation(struct space_def *def, uint32_t column, uint32_t *coll_id)
 		struct coll_id *collation = coll_by_id(*coll_id);
 		return collation != NULL ? collation->coll : NULL;
 	}
-	*coll_id = space->format->fields[column].coll_id;
-	return space->format->fields[column].coll;
+	struct tuple_field *field = tuple_format_field(space->format, column);
+	*coll_id = field->coll_id;
+	return field->coll;
 }
 
 struct ExprList *
diff --git a/src/box/tuple.c b/src/box/tuple.c
index ef4d16f39..aae1c3cdd 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,7 +138,7 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
 int
 tuple_validate_raw(struct tuple_format *format, const char *tuple)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to check */
 
 	/* Check to see if the tuple has a sufficient number of fields. */
@@ -158,10 +158,12 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
 	}
 
 	/* Check field types */
-	struct tuple_field *field = &format->fields[0];
+	struct tuple_field *field = tuple_format_field(format, 0);
 	uint32_t i = 0;
-	uint32_t defined_field_count = MIN(field_count, format->field_count);
-	for (; i < defined_field_count; ++i, ++field) {
+	uint32_t defined_field_count =
+		MIN(field_count, tuple_format_field_count(format));
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field(format, i);
 		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
 					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
 					 tuple_field_is_nullable(field)))
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 661cfdc94..b801a0eb4 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -38,10 +38,27 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
 
 static uint32_t formats_size = 0, formats_capacity = 0;
 
-static const struct tuple_field tuple_field_default = {
-	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
-};
+static struct tuple_field *
+tuple_field_new(void)
+{
+	struct tuple_field *ret = calloc(1, sizeof(struct tuple_field));
+	if (ret == NULL) {
+		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
+			 "ret");
+		return NULL;
+	}
+	ret->type = FIELD_TYPE_ANY;
+	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
+	ret->coll_id = COLL_NONE;
+	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
+	return ret;
+}
+
+static void
+tuple_field_delete(struct tuple_field *field)
+{
+	free(field);
+}
 
 static int
 tuple_format_use_key_part(struct tuple_format *format,
@@ -49,8 +66,8 @@ tuple_format_use_key_part(struct tuple_format *format,
 			  const struct key_part *part, bool is_sequential,
 			  int *current_slot)
 {
-	assert(part->fieldno < format->field_count);
-	struct tuple_field *field = &format->fields[part->fieldno];
+	assert(part->fieldno < tuple_format_field_count(format));
+	struct tuple_field *field = tuple_format_field(format, part->fieldno);
 	/*
 		* If a field is not present in the space format,
 		* inherit nullable action of the first key part
@@ -138,16 +155,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 	format->min_field_count =
 		tuple_format_min_field_count(keys, key_count, fields,
 					     field_count);
-	if (format->field_count == 0) {
+	if (tuple_format_field_count(format) == 0) {
 		format->field_map_size = 0;
 		return 0;
 	}
 	/* Initialize defined fields */
 	for (uint32_t i = 0; i < field_count; ++i) {
-		format->fields[i].is_key_part = false;
-		format->fields[i].type = fields[i].type;
-		format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
-		format->fields[i].nullable_action = fields[i].nullable_action;
+		struct tuple_field *field = tuple_format_field(format, i);
+		field->type = fields[i].type;
+		field->nullable_action = fields[i].nullable_action;
 		struct coll *coll = NULL;
 		uint32_t cid = fields[i].coll_id;
 		if (cid != COLL_NONE) {
@@ -159,12 +175,9 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			}
 			coll = coll_id->coll;
 		}
-		format->fields[i].coll = coll;
-		format->fields[i].coll_id = cid;
+		field->coll = coll;
+		field->coll_id = cid;
 	}
-	/* Initialize remaining fields */
-	for (uint32_t i = field_count; i < format->field_count; i++)
-		format->fields[i] = tuple_field_default;
 
 	int current_slot = 0;
 
@@ -184,7 +197,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		}
 	}
 
-	assert(format->fields[0].offset_slot == TUPLE_OFFSET_SLOT_NIL);
+	assert(tuple_format_field(format, 0)->offset_slot ==
+		TUPLE_OFFSET_SLOT_NIL);
 	size_t field_map_size = -current_slot * sizeof(uint32_t);
 	if (field_map_size > UINT16_MAX) {
 		/** tuple->data_offset is 16 bits */
@@ -242,6 +256,19 @@ tuple_format_deregister(struct tuple_format *format)
 	format->id = FORMAT_ID_NIL;
 }
 
+/** Destroy field JSON tree and release allocated memory. */
+static inline void
+tuple_format_field_tree_destroy(struct tuple_format *format)
+{
+	struct tuple_field *field, *tmp;
+	json_tree_foreach_entry_safe(&format->fields.root, field,
+				     struct tuple_field, token, tmp) {
+		json_tree_del(&format->fields, &field->token);
+		tuple_field_delete(field);
+	}
+	json_tree_destroy(&format->fields);
+}
+
 static struct tuple_format *
 tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		   uint32_t space_field_count, struct tuple_dictionary *dict)
@@ -258,39 +285,60 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
-	uint32_t total = sizeof(struct tuple_format) +
-			 field_count * sizeof(struct tuple_field);
 
-	struct tuple_format *format = (struct tuple_format *) malloc(total);
+	struct tuple_format *format = malloc(sizeof(struct tuple_format));
 	if (format == NULL) {
 		diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
 			 "tuple format");
 		return NULL;
 	}
+	if (json_tree_create(&format->fields) != 0) {
+		free(format);
+		return NULL;
+	}
+	struct json_token token;
+	memset(&token, 0, sizeof(token));
+	token.type = JSON_TOKEN_NUM;
+	for (token.num = TUPLE_INDEX_BASE;
+	     token.num < field_count + TUPLE_INDEX_BASE; token.num++) {
+		struct tuple_field *field = tuple_field_new();
+		if (field == NULL)
+			goto error;
+		field->token = token;
+		if (json_tree_add(&format->fields, &format->fields.root,
+				  &field->token) != 0) {
+			diag_set(OutOfMemory, 0, "json_tree_add",
+				 "&format->tree");
+			tuple_field_delete(field);
+			goto error;
+		}
+	}
 	if (dict == NULL) {
 		assert(space_field_count == 0);
 		format->dict = tuple_dictionary_new(NULL, 0);
-		if (format->dict == NULL) {
-			free(format);
-			return NULL;
-		}
+		if (format->dict == NULL)
+			goto error;
 	} else {
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
-	format->field_count = field_count;
 	format->index_field_count = index_field_count;
 	format->exact_field_count = 0;
 	format->min_field_count = 0;
 	return format;
+error:;
+	tuple_format_field_tree_destroy(format);
+	free(format);
+	return NULL;
 }
 
 /** Free tuple format resources, doesn't unregister. */
 static inline void
 tuple_format_destroy(struct tuple_format *format)
 {
+	tuple_format_field_tree_destroy(format);
 	tuple_dictionary_unref(format->dict);
 }
 
@@ -328,18 +376,21 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 }
 
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2)
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2)
 {
 	if (format1->exact_field_count != format2->exact_field_count)
 		return false;
-	for (uint32_t i = 0; i < format1->field_count; ++i) {
-		const struct tuple_field *field1 = &format1->fields[i];
+	uint32_t format1_field_count = tuple_format_field_count(format1);
+	uint32_t format2_field_count = tuple_format_field_count(format2);
+	for (uint32_t i = 0; i < format1_field_count; ++i) {
+		const struct tuple_field *field1 =
+			tuple_format_field(format1, i);
 		/*
 		 * The field has a data type in format1, but has
 		 * no data type in format2.
 		 */
-		if (i >= format2->field_count) {
+		if (i >= format2_field_count) {
 			/*
 			 * The field can get a name added
 			 * for it, and this doesn't require a data
@@ -355,7 +406,8 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
 			else
 				return false;
 		}
-		const struct tuple_field *field2 = &format2->fields[i];
+		const struct tuple_field *field2 =
+			tuple_format_field(format2, i);
 		if (! field_type1_contains_type2(field1->type, field2->type))
 			return false;
 		/*
@@ -374,7 +426,7 @@ int
 tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple, bool validate)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to initialize */
 
 	const char *pos = tuple;
@@ -397,17 +449,17 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 
 	/* first field is simply accessible, so we do not store offset to it */
 	enum mp_type mp_type = mp_typeof(*pos);
-	const struct tuple_field *field = &format->fields[0];
+	const struct tuple_field *field =
+		tuple_format_field((struct tuple_format *)format, 0);
 	if (validate &&
 	    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
 				 TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
 		return -1;
 	mp_next(&pos);
 	/* other fields...*/
-	++field;
 	uint32_t i = 1;
 	uint32_t defined_field_count = MIN(field_count, validate ?
-					   format->field_count :
+					   tuple_format_field_count(format) :
 					   format->index_field_count);
 	if (field_count < format->index_field_count) {
 		/*
@@ -417,7 +469,8 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		memset((char *)field_map - format->field_map_size, 0,
 		       format->field_map_size);
 	}
-	for (; i < defined_field_count; ++i, ++field) {
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field((struct tuple_format *)format, i);
 		mp_type = mp_typeof(*pos);
 		if (validate &&
 		    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 232df22b2..d84feb685 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -34,6 +34,7 @@
 #include "key_def.h"
 #include "field_def.h"
 #include "errinj.h"
+#include "json/json.h"
 #include "tuple_dictionary.h"
 
 #if defined(__cplusplus)
@@ -113,6 +114,8 @@ struct tuple_field {
 	struct coll *coll;
 	/** Collation identifier. */
 	uint32_t coll_id;
+	/** Link in tuple_format::fields. */
+	struct json_token token;
 };
 
 /**
@@ -166,16 +169,46 @@ struct tuple_format {
 	 * index_field_count <= min_field_count <= field_count.
 	 */
 	uint32_t min_field_count;
-	/* Length of 'fields' array. */
-	uint32_t field_count;
 	/**
 	 * Shared names storage used by all formats of a space.
 	 */
 	struct tuple_dictionary *dict;
-	/* Formats of the fields */
-	struct tuple_field fields[0];
+	/**
+	 * Fields comprising the format, organized in a tree.
+	 * First level nodes correspond to tuple fields.
+	 * Deeper levels define indexed JSON paths within
+	 * tuple fields. Nodes of the tree are linked by
+	 * tuple_field::token.
+	 */
+	struct json_tree fields;
 };
 
+/**
+ * Return a count of first level nodes correspond to tuple
+ * fields.
+ */
+static inline uint32_t
+tuple_format_field_count(const struct tuple_format *format)
+{
+	const struct json_token *root = &format->fields.root;
+	return root->children != NULL ? root->max_child_idx + 1 : 0;
+}
+
+/**
+ * Get the first level node corresponding to tuple field by its
+ * fieldno.
+ */
+static inline struct tuple_field *
+tuple_format_field(struct tuple_format *format, uint32_t fieldno)
+{
+	assert(fieldno < tuple_format_field_count(format));
+	struct json_token token;
+	token.type = JSON_TOKEN_NUM;
+	token.num = fieldno + TUPLE_INDEX_BASE;
+	return json_tree_lookup_entry(&format->fields, &format->fields.root,
+				      &token, struct tuple_field, token);
+}
+
 extern struct tuple_format **tuple_formats;
 
 static inline uint32_t
@@ -238,8 +271,8 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
  * @retval True, if @a format1 can store any tuples of @a format2.
  */
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2);
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2);
 
 /**
  * Calculate minimal field count of tuples with specified keys and
@@ -333,7 +366,9 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple,
 			return tuple;
 		}
 
-		int32_t offset_slot = format->fields[field_no].offset_slot;
+		int32_t offset_slot =
+			tuple_format_field((struct tuple_format *)format,
+					   field_no)->offset_slot;
 		if (offset_slot != TUPLE_OFFSET_SLOT_NIL) {
 			if (field_map[offset_slot] != 0)
 				return tuple + field_map[offset_slot];
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index d83840406..3e60fece9 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -411,7 +411,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	uint32_t *field_map = (uint32_t *) raw;
 	char *wpos = mp_encode_array(raw, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
 			field_map[field->offset_slot] = wpos - raw;
 		if (iov[i].iov_base == NULL) {
@@ -465,7 +465,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format,
 	}
 	char *pos = mp_encode_array(data, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (! field->is_key_part) {
 			/* Unindexed field - write NIL. */
 			assert(i < src_count);
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-12-05  8:37           ` Kirill Shcherbatov
@ 2018-12-05  9:07             ` Vladimir Davydov
  2018-12-05  9:52               ` Vladimir Davydov
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-05  9:07 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Wed, Dec 05, 2018 at 11:37:06AM +0300, Kirill Shcherbatov wrote:
> Hi! Thank you for review.
> 
> >>> BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing
> >>> from 1 looks weird to me.
> > 
> > You left this comment from my previous review unattended.
> 
> In fact, it is not so; we use [token.num - 1] to retrieve field.
> Let's better describe it in comment:
> 	/**
> 	 * Array of child records. Indexes in this array
> 	 * match [token.num-1] index for JSON_TOKEN_NUM  type
> 	 * and are allocated sequentially for JSON_TOKEN_STR child
> 	 * tokens.
> 	 */
> 	struct json_token **children;

This is weird: AFAIU json_lexer may return token.num equal to 0.
What happens if we try to insert such a token into a tree? I think 
we should insert a token at children[token.num], not [toekn.num-1].

> 
> > As I've already told you, should be
> > Needed for #1012
> 
> > #ifndef/endif shouldn't be indented.
> 
> > I'd prefer to change this to something simpler, like
> > 
> > 	assert(token->child_count == 0);
> > 
> > but now I realize that child_count isn't actually the number of
> > children, as I thought, but the max id of ever existed child.
> > This is confusing. We need to do something about it.
> > 
> > What about?
> > 
> > 	/**
> > 	 * Allocation size of the children array.
> > 	 */
> > 	int children_capacity;
> > 	/**
> > 	 * Max occupied index in the children array.
> > 	 */
> > 	int max_child_idx;
> > 
> > and update max_child_idx on json_tree_del() as well
> 
> > You pass token** to mh_json_find instead of token*. I haven't noticed
> > that before, but turns out that
> > 
> >> +#define mh_key_t struct json_token **
> > 
> > This looks weird. Why not
> > 
> >  #define mh_key_t struct json_token *
> 
> Ok
> 
> >> +		return entry != NULL ? *entry : NULL;
> > 
> > AFAIU entry can't be NULL here.
> 
> 		assert(entry != NULL);
> 		return *entry;
> 
> >> +#define json_tree_foreach_entry_safe(root, node, type, member, tmp)	     \
> >> +	for ((node) = json_tree_postorder_next_entry((root), NULL,	     \
> >> +						     type, member);	     \
> >> +	     &(node)->member != (root) &&				     \
> >> +	     ((tmp) =  json_tree_postorder_next_entry((root),		     \
> > 
> > Extra space.
> 
> Ok

These 'Ok'-s only clutter the email. If you agree with all my other
comments. You can simply write "Agreed with everything else" and don't
quote my comments.

Also, please don't re-push and re-send the patch until we've agreed on
all points. In this particular case the question about whether we should
start indexing from 0 or 1 remains.

> 
> ===============================================
> 
> New JSON tree class would store JSON paths for tuple fields
> for registered non-plain indexes. It is a hierarchical data
> structure that organize JSON nodes produced by parser.
> Class provides API to lookup node by path and iterate over the
> tree.
> JSON Indexes patch require such functionality to make lookup
> for tuple_fields by path, make initialization of field map and
> build vynyl_stmt msgpack for secondary index via JSON tree
> iteration.
> 
> Needed for #1012
> ---
>  src/lib/json/CMakeLists.txt |   1 +
>  src/lib/json/json.c         | 257 ++++++++++++++++++++++++++++++++
>  src/lib/json/json.h         | 287 +++++++++++++++++++++++++++++++++++-
>  test/unit/json_path.c       | 237 ++++++++++++++++++++++++++++-
>  test/unit/json_path.result  |  60 +++++++-
>  5 files changed, 839 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-12-05  9:07             ` Vladimir Davydov
@ 2018-12-05  9:52               ` Vladimir Davydov
  2018-12-06  7:56                 ` Kirill Shcherbatov
  2018-12-06  7:56                 ` [tarantool-patches] Re: [PATCH v5 2/9] lib: make index_base support for json_lexer Kirill Shcherbatov
  0 siblings, 2 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-05  9:52 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Wed, Dec 05, 2018 at 12:07:40PM +0300, Vladimir Davydov wrote:
> On Wed, Dec 05, 2018 at 11:37:06AM +0300, Kirill Shcherbatov wrote:
> > Hi! Thank you for review.
> > 
> > >>> BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing
> > >>> from 1 looks weird to me.
> > > 
> > > You left this comment from my previous review unattended.
> > 
> > In fact, it is not so; we use [token.num - 1] to retrieve field.
> > Let's better describe it in comment:
> > 	/**
> > 	 * Array of child records. Indexes in this array
> > 	 * match [token.num-1] index for JSON_TOKEN_NUM  type
> > 	 * and are allocated sequentially for JSON_TOKEN_STR child
> > 	 * tokens.
> > 	 */
> > 	struct json_token **children;
> 
> This is weird: AFAIU json_lexer may return token.num equal to 0.
> What happens if we try to insert such a token into a tree? I think 
> we should insert a token at children[token.num], not [toekn.num-1].

Discussed verbally. Agreed that we should pass index_base to
json_lexer_create, as we do, for example, in case of tuple_update.
json_lexer_next will subtract index_base from token.num before
returning the token. json_tree will use token.num as is when
inserting it into the children array. Tuple-related routines
will pass TUPLE_INDEX_BASE to json_lexer_create when parsing user
input so that Lua 1-base indexes are converted to 0-base.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp, json_path_validate
  2018-12-03 20:14         ` Konstantin Osipov
@ 2018-12-06  7:56           ` Kirill Shcherbatov
  0 siblings, 0 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-06  7:56 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov, Kostya Osipov

Introduced json_path_validate routine to ensure user-defined
JSON path is valid. This will be required to raise an error if
an incorrect user-defined jason-path is detected.

Introduced json_path_cmp routine to compare JSON paths that may
have different representation.
Note that:
 - in case of paths that have same token-sequence prefix,
   the path having more tokens is assumed to be greater
 - both paths to compare should be valid

Needed for #1012
---
 src/lib/json/json.c        | 29 +++++++++++++++++++++++++++++
 src/lib/json/json.h        | 28 ++++++++++++++++++++++++++++
 test/unit/json_path.c      | 37 ++++++++++++++++++++++++++++++++++++-
 test/unit/json_path.result | 13 ++++++++++++-
 4 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index 65169b047..5586e59fc 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -500,3 +500,32 @@ json_tree_postorder_next(struct json_token *root, struct json_token *pos)
 		return json_tree_leftmost(next);
 	return pos->parent;
 }
+
+int
+json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len,
+	      uint32_t index_base)
+{
+	struct json_lexer lexer_a, lexer_b;
+	json_lexer_create(&lexer_a, a, a_len, index_base);
+	json_lexer_create(&lexer_b, b, b_len, index_base);
+	struct json_token token_a, token_b;
+	token_a.parent = NULL;
+	token_b.parent = NULL;
+	int rc_a, rc_b;
+	while ((rc_a = json_lexer_next_token(&lexer_a, &token_a)) == 0 &&
+		(rc_b = json_lexer_next_token(&lexer_b, &token_b)) == 0 &&
+		token_a.type != JSON_TOKEN_END &&
+		token_b.type != JSON_TOKEN_END) {
+		int rc = json_token_cmp(&token_a, &token_b);
+		if (rc != 0)
+			return rc;
+	}
+	/* Paths a and b should be valid. */
+	assert(rc_b == 0 && rc_b == 0);
+	/*
+	 * The parser stopped because the end of one of the paths
+	 * was reached. As JSON_TOKEN_END > JSON_TOKEN_{NUM, STR},
+	 * the path having more tokens has lower key.type value.
+	 */
+	return token_b.type - token_a.type;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index 594d59a08..baf26143d 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -30,6 +30,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
+#include <stdbool.h>
 #include "trivia/util.h"
 
 #ifdef __cplusplus
@@ -222,6 +223,33 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len,
 int
 json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
+/**
+ * Compare two JSON paths using Lexer class.
+ * - in case of paths that have same token-sequence prefix,
+ *   the path having more tokens is assumed to be greater
+ * - both paths should be valid
+ *   (may be tested with json_path_validate).
+ */
+int
+json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len,
+	      uint32_t index_base);
+
+/**
+ * Check if the passed JSON path is valid.
+ * Return 0 for valid path and error position for invalid.
+ */
+static inline int
+json_path_validate(const char *path, uint32_t path_len, uint32_t index_base)
+{
+	struct json_lexer lexer;
+	json_lexer_create(&lexer, path, path_len, index_base);
+	struct json_token token;
+	int rc;
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
+		token.type != JSON_TOKEN_END) {};
+	return rc;
+}
+
 /** Create a JSON tree object to manage data relations. */
 int
 json_tree_create(struct json_tree *tree);
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 1b224f9c2..72e50e143 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -405,15 +405,50 @@ test_tree()
 	footer();
 }
 
+void
+test_path_cmp()
+{
+	const char *a = "Data[1][\"FIO\"].fname";
+	uint32_t a_len = strlen(a);
+	const struct path_and_errpos rc[] = {
+		{a, 0},
+		{"[\"Data\"][1].FIO[\"fname\"]", 0},
+		{"Data[1]", 1},
+		{"Data[1][\"FIO\"].fname[1]", -1},
+		{"Data[1][\"Info\"].fname[1]", -1},
+	};
+	header();
+	plan(lengthof(rc) + 2);
+	for (size_t i = 0; i < lengthof(rc); ++i) {
+		const char *path = rc[i].path;
+		int errpos = rc[i].errpos;
+		int rc = json_path_cmp(a, a_len, path, strlen(path),
+				       TUPLE_INDEX_BASE);
+		if (rc > 0) rc = 1;
+		if (rc < 0) rc = -1;
+		is(rc, errpos, "path cmp result \"%s\" with \"%s\": "
+		   "have %d, expected %d", a, path, rc, errpos);
+	}
+	const char *invalid = "Data[[1][\"FIO\"].fname";
+	int ret = json_path_validate(a, strlen(a), TUPLE_INDEX_BASE);
+	is(ret, 0, "path %s is valid", a);
+	ret = json_path_validate(invalid, strlen(invalid), TUPLE_INDEX_BASE);
+	is(ret, 6, "path %s error pos %d expected %d", invalid, ret, 6);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(3);
+	plan(4);
 
 	test_basic();
 	test_errors();
 	test_tree();
+	test_path_cmp();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index 0ee970c8c..cf0fa51c4 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..3
+1..4
 	*** test_basic ***
     1..71
     ok 1 - parse <[1]>
@@ -158,4 +158,15 @@ ok 2 - subtests
     ok 54 - records iterated count 4 of 4
 ok 3 - subtests
 	*** test_tree: done ***
+	*** test_path_cmp ***
+    1..7
+    ok 1 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname": have 0, expected 0
+    ok 2 - path cmp result "Data[1]["FIO"].fname" with "["Data"][1].FIO["fname"]": have 0, expected 0
+    ok 3 - path cmp result "Data[1]["FIO"].fname" with "Data[1]": have 1, expected 1
+    ok 4 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname[1]": have -1, expected -1
+    ok 5 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["Info"].fname[1]": have -1, expected -1
+    ok 6 - path Data[1]["FIO"].fname is valid
+    ok 7 - path Data[[1]["FIO"].fname error pos 6 expected 6
+ok 4 - subtests
+	*** test_path_cmp: done ***
 	*** main: done ***
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-12-04 16:09       ` Vladimir Davydov
  2018-12-04 16:32         ` Kirill Shcherbatov
  2018-12-05  8:37         ` Kirill Shcherbatov
@ 2018-12-06  7:56         ` Kirill Shcherbatov
  2018-12-06  8:06           ` Vladimir Davydov
  2 siblings, 1 reply; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-06  7:56 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: tarantool-patches, Kostya Osipov

As we going to work with format fields in a unified way, we
started to use the tree JSON class for working with first-level
format fields.

Needed for #1012
---
 src/box/sql.c          |  18 +++---
 src/box/sql/build.c    |   5 +-
 src/box/tuple.c        |  10 ++--
 src/box/tuple_format.c | 122 +++++++++++++++++++++++++++++------------
 src/box/tuple_format.h |  49 ++++++++++++++---
 src/box/vy_stmt.c      |   4 +-
 6 files changed, 148 insertions(+), 60 deletions(-)

diff --git a/src/box/sql.c b/src/box/sql.c
index 25b3213fd..2cb0edbff 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -199,8 +199,9 @@ tarantoolSqlite3TupleColumnFast(BtCursor *pCur, u32 fieldno, u32 *field_size)
 	assert(pCur->last_tuple != NULL);
 
 	struct tuple_format *format = tuple_format(pCur->last_tuple);
-	if (fieldno >= format->field_count ||
-	    format->fields[fieldno].offset_slot == TUPLE_OFFSET_SLOT_NIL)
+	if (fieldno >= tuple_format_field_count(format) ||
+	    tuple_format_field(format, fieldno)->offset_slot ==
+	    TUPLE_OFFSET_SLOT_NIL)
 		return NULL;
 	const char *field = tuple_field(pCur->last_tuple, fieldno);
 	const char *end = field;
@@ -895,7 +896,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	struct key_def *key_def;
 	const struct tuple *tuple;
 	const char *base;
-	const struct tuple_format *format;
+	struct tuple_format *format;
 	const uint32_t *field_map;
 	uint32_t field_count, next_fieldno = 0;
 	const char *p, *field0;
@@ -913,7 +914,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 	base = tuple_data(tuple);
 	format = tuple_format(tuple);
 	field_map = tuple_field_map(tuple);
-	field_count = format->field_count;
+	field_count = tuple_format_field_count(format);
 	field0 = base; mp_decode_array(&field0); p = field0;
 	for (i = 0; i < n; i++) {
 		/*
@@ -931,9 +932,10 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 		uint32_t fieldno = key_def->parts[i].fieldno;
 
 		if (fieldno != next_fieldno) {
+			struct tuple_field *field =
+				tuple_format_field(format, fieldno);
 			if (fieldno >= field_count ||
-			    format->fields[fieldno].offset_slot ==
-			    TUPLE_OFFSET_SLOT_NIL) {
+			    field->offset_slot == TUPLE_OFFSET_SLOT_NIL) {
 				/* Outdated field_map. */
 				uint32_t j = 0;
 
@@ -941,9 +943,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor,
 				while (j++ != fieldno)
 					mp_next(&p);
 			} else {
-				p = base + field_map[
-					format->fields[fieldno].offset_slot
-					];
+				p = base + field_map[field->offset_slot];
 			}
 		}
 		next_fieldno = fieldno + 1;
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 52f0bde15..b5abaeeda 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -936,8 +936,9 @@ sql_column_collation(struct space_def *def, uint32_t column, uint32_t *coll_id)
 		struct coll_id *collation = coll_by_id(*coll_id);
 		return collation != NULL ? collation->coll : NULL;
 	}
-	*coll_id = space->format->fields[column].coll_id;
-	return space->format->fields[column].coll;
+	struct tuple_field *field = tuple_format_field(space->format, column);
+	*coll_id = field->coll_id;
+	return field->coll;
 }
 
 struct ExprList *
diff --git a/src/box/tuple.c b/src/box/tuple.c
index ef4d16f39..aae1c3cdd 100644
--- a/src/box/tuple.c
+++ b/src/box/tuple.c
@@ -138,7 +138,7 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple)
 int
 tuple_validate_raw(struct tuple_format *format, const char *tuple)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to check */
 
 	/* Check to see if the tuple has a sufficient number of fields. */
@@ -158,10 +158,12 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple)
 	}
 
 	/* Check field types */
-	struct tuple_field *field = &format->fields[0];
+	struct tuple_field *field = tuple_format_field(format, 0);
 	uint32_t i = 0;
-	uint32_t defined_field_count = MIN(field_count, format->field_count);
-	for (; i < defined_field_count; ++i, ++field) {
+	uint32_t defined_field_count =
+		MIN(field_count, tuple_format_field_count(format));
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field(format, i);
 		if (key_mp_type_validate(field->type, mp_typeof(*tuple),
 					 ER_FIELD_TYPE, i + TUPLE_INDEX_BASE,
 					 tuple_field_is_nullable(field)))
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 149248144..eeb68f5dd 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -38,10 +38,27 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL;
 
 static uint32_t formats_size = 0, formats_capacity = 0;
 
-static const struct tuple_field tuple_field_default = {
-	FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false,
-	ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE,
-};
+static struct tuple_field *
+tuple_field_new(void)
+{
+	struct tuple_field *ret = calloc(1, sizeof(struct tuple_field));
+	if (ret == NULL) {
+		diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc",
+			 "ret");
+		return NULL;
+	}
+	ret->type = FIELD_TYPE_ANY;
+	ret->offset_slot = TUPLE_OFFSET_SLOT_NIL;
+	ret->coll_id = COLL_NONE;
+	ret->nullable_action = ON_CONFLICT_ACTION_NONE;
+	return ret;
+}
+
+static void
+tuple_field_delete(struct tuple_field *field)
+{
+	free(field);
+}
 
 static int
 tuple_format_use_key_part(struct tuple_format *format,
@@ -49,8 +66,8 @@ tuple_format_use_key_part(struct tuple_format *format,
 			  const struct key_part *part, bool is_sequential,
 			  int *current_slot)
 {
-	assert(part->fieldno < format->field_count);
-	struct tuple_field *field = &format->fields[part->fieldno];
+	assert(part->fieldno < tuple_format_field_count(format));
+	struct tuple_field *field = tuple_format_field(format, part->fieldno);
 	/*
 		* If a field is not present in the space format,
 		* inherit nullable action of the first key part
@@ -138,16 +155,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 	format->min_field_count =
 		tuple_format_min_field_count(keys, key_count, fields,
 					     field_count);
-	if (format->field_count == 0) {
+	if (tuple_format_field_count(format) == 0) {
 		format->field_map_size = 0;
 		return 0;
 	}
 	/* Initialize defined fields */
 	for (uint32_t i = 0; i < field_count; ++i) {
-		format->fields[i].is_key_part = false;
-		format->fields[i].type = fields[i].type;
-		format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL;
-		format->fields[i].nullable_action = fields[i].nullable_action;
+		struct tuple_field *field = tuple_format_field(format, i);
+		field->type = fields[i].type;
+		field->nullable_action = fields[i].nullable_action;
 		struct coll *coll = NULL;
 		uint32_t cid = fields[i].coll_id;
 		if (cid != COLL_NONE) {
@@ -159,12 +175,9 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 			}
 			coll = coll_id->coll;
 		}
-		format->fields[i].coll = coll;
-		format->fields[i].coll_id = cid;
+		field->coll = coll;
+		field->coll_id = cid;
 	}
-	/* Initialize remaining fields */
-	for (uint32_t i = field_count; i < format->field_count; i++)
-		format->fields[i] = tuple_field_default;
 
 	int current_slot = 0;
 
@@ -184,7 +197,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
 		}
 	}
 
-	assert(format->fields[0].offset_slot == TUPLE_OFFSET_SLOT_NIL);
+	assert(tuple_format_field(format, 0)->offset_slot ==
+		TUPLE_OFFSET_SLOT_NIL);
 	size_t field_map_size = -current_slot * sizeof(uint32_t);
 	if (field_map_size > UINT16_MAX) {
 		/** tuple->data_offset is 16 bits */
@@ -242,6 +256,19 @@ tuple_format_deregister(struct tuple_format *format)
 	format->id = FORMAT_ID_NIL;
 }
 
+/** Destroy field JSON tree and release allocated memory. */
+static inline void
+tuple_format_fields_destroy(struct tuple_format *format)
+{
+	struct tuple_field *field, *tmp;
+	json_tree_foreach_entry_safe(&format->fields.root, field,
+				     struct tuple_field, token, tmp) {
+		json_tree_del(&format->fields, &field->token);
+		tuple_field_delete(field);
+	}
+	json_tree_destroy(&format->fields);
+}
+
 static struct tuple_format *
 tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		   uint32_t space_field_count, struct tuple_dictionary *dict)
@@ -258,39 +285,57 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
 		}
 	}
 	uint32_t field_count = MAX(space_field_count, index_field_count);
-	uint32_t total = sizeof(struct tuple_format) +
-			 field_count * sizeof(struct tuple_field);
 
-	struct tuple_format *format = (struct tuple_format *) malloc(total);
+	struct tuple_format *format = malloc(sizeof(struct tuple_format));
 	if (format == NULL) {
 		diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
 			 "tuple format");
 		return NULL;
 	}
+	if (json_tree_create(&format->fields) != 0) {
+		free(format);
+		return NULL;
+	}
+	for (uint32_t fieldno = 0; fieldno < field_count; fieldno++) {
+		struct tuple_field *field = tuple_field_new();
+		if (field == NULL)
+			goto error;
+		field->token.num = fieldno;
+		field->token.type = JSON_TOKEN_NUM;
+		if (json_tree_add(&format->fields, &format->fields.root,
+				  &field->token) != 0) {
+			diag_set(OutOfMemory, 0, "json_tree_add",
+				 "&format->tree");
+			tuple_field_delete(field);
+			goto error;
+		}
+	}
 	if (dict == NULL) {
 		assert(space_field_count == 0);
 		format->dict = tuple_dictionary_new(NULL, 0);
-		if (format->dict == NULL) {
-			free(format);
-			return NULL;
-		}
+		if (format->dict == NULL)
+			goto error;
 	} else {
 		format->dict = dict;
 		tuple_dictionary_ref(dict);
 	}
 	format->refs = 0;
 	format->id = FORMAT_ID_NIL;
-	format->field_count = field_count;
 	format->index_field_count = index_field_count;
 	format->exact_field_count = 0;
 	format->min_field_count = 0;
 	return format;
+error:;
+	tuple_format_fields_destroy(format);
+	free(format);
+	return NULL;
 }
 
 /** Free tuple format resources, doesn't unregister. */
 static inline void
 tuple_format_destroy(struct tuple_format *format)
 {
+	tuple_format_fields_destroy(format);
 	tuple_dictionary_unref(format->dict);
 }
 
@@ -328,18 +373,21 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
 }
 
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2)
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2)
 {
 	if (format1->exact_field_count != format2->exact_field_count)
 		return false;
-	for (uint32_t i = 0; i < format1->field_count; ++i) {
-		const struct tuple_field *field1 = &format1->fields[i];
+	uint32_t format1_field_count = tuple_format_field_count(format1);
+	uint32_t format2_field_count = tuple_format_field_count(format2);
+	for (uint32_t i = 0; i < format1_field_count; ++i) {
+		const struct tuple_field *field1 =
+			tuple_format_field(format1, i);
 		/*
 		 * The field has a data type in format1, but has
 		 * no data type in format2.
 		 */
-		if (i >= format2->field_count) {
+		if (i >= format2_field_count) {
 			/*
 			 * The field can get a name added
 			 * for it, and this doesn't require a data
@@ -355,7 +403,8 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
 			else
 				return false;
 		}
-		const struct tuple_field *field2 = &format2->fields[i];
+		const struct tuple_field *field2 =
+			tuple_format_field(format2, i);
 		if (! field_type1_contains_type2(field1->type, field2->type))
 			return false;
 		/*
@@ -374,7 +423,7 @@ int
 tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		     const char *tuple, bool validate)
 {
-	if (format->field_count == 0)
+	if (tuple_format_field_count(format) == 0)
 		return 0; /* Nothing to initialize */
 
 	const char *pos = tuple;
@@ -397,17 +446,17 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 
 	/* first field is simply accessible, so we do not store offset to it */
 	enum mp_type mp_type = mp_typeof(*pos);
-	const struct tuple_field *field = &format->fields[0];
+	const struct tuple_field *field =
+		tuple_format_field((struct tuple_format *)format, 0);
 	if (validate &&
 	    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
 				 TUPLE_INDEX_BASE, tuple_field_is_nullable(field)))
 		return -1;
 	mp_next(&pos);
 	/* other fields...*/
-	++field;
 	uint32_t i = 1;
 	uint32_t defined_field_count = MIN(field_count, validate ?
-					   format->field_count :
+					   tuple_format_field_count(format) :
 					   format->index_field_count);
 	if (field_count < format->index_field_count) {
 		/*
@@ -417,7 +466,8 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map,
 		memset((char *)field_map - format->field_map_size, 0,
 		       format->field_map_size);
 	}
-	for (; i < defined_field_count; ++i, ++field) {
+	for (; i < defined_field_count; ++i) {
+		field = tuple_format_field((struct tuple_format *)format, i);
 		mp_type = mp_typeof(*pos);
 		if (validate &&
 		    key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE,
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 232df22b2..68e9205b9 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -34,6 +34,7 @@
 #include "key_def.h"
 #include "field_def.h"
 #include "errinj.h"
+#include "json/json.h"
 #include "tuple_dictionary.h"
 
 #if defined(__cplusplus)
@@ -113,6 +114,8 @@ struct tuple_field {
 	struct coll *coll;
 	/** Collation identifier. */
 	uint32_t coll_id;
+	/** Link in tuple_format::fields. */
+	struct json_token token;
 };
 
 /**
@@ -166,16 +169,46 @@ struct tuple_format {
 	 * index_field_count <= min_field_count <= field_count.
 	 */
 	uint32_t min_field_count;
-	/* Length of 'fields' array. */
-	uint32_t field_count;
 	/**
 	 * Shared names storage used by all formats of a space.
 	 */
 	struct tuple_dictionary *dict;
-	/* Formats of the fields */
-	struct tuple_field fields[0];
+	/**
+	 * Fields comprising the format, organized in a tree.
+	 * First level nodes correspond to tuple fields.
+	 * Deeper levels define indexed JSON paths within
+	 * tuple fields. Nodes of the tree are linked by
+	 * tuple_field::token.
+	 */
+	struct json_tree fields;
 };
 
+/**
+ * Return a count of first level nodes correspond to tuple
+ * fields.
+ */
+static inline uint32_t
+tuple_format_field_count(const struct tuple_format *format)
+{
+	const struct json_token *root = &format->fields.root;
+	return root->children != NULL ? root->max_child_idx + 1 : 0;
+}
+
+/**
+ * Get the first level node corresponding to tuple field by its
+ * fieldno.
+ */
+static inline struct tuple_field *
+tuple_format_field(struct tuple_format *format, uint32_t fieldno)
+{
+	assert(fieldno < tuple_format_field_count(format));
+	struct json_token token;
+	token.type = JSON_TOKEN_NUM;
+	token.num = fieldno;
+	return json_tree_lookup_entry(&format->fields, &format->fields.root,
+				      &token, struct tuple_field, token);
+}
+
 extern struct tuple_format **tuple_formats;
 
 static inline uint32_t
@@ -238,8 +271,8 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
  * @retval True, if @a format1 can store any tuples of @a format2.
  */
 bool
-tuple_format1_can_store_format2_tuples(const struct tuple_format *format1,
-				       const struct tuple_format *format2);
+tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
+				       struct tuple_format *format2);
 
 /**
  * Calculate minimal field count of tuples with specified keys and
@@ -333,7 +366,9 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple,
 			return tuple;
 		}
 
-		int32_t offset_slot = format->fields[field_no].offset_slot;
+		int32_t offset_slot =
+			tuple_format_field((struct tuple_format *)format,
+					   field_no)->offset_slot;
 		if (offset_slot != TUPLE_OFFSET_SLOT_NIL) {
 			if (field_map[offset_slot] != 0)
 				return tuple + field_map[offset_slot];
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index d83840406..3e60fece9 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -411,7 +411,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
 	uint32_t *field_map = (uint32_t *) raw;
 	char *wpos = mp_encode_array(raw, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
 			field_map[field->offset_slot] = wpos - raw;
 		if (iov[i].iov_base == NULL) {
@@ -465,7 +465,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format,
 	}
 	char *pos = mp_encode_array(data, field_count);
 	for (uint32_t i = 0; i < field_count; ++i) {
-		const struct tuple_field *field = &format->fields[i];
+		const struct tuple_field *field = tuple_format_field(format, i);
 		if (! field->is_key_part) {
 			/* Unindexed field - write NIL. */
 			assert(i < src_count);
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library
  2018-12-05  9:52               ` Vladimir Davydov
@ 2018-12-06  7:56                 ` Kirill Shcherbatov
  2018-12-06  7:56                 ` [tarantool-patches] Re: [PATCH v5 2/9] lib: make index_base support for json_lexer Kirill Shcherbatov
  1 sibling, 0 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-06  7:56 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov, Kostya Osipov

New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.
JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

Needed for #1012
---
 src/lib/json/CMakeLists.txt |   1 +
 src/lib/json/json.c         | 257 ++++++++++++++++++++++++++++++++
 src/lib/json/json.h         | 287 +++++++++++++++++++++++++++++++++++-
 test/unit/json_path.c       | 243 +++++++++++++++++++++++++++++-
 test/unit/json_path.result  |  60 +++++++-
 5 files changed, 845 insertions(+), 3 deletions(-)

diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt
index 0f0739620..51a1f027a 100644
--- a/src/lib/json/CMakeLists.txt
+++ b/src/lib/json/CMakeLists.txt
@@ -4,3 +4,4 @@ set(lib_sources
 
 set_source_files_compile_flags(${lib_sources})
 add_library(json_path STATIC ${lib_sources})
+target_link_libraries(json_path misc)
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index 81b291127..65169b047 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -30,6 +30,7 @@
  */
 
 #include "json.h"
+#include "third_party/PMurHash.h"
 #include <ctype.h>
 #include <stdbool.h>
 #include <unicode/uchar.h>
@@ -243,3 +244,259 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token)
 		return json_parse_identifier(lexer, token);
 	}
 }
+
+/** Compare JSON tokens keys. */
+static int
+json_token_cmp(const struct json_token *a, const struct json_token *b)
+{
+	if (a->parent != b->parent)
+		return a->parent - b->parent;
+	if (a->type != b->type)
+		return a->type - b->type;
+	int ret = 0;
+	if (a->type == JSON_TOKEN_STR) {
+		if (a->len != b->len)
+			return a->len - b->len;
+		ret = memcmp(a->str, b->str, a->len);
+	} else if (a->type == JSON_TOKEN_NUM) {
+		ret = a->num - b->num;
+	} else {
+		unreachable();
+	}
+	return ret;
+}
+
+#define MH_SOURCE 1
+#define mh_name _json
+#define mh_key_t struct json_token *
+#define mh_node_t struct json_token *
+#define mh_arg_t void *
+#define mh_hash(a, arg) ((*(a))->hash)
+#define mh_hash_key(a, arg) ((a)->hash)
+#define mh_cmp(a, b, arg) (json_token_cmp(*(a), *(b)))
+#define mh_cmp_key(a, b, arg) (json_token_cmp((a), *(b)))
+#include "salad/mhash.h"
+
+static const uint32_t hash_seed = 13U;
+
+/** Compute the hash value of a JSON token. */
+static uint32_t
+json_token_hash(struct json_token *token)
+{
+	uint32_t h = token->parent->hash;
+	uint32_t carry = 0;
+	const void *data;
+	uint32_t data_size;
+	if (token->type == JSON_TOKEN_STR) {
+		data = token->str;
+		data_size = token->len;
+	} else if (token->type == JSON_TOKEN_NUM) {
+		data = &token->num;
+		data_size = sizeof(token->num);
+	} else {
+		unreachable();
+	}
+	PMurHash32_Process(&h, &carry, data, data_size);
+	return PMurHash32_Result(h, carry, data_size);
+}
+
+int
+json_tree_create(struct json_tree *tree)
+{
+	memset(tree, 0, sizeof(struct json_tree));
+	tree->root.hash = hash_seed;
+	tree->root.type = JSON_TOKEN_END;
+	tree->hash = mh_json_new();
+	return tree->hash == NULL ? -1 : 0;
+}
+
+static void
+json_token_destroy(struct json_token *token)
+{
+#ifndef NDEBUG
+	/* Token mustn't have JSON subtree. */
+	struct json_token *iter;
+	uint32_t nodes = 0;
+	json_tree_foreach_preorder(token, iter)
+		nodes++;
+	assert(nodes == 0);
+#endif /* NDEBUG */
+
+	free(token->children);
+}
+
+void
+json_tree_destroy(struct json_tree *tree)
+{
+	json_token_destroy(&tree->root);
+	mh_json_delete(tree->hash);
+}
+
+struct json_token *
+json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
+			  const struct json_token *token)
+{
+	assert(token->type == JSON_TOKEN_STR);
+	struct json_token key;
+	key.type = token->type;
+	key.str = token->str;
+	key.len = token->len;
+	key.parent = parent;
+	key.hash = json_token_hash(&key);
+	mh_int_t id = mh_json_find(tree->hash, &key, NULL);
+	if (id == mh_end(tree->hash))
+		return NULL;
+	struct json_token **entry = mh_json_node(tree->hash, id);
+	assert(entry != NULL);
+	return *entry;
+}
+
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token)
+{
+	int ret = 0;
+	token->parent = parent;
+	uint32_t hash = json_token_hash(token);
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+	uint32_t insert_idx = token->type == JSON_TOKEN_NUM ?
+			      (uint32_t)token->num :
+			      parent->children_capacity;
+	if (insert_idx >= parent->children_capacity) {
+		uint32_t new_size = parent->children_capacity == 0 ?
+				    8 : 2 * parent->children_capacity;
+		while (insert_idx >= new_size)
+			new_size *= 2;
+		struct json_token **children =
+			realloc(parent->children, new_size*sizeof(void *));
+		if (children == NULL) {
+			ret = -1;
+			goto end;
+		}
+		memset(children + parent->children_capacity, 0,
+		       (new_size - parent->children_capacity)*sizeof(void *));
+		parent->children = children;
+		parent->children_capacity = new_size;
+	}
+	assert(parent->children[insert_idx] == NULL);
+	parent->children[insert_idx] = token;
+	parent->max_child_idx = MAX(parent->max_child_idx, insert_idx);
+	token->sibling_idx = insert_idx;
+	token->hash = hash;
+	token->parent = parent;
+	if (token->type != JSON_TOKEN_STR)
+		goto end;
+	/*
+	 * Add string token to json_tree hash to make lookup
+	 * by name.
+	 */
+	mh_int_t rc =
+		mh_json_put(tree->hash, (const struct json_token **)&token,
+			    NULL, NULL);
+	if (rc == mh_end(tree->hash)) {
+		parent->children[insert_idx] = NULL;
+		ret = -1;
+		goto end;
+	}
+end:
+	assert(json_tree_lookup(tree, parent, token) == token);
+	return ret;
+}
+
+void
+json_tree_del(struct json_tree *tree, struct json_token *token)
+{
+	struct json_token *parent = token->parent;
+	assert(json_tree_lookup(tree, parent, token) == token);
+	struct json_token **child_slot = &parent->children[token->sibling_idx];
+	assert(child_slot != NULL && *child_slot == token);
+	*child_slot = NULL;
+	/* The max_child_idx field may require update. */
+	if (token->sibling_idx == parent->max_child_idx &&
+	    parent->max_child_idx > 0) {
+		uint32_t idx = token->sibling_idx - 1;
+		while (idx > 0 && parent->children[idx] == 0)
+			idx--;
+		parent->max_child_idx = idx;
+	}
+	if (token->type != JSON_TOKEN_STR)
+		goto end;
+	/* Remove string token from json_tree hash. */
+	mh_int_t id = mh_json_find(tree->hash, token, NULL);
+	assert(id != mh_end(tree->hash));
+	mh_json_del(tree->hash, id, NULL);
+	json_token_destroy(token);
+end:
+	assert(json_tree_lookup(tree, parent, token) == NULL);
+}
+
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len, uint32_t index_base)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token token;
+	struct json_token *ret = parent != NULL ? parent : &tree->root;
+	json_lexer_create(&lexer, path, path_len, index_base);
+	while ((rc = json_lexer_next_token(&lexer, &token)) == 0 &&
+	       token.type != JSON_TOKEN_END && ret != NULL) {
+		ret = json_tree_lookup(tree, ret, &token);
+	}
+	if (rc != 0 || token.type != JSON_TOKEN_END)
+		return NULL;
+	return ret;
+}
+
+static struct json_token *
+json_tree_child_next(struct json_token *parent, struct json_token *pos)
+{
+	assert(pos == NULL || pos->parent == parent);
+	struct json_token **arr = parent->children;
+	if (arr == NULL)
+		return NULL;
+	uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0;
+	while (idx <= parent->max_child_idx && arr[idx] == NULL)
+		idx++;
+	return idx <= parent->max_child_idx ? arr[idx] : NULL;
+}
+
+static struct json_token *
+json_tree_leftmost(struct json_token *pos)
+{
+	struct json_token *last;
+	do {
+		last = pos;
+		pos = json_tree_child_next(pos, NULL);
+	} while (pos != NULL);
+	return last;
+}
+
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos)
+{
+	struct json_token *next = json_tree_child_next(pos, NULL);
+	if (next != NULL)
+		return next;
+	while (pos != root) {
+		next = json_tree_child_next(pos->parent, pos);
+		if (next != NULL)
+			return next;
+		pos = pos->parent;
+	}
+	return NULL;
+}
+
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos)
+{
+	struct json_token *next;
+	if (pos == NULL)
+		return json_tree_leftmost(root);
+	if (pos == root)
+		return NULL;
+	next = json_tree_child_next(pos->parent, pos);
+	if (next != NULL)
+		return json_tree_leftmost(next);
+	return pos->parent;
+}
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index 5c8d973e5..594d59a08 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -30,7 +30,7 @@
  * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
-#include <stdint.h>
+#include "trivia/util.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -67,6 +67,10 @@ enum json_token_type {
  * Element of a JSON path. It can be either string or number.
  * String idenfiers are in ["..."] and between dots. Numbers are
  * indexes in [...].
+ *
+ * May be organized in a tree-like structure reflecting a JSON
+ * document structure, for more details see the comment to struct
+ * json_tree.
  */
 struct json_token {
 	enum json_token_type type;
@@ -80,6 +84,111 @@ struct json_token {
 		/** Index value. */
 		uint64_t num;
 	};
+	/**
+	 * Hash value of the token. Used for lookups in a JSON tree.
+	 * For more details, see the comment to json_tree::hash_table.
+	 */
+	uint32_t hash;
+	/**
+	 * Array of child records. Indexes in this array
+	 * match [token.num] index for JSON_TOKEN_NUM type and
+	 * are allocated sequentially for JSON_TOKEN_STR child
+	 * tokens.
+	 */
+	struct json_token **children;
+	/** Allocation size of children array. */
+	uint32_t children_capacity;
+	/** Max occupied index in the children array. */
+	uint32_t max_child_idx;
+	/** Index of node in parent children array. */
+	uint32_t sibling_idx;
+	/** Pointer to parent node. */
+	struct json_token *parent;
+};
+
+struct mh_json_t;
+
+/**
+ * This structure is used for organizing JSON tokens produced
+ * by a lexer in a tree-like structure reflecting a JSON document
+ * structure.
+ *
+ * Each intermediate node of the tree corresponds to either
+ * a JSON map or an array, depending on the key type used by
+ * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively).
+ * Leaf nodes may represent both complex JSON structures and
+ * final values - it is not mandated by the JSON tree design.
+ * The root of the tree doesn't have a key and is preallocated
+ * when the tree is created.
+ *
+ * The json_token structure is intrusive by design, i.e. to store
+ * arbitrary information in a JSON tree, one has to incorporate it
+ * into a user defined structure.
+ *
+ * Example:
+ *
+ *   struct data {
+ *           ...
+ *           struct json_token token;
+ *   };
+ *
+ *   struct json_tree tree;
+ *   json_tree_create(&tree);
+ *   struct json_token *parent = &tree->root;
+ *
+ *   // Add a path to the tree.
+ *   struct data *data = data_new();
+ *   struct json_lexer lexer;
+ *   json_lexer_create(&lexer, path, path_len);
+ *   json_lexer_next_token(&lexer, &data->token);
+ *   while (data->token.type != JSON_TOKEN_END) {
+ *           json_tree_add(&tree, parent, &data->token);
+ *           parent = &data->token;
+ *           data = data_new();
+ *           json_lexer_next_token(&lexer, &data->token);
+ *   }
+ *   data_delete(data);
+ *
+ *   // Look up a path in the tree.
+ *   data = json_tree_lookup_path(&tree, &tree.root,
+ *                                path, path_len);
+ */
+struct json_tree {
+	/**
+	 * Preallocated token corresponding to the JSON tree root.
+	 * It doesn't have a key (set to JSON_TOKEN_END).
+	 */
+	struct json_token root;
+	/**
+	 * Hash table that is used to quickly look up a token
+	 * corresponding to a JSON map item given a key and
+	 * a parent token. We store all tokens that have type
+	 * JSON_TOKEN_STR in this hash table. Apparently, we
+	 * don't need to store JSON_TOKEN_NUM tokens as we can
+	 * quickly look them up in the children array anyway.
+	 *
+	 * The hash table uses pair <parent, key> as key, so
+	 * even tokens that happen to have the same key will
+	 * have different keys in the hash. To look up a tree
+	 * node corresponding to a particular path, we split
+	 * the path into tokens and look up the first token
+	 * in the root node and each following token in the
+	 * node returned at the previous step.
+	 *
+	 * We compute a hash value for a token by hashing its
+	 * key using the hash value of its parent as seed. This
+	 * is equivalent to computing hash for the path leading
+	 * to the token. However, we don't need to recompute
+	 * hash starting from the root at each step as we
+	 * descend the tree looking for a specific path, and we
+	 * can start descent from any node, not only from the root.
+	 *
+	 * As a consequence of this hashing technique, even
+	 * though we don't need to store JSON_TOKEN_NUM tokens
+	 * in the hash table, we still have to compute hash
+	 * values for them.
+	 */
+	struct mh_json_t *hash;
 };
 
 /**
@@ -113,6 +222,182 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len,
 int
 json_lexer_next_token(struct json_lexer *lexer, struct json_token *token);
 
+/** Create a JSON tree object to manage data relations. */
+int
+json_tree_create(struct json_tree *tree);
+
+/**
+ * Destroy JSON tree object. This routine doesn't destroy attached
+ * subtree so it should be called at the end of manual destroy.
+ */
+void
+json_tree_destroy(struct json_tree *tree);
+
+/** Internal function, use json_tree_lookup instead. */
+struct json_token *
+json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent,
+			  const struct json_token *token);
+
+/**
+ * Look up a token in a tree at position specified with parent.
+ */
+static inline struct json_token *
+json_tree_lookup(struct json_tree *tree, struct json_token *parent,
+		 const struct json_token *token)
+{
+	struct json_token *ret = NULL;
+	if (token->type == JSON_TOKEN_NUM) {
+		ret = likely(token->num < parent->children_capacity) ?
+		      parent->children[token->num] : NULL;
+	} else {
+		ret = json_tree_lookup_slowpath(tree, parent, token);
+	}
+	return ret;
+}
+
+/**
+ * Append token to the given parent position in a JSON tree. The
+ * parent mustn't have a child with such content.
+ */
+int
+json_tree_add(struct json_tree *tree, struct json_token *parent,
+	      struct json_token *token);
+
+/**
+ * Delete a JSON tree token at the given parent position in JSON
+ * tree. Token entry shouldn't have subtree.
+ */
+void
+json_tree_del(struct json_tree *tree, struct json_token *token);
+
+/** Make child lookup by path in JSON tree. */
+struct json_token *
+json_tree_lookup_path(struct json_tree *tree, struct json_token *parent,
+		      const char *path, uint32_t path_len, uint32_t index_base);
+
+/** Make pre order traversal in JSON tree. */
+struct json_token *
+json_tree_preorder_next(struct json_token *root, struct json_token *pos);
+
+/** Make post order traversal in JSON tree. */
+struct json_token *
+json_tree_postorder_next(struct json_token *root, struct json_token *pos);
+
+#ifndef typeof
+#define typeof __typeof__
+#endif
+
+/** Return container entry by json_tree_node node. */
+#define json_tree_entry(node, type, member) \
+	container_of((node), type, member)
+/**
+ * Return container entry by json_tree_node or NULL if
+ * node is NULL.
+ */
+#define json_tree_entry_safe(node, type, member) ({			     \
+	(node) != NULL ? json_tree_entry((node), type, member) : NULL;	     \
+})
+
+/** Make entry pre order traversal in JSON tree */
+#define json_tree_preorder_next_entry(root, node, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_preorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/**
+ * Make entry post order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_postorder_next_entry(root, node, type, member) ({	     \
+	struct json_token *__next =					     \
+		json_tree_postorder_next((root), (node));		     \
+	json_tree_entry_safe(__next, type, member);			     \
+})
+
+/**
+ * Make lookup in tree by path and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_lookup_path_entry(tree, parent, path, path_len,	     \
+				    index_base, type, member) ({	     \
+	struct json_token *__node =					     \
+		json_tree_lookup_path((tree), (parent), path, path_len,	     \
+				      index_base);			     \
+	json_tree_entry_safe(__node, type, member);			     \
+})
+
+/** Make lookup in tree by token and return entry. */
+#define json_tree_lookup_entry(tree, parent, token, type, member) ({	     \
+	struct json_token *__node =					     \
+		json_tree_lookup((tree), (parent), token);		     \
+	json_tree_entry_safe(__node, type, member);			     \
+})
+
+/**
+ * Make pre-order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_preorder(root, node)				     \
+	for ((node) = json_tree_preorder_next((root), (root));		     \
+	     (node) != NULL;						     \
+	     (node) = json_tree_preorder_next((root), (node)))
+
+/**
+ * Make post-order traversal in JSON tree.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_postorder(root, node)				     \
+	for ((node) = json_tree_postorder_next((root), NULL);		     \
+	     (node) != (root);						     \
+	     (node) = json_tree_postorder_next((root), (node)))
+
+/**
+ * Make safe post-order traversal in JSON tree.
+ * May be used for destructors.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_safe(root, node, tmp)				     \
+	for ((node) = json_tree_postorder_next((root), NULL);		     \
+	     (node) != (root) &&					     \
+	     ((tmp) =  json_tree_postorder_next((root), (node)));	     \
+	     (node) = (tmp))
+
+/**
+ * Make post-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_preorder(root, node, type, member)	     \
+	for ((node) = json_tree_preorder_next_entry((root), (root),	     \
+						    type, member);	     \
+	     (node) != NULL;						     \
+	     (node) = json_tree_preorder_next_entry((root), &(node)->member, \
+						    type, member))
+
+/**
+ * Make pre-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_postorder(root, node, type, member)	     \
+	for ((node) = json_tree_postorder_next_entry((root), NULL, type,     \
+						     member);		     \
+	     &(node)->member != (root);					     \
+	     (node) = json_tree_postorder_next_entry((root), &(node)->member,\
+						     type, member))
+
+/**
+ * Make secure post-order traversal in JSON tree and return entry.
+ * This cycle doesn't visit root node.
+ */
+#define json_tree_foreach_entry_safe(root, node, type, member, tmp)	     \
+	for ((node) = json_tree_postorder_next_entry((root), NULL,	     \
+						     type, member);	     \
+	     &(node)->member != (root) &&				     \
+	     ((tmp) =  json_tree_postorder_next_entry((root),		     \
+						      &(node)->member,	     \
+						      type, member));	     \
+	     (node) = (tmp))
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index 1d7707ee6..1b224f9c2 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -2,6 +2,7 @@
 #include "unit.h"
 #include "trivia/util.h"
 #include <string.h>
+#include <stdbool.h>
 
 #define TUPLE_INDEX_BASE 1
 
@@ -165,14 +166,254 @@ test_errors()
 	footer();
 }
 
+struct test_struct {
+	int value;
+	struct json_token node;
+};
+
+struct test_struct *
+test_struct_alloc(struct test_struct *records_pool, int *pool_idx)
+{
+	struct test_struct *ret = &records_pool[*pool_idx];
+	*pool_idx = *pool_idx + 1;
+	memset(&ret->node, 0, sizeof(ret->node));
+	return ret;
+}
+
+struct test_struct *
+test_add_path(struct json_tree *tree, const char *path, uint32_t path_len,
+	      struct test_struct *records_pool, int *pool_idx)
+{
+	int rc;
+	struct json_lexer lexer;
+	struct json_token *parent = &tree->root;
+	json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE);
+	struct test_struct *field = test_struct_alloc(records_pool, pool_idx);
+	while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 &&
+		field->node.type != JSON_TOKEN_END) {
+		struct json_token *next =
+			json_tree_lookup(tree, parent, &field->node);
+		if (next == NULL) {
+			rc = json_tree_add(tree, parent, &field->node);
+			fail_if(rc != 0);
+			next = &field->node;
+			field = test_struct_alloc(records_pool, pool_idx);
+		}
+		parent = next;
+	}
+	fail_if(rc != 0 || field->node.type != JSON_TOKEN_END);
+	/* Release field. */
+	*pool_idx = *pool_idx - 1;
+	return json_tree_entry(parent, struct test_struct, node);
+}
+
+void
+test_tree()
+{
+	header();
+	plan(54);
+
+	struct json_tree tree;
+	int rc = json_tree_create(&tree);
+	fail_if(rc != 0);
+
+	struct test_struct records[7];
+	for (int i = 0; i < 6; i++)
+		records[i].value = i;
+
+	const char *path1 = "[1][10]";
+	const char *path2 = "[1][20].file";
+	const char *path3 = "[1][20].file[2]";
+	const char *path4 = "[1][20].file[8]";
+	const char *path4_copy = "[1][20][\"file\"][8]";
+	const char *path_unregistered = "[1][3]";
+
+	int records_idx = 0;
+	struct test_struct *node, *node_tmp;
+	node = test_add_path(&tree, path1, strlen(path1), records,
+			     &records_idx);
+	is(node, &records[1], "add path '%s'", path1);
+
+	node = test_add_path(&tree, path2, strlen(path2), records,
+			     &records_idx);
+	is(node, &records[3], "add path '%s'", path2);
+
+	node = test_add_path(&tree, path3, strlen(path3), records,
+			     &records_idx);
+	is(node, &records[4], "add path '%s'", path3);
+
+	node = test_add_path(&tree, path4, strlen(path4), records,
+			     &records_idx);
+	is(node, &records[5], "add path '%s'", path4);
+
+	node = test_add_path(&tree, path4_copy, strlen(path4_copy), records,
+			     &records_idx);
+	is(node, &records[5], "add path '%s'", path4_copy);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1),
+					   TUPLE_INDEX_BASE, struct test_struct,
+					   node);
+	is(node, &records[1], "lookup path '%s'", path1);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+					   TUPLE_INDEX_BASE, struct test_struct,
+					   node);
+	is(node, &records[3], "lookup path '%s'", path2);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered,
+					   strlen(path_unregistered),
+					   TUPLE_INDEX_BASE, struct test_struct,
+					   node);
+	is(node, NULL, "lookup unregistered path '%s'", path_unregistered);
+
+	/* Test iterators. */
+	struct json_token *token = NULL, *tmp;
+	const struct json_token *tokens_preorder[] =
+		{&records[0].node, &records[1].node, &records[2].node,
+		 &records[3].node, &records[4].node, &records[5].node};
+	int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]);
+	int idx = 0;
+
+	json_tree_foreach_preorder(&tree.root, token) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(token, tokens_preorder[idx],
+		   "test foreach pre order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	const struct json_token *tree_nodes_postorder[] =
+		{&records[1].node, &records[4].node, &records[5].node,
+		 &records[3].node, &records[2].node, &records[0].node};
+	cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]);
+	idx = 0;
+	json_tree_foreach_postorder(&tree.root, token) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach post order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_safe(&tree.root, token, tmp) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t1 =
+			json_tree_entry(token, struct test_struct, node);
+		struct test_struct *t2 =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(token, tree_nodes_postorder[idx],
+		   "test foreach safe order %d: have %d expected of %d",
+		   idx, t1->value, t2->value);
+		++idx;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_preorder(&tree.root, node, struct test_struct,
+					 node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tokens_preorder[idx],
+					struct test_struct, node);
+		is(&node->node, tokens_preorder[idx],
+		   "test foreach entry pre order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	idx = 0;
+	json_tree_foreach_entry_postorder(&tree.root, node, struct test_struct,
+					  node) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder[idx],
+		   "test foreach entry post order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+
+	/* Test record deletion. */
+	is(records[3].node.max_child_idx, 7, "max_child_index %d expected of %d",
+	   records[3].node.max_child_idx, 7);
+	json_tree_del(&tree, &records[5].node);
+	is(records[3].node.max_child_idx, 1, "max_child_index %d expected of %d",
+	   records[3].node.max_child_idx, 1);
+	json_tree_del(&tree, &records[4].node);
+	is(records[3].node.max_child_idx, 0, "max_child_index %d expected of %d",
+	   records[3].node.max_child_idx, 0);
+	node = json_tree_lookup_path_entry(&tree, NULL, path3, strlen(path3),\
+					   TUPLE_INDEX_BASE, struct test_struct,
+					   node);
+	is(node, NULL, "lookup removed path '%s'", path3);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path4, strlen(path4),
+					   TUPLE_INDEX_BASE, struct test_struct,
+					   node);
+	is(node, NULL, "lookup removed path '%s'", path4);
+
+	node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2),
+					   TUPLE_INDEX_BASE, struct test_struct,
+					   node);
+	is(node, &records[3], "lookup path was not corrupted '%s'", path2);
+
+	const struct json_token *tree_nodes_postorder_new[] =
+		{&records[1].node, &records[3].node,
+		 &records[2].node, &records[0].node};
+	cnt = sizeof(tree_nodes_postorder_new) /
+	      sizeof(tree_nodes_postorder_new[0]);
+	idx = 0;
+	json_tree_foreach_entry_safe(&tree.root, node, struct test_struct,
+				     node, node_tmp) {
+		if (idx >= cnt)
+			break;
+		struct test_struct *t =
+			json_tree_entry(tree_nodes_postorder_new[idx],
+					struct test_struct, node);
+		is(&node->node, tree_nodes_postorder_new[idx],
+		   "test foreach entry safe order %d: have %d expected of %d",
+		   idx, node->value, t->value);
+		json_tree_del(&tree, &node->node);
+		idx++;
+	}
+	is(idx, cnt, "records iterated count %d of %d", idx, cnt);
+	json_tree_destroy(&tree);
+
+	check_plan();
+	footer();
+}
+
 int
 main()
 {
 	header();
-	plan(2);
+	plan(3);
 
 	test_basic();
 	test_errors();
+	test_tree();
 
 	int rc = check_plan();
 	footer();
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index ad6f07e5a..0ee970c8c 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -1,5 +1,5 @@
 	*** main ***
-1..2
+1..3
 	*** test_basic ***
     1..71
     ok 1 - parse <[1]>
@@ -100,4 +100,62 @@ ok 1 - subtests
     ok 21 - invalid token for index_base 1
 ok 2 - subtests
 	*** test_errors: done ***
+	*** test_tree ***
+    1..54
+    ok 1 - add path '[1][10]'
+    ok 2 - add path '[1][20].file'
+    ok 3 - add path '[1][20].file[2]'
+    ok 4 - add path '[1][20].file[8]'
+    ok 5 - add path '[1][20]["file"][8]'
+    ok 6 - lookup path '[1][10]'
+    ok 7 - lookup path '[1][20].file'
+    ok 8 - lookup unregistered path '[1][3]'
+    ok 9 - test foreach pre order 0: have 0 expected of 0
+    ok 10 - test foreach pre order 1: have 1 expected of 1
+    ok 11 - test foreach pre order 2: have 2 expected of 2
+    ok 12 - test foreach pre order 3: have 3 expected of 3
+    ok 13 - test foreach pre order 4: have 4 expected of 4
+    ok 14 - test foreach pre order 5: have 5 expected of 5
+    ok 15 - records iterated count 6 of 6
+    ok 16 - test foreach post order 0: have 1 expected of 1
+    ok 17 - test foreach post order 1: have 4 expected of 4
+    ok 18 - test foreach post order 2: have 5 expected of 5
+    ok 19 - test foreach post order 3: have 3 expected of 3
+    ok 20 - test foreach post order 4: have 2 expected of 2
+    ok 21 - test foreach post order 5: have 0 expected of 0
+    ok 22 - records iterated count 6 of 6
+    ok 23 - test foreach safe order 0: have 1 expected of 1
+    ok 24 - test foreach safe order 1: have 4 expected of 4
+    ok 25 - test foreach safe order 2: have 5 expected of 5
+    ok 26 - test foreach safe order 3: have 3 expected of 3
+    ok 27 - test foreach safe order 4: have 2 expected of 2
+    ok 28 - test foreach safe order 5: have 0 expected of 0
+    ok 29 - records iterated count 6 of 6
+    ok 30 - test foreach entry pre order 0: have 0 expected of 0
+    ok 31 - test foreach entry pre order 1: have 1 expected of 1
+    ok 32 - test foreach entry pre order 2: have 2 expected of 2
+    ok 33 - test foreach entry pre order 3: have 3 expected of 3
+    ok 34 - test foreach entry pre order 4: have 4 expected of 4
+    ok 35 - test foreach entry pre order 5: have 5 expected of 5
+    ok 36 - records iterated count 6 of 6
+    ok 37 - test foreach entry post order 0: have 1 expected of 1
+    ok 38 - test foreach entry post order 1: have 4 expected of 4
+    ok 39 - test foreach entry post order 2: have 5 expected of 5
+    ok 40 - test foreach entry post order 3: have 3 expected of 3
+    ok 41 - test foreach entry post order 4: have 2 expected of 2
+    ok 42 - test foreach entry post order 5: have 0 expected of 0
+    ok 43 - records iterated count 6 of 6
+    ok 44 - max_child_index 7 expected of 7
+    ok 45 - max_child_index 1 expected of 1
+    ok 46 - max_child_index 0 expected of 0
+    ok 47 - lookup removed path '[1][20].file[2]'
+    ok 48 - lookup removed path '[1][20].file[8]'
+    ok 49 - lookup path was not corrupted '[1][20].file'
+    ok 50 - test foreach entry safe order 0: have 1 expected of 1
+    ok 51 - test foreach entry safe order 1: have 3 expected of 3
+    ok 52 - test foreach entry safe order 2: have 2 expected of 2
+    ok 53 - test foreach entry safe order 3: have 0 expected of 0
+    ok 54 - records iterated count 4 of 4
+ok 3 - subtests
+	*** test_tree: done ***
 	*** main: done ***
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: make index_base support for json_lexer
  2018-12-05  9:52               ` Vladimir Davydov
  2018-12-06  7:56                 ` Kirill Shcherbatov
@ 2018-12-06  7:56                 ` Kirill Shcherbatov
  1 sibling, 0 replies; 41+ messages in thread
From: Kirill Shcherbatov @ 2018-12-06  7:56 UTC (permalink / raw)
  To: tarantool-patches, Vladimir Davydov, Kostya Osipov

Introduced a new index_base field for json_lexer class - this
value is a base field offset for emitted JSON_TOKEN_NUM tokens.
Thus, we get rid of the need to perform manual casts using the
TUPLE_INDEX_BASE constant in the majority of cases. This will
also ensure that the extracted tuples are correctly inserted
into the numerical level of JSON tree.

Needed for #1012
---
 src/box/tuple_format.c     | 16 ++++------------
 src/lib/json/json.c        |  4 +++-
 src/lib/json/json.h        | 11 ++++++++++-
 test/engine/tuple.result   |  4 ++--
 test/unit/json_path.c      | 24 +++++++++++++++---------
 test/unit/json_path.result | 21 +++++++++++----------
 6 files changed, 45 insertions(+), 35 deletions(-)

diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 661cfdc94..149248144 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -491,7 +491,7 @@ box_tuple_format_unref(box_tuple_format_t *format)
 /**
  * Propagate @a field to MessagePack(field)[index].
  * @param[in][out] field Field to propagate.
- * @param index 1-based index to propagate to.
+ * @param index 0-based index to propagate to.
  *
  * @retval  0 Success, the index was found.
  * @retval -1 Not found.
@@ -501,10 +501,6 @@ tuple_field_go_to_index(const char **field, uint64_t index)
 {
 	enum mp_type type = mp_typeof(**field);
 	if (type == MP_ARRAY) {
-		if (index == 0)
-			return -1;
-		/* Make index 0-based. */
-		index -= TUPLE_INDEX_BASE;
 		uint32_t count = mp_decode_array(field);
 		if (index >= count)
 			return -1;
@@ -512,6 +508,7 @@ tuple_field_go_to_index(const char **field, uint64_t index)
 			mp_next(field);
 		return 0;
 	} else if (type == MP_MAP) {
+		index += TUPLE_INDEX_BASE;
 		uint64_t count = mp_decode_map(field);
 		for (; count > 0; --count) {
 			type = mp_typeof(**field);
@@ -582,7 +579,7 @@ tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
 	int rc;
 	struct json_lexer lexer;
 	struct json_token token;
-	json_lexer_create(&lexer, path, path_len);
+	json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE);
 	while ((rc = json_lexer_next_token(&lexer, &token)) == 0) {
 		switch (token.type) {
 		case JSON_TOKEN_NUM:
@@ -624,18 +621,13 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
 	}
 	struct json_lexer lexer;
 	struct json_token token;
-	json_lexer_create(&lexer, path, path_len);
+	json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE);
 	int rc = json_lexer_next_token(&lexer, &token);
 	if (rc != 0)
 		goto error;
 	switch(token.type) {
 	case JSON_TOKEN_NUM: {
 		int index = token.num;
-		if (index == 0) {
-			*field = NULL;
-			return 0;
-		}
-		index -= TUPLE_INDEX_BASE;
 		*field = tuple_field_raw(format, tuple, field_map, index);
 		if (*field == NULL)
 			return 0;
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index eb80e4bbc..81b291127 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -144,10 +144,12 @@ json_parse_integer(struct json_lexer *lexer, struct json_token *token)
 		value = value * 10 + c - (int)'0';
 		++len;
 	} while (++pos < end && isdigit((c = *pos)));
+	if (value < lexer->index_base)
+		return lexer->symbol_count + 1;
 	lexer->offset += len;
 	lexer->symbol_count += len;
 	token->type = JSON_TOKEN_NUM;
-	token->num = value;
+	token->num = value - lexer->index_base;
 	return 0;
 }
 
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index ead446878..5c8d973e5 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -49,6 +49,11 @@ struct json_lexer {
 	int offset;
 	/** Current lexer's offset in symbols. */
 	int symbol_count;
+	/**
+	 * Base field offset for emitted JSON_TOKEN_NUM tokens,
+	 * e.g. 0 for C and 1 for Lua.
+	 */
+	unsigned index_base;
 };
 
 enum json_token_type {
@@ -82,14 +87,18 @@ struct json_token {
  * @param[out] lexer Lexer to create.
  * @param src Source string.
  * @param src_len Length of @a src.
+ * @param index_base Base field offset for emitted JSON_TOKEN_NUM
+ *                   tokens e.g. 0 for C and 1 for Lua.
  */
 static inline void
-json_lexer_create(struct json_lexer *lexer, const char *src, int src_len)
+json_lexer_create(struct json_lexer *lexer, const char *src, int src_len,
+		  unsigned index_base)
 {
 	lexer->src = src;
 	lexer->src_len = src_len;
 	lexer->offset = 0;
 	lexer->symbol_count = 0;
+	lexer->index_base = index_base;
 }
 
 /**
diff --git a/test/engine/tuple.result b/test/engine/tuple.result
index 35c700e16..7ca3985c7 100644
--- a/test/engine/tuple.result
+++ b/test/engine/tuple.result
@@ -823,7 +823,7 @@ t[0]
 ...
 t["[0]"]
 ---
-- null
+- error: Illegal parameters, error in path on position 2
 ...
 t["[1000]"]
 ---
@@ -847,7 +847,7 @@ t["[2][6].key100"]
 ...
 t["[2][0]"] -- 0-based index in array.
 ---
-- null
+- error: Illegal parameters, error in path on position 5
 ...
 t["[4][3]"] -- Can not index string.
 ---
diff --git a/test/unit/json_path.c b/test/unit/json_path.c
index a5f90ad98..1d7707ee6 100644
--- a/test/unit/json_path.c
+++ b/test/unit/json_path.c
@@ -3,10 +3,12 @@
 #include "trivia/util.h"
 #include <string.h>
 
+#define TUPLE_INDEX_BASE 1
+
 #define reset_to_new_path(value) \
 	path = value; \
 	len = strlen(value); \
-	json_lexer_create(&lexer, path, len);
+	json_lexer_create(&lexer, path, len, TUPLE_INDEX_BASE);
 
 #define is_next_index(value_len, value) \
 	path = lexer.src + lexer.offset; \
@@ -32,18 +34,18 @@ test_basic()
 	struct json_lexer lexer;
 	struct json_token token;
 
-	reset_to_new_path("[0].field1.field2['field3'][5]");
+	reset_to_new_path("[1].field1.field2['field3'][5]");
 	is_next_index(3, 0);
 	is_next_key("field1");
 	is_next_key("field2");
 	is_next_key("field3");
-	is_next_index(3, 5);
+	is_next_index(3, 4);
 
 	reset_to_new_path("[3].field[2].field")
-	is_next_index(3, 3);
-	is_next_key("field");
 	is_next_index(3, 2);
 	is_next_key("field");
+	is_next_index(3, 1);
+	is_next_key("field");
 
 	reset_to_new_path("[\"f1\"][\"f2'3'\"]");
 	is_next_key("f1");
@@ -57,7 +59,7 @@ test_basic()
 
 	/* Long number. */
 	reset_to_new_path("[1234]");
-	is_next_index(6, 1234);
+	is_next_index(6, 1233);
 
 	/* Empty path. */
 	reset_to_new_path("");
@@ -70,8 +72,8 @@ test_basic()
 
 	/* Unicode. */
 	reset_to_new_path("[2][6]['привет中国world']['中国a']");
-	is_next_index(3, 2);
-	is_next_index(3, 6);
+	is_next_index(3, 1);
+	is_next_index(3, 5);
 	is_next_key("привет中国world");
 	is_next_key("中国a");
 
@@ -94,7 +96,7 @@ void
 test_errors()
 {
 	header();
-	plan(20);
+	plan(21);
 	const char *path;
 	int len;
 	struct json_lexer lexer;
@@ -155,6 +157,10 @@ test_errors()
 	json_lexer_next_token(&lexer, &token);
 	is(json_lexer_next_token(&lexer, &token), 6, "tab inside identifier");
 
+	reset_to_new_path("[0]");
+	is(json_lexer_next_token(&lexer, &token), 2,
+	   "invalid token for index_base %d", TUPLE_INDEX_BASE);
+
 	check_plan();
 	footer();
 }
diff --git a/test/unit/json_path.result b/test/unit/json_path.result
index a2a2f829f..ad6f07e5a 100644
--- a/test/unit/json_path.result
+++ b/test/unit/json_path.result
@@ -2,9 +2,9 @@
 1..2
 	*** test_basic ***
     1..71
-    ok 1 - parse <[0]>
-    ok 2 - <[0]> is num
-    ok 3 - <[0]> is 0
+    ok 1 - parse <[1]>
+    ok 2 - <[1]> is num
+    ok 3 - <[1]> is 0
     ok 4 - parse <field1>
     ok 5 - <field1> is str
     ok 6 - len is 6
@@ -19,17 +19,17 @@
     ok 15 - str is field3
     ok 16 - parse <[5]>
     ok 17 - <[5]> is num
-    ok 18 - <[5]> is 5
+    ok 18 - <[5]> is 4
     ok 19 - parse <[3]>
     ok 20 - <[3]> is num
-    ok 21 - <[3]> is 3
+    ok 21 - <[3]> is 2
     ok 22 - parse <field>
     ok 23 - <field> is str
     ok 24 - len is 5
     ok 25 - str is field
     ok 26 - parse <[2]>
     ok 27 - <[2]> is num
-    ok 28 - <[2]> is 2
+    ok 28 - <[2]> is 1
     ok 29 - parse <field>
     ok 30 - <field> is str
     ok 31 - len is 5
@@ -52,7 +52,7 @@
     ok 48 - str is field1
     ok 49 - parse <[1234]>
     ok 50 - <[1234]> is num
-    ok 51 - <[1234]> is 1234
+    ok 51 - <[1234]> is 1233
     ok 52 - parse empty path
     ok 53 - is str
     ok 54 - parse <field1>
@@ -61,10 +61,10 @@
     ok 57 - str is field1
     ok 58 - parse <[2]>
     ok 59 - <[2]> is num
-    ok 60 - <[2]> is 2
+    ok 60 - <[2]> is 1
     ok 61 - parse <[6]>
     ok 62 - <[6]> is num
-    ok 63 - <[6]> is 6
+    ok 63 - <[6]> is 5
     ok 64 - parse <привет中国world>
     ok 65 - <привет中国world> is str
     ok 66 - len is 23
@@ -76,7 +76,7 @@
 ok 1 - subtests
 	*** test_basic: done ***
 	*** test_errors ***
-    1..20
+    1..21
     ok 1 - error on position 2 for <[[>
     ok 2 - error on position 2 for <[field]>
     ok 3 - error on position 1 for <'field1'.field2>
@@ -97,6 +97,7 @@ ok 1 - subtests
     ok 18 - error in leading <.>
     ok 19 - space inside identifier
     ok 20 - tab inside identifier
+    ok 21 - invalid token for index_base 1
 ok 2 - subtests
 	*** test_errors: done ***
 	*** main: done ***
-- 
2.19.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [tarantool-patches] Re: [PATCH v5 3/9] box: manage format fields with JSON tree class
  2018-12-06  7:56         ` Kirill Shcherbatov
@ 2018-12-06  8:06           ` Vladimir Davydov
  0 siblings, 0 replies; 41+ messages in thread
From: Vladimir Davydov @ 2018-12-06  8:06 UTC (permalink / raw)
  To: Kirill Shcherbatov; +Cc: tarantool-patches, Kostya Osipov

On Thu, Dec 06, 2018 at 10:56:55AM +0300, Kirill Shcherbatov wrote:
> As we going to work with format fields in a unified way, we
> started to use the tree JSON class for working with first-level
> format fields.
> 
> Needed for #1012
> ---
>  src/box/sql.c          |  18 +++---
>  src/box/sql/build.c    |   5 +-
>  src/box/tuple.c        |  10 ++--
>  src/box/tuple_format.c | 122 +++++++++++++++++++++++++++++------------
>  src/box/tuple_format.h |  49 ++++++++++++++---
>  src/box/vy_stmt.c      |   4 +-
>  6 files changed, 148 insertions(+), 60 deletions(-)

I don't understand what have changed and why you have to resend this
patch. Please push the patches you think are ready to a separate branch
and resend them as a separate thread. And please write a change log
describing briefly what have changed since the last version.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-12-06  8:06 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
2018-11-29 15:39     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 2/9] lib: implement JSON tree class for json library Kirill Shcherbatov
2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
2018-11-29 17:38     ` Vladimir Davydov
2018-11-29 17:50       ` Vladimir Davydov
2018-12-04 15:22       ` Vladimir Davydov
2018-12-04 15:47       ` [tarantool-patches] " Kirill Shcherbatov
2018-12-04 17:54         ` Vladimir Davydov
2018-12-05  8:37           ` Kirill Shcherbatov
2018-12-05  9:07             ` Vladimir Davydov
2018-12-05  9:52               ` Vladimir Davydov
2018-12-06  7:56                 ` Kirill Shcherbatov
2018-12-06  7:56                 ` [tarantool-patches] Re: [PATCH v5 2/9] lib: make index_base support for json_lexer Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 3/9] box: manage format fields with JSON tree class Kirill Shcherbatov
2018-11-29 19:07   ` Vladimir Davydov
2018-12-04 15:47     ` [tarantool-patches] " Kirill Shcherbatov
2018-12-04 16:09       ` Vladimir Davydov
2018-12-04 16:32         ` Kirill Shcherbatov
2018-12-05  8:37         ` Kirill Shcherbatov
2018-12-06  7:56         ` Kirill Shcherbatov
2018-12-06  8:06           ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 4/9] lib: introduce json_path_cmp routine Kirill Shcherbatov
2018-11-30 10:46   ` Vladimir Davydov
2018-12-03 17:37     ` [tarantool-patches] " Konstantin Osipov
2018-12-03 18:48       ` Vladimir Davydov
2018-12-03 20:14         ` Konstantin Osipov
2018-12-06  7:56           ` [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov
2018-11-26 10:49 ` [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes Kirill Shcherbatov
2018-11-30 21:28   ` Vladimir Davydov
2018-12-01 16:49     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 6/9] box: introduce has_json_paths flag in templates Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2018-12-01 17:20   ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 8/9] box: introduce offset slot cache in key_part Kirill Shcherbatov
2018-12-03 21:04   ` Vladimir Davydov
2018-12-04 15:51     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 9/9] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-12-04 12:22   ` Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox