* [PATCH v6 0/4] box: JSON tree class @ 2018-12-06 8:42 Kirill Shcherbatov 2018-12-06 8:42 ` [PATCH v6 1/4] lib: make index_base support for json_lexer Kirill Shcherbatov ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: Kirill Shcherbatov @ 2018-12-06 8:42 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-tree https://github.com/tarantool/tarantool/issues/1012 Preparatory patch set for JSON indexes story. Implemented JSON tree class is used for organizing JSON tokens produced by a lexer in a tree-like structure reflecting a JSON document structure. Refactored json_lexer class to use index_base to emmit tokens being ready to insert in JSON tree. Refactored tuple_format to use the tree JSON class for working with first-level fields to works with all fields in unified way. Introduced help routines json_path_validate and json_path_cmp. The third path "box: manage format fields with JSON tree class" triggers a bug #3772 that is already on review kshch/gh-3772-dirty-memory-access. Source branch has cherry-picked fix. General information of difference from previous version of patch-set: - new patch "make index_base support for json_lexer" allows to do not carry for LUA index base during token insertion - significantly improved comments, many minor refactoring - new consistency assertions, e.g. that no fields attached during token detach. - reworked iterators - _slowpath version for json_tree_lookup to improve performance for numerical levels - new tests - reworked json_path_cmp, introduced json_path_validate - rebased on up-to-date 2.1 and cherry-picked #3772 fix Kirill Shcherbatov (4): lib: make index_base support for json_lexer lib: implement JSON tree class for json library box: manage format fields with JSON tree class lib: introduce json_path_cmp, json_path_validate src/box/sql.c | 18 +- src/box/sql/build.c | 5 +- src/box/tuple.c | 10 +- src/box/tuple_format.c | 138 +++++++++------ src/box/tuple_format.h | 49 +++++- src/box/vy_stmt.c | 4 +- src/lib/json/CMakeLists.txt | 1 + src/lib/json/json.c | 290 +++++++++++++++++++++++++++++++- src/lib/json/json.h | 326 +++++++++++++++++++++++++++++++++++- test/engine/tuple.result | 4 +- test/unit/json_path.c | 302 +++++++++++++++++++++++++++++++-- test/unit/json_path.result | 92 ++++++++-- 12 files changed, 1141 insertions(+), 98 deletions(-) -- 2.19.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 1/4] lib: make index_base support for json_lexer 2018-12-06 8:42 [PATCH v6 0/4] box: JSON tree class Kirill Shcherbatov @ 2018-12-06 8:42 ` Kirill Shcherbatov 2018-12-10 17:34 ` Vladimir Davydov 2018-12-06 8:42 ` [PATCH v6 2/4] lib: implement JSON tree class for json library Kirill Shcherbatov ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Kirill Shcherbatov @ 2018-12-06 8:42 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Introduced a new index_base field for json_lexer class - this value is a base field offset for emitted JSON_TOKEN_NUM tokens. Thus, we get rid of the need to perform manual casts using the TUPLE_INDEX_BASE constant in the majority of cases. This will also ensure that the extracted tuples are correctly inserted into the numerical level of JSON tree. Needed for #1012 --- src/box/tuple_format.c | 16 ++++------------ src/lib/json/json.c | 4 +++- src/lib/json/json.h | 11 ++++++++++- test/engine/tuple.result | 4 ++-- test/unit/json_path.c | 24 +++++++++++++++--------- test/unit/json_path.result | 21 +++++++++++---------- 6 files changed, 45 insertions(+), 35 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 661cfdc94..149248144 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -491,7 +491,7 @@ box_tuple_format_unref(box_tuple_format_t *format) /** * Propagate @a field to MessagePack(field)[index]. * @param[in][out] field Field to propagate. - * @param index 1-based index to propagate to. + * @param index 0-based index to propagate to. * * @retval 0 Success, the index was found. * @retval -1 Not found. @@ -501,10 +501,6 @@ tuple_field_go_to_index(const char **field, uint64_t index) { enum mp_type type = mp_typeof(**field); if (type == MP_ARRAY) { - if (index == 0) - return -1; - /* Make index 0-based. */ - index -= TUPLE_INDEX_BASE; uint32_t count = mp_decode_array(field); if (index >= count) return -1; @@ -512,6 +508,7 @@ tuple_field_go_to_index(const char **field, uint64_t index) mp_next(field); return 0; } else if (type == MP_MAP) { + index += TUPLE_INDEX_BASE; uint64_t count = mp_decode_map(field); for (; count > 0; --count) { type = mp_typeof(**field); @@ -582,7 +579,7 @@ tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len) int rc; struct json_lexer lexer; struct json_token token; - json_lexer_create(&lexer, path, path_len); + json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { switch (token.type) { case JSON_TOKEN_NUM: @@ -624,18 +621,13 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, } struct json_lexer lexer; struct json_token token; - json_lexer_create(&lexer, path, path_len); + json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); int rc = json_lexer_next_token(&lexer, &token); if (rc != 0) goto error; switch(token.type) { case JSON_TOKEN_NUM: { int index = token.num; - if (index == 0) { - *field = NULL; - return 0; - } - index -= TUPLE_INDEX_BASE; *field = tuple_field_raw(format, tuple, field_map, index); if (*field == NULL) return 0; diff --git a/src/lib/json/json.c b/src/lib/json/json.c index eb80e4bbc..81b291127 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -144,10 +144,12 @@ json_parse_integer(struct json_lexer *lexer, struct json_token *token) value = value * 10 + c - (int)'0'; ++len; } while (++pos < end && isdigit((c = *pos))); + if (value < lexer->index_base) + return lexer->symbol_count + 1; lexer->offset += len; lexer->symbol_count += len; token->type = JSON_TOKEN_NUM; - token->num = value; + token->num = value - lexer->index_base; return 0; } diff --git a/src/lib/json/json.h b/src/lib/json/json.h index ead446878..5c8d973e5 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -49,6 +49,11 @@ struct json_lexer { int offset; /** Current lexer's offset in symbols. */ int symbol_count; + /** + * Base field offset for emitted JSON_TOKEN_NUM tokens, + * e.g. 0 for C and 1 for Lua. + */ + unsigned index_base; }; enum json_token_type { @@ -82,14 +87,18 @@ struct json_token { * @param[out] lexer Lexer to create. * @param src Source string. * @param src_len Length of @a src. + * @param index_base Base field offset for emitted JSON_TOKEN_NUM + * tokens e.g. 0 for C and 1 for Lua. */ static inline void -json_lexer_create(struct json_lexer *lexer, const char *src, int src_len) +json_lexer_create(struct json_lexer *lexer, const char *src, int src_len, + unsigned index_base) { lexer->src = src; lexer->src_len = src_len; lexer->offset = 0; lexer->symbol_count = 0; + lexer->index_base = index_base; } /** diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 35c700e16..7ca3985c7 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -823,7 +823,7 @@ t[0] ... t["[0]"] --- -- null +- error: Illegal parameters, error in path on position 2 ... t["[1000]"] --- @@ -847,7 +847,7 @@ t["[2][6].key100"] ... t["[2][0]"] -- 0-based index in array. --- -- null +- error: Illegal parameters, error in path on position 5 ... t["[4][3]"] -- Can not index string. --- diff --git a/test/unit/json_path.c b/test/unit/json_path.c index a5f90ad98..33cd0a685 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -3,10 +3,12 @@ #include "trivia/util.h" #include <string.h> +#define JSON_INDEX_BASE 1 + #define reset_to_new_path(value) \ path = value; \ len = strlen(value); \ - json_lexer_create(&lexer, path, len); + json_lexer_create(&lexer, path, len, JSON_INDEX_BASE); #define is_next_index(value_len, value) \ path = lexer.src + lexer.offset; \ @@ -32,18 +34,18 @@ test_basic() struct json_lexer lexer; struct json_token token; - reset_to_new_path("[0].field1.field2['field3'][5]"); + reset_to_new_path("[1].field1.field2['field3'][5]"); is_next_index(3, 0); is_next_key("field1"); is_next_key("field2"); is_next_key("field3"); - is_next_index(3, 5); + is_next_index(3, 4); reset_to_new_path("[3].field[2].field") - is_next_index(3, 3); - is_next_key("field"); is_next_index(3, 2); is_next_key("field"); + is_next_index(3, 1); + is_next_key("field"); reset_to_new_path("[\"f1\"][\"f2'3'\"]"); is_next_key("f1"); @@ -57,7 +59,7 @@ test_basic() /* Long number. */ reset_to_new_path("[1234]"); - is_next_index(6, 1234); + is_next_index(6, 1233); /* Empty path. */ reset_to_new_path(""); @@ -70,8 +72,8 @@ test_basic() /* Unicode. */ reset_to_new_path("[2][6]['привет中国world']['中国a']"); - is_next_index(3, 2); - is_next_index(3, 6); + is_next_index(3, 1); + is_next_index(3, 5); is_next_key("привет中国world"); is_next_key("中国a"); @@ -94,7 +96,7 @@ void test_errors() { header(); - plan(20); + plan(21); const char *path; int len; struct json_lexer lexer; @@ -155,6 +157,10 @@ test_errors() json_lexer_next_token(&lexer, &token); is(json_lexer_next_token(&lexer, &token), 6, "tab inside identifier"); + reset_to_new_path("[0]"); + is(json_lexer_next_token(&lexer, &token), 2, + "invalid token for index_base %d", JSON_INDEX_BASE); + check_plan(); footer(); } diff --git a/test/unit/json_path.result b/test/unit/json_path.result index a2a2f829f..ad6f07e5a 100644 --- a/test/unit/json_path.result +++ b/test/unit/json_path.result @@ -2,9 +2,9 @@ 1..2 *** test_basic *** 1..71 - ok 1 - parse <[0]> - ok 2 - <[0]> is num - ok 3 - <[0]> is 0 + ok 1 - parse <[1]> + ok 2 - <[1]> is num + ok 3 - <[1]> is 0 ok 4 - parse <field1> ok 5 - <field1> is str ok 6 - len is 6 @@ -19,17 +19,17 @@ ok 15 - str is field3 ok 16 - parse <[5]> ok 17 - <[5]> is num - ok 18 - <[5]> is 5 + ok 18 - <[5]> is 4 ok 19 - parse <[3]> ok 20 - <[3]> is num - ok 21 - <[3]> is 3 + ok 21 - <[3]> is 2 ok 22 - parse <field> ok 23 - <field> is str ok 24 - len is 5 ok 25 - str is field ok 26 - parse <[2]> ok 27 - <[2]> is num - ok 28 - <[2]> is 2 + ok 28 - <[2]> is 1 ok 29 - parse <field> ok 30 - <field> is str ok 31 - len is 5 @@ -52,7 +52,7 @@ ok 48 - str is field1 ok 49 - parse <[1234]> ok 50 - <[1234]> is num - ok 51 - <[1234]> is 1234 + ok 51 - <[1234]> is 1233 ok 52 - parse empty path ok 53 - is str ok 54 - parse <field1> @@ -61,10 +61,10 @@ ok 57 - str is field1 ok 58 - parse <[2]> ok 59 - <[2]> is num - ok 60 - <[2]> is 2 + ok 60 - <[2]> is 1 ok 61 - parse <[6]> ok 62 - <[6]> is num - ok 63 - <[6]> is 6 + ok 63 - <[6]> is 5 ok 64 - parse <привет中国world> ok 65 - <привет中国world> is str ok 66 - len is 23 @@ -76,7 +76,7 @@ ok 1 - subtests *** test_basic: done *** *** test_errors *** - 1..20 + 1..21 ok 1 - error on position 2 for <[[> ok 2 - error on position 2 for <[field]> ok 3 - error on position 1 for <'field1'.field2> @@ -97,6 +97,7 @@ ok 1 - subtests ok 18 - error in leading <.> ok 19 - space inside identifier ok 20 - tab inside identifier + ok 21 - invalid token for index_base 1 ok 2 - subtests *** test_errors: done *** *** main: done *** -- 2.19.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 1/4] lib: make index_base support for json_lexer 2018-12-06 8:42 ` [PATCH v6 1/4] lib: make index_base support for json_lexer Kirill Shcherbatov @ 2018-12-10 17:34 ` Vladimir Davydov 0 siblings, 0 replies; 9+ messages in thread From: Vladimir Davydov @ 2018-12-10 17:34 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja On Thu, Dec 06, 2018 at 11:42:28AM +0300, Kirill Shcherbatov wrote: > Introduced a new index_base field for json_lexer class - this > value is a base field offset for emitted JSON_TOKEN_NUM tokens. > Thus, we get rid of the need to perform manual casts using the > TUPLE_INDEX_BASE constant in the majority of cases. This will > also ensure that the extracted tuples are correctly inserted > into the numerical level of JSON tree. > > Needed for #1012 > diff --git a/src/lib/json/json.h b/src/lib/json/json.h > index ead446878..5c8d973e5 100644 > --- a/src/lib/json/json.h > +++ b/src/lib/json/json.h > @@ -49,6 +49,11 @@ struct json_lexer { > int offset; > /** Current lexer's offset in symbols. */ > int symbol_count; > + /** > + * Base field offset for emitted JSON_TOKEN_NUM tokens, > + * e.g. 0 for C and 1 for Lua. > + */ > + unsigned index_base; I changed the type of this member to int, as we use int in tuple_update_execute(), and pushed it to 2.1. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 2/4] lib: implement JSON tree class for json library 2018-12-06 8:42 [PATCH v6 0/4] box: JSON tree class Kirill Shcherbatov 2018-12-06 8:42 ` [PATCH v6 1/4] lib: make index_base support for json_lexer Kirill Shcherbatov @ 2018-12-06 8:42 ` Kirill Shcherbatov 2018-12-10 17:36 ` Vladimir Davydov 2018-12-06 8:42 ` [PATCH v6 3/4] box: manage format fields with JSON tree class Kirill Shcherbatov 2018-12-06 8:42 ` [PATCH v6 4/4] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov 3 siblings, 1 reply; 9+ messages in thread From: Kirill Shcherbatov @ 2018-12-06 8:42 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov New JSON tree class would store JSON paths for tuple fields for registered non-plain indexes. It is a hierarchical data structure that organize JSON nodes produced by parser. Class provides API to lookup node by path and iterate over the tree. JSON Indexes patch require such functionality to make lookup for tuple_fields by path, make initialization of field map and build vynyl_stmt msgpack for secondary index via JSON tree iteration. Needed for #1012 --- src/lib/json/CMakeLists.txt | 1 + src/lib/json/json.c | 257 ++++++++++++++++++++++++++++++++ src/lib/json/json.h | 287 +++++++++++++++++++++++++++++++++++- test/unit/json_path.c | 243 +++++++++++++++++++++++++++++- test/unit/json_path.result | 60 +++++++- 5 files changed, 845 insertions(+), 3 deletions(-) diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt index 0f0739620..51a1f027a 100644 --- a/src/lib/json/CMakeLists.txt +++ b/src/lib/json/CMakeLists.txt @@ -4,3 +4,4 @@ set(lib_sources set_source_files_compile_flags(${lib_sources}) add_library(json_path STATIC ${lib_sources}) +target_link_libraries(json_path misc) diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 81b291127..65169b047 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -30,6 +30,7 @@ */ #include "json.h" +#include "third_party/PMurHash.h" #include <ctype.h> #include <stdbool.h> #include <unicode/uchar.h> @@ -243,3 +244,259 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token) return json_parse_identifier(lexer, token); } } + +/** Compare JSON tokens keys. */ +static int +json_token_cmp(const struct json_token *a, const struct json_token *b) +{ + if (a->parent != b->parent) + return a->parent - b->parent; + if (a->type != b->type) + return a->type - b->type; + int ret = 0; + if (a->type == JSON_TOKEN_STR) { + if (a->len != b->len) + return a->len - b->len; + ret = memcmp(a->str, b->str, a->len); + } else if (a->type == JSON_TOKEN_NUM) { + ret = a->num - b->num; + } else { + unreachable(); + } + return ret; +} + +#define MH_SOURCE 1 +#define mh_name _json +#define mh_key_t struct json_token * +#define mh_node_t struct json_token * +#define mh_arg_t void * +#define mh_hash(a, arg) ((*(a))->hash) +#define mh_hash_key(a, arg) ((a)->hash) +#define mh_cmp(a, b, arg) (json_token_cmp(*(a), *(b))) +#define mh_cmp_key(a, b, arg) (json_token_cmp((a), *(b))) +#include "salad/mhash.h" + +static const uint32_t hash_seed = 13U; + +/** Compute the hash value of a JSON token. */ +static uint32_t +json_token_hash(struct json_token *token) +{ + uint32_t h = token->parent->hash; + uint32_t carry = 0; + const void *data; + uint32_t data_size; + if (token->type == JSON_TOKEN_STR) { + data = token->str; + data_size = token->len; + } else if (token->type == JSON_TOKEN_NUM) { + data = &token->num; + data_size = sizeof(token->num); + } else { + unreachable(); + } + PMurHash32_Process(&h, &carry, data, data_size); + return PMurHash32_Result(h, carry, data_size); +} + +int +json_tree_create(struct json_tree *tree) +{ + memset(tree, 0, sizeof(struct json_tree)); + tree->root.hash = hash_seed; + tree->root.type = JSON_TOKEN_END; + tree->hash = mh_json_new(); + return tree->hash == NULL ? -1 : 0; +} + +static void +json_token_destroy(struct json_token *token) +{ +#ifndef NDEBUG + /* Token mustn't have JSON subtree. */ + struct json_token *iter; + uint32_t nodes = 0; + json_tree_foreach_preorder(token, iter) + nodes++; + assert(nodes == 0); +#endif /* NDEBUG */ + + free(token->children); +} + +void +json_tree_destroy(struct json_tree *tree) +{ + json_token_destroy(&tree->root); + mh_json_delete(tree->hash); +} + +struct json_token * +json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent, + const struct json_token *token) +{ + assert(token->type == JSON_TOKEN_STR); + struct json_token key; + key.type = token->type; + key.str = token->str; + key.len = token->len; + key.parent = parent; + key.hash = json_token_hash(&key); + mh_int_t id = mh_json_find(tree->hash, &key, NULL); + if (id == mh_end(tree->hash)) + return NULL; + struct json_token **entry = mh_json_node(tree->hash, id); + assert(entry != NULL); + return *entry; +} + +int +json_tree_add(struct json_tree *tree, struct json_token *parent, + struct json_token *token) +{ + int ret = 0; + token->parent = parent; + uint32_t hash = json_token_hash(token); + assert(json_tree_lookup(tree, parent, token) == NULL); + uint32_t insert_idx = token->type == JSON_TOKEN_NUM ? + (uint32_t)token->num : + parent->children_capacity; + if (insert_idx >= parent->children_capacity) { + uint32_t new_size = parent->children_capacity == 0 ? + 8 : 2 * parent->children_capacity; + while (insert_idx >= new_size) + new_size *= 2; + struct json_token **children = + realloc(parent->children, new_size*sizeof(void *)); + if (children == NULL) { + ret = -1; + goto end; + } + memset(children + parent->children_capacity, 0, + (new_size - parent->children_capacity)*sizeof(void *)); + parent->children = children; + parent->children_capacity = new_size; + } + assert(parent->children[insert_idx] == NULL); + parent->children[insert_idx] = token; + parent->max_child_idx = MAX(parent->max_child_idx, insert_idx); + token->sibling_idx = insert_idx; + token->hash = hash; + token->parent = parent; + if (token->type != JSON_TOKEN_STR) + goto end; + /* + * Add string token to json_tree hash to make lookup + * by name. + */ + mh_int_t rc = + mh_json_put(tree->hash, (const struct json_token **)&token, + NULL, NULL); + if (rc == mh_end(tree->hash)) { + parent->children[insert_idx] = NULL; + ret = -1; + goto end; + } +end: + assert(json_tree_lookup(tree, parent, token) == token); + return ret; +} + +void +json_tree_del(struct json_tree *tree, struct json_token *token) +{ + struct json_token *parent = token->parent; + assert(json_tree_lookup(tree, parent, token) == token); + struct json_token **child_slot = &parent->children[token->sibling_idx]; + assert(child_slot != NULL && *child_slot == token); + *child_slot = NULL; + /* The max_child_idx field may require update. */ + if (token->sibling_idx == parent->max_child_idx && + parent->max_child_idx > 0) { + uint32_t idx = token->sibling_idx - 1; + while (idx > 0 && parent->children[idx] == 0) + idx--; + parent->max_child_idx = idx; + } + if (token->type != JSON_TOKEN_STR) + goto end; + /* Remove string token from json_tree hash. */ + mh_int_t id = mh_json_find(tree->hash, token, NULL); + assert(id != mh_end(tree->hash)); + mh_json_del(tree->hash, id, NULL); + json_token_destroy(token); +end: + assert(json_tree_lookup(tree, parent, token) == NULL); +} + +struct json_token * +json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, + const char *path, uint32_t path_len, uint32_t index_base) +{ + int rc; + struct json_lexer lexer; + struct json_token token; + struct json_token *ret = parent != NULL ? parent : &tree->root; + json_lexer_create(&lexer, path, path_len, index_base); + while ((rc = json_lexer_next_token(&lexer, &token)) == 0 && + token.type != JSON_TOKEN_END && ret != NULL) { + ret = json_tree_lookup(tree, ret, &token); + } + if (rc != 0 || token.type != JSON_TOKEN_END) + return NULL; + return ret; +} + +static struct json_token * +json_tree_child_next(struct json_token *parent, struct json_token *pos) +{ + assert(pos == NULL || pos->parent == parent); + struct json_token **arr = parent->children; + if (arr == NULL) + return NULL; + uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0; + while (idx <= parent->max_child_idx && arr[idx] == NULL) + idx++; + return idx <= parent->max_child_idx ? arr[idx] : NULL; +} + +static struct json_token * +json_tree_leftmost(struct json_token *pos) +{ + struct json_token *last; + do { + last = pos; + pos = json_tree_child_next(pos, NULL); + } while (pos != NULL); + return last; +} + +struct json_token * +json_tree_preorder_next(struct json_token *root, struct json_token *pos) +{ + struct json_token *next = json_tree_child_next(pos, NULL); + if (next != NULL) + return next; + while (pos != root) { + next = json_tree_child_next(pos->parent, pos); + if (next != NULL) + return next; + pos = pos->parent; + } + return NULL; +} + +struct json_token * +json_tree_postorder_next(struct json_token *root, struct json_token *pos) +{ + struct json_token *next; + if (pos == NULL) + return json_tree_leftmost(root); + if (pos == root) + return NULL; + next = json_tree_child_next(pos->parent, pos); + if (next != NULL) + return json_tree_leftmost(next); + return pos->parent; +} diff --git a/src/lib/json/json.h b/src/lib/json/json.h index 5c8d973e5..4ff2b9c67 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -30,7 +30,7 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ -#include <stdint.h> +#include "trivia/util.h" #ifdef __cplusplus extern "C" { @@ -67,6 +67,10 @@ enum json_token_type { * Element of a JSON path. It can be either string or number. * String idenfiers are in ["..."] and between dots. Numbers are * indexes in [...]. + * + * May be organized in a tree-like structure reflecting a JSON + * document structure, for more details see the comment to struct + * json_tree. */ struct json_token { enum json_token_type type; @@ -80,6 +84,111 @@ struct json_token { /** Index value. */ uint64_t num; }; + /** + * Hash value of the token. Used for lookups in a JSON tree. + * For more details, see the comment to json_tree::hash_table. + */ + uint32_t hash; + /** + * Array of child records. Indexes in this array + * match [token.num] index for JSON_TOKEN_NUM type and + * are allocated sequentially for JSON_TOKEN_STR child + * tokens. + */ + struct json_token **children; + /** Allocation size of children array. */ + uint32_t children_capacity; + /** Max occupied index in the children array. */ + uint32_t max_child_idx; + /** Index of node in parent children array. */ + uint32_t sibling_idx; + /** Pointer to parent node. */ + struct json_token *parent; +}; + +struct mh_json_t; + +/** + * This structure is used for organizing JSON tokens produced + * by a lexer in a tree-like structure reflecting a JSON document + * structure. + * + * Each intermediate node of the tree corresponds to either + * a JSON map or an array, depending on the key type used by + * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively). + * Leaf nodes may represent both complex JSON structures and + * final values - it is not mandated by the JSON tree design. + * The root of the tree doesn't have a key and is preallocated + * when the tree is created. + * + * The json_token structure is intrusive by design, i.e. to store + * arbitrary information in a JSON tree, one has to incorporate it + * into a user defined structure. + * + * Example: + * + * struct data { + * ... + * struct json_token token; + * }; + * + * struct json_tree tree; + * json_tree_create(&tree); + * struct json_token *parent = &tree->root; + * + * // Add a path to the tree. + * struct data *data = data_new(); + * struct json_lexer lexer; + * json_lexer_create(&lexer, path, path_len); + * json_lexer_next_token(&lexer, &data->token); + * while (data->token.type != JSON_TOKEN_END) { + * json_tree_add(&tree, parent, &data->token); + * parent = &data->token; + * data = data_new(); + * json_lexer_next_token(&lexer, &data->token); + * } + * data_delete(data); + * + * // Look up a path in the tree. + * data = json_tree_lookup_path(&tree, &tree.root, + * path, path_len); + */ +struct json_tree { + /** + * Preallocated token corresponding to the JSON tree root. + * It doesn't have a key (set to JSON_TOKEN_END). + */ + struct json_token root; + /** + * Hash table that is used to quickly look up a token + * corresponding to a JSON map item given a key and + * a parent token. We store all tokens that have type + * JSON_TOKEN_STR in this hash table. Apparently, we + * don't need to store JSON_TOKEN_NUM tokens as we can + * quickly look them up in the children array anyway. + * + * The hash table uses pair <parent, key> as key, so + * even tokens that happen to have the same key will + * have different keys in the hash. To look up a tree + * node corresponding to a particular path, we split + * the path into tokens and look up the first token + * in the root node and each following token in the + * node returned at the previous step. + * + * We compute a hash value for a token by hashing its + * key using the hash value of its parent as seed. This + * is equivalent to computing hash for the path leading + * to the token. However, we don't need to recompute + * hash starting from the root at each step as we + * descend the tree looking for a specific path, and we + * can start descent from any node, not only from the root. + * + * As a consequence of this hashing technique, even + * though we don't need to store JSON_TOKEN_NUM tokens + * in the hash table, we still have to compute hash + * values for them. + */ + struct mh_json_t *hash; }; /** @@ -113,6 +222,182 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len, int json_lexer_next_token(struct json_lexer *lexer, struct json_token *token); +/** Create a JSON tree object to manage data relations. */ +int +json_tree_create(struct json_tree *tree); + +/** + * Destroy JSON tree object. This routine doesn't destroy attached + * subtree so it should be called at the end of manual destroy. + */ +void +json_tree_destroy(struct json_tree *tree); + +/** Internal function, use json_tree_lookup instead. */ +struct json_token * +json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent, + const struct json_token *token); + +/** + * Look up a token in a tree at position specified with parent. + */ +static inline struct json_token * +json_tree_lookup(struct json_tree *tree, struct json_token *parent, + const struct json_token *token) +{ + struct json_token *ret = NULL; + if (token->type == JSON_TOKEN_NUM) { + ret = likely(token->num < parent->children_capacity) ? + parent->children[token->num] : NULL; + } else { + ret = json_tree_lookup_slowpath(tree, parent, token); + } + return ret; +} + +/** + * Append token to the given parent position in a JSON tree. The + * parent mustn't have a child with such content. + */ +int +json_tree_add(struct json_tree *tree, struct json_token *parent, + struct json_token *token); + +/** + * Delete a JSON tree token at the given parent position in JSON + * tree. Token entry shouldn't have subtree. + */ +void +json_tree_del(struct json_tree *tree, struct json_token *token); + +/** Lookup child by path in JSON tree. */ +struct json_token * +json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, + const char *path, uint32_t path_len, uint32_t index_base); + +/** Do pre order traversal in JSON tree. */ +struct json_token * +json_tree_preorder_next(struct json_token *root, struct json_token *pos); + +/** Do post order traversal in JSON tree. */ +struct json_token * +json_tree_postorder_next(struct json_token *root, struct json_token *pos); + +#ifndef typeof +#define typeof __typeof__ +#endif + +/** Return container entry by json_tree_node node. */ +#define json_tree_entry(node, type, member) \ + container_of((node), type, member) +/** + * Return container entry by json_tree_node or NULL if + * node is NULL. + */ +#define json_tree_entry_safe(node, type, member) ({ \ + (node) != NULL ? json_tree_entry((node), type, member) : NULL; \ +}) + +/** Do entry pre order traversal in JSON tree */ +#define json_tree_preorder_next_entry(root, node, type, member) ({ \ + struct json_token *__next = \ + json_tree_preorder_next((root), (node)); \ + json_tree_entry_safe(__next, type, member); \ +}) + +/** + * Do entry post order traversal in JSON tree. + * This cycle doesn't visit root node. + */ +#define json_tree_postorder_next_entry(root, node, type, member) ({ \ + struct json_token *__next = \ + json_tree_postorder_next((root), (node)); \ + json_tree_entry_safe(__next, type, member); \ +}) + +/** + * Lookup in tree by path and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_lookup_path_entry(tree, parent, path, path_len, \ + index_base, type, member) ({ \ + struct json_token *__node = \ + json_tree_lookup_path((tree), (parent), path, path_len, \ + index_base); \ + json_tree_entry_safe(__node, type, member); \ +}) + +/** Lookup in tree by token and return entry. */ +#define json_tree_lookup_entry(tree, parent, token, type, member) ({ \ + struct json_token *__node = \ + json_tree_lookup((tree), (parent), token); \ + json_tree_entry_safe(__node, type, member); \ +}) + +/** + * Do pre-order traversal in JSON tree. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_preorder(root, node) \ + for ((node) = json_tree_preorder_next((root), (root)); \ + (node) != NULL; \ + (node) = json_tree_preorder_next((root), (node))) + +/** + * Do post-order traversal in JSON tree. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_postorder(root, node) \ + for ((node) = json_tree_postorder_next((root), NULL); \ + (node) != (root); \ + (node) = json_tree_postorder_next((root), (node))) + +/** + * Do safe post-order traversal in JSON tree. + * May be used for destructors. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_safe(root, node, tmp) \ + for ((node) = json_tree_postorder_next((root), NULL); \ + (node) != (root) && \ + ((tmp) = json_tree_postorder_next((root), (node))); \ + (node) = (tmp)) + +/** + * Do post-order traversal in JSON tree and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_entry_preorder(root, node, type, member) \ + for ((node) = json_tree_preorder_next_entry((root), (root), \ + type, member); \ + (node) != NULL; \ + (node) = json_tree_preorder_next_entry((root), &(node)->member, \ + type, member)) + +/** + * Do pre-order traversal in JSON tree and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_entry_postorder(root, node, type, member) \ + for ((node) = json_tree_postorder_next_entry((root), NULL, type, \ + member); \ + &(node)->member != (root); \ + (node) = json_tree_postorder_next_entry((root), &(node)->member,\ + type, member)) + +/** + * Do secure post-order traversal in JSON tree and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_entry_safe(root, node, type, member, tmp) \ + for ((node) = json_tree_postorder_next_entry((root), NULL, \ + type, member); \ + &(node)->member != (root) && \ + ((tmp) = json_tree_postorder_next_entry((root), \ + &(node)->member, \ + type, member)); \ + (node) = (tmp)) + #ifdef __cplusplus } #endif diff --git a/test/unit/json_path.c b/test/unit/json_path.c index 33cd0a685..04d36688b 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -2,6 +2,7 @@ #include "unit.h" #include "trivia/util.h" #include <string.h> +#include <stdbool.h> #define JSON_INDEX_BASE 1 @@ -165,14 +166,254 @@ test_errors() footer(); } +struct test_struct { + int value; + struct json_token node; +}; + +struct test_struct * +test_struct_alloc(struct test_struct *records_pool, int *pool_idx) +{ + struct test_struct *ret = &records_pool[*pool_idx]; + *pool_idx = *pool_idx + 1; + memset(&ret->node, 0, sizeof(ret->node)); + return ret; +} + +struct test_struct * +test_add_path(struct json_tree *tree, const char *path, uint32_t path_len, + struct test_struct *records_pool, int *pool_idx) +{ + int rc; + struct json_lexer lexer; + struct json_token *parent = &tree->root; + json_lexer_create(&lexer, path, path_len, JSON_INDEX_BASE); + struct test_struct *field = test_struct_alloc(records_pool, pool_idx); + while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 && + field->node.type != JSON_TOKEN_END) { + struct json_token *next = + json_tree_lookup(tree, parent, &field->node); + if (next == NULL) { + rc = json_tree_add(tree, parent, &field->node); + fail_if(rc != 0); + next = &field->node; + field = test_struct_alloc(records_pool, pool_idx); + } + parent = next; + } + fail_if(rc != 0 || field->node.type != JSON_TOKEN_END); + /* Release field. */ + *pool_idx = *pool_idx - 1; + return json_tree_entry(parent, struct test_struct, node); +} + +void +test_tree() +{ + header(); + plan(54); + + struct json_tree tree; + int rc = json_tree_create(&tree); + fail_if(rc != 0); + + struct test_struct records[7]; + for (int i = 0; i < 6; i++) + records[i].value = i; + + const char *path1 = "[1][10]"; + const char *path2 = "[1][20].file"; + const char *path3 = "[1][20].file[2]"; + const char *path4 = "[1][20].file[8]"; + const char *path4_copy = "[1][20][\"file\"][8]"; + const char *path_unregistered = "[1][3]"; + + int records_idx = 0; + struct test_struct *node, *node_tmp; + node = test_add_path(&tree, path1, strlen(path1), records, + &records_idx); + is(node, &records[1], "add path '%s'", path1); + + node = test_add_path(&tree, path2, strlen(path2), records, + &records_idx); + is(node, &records[3], "add path '%s'", path2); + + node = test_add_path(&tree, path3, strlen(path3), records, + &records_idx); + is(node, &records[4], "add path '%s'", path3); + + node = test_add_path(&tree, path4, strlen(path4), records, + &records_idx); + is(node, &records[5], "add path '%s'", path4); + + node = test_add_path(&tree, path4_copy, strlen(path4_copy), records, + &records_idx); + is(node, &records[5], "add path '%s'", path4_copy); + + node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1), + JSON_INDEX_BASE, struct test_struct, + node); + is(node, &records[1], "lookup path '%s'", path1); + + node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2), + JSON_INDEX_BASE, struct test_struct, + node); + is(node, &records[3], "lookup path '%s'", path2); + + node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered, + strlen(path_unregistered), + JSON_INDEX_BASE, struct test_struct, + node); + is(node, NULL, "lookup unregistered path '%s'", path_unregistered); + + /* Test iterators. */ + struct json_token *token = NULL, *tmp; + const struct json_token *tokens_preorder[] = + {&records[0].node, &records[1].node, &records[2].node, + &records[3].node, &records[4].node, &records[5].node}; + int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]); + int idx = 0; + + json_tree_foreach_preorder(&tree.root, token) { + if (idx >= cnt) + break; + struct test_struct *t1 = + json_tree_entry(token, struct test_struct, node); + struct test_struct *t2 = + json_tree_entry(tokens_preorder[idx], + struct test_struct, node); + is(token, tokens_preorder[idx], + "test foreach pre order %d: have %d expected of %d", + idx, t1->value, t2->value); + ++idx; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + const struct json_token *tree_nodes_postorder[] = + {&records[1].node, &records[4].node, &records[5].node, + &records[3].node, &records[2].node, &records[0].node}; + cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]); + idx = 0; + json_tree_foreach_postorder(&tree.root, token) { + if (idx >= cnt) + break; + struct test_struct *t1 = + json_tree_entry(token, struct test_struct, node); + struct test_struct *t2 = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(token, tree_nodes_postorder[idx], + "test foreach post order %d: have %d expected of %d", + idx, t1->value, t2->value); + ++idx; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_safe(&tree.root, token, tmp) { + if (idx >= cnt) + break; + struct test_struct *t1 = + json_tree_entry(token, struct test_struct, node); + struct test_struct *t2 = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(token, tree_nodes_postorder[idx], + "test foreach safe order %d: have %d expected of %d", + idx, t1->value, t2->value); + ++idx; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_entry_preorder(&tree.root, node, struct test_struct, + node) { + if (idx >= cnt) + break; + struct test_struct *t = + json_tree_entry(tokens_preorder[idx], + struct test_struct, node); + is(&node->node, tokens_preorder[idx], + "test foreach entry pre order %d: have %d expected of %d", + idx, node->value, t->value); + idx++; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_entry_postorder(&tree.root, node, struct test_struct, + node) { + if (idx >= cnt) + break; + struct test_struct *t = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(&node->node, tree_nodes_postorder[idx], + "test foreach entry post order %d: have %d expected of %d", + idx, node->value, t->value); + idx++; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + /* Test record deletion. */ + is(records[3].node.max_child_idx, 7, "max_child_index %d expected of %d", + records[3].node.max_child_idx, 7); + json_tree_del(&tree, &records[5].node); + is(records[3].node.max_child_idx, 1, "max_child_index %d expected of %d", + records[3].node.max_child_idx, 1); + json_tree_del(&tree, &records[4].node); + is(records[3].node.max_child_idx, 0, "max_child_index %d expected of %d", + records[3].node.max_child_idx, 0); + node = json_tree_lookup_path_entry(&tree, NULL, path3, strlen(path3),\ + JSON_INDEX_BASE, struct test_struct, + node); + is(node, NULL, "lookup removed path '%s'", path3); + + node = json_tree_lookup_path_entry(&tree, NULL, path4, strlen(path4), + JSON_INDEX_BASE, struct test_struct, + node); + is(node, NULL, "lookup removed path '%s'", path4); + + node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2), + JSON_INDEX_BASE, struct test_struct, + node); + is(node, &records[3], "lookup path was not corrupted '%s'", path2); + + const struct json_token *tree_nodes_postorder_new[] = + {&records[1].node, &records[3].node, + &records[2].node, &records[0].node}; + cnt = sizeof(tree_nodes_postorder_new) / + sizeof(tree_nodes_postorder_new[0]); + idx = 0; + json_tree_foreach_entry_safe(&tree.root, node, struct test_struct, + node, node_tmp) { + if (idx >= cnt) + break; + struct test_struct *t = + json_tree_entry(tree_nodes_postorder_new[idx], + struct test_struct, node); + is(&node->node, tree_nodes_postorder_new[idx], + "test foreach entry safe order %d: have %d expected of %d", + idx, node->value, t->value); + json_tree_del(&tree, &node->node); + idx++; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + json_tree_destroy(&tree); + + check_plan(); + footer(); +} + int main() { header(); - plan(2); + plan(3); test_basic(); test_errors(); + test_tree(); int rc = check_plan(); footer(); diff --git a/test/unit/json_path.result b/test/unit/json_path.result index ad6f07e5a..0ee970c8c 100644 --- a/test/unit/json_path.result +++ b/test/unit/json_path.result @@ -1,5 +1,5 @@ *** main *** -1..2 +1..3 *** test_basic *** 1..71 ok 1 - parse <[1]> @@ -100,4 +100,62 @@ ok 1 - subtests ok 21 - invalid token for index_base 1 ok 2 - subtests *** test_errors: done *** + *** test_tree *** + 1..54 + ok 1 - add path '[1][10]' + ok 2 - add path '[1][20].file' + ok 3 - add path '[1][20].file[2]' + ok 4 - add path '[1][20].file[8]' + ok 5 - add path '[1][20]["file"][8]' + ok 6 - lookup path '[1][10]' + ok 7 - lookup path '[1][20].file' + ok 8 - lookup unregistered path '[1][3]' + ok 9 - test foreach pre order 0: have 0 expected of 0 + ok 10 - test foreach pre order 1: have 1 expected of 1 + ok 11 - test foreach pre order 2: have 2 expected of 2 + ok 12 - test foreach pre order 3: have 3 expected of 3 + ok 13 - test foreach pre order 4: have 4 expected of 4 + ok 14 - test foreach pre order 5: have 5 expected of 5 + ok 15 - records iterated count 6 of 6 + ok 16 - test foreach post order 0: have 1 expected of 1 + ok 17 - test foreach post order 1: have 4 expected of 4 + ok 18 - test foreach post order 2: have 5 expected of 5 + ok 19 - test foreach post order 3: have 3 expected of 3 + ok 20 - test foreach post order 4: have 2 expected of 2 + ok 21 - test foreach post order 5: have 0 expected of 0 + ok 22 - records iterated count 6 of 6 + ok 23 - test foreach safe order 0: have 1 expected of 1 + ok 24 - test foreach safe order 1: have 4 expected of 4 + ok 25 - test foreach safe order 2: have 5 expected of 5 + ok 26 - test foreach safe order 3: have 3 expected of 3 + ok 27 - test foreach safe order 4: have 2 expected of 2 + ok 28 - test foreach safe order 5: have 0 expected of 0 + ok 29 - records iterated count 6 of 6 + ok 30 - test foreach entry pre order 0: have 0 expected of 0 + ok 31 - test foreach entry pre order 1: have 1 expected of 1 + ok 32 - test foreach entry pre order 2: have 2 expected of 2 + ok 33 - test foreach entry pre order 3: have 3 expected of 3 + ok 34 - test foreach entry pre order 4: have 4 expected of 4 + ok 35 - test foreach entry pre order 5: have 5 expected of 5 + ok 36 - records iterated count 6 of 6 + ok 37 - test foreach entry post order 0: have 1 expected of 1 + ok 38 - test foreach entry post order 1: have 4 expected of 4 + ok 39 - test foreach entry post order 2: have 5 expected of 5 + ok 40 - test foreach entry post order 3: have 3 expected of 3 + ok 41 - test foreach entry post order 4: have 2 expected of 2 + ok 42 - test foreach entry post order 5: have 0 expected of 0 + ok 43 - records iterated count 6 of 6 + ok 44 - max_child_index 7 expected of 7 + ok 45 - max_child_index 1 expected of 1 + ok 46 - max_child_index 0 expected of 0 + ok 47 - lookup removed path '[1][20].file[2]' + ok 48 - lookup removed path '[1][20].file[8]' + ok 49 - lookup path was not corrupted '[1][20].file' + ok 50 - test foreach entry safe order 0: have 1 expected of 1 + ok 51 - test foreach entry safe order 1: have 3 expected of 3 + ok 52 - test foreach entry safe order 2: have 2 expected of 2 + ok 53 - test foreach entry safe order 3: have 0 expected of 0 + ok 54 - records iterated count 4 of 4 +ok 3 - subtests + *** test_tree: done *** *** main: done *** -- 2.19.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 2/4] lib: implement JSON tree class for json library 2018-12-06 8:42 ` [PATCH v6 2/4] lib: implement JSON tree class for json library Kirill Shcherbatov @ 2018-12-10 17:36 ` Vladimir Davydov 0 siblings, 0 replies; 9+ messages in thread From: Vladimir Davydov @ 2018-12-10 17:36 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja On Thu, Dec 06, 2018 at 11:42:29AM +0300, Kirill Shcherbatov wrote: > New JSON tree class would store JSON paths for tuple fields > for registered non-plain indexes. It is a hierarchical data > structure that organize JSON nodes produced by parser. > Class provides API to lookup node by path and iterate over the > tree. > JSON Indexes patch require such functionality to make lookup > for tuple_fields by path, make initialization of field map and > build vynyl_stmt msgpack for secondary index via JSON tree > iteration. > > Needed for #1012 > --- > src/lib/json/CMakeLists.txt | 1 + > src/lib/json/json.c | 257 ++++++++++++++++++++++++++++++++ > src/lib/json/json.h | 287 +++++++++++++++++++++++++++++++++++- > test/unit/json_path.c | 243 +++++++++++++++++++++++++++++- > test/unit/json_path.result | 60 +++++++- > 5 files changed, 845 insertions(+), 3 deletions(-) I cleaned up the patch and pushed it to 2.1. Here's the list of my changes: - Forbid root = NULL in json_tree_lookup_path. - Make loop variable first in foreach macros argument list. - Use int instead of uint32_t unless uint32_t is specifically required. - Set max_child_idx to -1 if children array is empty to avoid ambiguity. - Fix child index allocation for JSON_TOKEN_STR. - Rework json_tree_add and json_tree_del to make them more straightforward. - Add comments and cleanup code. Here's the incremental diff: diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 65169b04..6f3cccc4 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -30,12 +30,19 @@ */ #include "json.h" -#include "third_party/PMurHash.h" + +#include <assert.h> #include <ctype.h> #include <stdbool.h> +#include <stddef.h> +#include <stdint.h> +#include <stdlib.h> +#include <string.h> #include <unicode/uchar.h> #include <unicode/utf8.h> + #include "trivia/util.h" +#include "third_party/PMurHash.h" /** * Read a single symbol from a string starting from an offset. @@ -245,7 +252,9 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token) } } -/** Compare JSON tokens keys. */ +/** + * Compare JSON token keys. + */ static int json_token_cmp(const struct json_token *a, const struct json_token *b) { @@ -279,14 +288,18 @@ json_token_cmp(const struct json_token *a, const struct json_token *b) static const uint32_t hash_seed = 13U; -/** Compute the hash value of a JSON token. */ +/** + * Compute the hash value of a JSON token. + * + * See the comment to json_tree::hash for implementation details. + */ static uint32_t json_token_hash(struct json_token *token) { uint32_t h = token->parent->hash; uint32_t carry = 0; const void *data; - uint32_t data_size; + int data_size; if (token->type == JSON_TOKEN_STR) { data = token->str; data_size = token->len; @@ -306,6 +319,8 @@ json_tree_create(struct json_tree *tree) memset(tree, 0, sizeof(struct json_tree)); tree->root.hash = hash_seed; tree->root.type = JSON_TOKEN_END; + tree->root.max_child_idx = -1; + tree->root.sibling_idx = -1; tree->hash = mh_json_new(); return tree->hash == NULL ? -1 : 0; } @@ -313,16 +328,10 @@ json_tree_create(struct json_tree *tree) static void json_token_destroy(struct json_token *token) { -#ifndef NDEBUG - /* Token mustn't have JSON subtree. */ - struct json_token *iter; - uint32_t nodes = 0; - json_tree_foreach_preorder(token, iter) - nodes++; - assert(nodes == 0); -#endif /* NDEBUG */ - + assert(token->max_child_idx == -1); + assert(token->sibling_idx == -1); free(token->children); + token->children = NULL; } void @@ -347,7 +356,7 @@ json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent, if (id == mh_end(tree->hash)) return NULL; struct json_token **entry = mh_json_node(tree->hash, id); - assert(entry != NULL); + assert(entry != NULL && *entry != NULL); return *entry; } @@ -355,89 +364,90 @@ int json_tree_add(struct json_tree *tree, struct json_token *parent, struct json_token *token) { - int ret = 0; - token->parent = parent; - uint32_t hash = json_token_hash(token); assert(json_tree_lookup(tree, parent, token) == NULL); - uint32_t insert_idx = token->type == JSON_TOKEN_NUM ? - (uint32_t)token->num : - parent->children_capacity; + token->parent = parent; + token->children = NULL; + token->children_capacity = 0; + token->max_child_idx = -1; + token->hash = json_token_hash(token); + int insert_idx = token->type == JSON_TOKEN_NUM ? + (int)token->num : parent->max_child_idx + 1; + /* + * Dynamically grow the children array if necessary. + */ if (insert_idx >= parent->children_capacity) { - uint32_t new_size = parent->children_capacity == 0 ? - 8 : 2 * parent->children_capacity; + int new_size = parent->children_capacity == 0 ? + 8 : 2 * parent->children_capacity; while (insert_idx >= new_size) new_size *= 2; - struct json_token **children = - realloc(parent->children, new_size*sizeof(void *)); - if (children == NULL) { - ret = -1; - goto end; - } + struct json_token **children = realloc(parent->children, + new_size * sizeof(void *)); + if (children == NULL) + return -1; /* out of memory */ memset(children + parent->children_capacity, 0, - (new_size - parent->children_capacity)*sizeof(void *)); + (new_size - parent->children_capacity) * sizeof(void *)); parent->children = children; parent->children_capacity = new_size; } + /* + * Insert the token into the hash (only for tokens representing + * JSON map entries, see the comment to json_tree::hash). + */ + if (token->type == JSON_TOKEN_STR) { + mh_int_t id = mh_json_put(tree->hash, + (const struct json_token **)&token, NULL, NULL); + if (id == mh_end(tree->hash)) + return -1; /* out of memory */ + } + /* + * Success, now we can insert the new token into its parent's + * children array. + */ assert(parent->children[insert_idx] == NULL); parent->children[insert_idx] = token; parent->max_child_idx = MAX(parent->max_child_idx, insert_idx); token->sibling_idx = insert_idx; - token->hash = hash; - token->parent = parent; - if (token->type != JSON_TOKEN_STR) - goto end; - /* - * Add string token to json_tree hash to make lookup - * by name. - */ - mh_int_t rc = - mh_json_put(tree->hash, (const struct json_token **)&token, - NULL, NULL); - if (rc == mh_end(tree->hash)) { - parent->children[insert_idx] = NULL; - ret = -1; - goto end; - } -end: assert(json_tree_lookup(tree, parent, token) == token); - return ret; + return 0; } void json_tree_del(struct json_tree *tree, struct json_token *token) { struct json_token *parent = token->parent; + assert(token->sibling_idx >= 0); + assert(parent->children[token->sibling_idx] == token); assert(json_tree_lookup(tree, parent, token) == token); - struct json_token **child_slot = &parent->children[token->sibling_idx]; - assert(child_slot != NULL && *child_slot == token); - *child_slot = NULL; - /* The max_child_idx field may require update. */ - if (token->sibling_idx == parent->max_child_idx && - parent->max_child_idx > 0) { - uint32_t idx = token->sibling_idx - 1; - while (idx > 0 && parent->children[idx] == 0) - idx--; - parent->max_child_idx = idx; + /* + * Clear the entry corresponding to this token in parent's + * children array and update max_child_idx if necessary. + */ + parent->children[token->sibling_idx] = NULL; + token->sibling_idx = -1; + while (parent->max_child_idx >= 0 && + parent->children[parent->max_child_idx] == NULL) + parent->max_child_idx--; + /* + * Remove the token from the hash (only for tokens representing + * JSON map entries, see the comment to json_tree::hash). + */ + if (token->type == JSON_TOKEN_STR) { + mh_int_t id = mh_json_find(tree->hash, token, NULL); + assert(id != mh_end(tree->hash)); + mh_json_del(tree->hash, id, NULL); } - if (token->type != JSON_TOKEN_STR) - goto end; - /* Remove string token from json_tree hash. */ - mh_int_t id = mh_json_find(tree->hash, token, NULL); - assert(id != mh_end(tree->hash)); - mh_json_del(tree->hash, id, NULL); json_token_destroy(token); -end: assert(json_tree_lookup(tree, parent, token) == NULL); } struct json_token * -json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, - const char *path, uint32_t path_len, uint32_t index_base) +json_tree_lookup_path(struct json_tree *tree, struct json_token *root, + const char *path, int path_len, int index_base) { int rc; struct json_lexer lexer; struct json_token token; - struct json_token *ret = parent != NULL ? parent : &tree->root; + struct json_token *ret = root; json_lexer_create(&lexer, path, path_len, index_base); while ((rc = json_lexer_next_token(&lexer, &token)) == 0 && token.type != JSON_TOKEN_END && ret != NULL) { @@ -448,6 +458,11 @@ json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, return ret; } +/** + * Return the child of @parent following @pos or NULL if @pos + * points to the last child in the children array. If @pos is + * NULL, this function returns the first child. + */ static struct json_token * json_tree_child_next(struct json_token *parent, struct json_token *pos) { @@ -455,12 +470,16 @@ json_tree_child_next(struct json_token *parent, struct json_token *pos) struct json_token **arr = parent->children; if (arr == NULL) return NULL; - uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0; + int idx = pos != NULL ? pos->sibling_idx + 1 : 0; while (idx <= parent->max_child_idx && arr[idx] == NULL) idx++; return idx <= parent->max_child_idx ? arr[idx] : NULL; } +/** + * Return the leftmost descendant of the tree rooted at @pos + * or NULL if the tree is empty. + */ static struct json_token * json_tree_leftmost(struct json_token *pos) { diff --git a/src/lib/json/json.h b/src/lib/json/json.h index 4ff2b9c6..8aa2765f 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -30,12 +30,18 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include <stddef.h> +#include <stdint.h> #include "trivia/util.h" #ifdef __cplusplus extern "C" { #endif +#ifndef typeof +#define typeof __typeof__ +#endif + /** * Lexer for JSON paths: * <field>, <.field>, <[123]>, <['field']> and their combinations. @@ -84,26 +90,33 @@ struct json_token { /** Index value. */ uint64_t num; }; + /** Pointer to the parent token in a JSON tree. */ + struct json_token *parent; /** - * Hash value of the token. Used for lookups in a JSON tree. - * For more details, see the comment to json_tree::hash_table. - */ - uint32_t hash; - /** - * Array of child records. Indexes in this array - * match [token.num] index for JSON_TOKEN_NUM type and - * are allocated sequentially for JSON_TOKEN_STR child + * Array of child tokens in a JSON tree. Indexes in this + * array match [token.num] index for JSON_TOKEN_NUM type + * and are allocated sequentially for JSON_TOKEN_STR child * tokens. */ struct json_token **children; /** Allocation size of children array. */ - uint32_t children_capacity; - /** Max occupied index in the children array. */ - uint32_t max_child_idx; - /** Index of node in parent children array. */ - uint32_t sibling_idx; - /** Pointer to parent node. */ - struct json_token *parent; + int children_capacity; + /** + * Max occupied index in the children array or -1 if + * the children array is empty. + */ + int max_child_idx; + /** + * Index of this token in parent's children array or -1 + * if the token was removed from a JSON tree or represents + * a JSON tree root. + */ + int sibling_idx; + /** + * Hash value of the token. Used for lookups in a JSON tree. + * For more details, see the comment to json_tree::hash. + */ + uint32_t hash; }; struct mh_json_t; @@ -222,32 +235,43 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len, int json_lexer_next_token(struct json_lexer *lexer, struct json_token *token); -/** Create a JSON tree object to manage data relations. */ +/** + * Initialize a new empty JSON tree. + * + * Returns 0 on success, -1 on memory allocation error. + */ int json_tree_create(struct json_tree *tree); /** - * Destroy JSON tree object. This routine doesn't destroy attached - * subtree so it should be called at the end of manual destroy. + * Destroy a JSON tree. + * + * Note, this routine expects the tree to be empty - the caller + * is supposed to use json_tree_foreach_safe() and json_tree_del() + * to dismantle the tree before calling this function. */ void json_tree_destroy(struct json_tree *tree); -/** Internal function, use json_tree_lookup instead. */ +/** + * Internal function, use json_tree_lookup() instead. + */ struct json_token * json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent, const struct json_token *token); /** - * Look up a token in a tree at position specified with parent. + * Look up a token in a JSON tree given a parent token and a key. + * + * Returns NULL if not found. */ static inline struct json_token * json_tree_lookup(struct json_tree *tree, struct json_token *parent, const struct json_token *token) { struct json_token *ret = NULL; - if (token->type == JSON_TOKEN_NUM) { - ret = likely(token->num < parent->children_capacity) ? + if (likely(token->type == JSON_TOKEN_NUM)) { + ret = (int)token->num < parent->children_capacity ? parent->children[token->num] : NULL; } else { ret = json_tree_lookup_slowpath(tree, parent, token); @@ -256,147 +280,171 @@ json_tree_lookup(struct json_tree *tree, struct json_token *parent, } /** - * Append token to the given parent position in a JSON tree. The - * parent mustn't have a child with such content. + * Insert a token into a JSON tree at a given position. + * + * The token key (json_token::type and num/str,len) must be set, + * e.g. by json_lexer_next_token(). The caller must ensure that + * no token with the same key is linked to the same parent, e.g. + * with json_tree_lookup(). + * + * Returns 0 on success, -1 on memory allocation error. */ int json_tree_add(struct json_tree *tree, struct json_token *parent, struct json_token *token); /** - * Delete a JSON tree token at the given parent position in JSON - * tree. Token entry shouldn't have subtree. + * Delete a token from a JSON tree. + * + * The token must be linked to the tree (see json_tree_add()) + * and must not have any children. */ void json_tree_del(struct json_tree *tree, struct json_token *token); -/** Lookup child by path in JSON tree. */ +/** + * Look up a token in a JSON tree by path. + * + * The path is relative to the given root token. In order to + * look up a token by absolute path, pass json_tree::root. + * The index_base is passed to json_lexer used for tokenizing + * the path, see json_lexer_create() for more details. + * + * Returns NULL if no token is found or the path is invalid. + */ struct json_token * -json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, - const char *path, uint32_t path_len, uint32_t index_base); +json_tree_lookup_path(struct json_tree *tree, struct json_token *root, + const char *path, int path_len, int index_base); -/** Do pre order traversal in JSON tree. */ +/** + * Perform pre-order traversal in a JSON subtree rooted + * at a given node. + * + * To start a new traversal, pass NULL for @pos. + * Returns @root at the first iteration. + * Returns NULL when traversal is over. + */ struct json_token * json_tree_preorder_next(struct json_token *root, struct json_token *pos); -/** Do post order traversal in JSON tree. */ +/** + * Perform post-order traversal in a JSON subtree rooted + * at a given node. + * + * To start a new traversal, pass NULL for @pos. + * Returns @root at the last iteration. + * Returns NULL when traversal is over. + */ struct json_token * json_tree_postorder_next(struct json_token *root, struct json_token *pos); -#ifndef typeof -#define typeof __typeof__ -#endif +/** + * Perform pre-order JSON tree traversal. + * Note, this function does not visit the root node. + * See also json_tree_preorder_next(). + */ +#define json_tree_foreach_preorder(pos, root) \ + for ((pos) = json_tree_preorder_next((root), (root)); \ + (pos) != NULL; \ + (pos) = json_tree_preorder_next((root), (pos))) -/** Return container entry by json_tree_node node. */ -#define json_tree_entry(node, type, member) \ - container_of((node), type, member) /** - * Return container entry by json_tree_node or NULL if - * node is NULL. + * Perform post-order JSON tree traversal. + * Note, this function does not visit the root node. + * See also json_tree_postorder_next(). */ -#define json_tree_entry_safe(node, type, member) ({ \ - (node) != NULL ? json_tree_entry((node), type, member) : NULL; \ -}) +#define json_tree_foreach_postorder(pos, root) \ + for ((pos) = json_tree_postorder_next((root), NULL); \ + (pos) != (root); \ + (pos) = json_tree_postorder_next((root), (pos))) -/** Do entry pre order traversal in JSON tree */ -#define json_tree_preorder_next_entry(root, node, type, member) ({ \ - struct json_token *__next = \ - json_tree_preorder_next((root), (node)); \ - json_tree_entry_safe(__next, type, member); \ -}) +/** + * Perform post-order JSON tree traversal safe against node removal. + * Note, this function does not visit the root node. + * See also json_tree_postorder_next(). + */ +#define json_tree_foreach_safe(pos, root, tmp) \ + for ((pos) = json_tree_postorder_next((root), NULL); \ + (pos) != (root) && \ + ((tmp) = json_tree_postorder_next((root), (pos))) != NULL; \ + (pos) = (tmp)) /** - * Do entry post order traversal in JSON tree. - * This cycle doesn't visit root node. + * Return a container of a json_tree_token. */ -#define json_tree_postorder_next_entry(root, node, type, member) ({ \ - struct json_token *__next = \ - json_tree_postorder_next((root), (node)); \ - json_tree_entry_safe(__next, type, member); \ -}) +#define json_tree_entry(token, type, member) \ + container_of((token), type, member) /** - * Lookup in tree by path and return entry. - * This cycle doesn't visit root node. + * Return a container of a json_tree_token or NULL if @token is NULL. */ -#define json_tree_lookup_path_entry(tree, parent, path, path_len, \ - index_base, type, member) ({ \ - struct json_token *__node = \ - json_tree_lookup_path((tree), (parent), path, path_len, \ - index_base); \ - json_tree_entry_safe(__node, type, member); \ -}) +#define json_tree_entry_safe(token, type, member) \ + ((token) != NULL ? json_tree_entry((token), type, member) : NULL); \ -/** Lookup in tree by token and return entry. */ +/** + * Container-aware wrapper around json_tree_lookup(). + */ #define json_tree_lookup_entry(tree, parent, token, type, member) ({ \ - struct json_token *__node = \ - json_tree_lookup((tree), (parent), token); \ - json_tree_entry_safe(__node, type, member); \ + struct json_token *ret = json_tree_lookup((tree), (parent), (token));\ + json_tree_entry_safe(ret, type, member); \ }) /** - * Do pre-order traversal in JSON tree. - * This cycle doesn't visit root node. + * Container-aware wrapper around json_tree_lookup_path(). */ -#define json_tree_foreach_preorder(root, node) \ - for ((node) = json_tree_preorder_next((root), (root)); \ - (node) != NULL; \ - (node) = json_tree_preorder_next((root), (node))) +#define json_tree_lookup_path_entry(tree, root, path, path_len, index_base, \ + type, member) ({ \ + struct json_token *ret = json_tree_lookup_path((tree), (root), \ + (path), (path_len), (index_base)); \ + json_tree_entry_safe(ret, type, member); \ +}) /** - * Do post-order traversal in JSON tree. - * This cycle doesn't visit root node. + * Container-aware wrapper around json_tree_preorder_next(). */ -#define json_tree_foreach_postorder(root, node) \ - for ((node) = json_tree_postorder_next((root), NULL); \ - (node) != (root); \ - (node) = json_tree_postorder_next((root), (node))) +#define json_tree_preorder_next_entry(root, pos, type, member) ({ \ + struct json_token *next = json_tree_preorder_next((root), (pos)); \ + json_tree_entry_safe(next, type, member); \ +}) /** - * Do safe post-order traversal in JSON tree. - * May be used for destructors. - * This cycle doesn't visit root node. + * Container-aware wrapper around json_tree_postorder_next(). */ -#define json_tree_foreach_safe(root, node, tmp) \ - for ((node) = json_tree_postorder_next((root), NULL); \ - (node) != (root) && \ - ((tmp) = json_tree_postorder_next((root), (node))); \ - (node) = (tmp)) +#define json_tree_postorder_next_entry(root, pos, type, member) ({ \ + struct json_token *next = json_tree_postorder_next((root), (pos)); \ + json_tree_entry_safe(next, type, member); \ +}) /** - * Do post-order traversal in JSON tree and return entry. - * This cycle doesn't visit root node. + * Container-aware version of json_tree_foreach_preorder(). */ -#define json_tree_foreach_entry_preorder(root, node, type, member) \ - for ((node) = json_tree_preorder_next_entry((root), (root), \ - type, member); \ - (node) != NULL; \ - (node) = json_tree_preorder_next_entry((root), &(node)->member, \ - type, member)) +#define json_tree_foreach_entry_preorder(pos, root, type, member) \ + for ((pos) = json_tree_preorder_next_entry((root), (root), \ + type, member); \ + (pos) != NULL; \ + (pos) = json_tree_preorder_next_entry((root), &(pos)->member, \ + type, member)) /** - * Do pre-order traversal in JSON tree and return entry. - * This cycle doesn't visit root node. + * Container-aware version of json_tree_foreach_postorder(). */ -#define json_tree_foreach_entry_postorder(root, node, type, member) \ - for ((node) = json_tree_postorder_next_entry((root), NULL, type, \ - member); \ - &(node)->member != (root); \ - (node) = json_tree_postorder_next_entry((root), &(node)->member,\ +#define json_tree_foreach_entry_postorder(pos, root, type, member) \ + for ((pos) = json_tree_postorder_next_entry((root), NULL, \ + type, member); \ + &(pos)->member != (root); \ + (pos) = json_tree_postorder_next_entry((root), &(pos)->member, \ type, member)) /** - * Do secure post-order traversal in JSON tree and return entry. - * This cycle doesn't visit root node. + * Container-aware version of json_tree_foreach_safe(). */ -#define json_tree_foreach_entry_safe(root, node, type, member, tmp) \ - for ((node) = json_tree_postorder_next_entry((root), NULL, \ - type, member); \ - &(node)->member != (root) && \ - ((tmp) = json_tree_postorder_next_entry((root), \ - &(node)->member, \ - type, member)); \ - (node) = (tmp)) +#define json_tree_foreach_entry_safe(pos, root, type, member, tmp) \ + for ((pos) = json_tree_postorder_next_entry((root), NULL, \ + type, member); \ + &(pos)->member != (root) && \ + ((tmp) = json_tree_postorder_next_entry((root), &(pos)->member, \ + type, member)) != NULL; \ + (pos) = (tmp)) #ifdef __cplusplus } diff --git a/test/unit/json_path.c b/test/unit/json_path.c index 04d36688..b72865e4 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -250,17 +250,17 @@ test_tree() &records_idx); is(node, &records[5], "add path '%s'", path4_copy); - node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1), - JSON_INDEX_BASE, struct test_struct, - node); + node = json_tree_lookup_path_entry(&tree, &tree.root, path1, + strlen(path1), JSON_INDEX_BASE, + struct test_struct, node); is(node, &records[1], "lookup path '%s'", path1); - node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2), - JSON_INDEX_BASE, struct test_struct, - node); + node = json_tree_lookup_path_entry(&tree, &tree.root, path2, + strlen(path2), JSON_INDEX_BASE, + struct test_struct, node); is(node, &records[3], "lookup path '%s'", path2); - node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered, + node = json_tree_lookup_path_entry(&tree, &tree.root, path_unregistered, strlen(path_unregistered), JSON_INDEX_BASE, struct test_struct, node); @@ -274,7 +274,7 @@ test_tree() int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]); int idx = 0; - json_tree_foreach_preorder(&tree.root, token) { + json_tree_foreach_preorder(token, &tree.root) { if (idx >= cnt) break; struct test_struct *t1 = @@ -294,7 +294,7 @@ test_tree() &records[3].node, &records[2].node, &records[0].node}; cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]); idx = 0; - json_tree_foreach_postorder(&tree.root, token) { + json_tree_foreach_postorder(token, &tree.root) { if (idx >= cnt) break; struct test_struct *t1 = @@ -310,7 +310,7 @@ test_tree() is(idx, cnt, "records iterated count %d of %d", idx, cnt); idx = 0; - json_tree_foreach_safe(&tree.root, token, tmp) { + json_tree_foreach_safe(token, &tree.root, tmp) { if (idx >= cnt) break; struct test_struct *t1 = @@ -326,7 +326,7 @@ test_tree() is(idx, cnt, "records iterated count %d of %d", idx, cnt); idx = 0; - json_tree_foreach_entry_preorder(&tree.root, node, struct test_struct, + json_tree_foreach_entry_preorder(node, &tree.root, struct test_struct, node) { if (idx >= cnt) break; @@ -341,7 +341,7 @@ test_tree() is(idx, cnt, "records iterated count %d of %d", idx, cnt); idx = 0; - json_tree_foreach_entry_postorder(&tree.root, node, struct test_struct, + json_tree_foreach_entry_postorder(node, &tree.root, struct test_struct, node) { if (idx >= cnt) break; @@ -362,21 +362,21 @@ test_tree() is(records[3].node.max_child_idx, 1, "max_child_index %d expected of %d", records[3].node.max_child_idx, 1); json_tree_del(&tree, &records[4].node); - is(records[3].node.max_child_idx, 0, "max_child_index %d expected of %d", - records[3].node.max_child_idx, 0); - node = json_tree_lookup_path_entry(&tree, NULL, path3, strlen(path3),\ - JSON_INDEX_BASE, struct test_struct, - node); + is(records[3].node.max_child_idx, -1, "max_child_index %d expected of %d", + records[3].node.max_child_idx, -1); + node = json_tree_lookup_path_entry(&tree, &tree.root, path3, + strlen(path3), JSON_INDEX_BASE, + struct test_struct, node); is(node, NULL, "lookup removed path '%s'", path3); - node = json_tree_lookup_path_entry(&tree, NULL, path4, strlen(path4), - JSON_INDEX_BASE, struct test_struct, - node); + node = json_tree_lookup_path_entry(&tree, &tree.root, path4, + strlen(path4), JSON_INDEX_BASE, + struct test_struct, node); is(node, NULL, "lookup removed path '%s'", path4); - node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2), - JSON_INDEX_BASE, struct test_struct, - node); + node = json_tree_lookup_path_entry(&tree, &tree.root, path2, + strlen(path2), JSON_INDEX_BASE, + struct test_struct, node); is(node, &records[3], "lookup path was not corrupted '%s'", path2); const struct json_token *tree_nodes_postorder_new[] = @@ -385,7 +385,7 @@ test_tree() cnt = sizeof(tree_nodes_postorder_new) / sizeof(tree_nodes_postorder_new[0]); idx = 0; - json_tree_foreach_entry_safe(&tree.root, node, struct test_struct, + json_tree_foreach_entry_safe(node, &tree.root, struct test_struct, node, node_tmp) { if (idx >= cnt) break; diff --git a/test/unit/json_path.result b/test/unit/json_path.result index 0ee970c8..ee23e70c 100644 --- a/test/unit/json_path.result +++ b/test/unit/json_path.result @@ -147,7 +147,7 @@ ok 2 - subtests ok 43 - records iterated count 6 of 6 ok 44 - max_child_index 7 expected of 7 ok 45 - max_child_index 1 expected of 1 - ok 46 - max_child_index 0 expected of 0 + ok 46 - max_child_index -1 expected of -1 ok 47 - lookup removed path '[1][20].file[2]' ok 48 - lookup removed path '[1][20].file[8]' ok 49 - lookup path was not corrupted '[1][20].file' ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 3/4] box: manage format fields with JSON tree class 2018-12-06 8:42 [PATCH v6 0/4] box: JSON tree class Kirill Shcherbatov 2018-12-06 8:42 ` [PATCH v6 1/4] lib: make index_base support for json_lexer Kirill Shcherbatov 2018-12-06 8:42 ` [PATCH v6 2/4] lib: implement JSON tree class for json library Kirill Shcherbatov @ 2018-12-06 8:42 ` Kirill Shcherbatov 2018-12-11 9:52 ` Vladimir Davydov 2018-12-06 8:42 ` [PATCH v6 4/4] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov 3 siblings, 1 reply; 9+ messages in thread From: Kirill Shcherbatov @ 2018-12-06 8:42 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov As we going to work with format fields in a unified way, we started to use the tree JSON class for working with first-level format fields. Needed for #1012 --- src/box/sql.c | 18 +++--- src/box/sql/build.c | 5 +- src/box/tuple.c | 10 ++-- src/box/tuple_format.c | 122 +++++++++++++++++++++++++++++------------ src/box/tuple_format.h | 49 ++++++++++++++--- src/box/vy_stmt.c | 4 +- 6 files changed, 148 insertions(+), 60 deletions(-) diff --git a/src/box/sql.c b/src/box/sql.c index 25b3213fd..2cb0edbff 100644 --- a/src/box/sql.c +++ b/src/box/sql.c @@ -199,8 +199,9 @@ tarantoolSqlite3TupleColumnFast(BtCursor *pCur, u32 fieldno, u32 *field_size) assert(pCur->last_tuple != NULL); struct tuple_format *format = tuple_format(pCur->last_tuple); - if (fieldno >= format->field_count || - format->fields[fieldno].offset_slot == TUPLE_OFFSET_SLOT_NIL) + if (fieldno >= tuple_format_field_count(format) || + tuple_format_field(format, fieldno)->offset_slot == + TUPLE_OFFSET_SLOT_NIL) return NULL; const char *field = tuple_field(pCur->last_tuple, fieldno); const char *end = field; @@ -895,7 +896,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor, struct key_def *key_def; const struct tuple *tuple; const char *base; - const struct tuple_format *format; + struct tuple_format *format; const uint32_t *field_map; uint32_t field_count, next_fieldno = 0; const char *p, *field0; @@ -913,7 +914,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor, base = tuple_data(tuple); format = tuple_format(tuple); field_map = tuple_field_map(tuple); - field_count = format->field_count; + field_count = tuple_format_field_count(format); field0 = base; mp_decode_array(&field0); p = field0; for (i = 0; i < n; i++) { /* @@ -931,9 +932,10 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor, uint32_t fieldno = key_def->parts[i].fieldno; if (fieldno != next_fieldno) { + struct tuple_field *field = + tuple_format_field(format, fieldno); if (fieldno >= field_count || - format->fields[fieldno].offset_slot == - TUPLE_OFFSET_SLOT_NIL) { + field->offset_slot == TUPLE_OFFSET_SLOT_NIL) { /* Outdated field_map. */ uint32_t j = 0; @@ -941,9 +943,7 @@ tarantoolSqlite3IdxKeyCompare(struct BtCursor *cursor, while (j++ != fieldno) mp_next(&p); } else { - p = base + field_map[ - format->fields[fieldno].offset_slot - ]; + p = base + field_map[field->offset_slot]; } } next_fieldno = fieldno + 1; diff --git a/src/box/sql/build.c b/src/box/sql/build.c index 52f0bde15..b5abaeeda 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -936,8 +936,9 @@ sql_column_collation(struct space_def *def, uint32_t column, uint32_t *coll_id) struct coll_id *collation = coll_by_id(*coll_id); return collation != NULL ? collation->coll : NULL; } - *coll_id = space->format->fields[column].coll_id; - return space->format->fields[column].coll; + struct tuple_field *field = tuple_format_field(space->format, column); + *coll_id = field->coll_id; + return field->coll; } struct ExprList * diff --git a/src/box/tuple.c b/src/box/tuple.c index ef4d16f39..aae1c3cdd 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -138,7 +138,7 @@ runtime_tuple_delete(struct tuple_format *format, struct tuple *tuple) int tuple_validate_raw(struct tuple_format *format, const char *tuple) { - if (format->field_count == 0) + if (tuple_format_field_count(format) == 0) return 0; /* Nothing to check */ /* Check to see if the tuple has a sufficient number of fields. */ @@ -158,10 +158,12 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) } /* Check field types */ - struct tuple_field *field = &format->fields[0]; + struct tuple_field *field = tuple_format_field(format, 0); uint32_t i = 0; - uint32_t defined_field_count = MIN(field_count, format->field_count); - for (; i < defined_field_count; ++i, ++field) { + uint32_t defined_field_count = + MIN(field_count, tuple_format_field_count(format)); + for (; i < defined_field_count; ++i) { + field = tuple_format_field(format, i); if (key_mp_type_validate(field->type, mp_typeof(*tuple), ER_FIELD_TYPE, i + TUPLE_INDEX_BASE, tuple_field_is_nullable(field))) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 149248144..eeb68f5dd 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -38,10 +38,27 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL; static uint32_t formats_size = 0, formats_capacity = 0; -static const struct tuple_field tuple_field_default = { - FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, - ON_CONFLICT_ACTION_NONE, NULL, COLL_NONE, -}; +static struct tuple_field * +tuple_field_new(void) +{ + struct tuple_field *ret = calloc(1, sizeof(struct tuple_field)); + if (ret == NULL) { + diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc", + "ret"); + return NULL; + } + ret->type = FIELD_TYPE_ANY; + ret->offset_slot = TUPLE_OFFSET_SLOT_NIL; + ret->coll_id = COLL_NONE; + ret->nullable_action = ON_CONFLICT_ACTION_NONE; + return ret; +} + +static void +tuple_field_delete(struct tuple_field *field) +{ + free(field); +} static int tuple_format_use_key_part(struct tuple_format *format, @@ -49,8 +66,8 @@ tuple_format_use_key_part(struct tuple_format *format, const struct key_part *part, bool is_sequential, int *current_slot) { - assert(part->fieldno < format->field_count); - struct tuple_field *field = &format->fields[part->fieldno]; + assert(part->fieldno < tuple_format_field_count(format)); + struct tuple_field *field = tuple_format_field(format, part->fieldno); /* * If a field is not present in the space format, * inherit nullable action of the first key part @@ -138,16 +155,15 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, format->min_field_count = tuple_format_min_field_count(keys, key_count, fields, field_count); - if (format->field_count == 0) { + if (tuple_format_field_count(format) == 0) { format->field_map_size = 0; return 0; } /* Initialize defined fields */ for (uint32_t i = 0; i < field_count; ++i) { - format->fields[i].is_key_part = false; - format->fields[i].type = fields[i].type; - format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL; - format->fields[i].nullable_action = fields[i].nullable_action; + struct tuple_field *field = tuple_format_field(format, i); + field->type = fields[i].type; + field->nullable_action = fields[i].nullable_action; struct coll *coll = NULL; uint32_t cid = fields[i].coll_id; if (cid != COLL_NONE) { @@ -159,12 +175,9 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, } coll = coll_id->coll; } - format->fields[i].coll = coll; - format->fields[i].coll_id = cid; + field->coll = coll; + field->coll_id = cid; } - /* Initialize remaining fields */ - for (uint32_t i = field_count; i < format->field_count; i++) - format->fields[i] = tuple_field_default; int current_slot = 0; @@ -184,7 +197,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, } } - assert(format->fields[0].offset_slot == TUPLE_OFFSET_SLOT_NIL); + assert(tuple_format_field(format, 0)->offset_slot == + TUPLE_OFFSET_SLOT_NIL); size_t field_map_size = -current_slot * sizeof(uint32_t); if (field_map_size > UINT16_MAX) { /** tuple->data_offset is 16 bits */ @@ -242,6 +256,19 @@ tuple_format_deregister(struct tuple_format *format) format->id = FORMAT_ID_NIL; } +/** Destroy field JSON tree and release allocated memory. */ +static inline void +tuple_format_fields_destroy(struct tuple_format *format) +{ + struct tuple_field *field, *tmp; + json_tree_foreach_entry_safe(&format->fields.root, field, + struct tuple_field, token, tmp) { + json_tree_del(&format->fields, &field->token); + tuple_field_delete(field); + } + json_tree_destroy(&format->fields); +} + static struct tuple_format * tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, uint32_t space_field_count, struct tuple_dictionary *dict) @@ -258,39 +285,57 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, } } uint32_t field_count = MAX(space_field_count, index_field_count); - uint32_t total = sizeof(struct tuple_format) + - field_count * sizeof(struct tuple_field); - struct tuple_format *format = (struct tuple_format *) malloc(total); + struct tuple_format *format = malloc(sizeof(struct tuple_format)); if (format == NULL) { diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc", "tuple format"); return NULL; } + if (json_tree_create(&format->fields) != 0) { + free(format); + return NULL; + } + for (uint32_t fieldno = 0; fieldno < field_count; fieldno++) { + struct tuple_field *field = tuple_field_new(); + if (field == NULL) + goto error; + field->token.num = fieldno; + field->token.type = JSON_TOKEN_NUM; + if (json_tree_add(&format->fields, &format->fields.root, + &field->token) != 0) { + diag_set(OutOfMemory, 0, "json_tree_add", + "&format->tree"); + tuple_field_delete(field); + goto error; + } + } if (dict == NULL) { assert(space_field_count == 0); format->dict = tuple_dictionary_new(NULL, 0); - if (format->dict == NULL) { - free(format); - return NULL; - } + if (format->dict == NULL) + goto error; } else { format->dict = dict; tuple_dictionary_ref(dict); } format->refs = 0; format->id = FORMAT_ID_NIL; - format->field_count = field_count; format->index_field_count = index_field_count; format->exact_field_count = 0; format->min_field_count = 0; return format; +error:; + tuple_format_fields_destroy(format); + free(format); + return NULL; } /** Free tuple format resources, doesn't unregister. */ static inline void tuple_format_destroy(struct tuple_format *format) { + tuple_format_fields_destroy(format); tuple_dictionary_unref(format->dict); } @@ -328,18 +373,21 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys, } bool -tuple_format1_can_store_format2_tuples(const struct tuple_format *format1, - const struct tuple_format *format2) +tuple_format1_can_store_format2_tuples(struct tuple_format *format1, + struct tuple_format *format2) { if (format1->exact_field_count != format2->exact_field_count) return false; - for (uint32_t i = 0; i < format1->field_count; ++i) { - const struct tuple_field *field1 = &format1->fields[i]; + uint32_t format1_field_count = tuple_format_field_count(format1); + uint32_t format2_field_count = tuple_format_field_count(format2); + for (uint32_t i = 0; i < format1_field_count; ++i) { + const struct tuple_field *field1 = + tuple_format_field(format1, i); /* * The field has a data type in format1, but has * no data type in format2. */ - if (i >= format2->field_count) { + if (i >= format2_field_count) { /* * The field can get a name added * for it, and this doesn't require a data @@ -355,7 +403,8 @@ tuple_format1_can_store_format2_tuples(const struct tuple_format *format1, else return false; } - const struct tuple_field *field2 = &format2->fields[i]; + const struct tuple_field *field2 = + tuple_format_field(format2, i); if (! field_type1_contains_type2(field1->type, field2->type)) return false; /* @@ -374,7 +423,7 @@ int tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, const char *tuple, bool validate) { - if (format->field_count == 0) + if (tuple_format_field_count(format) == 0) return 0; /* Nothing to initialize */ const char *pos = tuple; @@ -397,17 +446,17 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, /* first field is simply accessible, so we do not store offset to it */ enum mp_type mp_type = mp_typeof(*pos); - const struct tuple_field *field = &format->fields[0]; + const struct tuple_field *field = + tuple_format_field((struct tuple_format *)format, 0); if (validate && key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, TUPLE_INDEX_BASE, tuple_field_is_nullable(field))) return -1; mp_next(&pos); /* other fields...*/ - ++field; uint32_t i = 1; uint32_t defined_field_count = MIN(field_count, validate ? - format->field_count : + tuple_format_field_count(format) : format->index_field_count); if (field_count < format->index_field_count) { /* @@ -417,7 +466,8 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, memset((char *)field_map - format->field_map_size, 0, format->field_map_size); } - for (; i < defined_field_count; ++i, ++field) { + for (; i < defined_field_count; ++i) { + field = tuple_format_field((struct tuple_format *)format, i); mp_type = mp_typeof(*pos); if (validate && key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 232df22b2..68e9205b9 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -34,6 +34,7 @@ #include "key_def.h" #include "field_def.h" #include "errinj.h" +#include "json/json.h" #include "tuple_dictionary.h" #if defined(__cplusplus) @@ -113,6 +114,8 @@ struct tuple_field { struct coll *coll; /** Collation identifier. */ uint32_t coll_id; + /** Link in tuple_format::fields. */ + struct json_token token; }; /** @@ -166,16 +169,46 @@ struct tuple_format { * index_field_count <= min_field_count <= field_count. */ uint32_t min_field_count; - /* Length of 'fields' array. */ - uint32_t field_count; /** * Shared names storage used by all formats of a space. */ struct tuple_dictionary *dict; - /* Formats of the fields */ - struct tuple_field fields[0]; + /** + * Fields comprising the format, organized in a tree. + * First level nodes correspond to tuple fields. + * Deeper levels define indexed JSON paths within + * tuple fields. Nodes of the tree are linked by + * tuple_field::token. + */ + struct json_tree fields; }; +/** + * Return a count of first level nodes correspond to tuple + * fields. + */ +static inline uint32_t +tuple_format_field_count(const struct tuple_format *format) +{ + const struct json_token *root = &format->fields.root; + return root->children != NULL ? root->max_child_idx + 1 : 0; +} + +/** + * Get the first level node corresponding to tuple field by its + * fieldno. + */ +static inline struct tuple_field * +tuple_format_field(struct tuple_format *format, uint32_t fieldno) +{ + assert(fieldno < tuple_format_field_count(format)); + struct json_token token; + token.type = JSON_TOKEN_NUM; + token.num = fieldno; + return json_tree_lookup_entry(&format->fields, &format->fields.root, + &token, struct tuple_field, token); +} + extern struct tuple_format **tuple_formats; static inline uint32_t @@ -238,8 +271,8 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys, * @retval True, if @a format1 can store any tuples of @a format2. */ bool -tuple_format1_can_store_format2_tuples(const struct tuple_format *format1, - const struct tuple_format *format2); +tuple_format1_can_store_format2_tuples(struct tuple_format *format1, + struct tuple_format *format2); /** * Calculate minimal field count of tuples with specified keys and @@ -333,7 +366,9 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple, return tuple; } - int32_t offset_slot = format->fields[field_no].offset_slot; + int32_t offset_slot = + tuple_format_field((struct tuple_format *)format, + field_no)->offset_slot; if (offset_slot != TUPLE_OFFSET_SLOT_NIL) { if (field_map[offset_slot] != 0) return tuple + field_map[offset_slot]; diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index d83840406..3e60fece9 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -411,7 +411,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, uint32_t *field_map = (uint32_t *) raw; char *wpos = mp_encode_array(raw, field_count); for (uint32_t i = 0; i < field_count; ++i) { - const struct tuple_field *field = &format->fields[i]; + const struct tuple_field *field = tuple_format_field(format, i); if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) field_map[field->offset_slot] = wpos - raw; if (iov[i].iov_base == NULL) { @@ -465,7 +465,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format, } char *pos = mp_encode_array(data, field_count); for (uint32_t i = 0; i < field_count; ++i) { - const struct tuple_field *field = &format->fields[i]; + const struct tuple_field *field = tuple_format_field(format, i); if (! field->is_key_part) { /* Unindexed field - write NIL. */ assert(i < src_count); -- 2.19.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 3/4] box: manage format fields with JSON tree class 2018-12-06 8:42 ` [PATCH v6 3/4] box: manage format fields with JSON tree class Kirill Shcherbatov @ 2018-12-11 9:52 ` Vladimir Davydov 0 siblings, 0 replies; 9+ messages in thread From: Vladimir Davydov @ 2018-12-11 9:52 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja On Thu, Dec 06, 2018 at 11:42:30AM +0300, Kirill Shcherbatov wrote: > As we going to work with format fields in a unified way, we > started to use the tree JSON class for working with first-level > format fields. > > Needed for #1012 > --- > src/box/sql.c | 18 +++--- > src/box/sql/build.c | 5 +- > src/box/tuple.c | 10 ++-- > src/box/tuple_format.c | 122 +++++++++++++++++++++++++++++------------ > src/box/tuple_format.h | 49 ++++++++++++++--- > src/box/vy_stmt.c | 4 +- > 6 files changed, 148 insertions(+), 60 deletions(-) Pushed to 2.1 with minor changes: diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index eeb68f5d..3af39a37 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -41,17 +41,18 @@ static uint32_t formats_size = 0, formats_capacity = 0; static struct tuple_field * tuple_field_new(void) { - struct tuple_field *ret = calloc(1, sizeof(struct tuple_field)); - if (ret == NULL) { + struct tuple_field *field = calloc(1, sizeof(struct tuple_field)); + if (field == NULL) { diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc", - "ret"); + "tuple field"); return NULL; } - ret->type = FIELD_TYPE_ANY; - ret->offset_slot = TUPLE_OFFSET_SLOT_NIL; - ret->coll_id = COLL_NONE; - ret->nullable_action = ON_CONFLICT_ACTION_NONE; - return ret; + field->token.type = JSON_TOKEN_END; + field->type = FIELD_TYPE_ANY; + field->offset_slot = TUPLE_OFFSET_SLOT_NIL; + field->coll_id = COLL_NONE; + field->nullable_action = ON_CONFLICT_ACTION_NONE; + return field; } static void @@ -198,7 +199,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, } assert(tuple_format_field(format, 0)->offset_slot == - TUPLE_OFFSET_SLOT_NIL); + TUPLE_OFFSET_SLOT_NIL); size_t field_map_size = -current_slot * sizeof(uint32_t); if (field_map_size > UINT16_MAX) { /** tuple->data_offset is 16 bits */ @@ -256,12 +257,15 @@ tuple_format_deregister(struct tuple_format *format) format->id = FORMAT_ID_NIL; } -/** Destroy field JSON tree and release allocated memory. */ -static inline void -tuple_format_fields_destroy(struct tuple_format *format) +/* + * Dismantle the tuple field tree attached to the format and free + * memory occupied by tuple fields. + */ +static void +tuple_format_destroy_fields(struct tuple_format *format) { struct tuple_field *field, *tmp; - json_tree_foreach_entry_safe(&format->fields.root, field, + json_tree_foreach_entry_safe(field, &format->fields.root, struct tuple_field, token, tmp) { json_tree_del(&format->fields, &field->token); tuple_field_delete(field); @@ -293,6 +297,8 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, return NULL; } if (json_tree_create(&format->fields) != 0) { + diag_set(OutOfMemory, 0, "json_lexer_create", + "tuple field tree"); free(format); return NULL; } @@ -305,7 +311,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, if (json_tree_add(&format->fields, &format->fields.root, &field->token) != 0) { diag_set(OutOfMemory, 0, "json_tree_add", - "&format->tree"); + "tuple field tree entry"); tuple_field_delete(field); goto error; } @@ -325,8 +331,8 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->exact_field_count = 0; format->min_field_count = 0; return format; -error:; - tuple_format_fields_destroy(format); +error: + tuple_format_destroy_fields(format); free(format); return NULL; } @@ -335,7 +341,7 @@ error:; static inline void tuple_format_destroy(struct tuple_format *format) { - tuple_format_fields_destroy(format); + tuple_format_destroy_fields(format); tuple_dictionary_unref(format->dict); } diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 68e9205b..94933780 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -184,8 +184,8 @@ struct tuple_format { }; /** - * Return a count of first level nodes correspond to tuple - * fields. + * Return the number of top-level tuple fields defined by + * a given format. */ static inline uint32_t tuple_format_field_count(const struct tuple_format *format) @@ -195,8 +195,8 @@ tuple_format_field_count(const struct tuple_format *format) } /** - * Get the first level node corresponding to tuple field by its - * fieldno. + * Return meta information of a top-level tuple field given + * a format and a field index. */ static inline struct tuple_field * tuple_format_field(struct tuple_format *format, uint32_t fieldno) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v6 4/4] lib: introduce json_path_cmp, json_path_validate 2018-12-06 8:42 [PATCH v6 0/4] box: JSON tree class Kirill Shcherbatov ` (2 preceding siblings ...) 2018-12-06 8:42 ` [PATCH v6 3/4] box: manage format fields with JSON tree class Kirill Shcherbatov @ 2018-12-06 8:42 ` Kirill Shcherbatov 2018-12-10 17:38 ` Vladimir Davydov 3 siblings, 1 reply; 9+ messages in thread From: Kirill Shcherbatov @ 2018-12-06 8:42 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Introduced json_path_validate routine to ensure user-defined JSON path is valid. This will be required to raise an error if an incorrect user-defined jason-path is detected. Introduced json_path_cmp routine to compare JSON paths that may have different representation. Note that: - in case of paths that have same token-sequence prefix, the path having more tokens is assumed to be greater - both paths to compare should be valid Needed for #1012 --- src/lib/json/json.c | 29 +++++++++++++++++++++++++++++ src/lib/json/json.h | 28 ++++++++++++++++++++++++++++ test/unit/json_path.c | 37 ++++++++++++++++++++++++++++++++++++- test/unit/json_path.result | 13 ++++++++++++- 4 files changed, 105 insertions(+), 2 deletions(-) diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 65169b047..5586e59fc 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -500,3 +500,32 @@ json_tree_postorder_next(struct json_token *root, struct json_token *pos) return json_tree_leftmost(next); return pos->parent; } + +int +json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len, + uint32_t index_base) +{ + struct json_lexer lexer_a, lexer_b; + json_lexer_create(&lexer_a, a, a_len, index_base); + json_lexer_create(&lexer_b, b, b_len, index_base); + struct json_token token_a, token_b; + token_a.parent = NULL; + token_b.parent = NULL; + int rc_a, rc_b; + while ((rc_a = json_lexer_next_token(&lexer_a, &token_a)) == 0 && + (rc_b = json_lexer_next_token(&lexer_b, &token_b)) == 0 && + token_a.type != JSON_TOKEN_END && + token_b.type != JSON_TOKEN_END) { + int rc = json_token_cmp(&token_a, &token_b); + if (rc != 0) + return rc; + } + /* Paths a and b should be valid. */ + assert(rc_b == 0 && rc_b == 0); + /* + * The parser stopped because the end of one of the paths + * was reached. As JSON_TOKEN_END > JSON_TOKEN_{NUM, STR}, + * the path having more tokens has lower key.type value. + */ + return token_b.type - token_a.type; +} diff --git a/src/lib/json/json.h b/src/lib/json/json.h index 4ff2b9c67..aae337e29 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -30,6 +30,7 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include <stdbool.h> #include "trivia/util.h" #ifdef __cplusplus @@ -222,6 +223,33 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len, int json_lexer_next_token(struct json_lexer *lexer, struct json_token *token); +/** + * Compare two JSON paths using Lexer class. + * - in case of paths that have same token-sequence prefix, + * the path having more tokens is assumed to be greater + * - both paths should be valid + * (may be tested with json_path_validate). + */ +int +json_path_cmp(const char *a, uint32_t a_len, const char *b, uint32_t b_len, + uint32_t index_base); + +/** + * Check if the passed JSON path is valid. + * Return 0 for valid path and error position for invalid. + */ +static inline int +json_path_validate(const char *path, uint32_t path_len, uint32_t index_base) +{ + struct json_lexer lexer; + json_lexer_create(&lexer, path, path_len, index_base); + struct json_token token; + int rc; + while ((rc = json_lexer_next_token(&lexer, &token)) == 0 && + token.type != JSON_TOKEN_END) {}; + return rc; +} + /** Create a JSON tree object to manage data relations. */ int json_tree_create(struct json_tree *tree); diff --git a/test/unit/json_path.c b/test/unit/json_path.c index 04d36688b..f345b8125 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -405,15 +405,50 @@ test_tree() footer(); } +void +test_path_cmp() +{ + const char *a = "Data[1][\"FIO\"].fname"; + uint32_t a_len = strlen(a); + const struct path_and_errpos rc[] = { + {a, 0}, + {"[\"Data\"][1].FIO[\"fname\"]", 0}, + {"Data[1]", 1}, + {"Data[1][\"FIO\"].fname[1]", -1}, + {"Data[1][\"Info\"].fname[1]", -1}, + }; + header(); + plan(lengthof(rc) + 2); + for (size_t i = 0; i < lengthof(rc); ++i) { + const char *path = rc[i].path; + int errpos = rc[i].errpos; + int rc = json_path_cmp(a, a_len, path, strlen(path), + TUPLE_INDEX_BASE); + if (rc > 0) rc = 1; + if (rc < 0) rc = -1; + is(rc, errpos, "path cmp result \"%s\" with \"%s\": " + "have %d, expected %d", a, path, rc, errpos); + } + const char *invalid = "Data[[1][\"FIO\"].fname"; + int ret = json_path_validate(a, strlen(a), TUPLE_INDEX_BASE); + is(ret, 0, "path %s is valid", a); + ret = json_path_validate(invalid, strlen(invalid), TUPLE_INDEX_BASE); + is(ret, 6, "path %s error pos %d expected %d", invalid, ret, 6); + + check_plan(); + footer(); +} + int main() { header(); - plan(3); + plan(4); test_basic(); test_errors(); test_tree(); + test_path_cmp(); int rc = check_plan(); footer(); diff --git a/test/unit/json_path.result b/test/unit/json_path.result index 0ee970c8c..cf0fa51c4 100644 --- a/test/unit/json_path.result +++ b/test/unit/json_path.result @@ -1,5 +1,5 @@ *** main *** -1..3 +1..4 *** test_basic *** 1..71 ok 1 - parse <[1]> @@ -158,4 +158,15 @@ ok 2 - subtests ok 54 - records iterated count 4 of 4 ok 3 - subtests *** test_tree: done *** + *** test_path_cmp *** + 1..7 + ok 1 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname": have 0, expected 0 + ok 2 - path cmp result "Data[1]["FIO"].fname" with "["Data"][1].FIO["fname"]": have 0, expected 0 + ok 3 - path cmp result "Data[1]["FIO"].fname" with "Data[1]": have 1, expected 1 + ok 4 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname[1]": have -1, expected -1 + ok 5 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["Info"].fname[1]": have -1, expected -1 + ok 6 - path Data[1]["FIO"].fname is valid + ok 7 - path Data[[1]["FIO"].fname error pos 6 expected 6 +ok 4 - subtests + *** test_path_cmp: done *** *** main: done *** -- 2.19.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6 4/4] lib: introduce json_path_cmp, json_path_validate 2018-12-06 8:42 ` [PATCH v6 4/4] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov @ 2018-12-10 17:38 ` Vladimir Davydov 0 siblings, 0 replies; 9+ messages in thread From: Vladimir Davydov @ 2018-12-10 17:38 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja On Thu, Dec 06, 2018 at 11:42:31AM +0300, Kirill Shcherbatov wrote: > Introduced json_path_validate routine to ensure user-defined > JSON path is valid. This will be required to raise an error if > an incorrect user-defined jason-path is detected. > > Introduced json_path_cmp routine to compare JSON paths that may > have different representation. > Note that: > - in case of paths that have same token-sequence prefix, > the path having more tokens is assumed to be greater > - both paths to compare should be valid > > Needed for #1012 > +/** > + * Check if the passed JSON path is valid. > + * Return 0 for valid path and error position for invalid. > + */ > +static inline int > +json_path_validate(const char *path, uint32_t path_len, uint32_t index_base) > +{ > + struct json_lexer lexer; > + json_lexer_create(&lexer, path, path_len, index_base); > + struct json_token token; > + int rc; > + while ((rc = json_lexer_next_token(&lexer, &token)) == 0 && > + token.type != JSON_TOKEN_END) {}; > + return rc; > +} No need to define this function in the header. I moved it to json.c and pushed the patch to 2.1. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-12-11 9:52 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-06 8:42 [PATCH v6 0/4] box: JSON tree class Kirill Shcherbatov 2018-12-06 8:42 ` [PATCH v6 1/4] lib: make index_base support for json_lexer Kirill Shcherbatov 2018-12-10 17:34 ` Vladimir Davydov 2018-12-06 8:42 ` [PATCH v6 2/4] lib: implement JSON tree class for json library Kirill Shcherbatov 2018-12-10 17:36 ` Vladimir Davydov 2018-12-06 8:42 ` [PATCH v6 3/4] box: manage format fields with JSON tree class Kirill Shcherbatov 2018-12-11 9:52 ` Vladimir Davydov 2018-12-06 8:42 ` [PATCH v6 4/4] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov 2018-12-10 17:38 ` Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox