From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Kirill Shcherbatov Subject: Re: [tarantool-patches] Re: [PATCH v5 2/9] lib: implement JSON tree class for json library References: <02671a3d0a2236ecd6e12c0bc51b7f5e39272a2f.1543229303.git.kshcherbatov@tarantool.org> <20181129173816.kprfjhki5o7ytfbl@esperanza> Message-ID: <3c7bb503-561c-19b0-1197-f714b6f384d4@tarantool.org> Date: Tue, 4 Dec 2018 18:47:27 +0300 MIME-Version: 1.0 In-Reply-To: <20181129173816.kprfjhki5o7ytfbl@esperanza> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit To: tarantool-patches@freelists.org, Vladimir Davydov Cc: Kostya Osipov List-ID: > In general looks OK, please see a few comments below. :) > Please merge the two functions - the more functions we have for doing > roughly the same thing (token comparison in this case), the more > difficult it gets to follow the code. > > I understand that you intend to use json_token_key_cmp (without parent > comparison) to compare paths in a future patch, but IMO you'd better > nullify token->parent in json_path_cmp before token comparison instead > of having two functions. > > The resulting function should be called json_token_cmp. Ok, done. > You always pass token->parent->hash to this function so the 'seed' > argument is not really needed. Ok, done. > This is a rather pointless assertion as mh_json_delete will crash anyway > if tree->hash is NULL. Besides, how would anyone ever step on it? By > passing an uninitialized json_tree to this function? This will hardly > ever happen. > > OTOH it would be good to ensure that the tree is empty here, because > this is a subtle part of the function protocol. Ok, done. > 'token' should be marked const here. Ok, done. > I'd rather require the caller to pass tree->root explicitly. > This would save us a comparison. Ok, done. > Please don't assign a whole object when you need just a few fields:> This is a warm path after all. Ok, done. > unlikely() isn't needed here - there are plenty of function calls so it > will hardly make any difference. Besides, why would one expect that the > key is likely to be present in the tree? Ok, fixed. > May be, it's worth making this case (JSON_TOKEN_NUM) inline, to save a > function call in case of a top-level tuple field lookup? BTW likely() > would be appropriate there. Introduce json_tree_lookup_slowpath and inline wrapper json_tree_lookup > Useless unlikely(). Ok, done >> >> Ouch, this looks much more complex than it was in the previous version. >> Please revert. Quoting myself: >> >> } > + if (insert_idx >= parent->children_count) { >> } > + uint32_t new_size = insert_idx + 1; >> } >> } We usually double the size with each allocation. If this is intentional, >> } please add a comment. >> >> I didn't push you to change that. I just wanted you to add a comment >> saying that we can afford quadratic algorithmic complexity here, because >> this function is far from a hot path and we expect the tree to be rather >> small. >Come to think of it, don't bother. Doubling the array will probably >increases chances that the allocation will stay close to the node >itself, and it looks saner in a common sense. Let's please just rename >child_size to child_count_max and increase the initial allocation up to >say 8, OK? Ok, done. > Again, unlikely() is pointless here. Ok, done. > What's this loop for? Can't you just use sibling_idx here? > Indentation. You are right, it is a legacy code, reworked. > This function may be called many times from json_tree_foreach helper. > The latter is invoked at each tuple creation so we must strive to make > it as fast as we can. Now, suppose you have JSON arrays of 100 elements > with the last one indexed. You'll be iterating over 99 elements for > nothing. Is it really that bad? I'm not sure, because the iteration > should finish pretty quickly, but it looks rather ugly to me. > > A suggestion: may be, we could use rlist, as you did it before, for > iteration and the children array for lookup in an array? That is the > children array wouldn't be allocated at all for JSON_TOKEN_STR, only for > JSON_TOKEN_NUM, and it would be used solely for quick access, not for > iteration. BTW sibling_idx wouldn't be needed then. Would it make sense? As we discussed verbally, this is not hot path for our typical scenarios; tuple_format_create, tuple_format_delete, and (not so cold, but) vy_stmt_new_surrogate_from_key - may be not extremal fast. Also this memory region would be in cache and linear comparison of hypothetical hundred zeros is not really time-consuming. Moreover, we are going to "flatenning" direction where all fields would be present in JSON tree. >> /** >> * Element of a JSON path. It can be either string or number. >> * String idenfiers are in ["..."] and between dots. Numbers are >> - * indexes in [...]. >> + * indexes in [...]. May be organized in a JSON tree. > > This is such a lame comment :-( > > Can we say a few words here about the json_token lifetime: how it is > first filled by lexer, how it then can be passed to json_tree_add to > form a tree, how the tree is organized... May be, even add a code > snippet to the comment. >> + } key; > > After some consideration, I don't think that it's worth moving the token > identifier to anonymous struct 'key' - it doesn't conflict with other > members and using 'num' or 'str/len' directly is shorter than 'key.num' > and 'key.str/len'. Also, the 'type' still has type json_token_type, not > json_token_key_type, which suggests that it should be under token.type, > not under token.key.type. Ok, done > } It took me a while to realize how you calculate rolling hash - I had to > } dive deep into the implementation. > > Alas, the new comment isn't a bit better than the previous one. > >> + uint32_t rolling_hash; > > Let's call it simply 'hash', short and clear. The rolling nature of the > hash should be explained in the comment. Ok, done > typo: indexe -> index > > BTW, json array start indexing from 0, not 1 AFAIK. Starting indexing > from 1 looks weird to me. > >> + * and are allocated sequently for JSON_TOKEN_NUM child > > typo: sequently -> sequentially Ok, done. > + /** Hashtable af all tree nodes. */ > > How does it work? In a year nobody will remember, and to understand how > this thing operates we will have to dive deep into the code... /** * Hashtable af all tree nodes. * The uniqueness of the records in the table is * provided by: * 1) the value of json_token initialized by the parser * is ; * 2) pointer to parent json_token; * 3) private hash function value calculated for * json_token (1) using parent (2) hash value as a * seed. * In fact, only JSON_TOKEN_STR json_tokens are physically * added in hashtable, because JSON_TOKEN_NUM doesn't * require support structure to make lookup. * However, each json_tree token should have valid * hash value (3) as it may be required by following * subtree level. */ struct mh_json_t *hash; > It's definitely worth mentioning that these functions don't visit the > root node. Ok. > >> + >> +/** >> + * Make safe post-order traversal in JSON tree. >> + * May be used for destructors. >> + */ >> +#define json_tree_foreach_safe(node, root) \ >> +for (struct json_token *__next = json_tree_postorder_next((root), NULL); \ >> + (((node) = __next) && \ >> + (__next = json_tree_postorder_next((root), (node))), (node) != NULL);) > > IMHO it's rather difficult for understanding. Can we rewrite it so that > it uses for() loop as expected, i.e. initializes the loop variable in > the first part, checks for a loop condition in the second part, and > advances the loop variable in the third part? Something like: > > #define json_tree_foreach_safe(curr, root, next) > for (curr = json_tree_postorder_next(root, NULL), > next = curr ? json_tree_postorder_next(root, curr) : NULL; > curr != NULL; > curr = next, > next = curr ? json_tree_postorder_next(root, curr) : NULL) Ok, done. #define json_tree_foreach_safe(root, node, tmp) \ for ((node) = json_tree_postorder_next((root), NULL); \ (node) != (root) && \ ((tmp) = json_tree_postorder_next((root), (node))); \ (node) = (tmp)) > Use container_of please. Ok, done #define json_tree_entry(node, type, member) \ container_of((node), type, member) > Sometimes you wrap macro arguments in parentheses (tree, parent), > sometimes you don't (token, path, path_len). Please fix. Ok, done. > Please add more paths. I'd like to see the following cases covered as > well: > > - JSON_TOKEN_STR as an intermediate node > - JSON_TOKEN_STR as a common node for two paths > - coinciding paths > - one path is a sub-path of another const char *path1 = "[1][10]"; const char *path2 = "[1][20].file"; const char *path3 = "[1][20].file[2]"; const char *path4 = "[1][20].file[4]"; const char *path4_copy = "[1][20][\"file\"][4]"; const char *path_unregistered = "[1][3]"; ==================================================================== New JSON tree class would store JSON paths for tuple fields for registered non-plain indexes. It is a hierarchical data structure that organize JSON nodes produced by parser. Class provides API to lookup node by path and iterate over the tree. JSON Indexes patch require such functionality to make lookup for tuple_fields by path, make initialization of field map and build vynyl_stmt msgpack for secondary index via JSON tree iteration. Need for #1012 --- src/lib/json/CMakeLists.txt | 1 + src/lib/json/json.c | 268 +++++++++++++++++++++++++++++++++ src/lib/json/json.h | 293 +++++++++++++++++++++++++++++++++++- test/unit/json_path.c | 211 +++++++++++++++++++++++++- test/unit/json_path.result | 56 ++++++- 5 files changed, 826 insertions(+), 3 deletions(-) diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt index 0f0739620..51a1f027a 100644 --- a/src/lib/json/CMakeLists.txt +++ b/src/lib/json/CMakeLists.txt @@ -4,3 +4,4 @@ set(lib_sources set_source_files_compile_flags(${lib_sources}) add_library(json_path STATIC ${lib_sources}) +target_link_libraries(json_path misc) diff --git a/src/lib/json/json.c b/src/lib/json/json.c index eb80e4bbc..58a842ef1 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -30,6 +30,7 @@ */ #include "json.h" +#include "third_party/PMurHash.h" #include #include #include @@ -241,3 +242,270 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token) return json_parse_identifier(lexer, token); } } + +/** Compare JSON tokens keys. */ +static int +json_token_cmp(const struct json_token *a, const struct json_token *b) +{ + if (a->parent != b->parent) + return a->parent - b->parent; + if (a->type != b->type) + return a->type - b->type; + int ret = 0; + if (a->type == JSON_TOKEN_STR) { + if (a->len != b->len) + return a->len - b->len; + ret = memcmp(a->str, b->str, a->len); + } else if (a->type == JSON_TOKEN_NUM) { + ret = a->num - b->num; + } else { + unreachable(); + } + return ret; +} + +#define MH_SOURCE 1 +#define mh_name _json +#define mh_key_t struct json_token ** +#define mh_node_t struct json_token * +#define mh_arg_t void * +#define mh_hash(a, arg) ((*a)->hash) +#define mh_hash_key(a, arg) ((*a)->hash) +#define mh_cmp(a, b, arg) (json_token_cmp((*a), (*b))) +#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg) +#include "salad/mhash.h" + +static const uint32_t hash_seed = 13U; + +/** Compute the hash value of a JSON token. */ +static uint32_t +json_token_hash(struct json_token *token) +{ + uint32_t h = token->parent->hash; + uint32_t carry = 0; + const void *data; + uint32_t data_size; + if (token->type == JSON_TOKEN_STR) { + data = token->str; + data_size = token->len; + } else if (token->type == JSON_TOKEN_NUM) { + data = &token->num; + data_size = sizeof(token->num); + } else { + unreachable(); + } + PMurHash32_Process(&h, &carry, data, data_size); + return PMurHash32_Result(h, carry, data_size); +} + +int +json_tree_create(struct json_tree *tree) +{ + memset(tree, 0, sizeof(struct json_tree)); + tree->root.hash = hash_seed; + tree->root.type = JSON_TOKEN_END; + tree->hash = mh_json_new(); + return tree->hash == NULL ? -1 : 0; +} + +static void +json_token_destroy(struct json_token *token) +{ + /* Token mustn't have JSON subtree. */ + #ifndef NDEBUG + struct json_token *iter; + uint32_t nodes = 0; + json_tree_foreach_preorder(token, iter) + nodes++; + assert(nodes == 0); + #endif /* NDEBUG */ + + free(token->children); +} + +void +json_tree_destroy(struct json_tree *tree) +{ + /* Tree must be empty. */ + #ifndef NDEBUG + struct json_token *iter; + uint32_t nodes = 0; + json_tree_foreach_preorder(&tree->root, iter) + nodes++; + assert(nodes == 0); + #endif /* NDEBUG */ + + json_token_destroy(&tree->root); + mh_json_delete(tree->hash); +} + +struct json_token * +json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent, + const struct json_token *token) +{ + assert(parent != NULL); + if (likely(token->type == JSON_TOKEN_STR)) { + struct json_token key, *key_ptr; + key.type = token->type; + key.str = token->str; + key.len = token->len; + key.parent = parent; + key.hash = json_token_hash(&key); + key_ptr = &key; + mh_int_t id = mh_json_find(tree->hash, &key_ptr, NULL); + if (id == mh_end(tree->hash)) + return NULL; + struct json_token **entry = mh_json_node(tree->hash, id); + assert(entry == NULL || (*entry)->parent == parent); + return entry != NULL ? *entry : NULL; + } else if (token->type == JSON_TOKEN_NUM) { + uint32_t idx = token->num - 1; + return likely(idx < parent->child_count) ? + parent->children[idx] : NULL; + } + unreachable(); + return NULL; +} + +int +json_tree_add(struct json_tree *tree, struct json_token *parent, + struct json_token *token) +{ + assert(parent != NULL); + token->parent = parent; + uint32_t hash = json_token_hash(token); + assert(json_tree_lookup(tree, parent, token) == NULL); + uint32_t insert_idx = token->type == JSON_TOKEN_NUM ? + (uint32_t)token->num - 1 : + parent->child_count_max; + if (insert_idx >= parent->child_count_max) { + uint32_t new_size = parent->child_count_max == 0 ? + 8 : 2 * parent->child_count_max; + while (insert_idx >= new_size) + new_size *= 2; + struct json_token **children = + realloc(parent->children, new_size*sizeof(void *)); + if (children == NULL) + return -1; + memset(children + parent->child_count_max, 0, + (new_size - parent->child_count_max)*sizeof(void *)); + parent->children = children; + parent->child_count_max = new_size; + } + assert(parent->children[insert_idx] == NULL); + parent->children[insert_idx] = token; + parent->child_count = MAX(parent->child_count, insert_idx + 1); + token->sibling_idx = insert_idx; + token->hash = hash; + token->parent = parent; + if (token->type != JSON_TOKEN_STR) + goto end; + /* + * Add string token to json_tree hash to make lookup + * by name. + */ + const struct json_token **key = + (const struct json_token **)&token; + mh_int_t rc = mh_json_put(tree->hash, key, NULL, NULL); + if (rc == mh_end(tree->hash)) { + parent->children[insert_idx] = NULL; + return -1; + } +end: + assert(json_tree_lookup(tree, parent, token) == token); + return 0; +} + +void +json_tree_del(struct json_tree *tree, struct json_token *token) +{ + struct json_token *parent = token->parent; + assert(json_tree_lookup(tree, parent, token) == token); + struct json_token **child_slot = &parent->children[token->sibling_idx]; + assert(child_slot != NULL && *child_slot == token); + *child_slot = NULL; + + if (token->type != JSON_TOKEN_STR) + goto end; + /* Remove string token from json_tree hash. */ + mh_int_t id = mh_json_find(tree->hash, &token, NULL); + assert(id != mh_end(tree->hash)); + mh_json_del(tree->hash, id, NULL); + json_token_destroy(token); +end: + assert(json_tree_lookup(tree, parent, token) == NULL); +} + +struct json_token * +json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, + const char *path, uint32_t path_len) +{ + int rc; + struct json_lexer lexer; + struct json_token token; + struct json_token *ret = parent != NULL ? parent : &tree->root; + json_lexer_create(&lexer, path, path_len); + while ((rc = json_lexer_next_token(&lexer, &token)) == 0 && + token.type != JSON_TOKEN_END && ret != NULL) { + ret = json_tree_lookup(tree, ret, &token); + } + if (rc != 0 || token.type != JSON_TOKEN_END) + return NULL; + return ret; +} + +static struct json_token * +json_tree_child_next(struct json_token *parent, struct json_token *pos) +{ + assert(pos == NULL || pos->parent == parent); + struct json_token **arr = parent->children; + uint32_t arr_size = parent->child_count; + if (arr == NULL) + return NULL; + uint32_t idx = pos != NULL ? pos->sibling_idx + 1 : 0; + while (idx < arr_size && arr[idx] == NULL) + idx++; + if (idx >= arr_size) + return NULL; + return arr[idx]; +} + +static struct json_token * +json_tree_leftmost(struct json_token *pos) +{ + struct json_token *last; + do { + last = pos; + pos = json_tree_child_next(pos, NULL); + } while (pos != NULL); + return last; +} + +struct json_token * +json_tree_preorder_next(struct json_token *root, struct json_token *pos) +{ + struct json_token *next = json_tree_child_next(pos, NULL); + if (next != NULL) + return next; + while (pos != root) { + next = json_tree_child_next(pos->parent, pos); + if (next != NULL) + return next; + pos = pos->parent; + } + return NULL; +} + +struct json_token * +json_tree_postorder_next(struct json_token *root, struct json_token *pos) +{ + struct json_token *next; + if (pos == NULL) + return json_tree_leftmost(root); + if (pos == root) + return NULL; + next = json_tree_child_next(pos->parent, pos); + if (next != NULL) + return json_tree_leftmost(next); + return pos->parent; +} diff --git a/src/lib/json/json.h b/src/lib/json/json.h index ead446878..948fcdb73 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -30,7 +30,7 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ -#include +#include "trivia/util.h" #ifdef __cplusplus extern "C" { @@ -62,6 +62,10 @@ enum json_token_type { * Element of a JSON path. It can be either string or number. * String idenfiers are in ["..."] and between dots. Numbers are * indexes in [...]. + * + * May be organized in a tree-like structure reflecting a JSON + * document structure, for more details see the comment to struct + * json_tree. */ struct json_token { enum json_token_type type; @@ -75,6 +79,114 @@ struct json_token { /** Index value. */ uint64_t num; }; + /** + * Hash value of the token. Used for lookups in a JSON tree. + * For more details, see the comment to json_tree::hash_table. + */ + uint32_t hash; + /** + * Array of child records. Indexes in this array + * match array index for JSON_TOKEN_STR tokens type + * and are allocated sequentially for JSON_TOKEN_NUM child + * tokens. + */ + struct json_token **children; + /** Allocation size of children array. */ + uint32_t child_count_max; + /** + * Count of defined children array items. Equal the + * maximum index of inserted item. + */ + uint32_t child_count; + /** Index of node in parent children array. */ + uint32_t sibling_idx; + /** Pointer to parent node. */ + struct json_token *parent; +}; + +struct mh_json_t; + +/** + * This structure is used for organizing JSON tokens produced + * by a lexer in a tree-like structure reflecting a JSON document + * structure. + * + * Each intermediate node of the tree corresponds to either + * a JSON map or an array, depending on the key type used by + * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively). + * Leaf nodes may represent both complex JSON structures and + * final values - it is not mandated by the JSON tree design. + * The root of the tree doesn't have a key and is preallocated + * when the tree is created. + * + * The json_token structure is intrusive by design, i.e. to store + * arbitrary information in a JSON tree, one has to incorporate it + * into a user defined structure. + * + * Example: + * + * struct data { + * ... + * struct json_token token; + * }; + * + * struct json_tree tree; + * json_tree_create(&tree); + * struct json_token *parent = &tree->root; + * + * // Add a path to the tree. + * struct data *data = data_new(); + * struct json_lexer lexer; + * json_lexer_create(&lexer, path, path_len); + * json_lexer_next_token(&lexer, &data->token); + * while (data->token.type != JSON_TOKEN_END) { + * json_tree_add(&tree, parent, &data->token); + * parent = &data->token; + * data = data_new(); + * json_lexer_next_token(&lexer, &data->token); + * } + * data_delete(data); + * + * // Look up a path in the tree. + * data = json_tree_lookup_path(&tree, &tree.root, + * path, path_len); + */ +struct json_tree { + /** + * Preallocated token corresponding to the JSON tree root. + * It doesn't have a key (set to JSON_TOKEN_END). + */ + struct json_token root; + /** + * Hash table that is used to quickly look up a token + * corresponding to a JSON map item given a key and + * a parent token. We store all tokens that have type + * JSON_TOKEN_STR in this hash table. Apparently, we + * don't need to store JSON_TOKEN_NUM tokens as we can + * quickly look them up in the children array anyway. + * + * The hash table uses pair as key, so + * even tokens that happen to have the same key will + * have different keys in the hash. To look up a tree + * node corresponding to a particular path, we split + * the path into tokens and look up the first token + * in the root node and each following token in the + * node returned at the previous step. + * + * We compute a hash value for a token by hashing its + * key using the hash value of its parent as seed. This + * is equivalent to computing hash for the path leading + * to the token. However, we don't need to recompute + * hash starting from the root at each step as we + * descend the tree looking for a specific path, and we + * can start descent from any node, not only from the root. + * + * As a consequence of this hashing technique, even + * though we don't need to store JSON_TOKEN_NUM tokens + * in the hash table, we still have to compute hash + * values for them. + */ + struct mh_json_t *hash; }; /** @@ -104,6 +216,185 @@ json_lexer_create(struct json_lexer *lexer, const char *src, int src_len) int json_lexer_next_token(struct json_lexer *lexer, struct json_token *token); +/** Create a JSON tree object to manage data relations. */ +int +json_tree_create(struct json_tree *tree); + +/** + * Destroy JSON tree object. This routine doesn't destroy attached + * subtree so it should be called at the end of manual destroy. + */ +void +json_tree_destroy(struct json_tree *tree); + +/** + * Make child lookup in JSON tree by token at position specified + * with parent. + */ +struct json_token * +json_tree_lookup_slowpath(struct json_tree *tree, struct json_token *parent, + const struct json_token *token); + +/** + * Make child lookup in JSON tree by token at position specified + * with parent without function call in the best-case. */ +static inline struct json_token * +json_tree_lookup(struct json_tree *tree, struct json_token *parent, + const struct json_token *token) +{ + struct json_token *ret = NULL; + if (token->type == JSON_TOKEN_NUM) { + uint32_t idx = token->num - 1; + ret = likely(idx < parent->child_count_max) ? + parent->children[idx] : NULL; + } else { + ret = json_tree_lookup_slowpath(tree, parent, token); + } + return ret; +} + +/** + * Append token to the given parent position in a JSON tree. The + * parent mustn't have a child with such content. + */ +int +json_tree_add(struct json_tree *tree, struct json_token *parent, + struct json_token *token); + +/** + * Delete a JSON tree token at the given parent position in JSON + * tree. Token entry shouldn't have subtree. + */ +void +json_tree_del(struct json_tree *tree, struct json_token *token); + +/** Make child lookup by path in JSON tree. */ +struct json_token * +json_tree_lookup_path(struct json_tree *tree, struct json_token *parent, + const char *path, uint32_t path_len); + +/** Make pre order traversal in JSON tree. */ +struct json_token * +json_tree_preorder_next(struct json_token *root, struct json_token *pos); + +/** Make post order traversal in JSON tree. */ +struct json_token * +json_tree_postorder_next(struct json_token *root, struct json_token *pos); + +#ifndef typeof +#define typeof __typeof__ +#endif + +/** Return container entry by json_tree_node node. */ +#define json_tree_entry(node, type, member) \ + container_of((node), type, member) +/** + * Return container entry by json_tree_node or NULL if + * node is NULL. + */ +#define json_tree_entry_safe(node, type, member) ({ \ + (node) != NULL ? json_tree_entry((node), type, member) : NULL; \ +}) + +/** Make entry pre order traversal in JSON tree */ +#define json_tree_preorder_next_entry(root, node, type, member) ({ \ + struct json_token *__next = \ + json_tree_preorder_next((root), (node)); \ + json_tree_entry_safe(__next, type, member); \ +}) + +/** + * Make entry post order traversal in JSON tree. + * This cycle doesn't visit root node. + */ +#define json_tree_postorder_next_entry(root, node, type, member) ({ \ + struct json_token *__next = \ + json_tree_postorder_next((root), (node)); \ + json_tree_entry_safe(__next, type, member); \ +}) + +/** + * Make lookup in tree by path and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_lookup_path_entry(tree, parent, path, path_len, type, \ + member) ({ \ + struct json_token *__node = \ + json_tree_lookup_path((tree), (parent), path, path_len); \ + json_tree_entry_safe(__node, type, member); \ +}) + +/** Make lookup in tree by token and return entry. */ +#define json_tree_lookup_entry(tree, parent, token, type, member) ({ \ + struct json_token *__node = \ + json_tree_lookup((tree), (parent), token); \ + json_tree_entry_safe(__node, type, member); \ +}) + +/** + * Make pre-order traversal in JSON tree. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_preorder(root, node) \ + for ((node) = json_tree_preorder_next((root), (root)); \ + (node) != NULL; \ + (node) = json_tree_preorder_next((root), (node))) + +/** + * Make post-order traversal in JSON tree. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_postorder(root, node) \ + for ((node) = json_tree_postorder_next((root), NULL); \ + (node) != (root); \ + (node) = json_tree_postorder_next((root), (node))) + +/** + * Make safe post-order traversal in JSON tree. + * May be used for destructors. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_safe(root, node, tmp) \ + for ((node) = json_tree_postorder_next((root), NULL); \ + (node) != (root) && \ + ((tmp) = json_tree_postorder_next((root), (node))); \ + (node) = (tmp)) + +/** + * Make post-order traversal in JSON tree and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_entry_preorder(root, node, type, member) \ + for ((node) = json_tree_preorder_next_entry((root), (root), \ + type, member); \ + (node) != NULL; \ + (node) = json_tree_preorder_next_entry((root), &(node)->member, \ + type, member)) + +/** + * Make pre-order traversal in JSON tree and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_entry_postorder(root, node, type, member) \ + for ((node) = json_tree_postorder_next_entry((root), NULL, type, \ + member); \ + &(node)->member != (root); \ + (node) = json_tree_postorder_next_entry((root), &(node)->member,\ + type, member)) + +/** + * Make secure post-order traversal in JSON tree and return entry. + * This cycle doesn't visit root node. + */ +#define json_tree_foreach_entry_safe(root, node, type, member, tmp) \ + for ((node) = json_tree_postorder_next_entry((root), NULL, \ + type, member); \ + &(node)->member != (root) && \ + ((tmp) = json_tree_postorder_next_entry((root), \ + &(node)->member, \ + type, member)); \ + (node) = (tmp)) + #ifdef __cplusplus } #endif diff --git a/test/unit/json_path.c b/test/unit/json_path.c index a5f90ad98..1e1011237 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -2,6 +2,7 @@ #include "unit.h" #include "trivia/util.h" #include +#include #define reset_to_new_path(value) \ path = value; \ @@ -159,14 +160,222 @@ test_errors() footer(); } +struct test_struct { + int value; + struct json_token node; +}; + +struct test_struct * +test_struct_alloc(struct test_struct *records_pool, int *pool_idx) +{ + struct test_struct *ret = &records_pool[*pool_idx]; + *pool_idx = *pool_idx + 1; + memset(&ret->node, 0, sizeof(ret->node)); + return ret; +} + +struct test_struct * +test_add_path(struct json_tree *tree, const char *path, uint32_t path_len, + struct test_struct *records_pool, int *pool_idx) +{ + int rc; + struct json_lexer lexer; + struct json_token *parent = &tree->root; + json_lexer_create(&lexer, path, path_len); + struct test_struct *field = test_struct_alloc(records_pool, pool_idx); + while ((rc = json_lexer_next_token(&lexer, &field->node)) == 0 && + field->node.type != JSON_TOKEN_END) { + struct json_token *next = + json_tree_lookup(tree, parent, &field->node); + if (next == NULL) { + rc = json_tree_add(tree, parent, &field->node); + fail_if(rc != 0); + next = &field->node; + field = test_struct_alloc(records_pool, pool_idx); + } + parent = next; + } + fail_if(rc != 0 || field->node.type != JSON_TOKEN_END); + *pool_idx = *pool_idx - 1; + /* release field */ + return json_tree_entry(parent, struct test_struct, node); +} + +void +test_tree() +{ + header(); + plan(50); + + struct json_tree tree; + int rc = json_tree_create(&tree); + fail_if(rc != 0); + + struct test_struct records[7]; + for (int i = 0; i < 6; i++) + records[i].value = i; + + const char *path1 = "[1][10]"; + const char *path2 = "[1][20].file"; + const char *path3 = "[1][20].file[2]"; + const char *path4 = "[1][20].file[4]"; + const char *path4_copy = "[1][20][\"file\"][4]"; + const char *path_unregistered = "[1][3]"; + + int records_idx = 0; + struct test_struct *node, *node_tmp; + node = test_add_path(&tree, path1, strlen(path1), records, + &records_idx); + is(node, &records[1], "add path '%s'", path1); + + node = test_add_path(&tree, path2, strlen(path2), records, + &records_idx); + is(node, &records[3], "add path '%s'", path2); + + node = test_add_path(&tree, path3, strlen(path3), records, + &records_idx); + is(node, &records[4], "add path '%s'", path3); + + node = test_add_path(&tree, path4, strlen(path4), records, + &records_idx); + is(node, &records[5], "add path '%s'", path4); + + node = test_add_path(&tree, path4_copy, strlen(path4_copy), records, + &records_idx); + is(node, &records[5], "add path '%s'", path4_copy); + + node = json_tree_lookup_path_entry(&tree, NULL, path1, strlen(path1), + struct test_struct, node); + is(node, &records[1], "lookup path '%s'", path1); + + node = json_tree_lookup_path_entry(&tree, NULL, path2, strlen(path2), + struct test_struct, node); + is(node, &records[3], "lookup path '%s'", path2); + + node = json_tree_lookup_path_entry(&tree, NULL, path_unregistered, + strlen(path_unregistered), + struct test_struct, node); + is(node, NULL, "lookup unregistered path '%s'", path_unregistered); + + /* Test iterators. */ + struct json_token *token = NULL, *tmp; + const struct json_token *tokens_preorder[] = + {&records[0].node, &records[1].node, &records[2].node, + &records[3].node, &records[4].node, &records[5].node}; + int cnt = sizeof(tokens_preorder)/sizeof(tokens_preorder[0]); + int idx = 0; + + json_tree_foreach_preorder(&tree.root, token) { + if (idx >= cnt) + break; + struct test_struct *t1 = + json_tree_entry(token, struct test_struct, node); + struct test_struct *t2 = + json_tree_entry(tokens_preorder[idx], + struct test_struct, node); + is(token, tokens_preorder[idx], + "test foreach pre order %d: have %d expected of %d", + idx, t1->value, t2->value); + ++idx; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + const struct json_token *tree_nodes_postorder[] = + {&records[1].node, &records[4].node, &records[5].node, + &records[3].node, &records[2].node, &records[0].node}; + cnt = sizeof(tree_nodes_postorder)/sizeof(tree_nodes_postorder[0]); + idx = 0; + json_tree_foreach_postorder(&tree.root, token) { + if (idx >= cnt) + break; + struct test_struct *t1 = + json_tree_entry(token, struct test_struct, node); + struct test_struct *t2 = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(token, tree_nodes_postorder[idx], + "test foreach post order %d: have %d expected of %d", + idx, t1->value, t2->value); + ++idx; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_safe(&tree.root, token, tmp) { + if (idx >= cnt) + break; + struct test_struct *t1 = + json_tree_entry(token, struct test_struct, node); + struct test_struct *t2 = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(token, tree_nodes_postorder[idx], + "test foreach safe order %d: have %d expected of %d", + idx, t1->value, t2->value); + ++idx; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_entry_preorder(&tree.root, node, struct test_struct, + node) { + if (idx >= cnt) + break; + struct test_struct *t = + json_tree_entry(tokens_preorder[idx], + struct test_struct, node); + is(&node->node, tokens_preorder[idx], + "test foreach entry pre order %d: have %d expected of %d", + idx, node->value, t->value); + idx++; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_entry_postorder(&tree.root, node, struct test_struct, + node) { + if (idx >= cnt) + break; + struct test_struct *t = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(&node->node, tree_nodes_postorder[idx], + "test foreach entry post order %d: have %d expected of %d", + idx, node->value, t->value); + idx++; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + + idx = 0; + json_tree_foreach_entry_safe(&tree.root, node, struct test_struct, + node, node_tmp) { + if (idx >= cnt) + break; + struct test_struct *t = + json_tree_entry(tree_nodes_postorder[idx], + struct test_struct, node); + is(&node->node, tree_nodes_postorder[idx], + "test foreach entry safe order %d: have %d expected of %d", + idx, node->value, t->value); + json_tree_del(&tree, &node->node); + idx++; + } + is(idx, cnt, "records iterated count %d of %d", idx, cnt); + json_tree_destroy(&tree); + + check_plan(); + footer(); +} + int main() { header(); - plan(2); + plan(3); test_basic(); test_errors(); + test_tree(); int rc = check_plan(); footer(); diff --git a/test/unit/json_path.result b/test/unit/json_path.result index a2a2f829f..ea5fbd317 100644 --- a/test/unit/json_path.result +++ b/test/unit/json_path.result @@ -1,5 +1,5 @@ *** main *** -1..2 +1..3 *** test_basic *** 1..71 ok 1 - parse <[0]> @@ -99,4 +99,58 @@ ok 1 - subtests ok 20 - tab inside identifier ok 2 - subtests *** test_errors: done *** + *** test_tree *** + 1..50 + ok 1 - add path '[1][10]' + ok 2 - add path '[1][20].file' + ok 3 - add path '[1][20].file[2]' + ok 4 - add path '[1][20].file[4]' + ok 5 - add path '[1][20]["file"][4]' + ok 6 - lookup path '[1][10]' + ok 7 - lookup path '[1][20].file' + ok 8 - lookup unregistered path '[1][3]' + ok 9 - test foreach pre order 0: have 0 expected of 0 + ok 10 - test foreach pre order 1: have 1 expected of 1 + ok 11 - test foreach pre order 2: have 2 expected of 2 + ok 12 - test foreach pre order 3: have 3 expected of 3 + ok 13 - test foreach pre order 4: have 4 expected of 4 + ok 14 - test foreach pre order 5: have 5 expected of 5 + ok 15 - records iterated count 6 of 6 + ok 16 - test foreach post order 0: have 1 expected of 1 + ok 17 - test foreach post order 1: have 4 expected of 4 + ok 18 - test foreach post order 2: have 5 expected of 5 + ok 19 - test foreach post order 3: have 3 expected of 3 + ok 20 - test foreach post order 4: have 2 expected of 2 + ok 21 - test foreach post order 5: have 0 expected of 0 + ok 22 - records iterated count 6 of 6 + ok 23 - test foreach safe order 0: have 1 expected of 1 + ok 24 - test foreach safe order 1: have 4 expected of 4 + ok 25 - test foreach safe order 2: have 5 expected of 5 + ok 26 - test foreach safe order 3: have 3 expected of 3 + ok 27 - test foreach safe order 4: have 2 expected of 2 + ok 28 - test foreach safe order 5: have 0 expected of 0 + ok 29 - records iterated count 6 of 6 + ok 30 - test foreach entry pre order 0: have 0 expected of 0 + ok 31 - test foreach entry pre order 1: have 1 expected of 1 + ok 32 - test foreach entry pre order 2: have 2 expected of 2 + ok 33 - test foreach entry pre order 3: have 3 expected of 3 + ok 34 - test foreach entry pre order 4: have 4 expected of 4 + ok 35 - test foreach entry pre order 5: have 5 expected of 5 + ok 36 - records iterated count 6 of 6 + ok 37 - test foreach entry post order 0: have 1 expected of 1 + ok 38 - test foreach entry post order 1: have 4 expected of 4 + ok 39 - test foreach entry post order 2: have 5 expected of 5 + ok 40 - test foreach entry post order 3: have 3 expected of 3 + ok 41 - test foreach entry post order 4: have 2 expected of 2 + ok 42 - test foreach entry post order 5: have 0 expected of 0 + ok 43 - records iterated count 6 of 6 + ok 44 - test foreach entry safe order 0: have 1 expected of 1 + ok 45 - test foreach entry safe order 1: have 4 expected of 4 + ok 46 - test foreach entry safe order 2: have 5 expected of 5 + ok 47 - test foreach entry safe order 3: have 3 expected of 3 + ok 48 - test foreach entry safe order 4: have 2 expected of 2 + ok 49 - test foreach entry safe order 5: have 0 expected of 0 + ok 50 - records iterated count 6 of 6 +ok 3 - subtests + *** test_tree: done *** *** main: done *** -- 2.19.2