From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 4 Dec 2018 18:22:18 +0300 From: Vladimir Davydov Subject: Re: [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library Message-ID: <20181204152217.fuaw2bk5x7a5dlk2@esperanza> References: <02671a3d0a2236ecd6e12c0bc51b7f5e39272a2f.1543229303.git.kshcherbatov@tarantool.org> <20181129173816.kprfjhki5o7ytfbl@esperanza> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181129173816.kprfjhki5o7ytfbl@esperanza> To: Kirill Shcherbatov Cc: tarantool-patches@freelists.org, Kostya Osipov List-ID: On Thu, Nov 29, 2018 at 08:38:16PM +0300, Vladimir Davydov wrote: > > +/** JSON tree object to manage tokens relations. */ > > +struct json_tree { > > + /** JSON tree root node. */ > > + struct json_token root; > > + /** Hashtable af all tree nodes. */ > > + struct mh_json_t *hash; > > How does it work? In a year nobody will remember, and to understand how > this thing operates we will have to dive deep into the code... /** * Element of a JSON path. It can be either string or number. * String idenfiers are in ["..."] and between dots. Numbers are * indexes in [...]. * * May be organized in a tree-like structure reflecting a JSON * document structure, for more details see the comment to struct * json_tree. */ struct json_token { ... }; /** * This structure is used for organizing JSON tokens produced * by a lexer in a tree-like structure reflecting a JSON document * structure. * * Each intermediate node of the tree corresponds to either * a JSON map or an array, depending on the key type used by * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively). * Leaf nodes may represent both complex JSON structures and * final values - it is not mandated by the JSON tree design. * The root of the tree doesn't have a key and is preallocated * when the tree is created. * * The json_token structure is intrusive by design, i.e. to store * arbitrary information in a JSON tree, one has to incorporate it * into a user defined structure. * * Example: * * struct data { * ... * struct json_token token; * }; * * struct json_tree tree; * json_tree_create(&tree); * struct json_token *parent = &tree->root; * * // Add a path to the tree. * struct data *data = data_new(); * struct json_lexer lexer; * json_lexer_create(&lexer, path, path_len); * json_lexer_next_token(&lexer, &data->token); * while (data->token.type != JSON_TOKEN_END) { * json_tree_add(&tree, parent, &data->token); * parent = &data->token; * data = data_new(); * json_lexer_next_token(&lexer, &data->token); * } * data_delete(data); * * // Look up a path in the tree. * data = json_tree_lookup_path(&tree, &tree.root, path, path_len); */ struct json_tree { /** * Preallocated token corresponding to the JSON tree root. * It doesn't have a key (set to JSON_TOKEN_END). */ struct json_token root; /** * Hash table that is used to quickly look up a token * corresponding to a JSON map item given a key and * a parent token. We store all tokens that have type * JSON_TOKEN_STR in this hash table. Apparently, we * don't need to store JSON_TOKEN_NUM tokens as we can * quickly look them up in the children array anyway. * * The hash table uses pair as key, so * even tokens that happen to have the same key will * have different keys in the hash. To look up a tree * node corresponding to a particular path, we split * the path into tokens and look up the first token * in the root node and each following token in the * node returned at the previous step. * * We compute a hash value for a token by hashing its * key using the hash value of its parent as seed. This * is equivalent to computing hash for the path leading * to the token. However, we don't need to recompute * hash starting from the root at each step as we * descend the tree looking for a specific path, and we * can start descent from any node, not only from the root. * * As a consequence of this hashing technique, even * though we don't need to store JSON_TOKEN_NUM tokens * in the hash table, we still have to compute hash * values for them. */ struct mh_json_t *hash; };