[tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library

Vladimir Davydov vdavydov.dev at gmail.com
Tue Dec 4 18:22:18 MSK 2018


On Thu, Nov 29, 2018 at 08:38:16PM +0300, Vladimir Davydov wrote:
> > +/** JSON tree object to manage tokens relations. */
> > +struct json_tree {
> > +	/** JSON tree root node. */
> > +	struct json_token root;
> > +	/** Hashtable af all tree nodes. */
> > +	struct mh_json_t *hash;
> 
> How does it work? In a year nobody will remember, and to understand how
> this thing operates we will have to dive deep into the code...


/**
 * Element of a JSON path. It can be either string or number.
 * String idenfiers are in ["..."] and between dots. Numbers are
 * indexes in [...].
 *
 * May be organized in a tree-like structure reflecting a JSON
 * document structure, for more details see the comment to struct
 * json_tree.
 */
struct json_token {
	...
};

/**
 * This structure is used for organizing JSON tokens produced
 * by a lexer in a tree-like structure reflecting a JSON document
 * structure.
 *
 * Each intermediate node of the tree corresponds to either
 * a JSON map or an array, depending on the key type used by
 * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively).
 * Leaf nodes may represent both complex JSON structures and
 * final values - it is not mandated by the JSON tree design.
 * The root of the tree doesn't have a key and is preallocated
 * when the tree is created.
 *
 * The json_token structure is intrusive by design, i.e. to store
 * arbitrary information in a JSON tree, one has to incorporate it
 * into a user defined structure.
 *
 * Example:
 *
 *   struct data {
 *           ...
 *           struct json_token token;
 *   };
 *
 *   struct json_tree tree;
 *   json_tree_create(&tree);
 *   struct json_token *parent = &tree->root;
 *
 *   // Add a path to the tree.
 *   struct data *data = data_new();
 *   struct json_lexer lexer;
 *   json_lexer_create(&lexer, path, path_len);
 *   json_lexer_next_token(&lexer, &data->token);
 *   while (data->token.type != JSON_TOKEN_END) {
 *           json_tree_add(&tree, parent, &data->token);
 *           parent = &data->token;
 *           data = data_new();
 *           json_lexer_next_token(&lexer, &data->token);
 *   }
 *   data_delete(data);
 *
 *   // Look up a path in the tree.
 *   data = json_tree_lookup_path(&tree, &tree.root, path, path_len);
 */
struct json_tree {
	/**
	 * Preallocated token corresponding to the JSON tree root.
	 * It doesn't have a key (set to JSON_TOKEN_END).
	 */
	struct json_token root;
	/**
	 * Hash table that is used to quickly look up a token
	 * corresponding to a JSON map item given a key and
	 * a parent token. We store all tokens that have type
	 * JSON_TOKEN_STR in this hash table. Apparently, we
	 * don't need to store JSON_TOKEN_NUM tokens as we can
	 * quickly look them up in the children array anyway.
	 *
	 * The hash table uses pair <parent, key> as key, so
	 * even tokens that happen to have the same key will
	 * have different keys in the hash. To look up a tree
	 * node corresponding to a particular path, we split
	 * the path into tokens and look up the first token
	 * in the root node and each following token in the
	 * node returned at the previous step.
	 *
	 * We compute a hash value for a token by hashing its
	 * key using the hash value of its parent as seed. This
	 * is equivalent to computing hash for the path leading
	 * to the token. However, we don't need to recompute
	 * hash starting from the root at each step as we
	 * descend the tree looking for a specific path, and we
	 * can start descent from any node, not only from the root.
	 *
	 * As a consequence of this hashing technique, even
	 * though we don't need to store JSON_TOKEN_NUM tokens
	 * in the hash table, we still have to compute hash
	 * values for them.
	 */
	struct mh_json_t *hash;
};



More information about the Tarantool-patches mailing list