Tarantool development patches archive
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov.dev@gmail.com>
To: Kirill Shcherbatov <kshcherbatov@tarantool.org>
Cc: tarantool-patches@freelists.org, Kostya Osipov <kostja@tarantool.org>
Subject: Re: [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library
Date: Tue, 4 Dec 2018 18:22:18 +0300	[thread overview]
Message-ID: <20181204152217.fuaw2bk5x7a5dlk2@esperanza> (raw)
In-Reply-To: <20181129173816.kprfjhki5o7ytfbl@esperanza>

On Thu, Nov 29, 2018 at 08:38:16PM +0300, Vladimir Davydov wrote:
> > +/** JSON tree object to manage tokens relations. */
> > +struct json_tree {
> > +	/** JSON tree root node. */
> > +	struct json_token root;
> > +	/** Hashtable af all tree nodes. */
> > +	struct mh_json_t *hash;
> 
> How does it work? In a year nobody will remember, and to understand how
> this thing operates we will have to dive deep into the code...


/**
 * Element of a JSON path. It can be either string or number.
 * String idenfiers are in ["..."] and between dots. Numbers are
 * indexes in [...].
 *
 * May be organized in a tree-like structure reflecting a JSON
 * document structure, for more details see the comment to struct
 * json_tree.
 */
struct json_token {
	...
};

/**
 * This structure is used for organizing JSON tokens produced
 * by a lexer in a tree-like structure reflecting a JSON document
 * structure.
 *
 * Each intermediate node of the tree corresponds to either
 * a JSON map or an array, depending on the key type used by
 * its children (JSON_TOKEN_STR or JSON_TOKEN_NUM, respectively).
 * Leaf nodes may represent both complex JSON structures and
 * final values - it is not mandated by the JSON tree design.
 * The root of the tree doesn't have a key and is preallocated
 * when the tree is created.
 *
 * The json_token structure is intrusive by design, i.e. to store
 * arbitrary information in a JSON tree, one has to incorporate it
 * into a user defined structure.
 *
 * Example:
 *
 *   struct data {
 *           ...
 *           struct json_token token;
 *   };
 *
 *   struct json_tree tree;
 *   json_tree_create(&tree);
 *   struct json_token *parent = &tree->root;
 *
 *   // Add a path to the tree.
 *   struct data *data = data_new();
 *   struct json_lexer lexer;
 *   json_lexer_create(&lexer, path, path_len);
 *   json_lexer_next_token(&lexer, &data->token);
 *   while (data->token.type != JSON_TOKEN_END) {
 *           json_tree_add(&tree, parent, &data->token);
 *           parent = &data->token;
 *           data = data_new();
 *           json_lexer_next_token(&lexer, &data->token);
 *   }
 *   data_delete(data);
 *
 *   // Look up a path in the tree.
 *   data = json_tree_lookup_path(&tree, &tree.root, path, path_len);
 */
struct json_tree {
	/**
	 * Preallocated token corresponding to the JSON tree root.
	 * It doesn't have a key (set to JSON_TOKEN_END).
	 */
	struct json_token root;
	/**
	 * Hash table that is used to quickly look up a token
	 * corresponding to a JSON map item given a key and
	 * a parent token. We store all tokens that have type
	 * JSON_TOKEN_STR in this hash table. Apparently, we
	 * don't need to store JSON_TOKEN_NUM tokens as we can
	 * quickly look them up in the children array anyway.
	 *
	 * The hash table uses pair <parent, key> as key, so
	 * even tokens that happen to have the same key will
	 * have different keys in the hash. To look up a tree
	 * node corresponding to a particular path, we split
	 * the path into tokens and look up the first token
	 * in the root node and each following token in the
	 * node returned at the previous step.
	 *
	 * We compute a hash value for a token by hashing its
	 * key using the hash value of its parent as seed. This
	 * is equivalent to computing hash for the path leading
	 * to the token. However, we don't need to recompute
	 * hash starting from the root at each step as we
	 * descend the tree looking for a specific path, and we
	 * can start descent from any node, not only from the root.
	 *
	 * As a consequence of this hashing technique, even
	 * though we don't need to store JSON_TOKEN_NUM tokens
	 * in the hash table, we still have to compute hash
	 * values for them.
	 */
	struct mh_json_t *hash;
};

  parent reply	other threads:[~2018-12-04 15:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 10:49 [PATCH v5 0/9] box: indexes by JSON path Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 1/9] box: refactor json_path_parser class Kirill Shcherbatov
2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
2018-11-29 15:39     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 2/9] lib: implement JSON tree class for json library Kirill Shcherbatov
2018-11-26 12:53   ` [tarantool-patches] " Kirill Shcherbatov
2018-11-29 17:38     ` Vladimir Davydov
2018-11-29 17:50       ` Vladimir Davydov
2018-12-04 15:22       ` Vladimir Davydov [this message]
2018-12-04 15:47       ` [tarantool-patches] " Kirill Shcherbatov
2018-12-04 17:54         ` Vladimir Davydov
2018-12-05  8:37           ` Kirill Shcherbatov
2018-12-05  9:07             ` Vladimir Davydov
2018-12-05  9:52               ` Vladimir Davydov
2018-12-06  7:56                 ` Kirill Shcherbatov
2018-12-06  7:56                 ` [tarantool-patches] Re: [PATCH v5 2/9] lib: make index_base support for json_lexer Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 3/9] box: manage format fields with JSON tree class Kirill Shcherbatov
2018-11-29 19:07   ` Vladimir Davydov
2018-12-04 15:47     ` [tarantool-patches] " Kirill Shcherbatov
2018-12-04 16:09       ` Vladimir Davydov
2018-12-04 16:32         ` Kirill Shcherbatov
2018-12-05  8:37         ` Kirill Shcherbatov
2018-12-06  7:56         ` Kirill Shcherbatov
2018-12-06  8:06           ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 4/9] lib: introduce json_path_cmp routine Kirill Shcherbatov
2018-11-30 10:46   ` Vladimir Davydov
2018-12-03 17:37     ` [tarantool-patches] " Konstantin Osipov
2018-12-03 18:48       ` Vladimir Davydov
2018-12-03 20:14         ` Konstantin Osipov
2018-12-06  7:56           ` [tarantool-patches] Re: [PATCH v5 4/9] lib: introduce json_path_cmp, json_path_validate Kirill Shcherbatov
2018-11-26 10:49 ` [tarantool-patches] [PATCH v5 5/9] box: introduce JSON indexes Kirill Shcherbatov
2018-11-30 21:28   ` Vladimir Davydov
2018-12-01 16:49     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 6/9] box: introduce has_json_paths flag in templates Kirill Shcherbatov
2018-11-26 10:49 ` [PATCH v5 7/9] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2018-12-01 17:20   ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 8/9] box: introduce offset slot cache in key_part Kirill Shcherbatov
2018-12-03 21:04   ` Vladimir Davydov
2018-12-04 15:51     ` Vladimir Davydov
2018-11-26 10:49 ` [PATCH v5 9/9] box: specify indexes in user-friendly form Kirill Shcherbatov
2018-12-04 12:22   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181204152217.fuaw2bk5x7a5dlk2@esperanza \
    --to=vdavydov.dev@gmail.com \
    --cc=kostja@tarantool.org \
    --cc=kshcherbatov@tarantool.org \
    --cc=tarantool-patches@freelists.org \
    --subject='Re: [tarantool-patches] [PATCH v5 2/9] lib: implement JSON tree class for json library' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox