[tarantool-patches] [PATCH v1 1/1] lib: implement JSON tree class for json library

Konstantin Osipov kostja at tarantool.org
Fri Oct 5 01:54:29 MSK 2018


* Kirill Shcherbatov <kshcherbatov at tarantool.org> [18/10/01 18:05]:
> +/** Hash record representing json_tree_node. */
> +struct mh_json_tree_node {
> +	struct json_path_node key;

A trivial question: does this data structure represent an entire
json path or a fragment of a json path? 
> +	uint32_t key_hash;
> +	struct json_tree_node *node;
> +};

Same here. Are we hashing an entire json path or a fragment of the
path? 

I of course could look up in the implementation. My point is that
the comment could be much more informative with just a few words
related to substance, and not formal repetition of what is already
evident. This review remark is relevant about the rest of the
comments. You did not think about the reader, all the comments are
purely formal.

> +
> +/**
> + * Compare json_tree_node hash bindings @a and @b.
> + * @param a Hash record to compare.
> + * @param b Hash record to compare.
> + * @retval 0 On equal, not 0 otherwise.
> + */
> +bool
> +mh_json_tree_node_cmp(const struct mh_json_tree_node *a,
> +		      const struct mh_json_tree_node *b)
> +{
> +	return (a->key.type != b->key.type) ||
> +		((a->key.type != JSON_PATH_STR ||
> +		a->key.len != b->key.len ||
> +		memcmp(a->key.str, b->key.str,
> +		       a->key.len) != 0) &&
> +		(a->key.type != JSON_PATH_NUM ||
> +		a->key_hash != b->key_hash));
> +}

> +
> +/** Define tree record hashtable. */
> +#define MH_SOURCE 1
> +#define mh_name _json_tree_node
> +#define mh_key_t struct mh_json_tree_node *
> +#define mh_node_t struct mh_json_tree_node
> +#define mh_arg_t void *
> +#define mh_hash(a, arg) ((a)->key_hash)
> +#define mh_hash_key(a, arg) ((a)->key_hash)
> +#define mh_cmp(a, b, arg) (mh_json_tree_node_cmp((a), (b)))
> +#define mh_cmp_key(a, b, arg) mh_cmp(a, b, arg)
> +#include "salad/mhash.h"
> +
> +static const uint32_t json_path_node_hash_seed = 13U;
> +
> +uint32_t
> +json_path_node_hash(struct json_path_node *key)
> +{
> +	uint32_t h = json_path_node_hash_seed;
> +	uint32_t carry = 0;
> +	uint8_t *data;
> +	uint32_t data_size;
> +	if (key->type == JSON_PATH_STR) {
> +		data = (uint8_t *)key->str;
> +		data_size = key->len;
> +	} else if (key->type == JSON_PATH_NUM) {
> +		data = (uint8_t *)&key->num;
> +		data_size = sizeof(key->num);
> +	} else {
> +		assert(0);
> +	}
> +	PMurHash32_Process(&h, &carry, data, data_size);
> +	return PMurHash32_Result(h, carry, data_size);

Why did you choose murmur and not luajit hash?

> +	memset(node, 0, sizeof(struct json_tree_node));
> +	node->children_ht = mh_json_tree_node_new();
> +	if (node->children_ht == NULL)
> +		return -1;
> +	if (path_node != NULL) {
> +		memcpy(&node->key, path_node, sizeof(struct json_path_node));
> +		node->key_hash = path_node_hash;
> +	} else {
> +		node->key.type = JSON_PATH_END;
> +	}
> +	rlist_create(&node->siblings);
> +	rlist_create(&node->children);
> +	return 0;

There should be a comment describing what API this data structure
provides and how it works. Why did you choose to create a nested
hash in every node?  What is the complexity of the json path
lookup? There should be a fastpath for exact match which only
evaluates a path hash once.

> +struct json_tree_node *
> +json_tree_lookup_path(struct json_tree_node *root, const char *path,
> +		      uint32_t path_len)
> +{
> +	int rc;
> +	struct json_path_parser parser;
> +	struct json_path_node path_node;
> +	json_path_parser_create(&parser, path, path_len);
> +	while ((rc = json_path_next(&parser, &path_node)) == 0 &&
> +	       path_node.type != JSON_PATH_END && root != NULL) {
> +		uint32_t path_node_hash = json_path_node_hash(&path_node);
> +		struct json_tree_node *node =
> +			json_tree_lookup(root, &path_node, path_node_hash);
> +		assert(node == NULL || node->parent == root);
> +		root = node;
> +	}
> +	if (rc != 0 || path_node.type != JSON_PATH_END)
> +		return NULL;
> +	return root;
> +}

> +/**
> + * Destroy JSON tree node.
> + * @param node JSON tree node to destruct.
> + * @retval 0 On success, -1 on error.
> + */

> +/**
> + * Lookup record with data @path_node having hash @path_node_hash
> + * in JSON tree @node childs.
> + * @param node JSON tree root node.
> + * @param path_node JSON parser node with data.
> + * @param path_node_hash Hash of @path_node.
> + * @retval not NULL pointer to JSON tree node if any.
> + */

I literally hate your comments. 

-- 
Konstantin Osipov, Moscow, Russia, +7 903 626 22 32
http://tarantool.io - www.twitter.com/kostja_osipov



More information about the Tarantool-patches mailing list