Re: [Tarantool-patches] [PATCH 3/3] tuple: JSON path update intersection at maps

Tarantool development patches archive
 help / color / mirror / Atom feed

From: Sergey Ostanevich <sergos@tarantool.org>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org
Subject: Re: [Tarantool-patches] [PATCH 3/3] tuple: JSON path update intersection at maps
Date: Fri, 27 Dec 2019 17:57:04 +0300	[thread overview]
Message-ID: <20191227145704.GB1222@tarantool.org> (raw)
In-Reply-To: <248f865c70c413e5a239a2e7bae6240e1e9b93d0.1577140688.git.v.shpilevoy@tarantool.org>

Hi!

Thanks for the patch!
Just the spaces after unaries - otherwise LGTM.

If I got it right, the limitation from the #2 patch:

|    This is not allowed yet:
|    
|        [1][2][3].a.b.c = 20
|        [1][2][3].a.e.f = 30
|    
|        First difference is 'b' vs 'e' - intersection by a map,
|        not ok.

is removed. Could you mention this in the description?

Regards,
Sergos

On 23 Dec 23:41, Vladislav Shpilevoy wrote:
> Previous commits introduced isolated JSON updates, and then
> allowed intersection at array. This one completes the puzzle,
> adding intersection at maps. Now JSON updates are fully
> available.
> 
> Closes #1261
> 
> @TarantoolBot document
> Title: JSON updates
> Tuple/space/index:update/upsert now support JSON paths. All the
> same paths as allowed in tuple["..."].
> 
> Example:
> box.cfg{}
> format = {}
> format[1] = {'field1', 'unsigned'}
> format[2] = {'field2', 'map'}
> format[3] = {'field3', 'array'}
> format[4] = {'field4', 'string', is_nullable = true}
> s = box.schema.create_space('test', {format = format})
> _ = s:create_index('pk')
> t = {
>     1,
>     {
>         key1 = 'value',
>         key2 = 10
>     },
>     {
>         2,
>         3,
>         {key3 = 20}
>     }
> }
> t = s:replace(t)
> 
> tarantool> t:update({{'=', 'field2.key1', 'new_value'}})
> ---
> - [1, {'key1': 'new_value', 'key2': 10}, [2, 3, {'key3': 20}]]
> ...
> 
> tarantool> t:update({{'+', 'field3[2]', 1}})
> ---
> - [1, {'key1': 'value', 'key2': 10}, [2, 4, {'key3': 20}]]
> ...
> 
> tarantool> s:update({1}, {{'!', 'field4', 'inserted value'}})
> ---
> - [1, {'key1': 'value', 'key2': 10}, [2, 3, {'key3': 20}], 'inserted value']
> ...
> 
> tarantool> s:update({1}, {{'#', '[2].key2', 1}, {'=', '[3][3].key4', 'value4'}})
> ---
> - [1, {'key1': 'value'}, [2, 3, {'key3': 20, 'key4': 'value4'}], 'inserted value']
> ...
> 
> tarantool> s:upsert({1, {k = 'v'}, {}}, {{'#', '[2].key1', 1}})
> ---
> ...
> 
> tarantool> s:select{}
> ---
> - - [1, {}, [2, 3, {'key3': 20, 'key4': 'value4'}], 'inserted value']
> ...
> 
> Note, that there is the same rule, as in tuple field access by
> JSON, for field names looking like JSON paths. The rule is that
> firstly the whole path is interpreted as a field name. If such a
> name does not exist, then it is treated as a path. For example,
> if there is a field name 'field.name.like.json', then this update
> 
>     <obj>:update({..., 'field.name.like.json', ...})
> 
> will update this field, instead of keys 'field' -> 'name' ->
> 'like' -> 'json'. If such a name is needed as a part of a bigger
> path, then it should be wrapped in quotes and []:
> 
>     <obj>:update({..., '["field.name.like.json"].next.fields', ...})
> 
> There are some new rules for JSON updates:
> 
> - Operation '!' can't be used to create all intermediate nodes of
>   a path. For example, {'!', 'field1[1].field3', ...} can't
>   create fields 'field1' and '[1]', they should exist.
> 
> - Operation '#', when applied to maps, can't delete more than one
>   key at once. That is, its argument should be always 1 for maps.
> 
>     {'#', 'field1.field2', 1} - this is allowed;
>     {'#', 'field1.field2', 10} - this is not.
> 
>   That limitation originates from a problem, that keys in a map
>   are not ordered anyhow, and '#' with more than 1 key would lead
>   to undefined behaviour.
> 
> - Operation '!' on maps can't create a key, if it exists already.
> 
> - If a map contains non-string keys (booleans, numbers, maps,
>   arrays - anything), then these keys can't be updated via JSON
>   paths. But it is still allowed to update string keys in such a
>   map.
> 
> Why JSON updates are good, and should be preferred when only a
> part of a tuple needs to be updated?
> 
> - They consume less space in WAL, because for an update only its
>   keys, operations, and arguments are stored. It is cheaper to
>   store update of one deep field, than the whole tuple.
> 
> - They are faster. Firstly, this is because they are implemented
>   in C, and have no problems with Lua GC and dynamic typing.
>   Secondly, some cases of JSON paths are highly optimized. For
>   example, an update with a single JSON path costs O(1) memory
>   regardless of how deep that path goes (not counting update
>   arguments).
> 
> - They are available from remote clients, as well as any other
>   DML. Before JSON updates to update one deep part of a tuple it
>   would be necessary to download that tuple, update it in memory,
>   send it back - 2 network hops. With JSON paths it can be 1 when
>   the update can be described in paths.
> ---
>  src/box/CMakeLists.txt      |   1 +
>  src/box/xrow_update_field.c |   4 +
>  src/box/xrow_update_field.h |  85 ++++++-
>  src/box/xrow_update_map.c   | 453 ++++++++++++++++++++++++++++++++++++
>  src/box/xrow_update_route.c |  47 +++-
>  test/box/update.result      | 152 ++++++++++++
>  test/box/update.test.lua    |  75 ++++++
>  7 files changed, 813 insertions(+), 4 deletions(-)
>  create mode 100644 src/box/xrow_update_map.c
> 
> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt
> index f99efc503..2022b378b 100644
> --- a/src/box/CMakeLists.txt
> +++ b/src/box/CMakeLists.txt
> @@ -45,6 +45,7 @@ add_library(tuple STATIC
>      xrow_update_array.c
>      xrow_update_bar.c
>      xrow_update_route.c
> +    xrow_update_map.c
>      tuple_compare.cc
>      tuple_extract_key.cc
>      tuple_hash.cc
> diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c
> index 3a4aa6c90..a8db31768 100644
> --- a/src/box/xrow_update_field.c
> +++ b/src/box/xrow_update_field.c
> @@ -116,6 +116,8 @@ xrow_update_field_sizeof(struct xrow_update_field *field)
>  		return xrow_update_bar_sizeof(field);
>  	case XUPDATE_ROUTE:
>  		return xrow_update_route_sizeof(field);
> +	case XUPDATE_MAP:
> +		return xrow_update_map_sizeof(field);
>  	default:
>  		unreachable();
>  	}
> @@ -145,6 +147,8 @@ xrow_update_field_store(struct xrow_update_field *field, char *out,
>  		return xrow_update_bar_store(field, out, out_end);
>  	case XUPDATE_ROUTE:
>  		return xrow_update_route_store(field, out, out_end);
> +	case XUPDATE_MAP:
> +		return xrow_update_map_store(field, out, out_end);
>  	default:
>  		unreachable();
>  	}
> diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h
> index f07b5a649..6b2a0d999 100644
> --- a/src/box/xrow_update_field.h
> +++ b/src/box/xrow_update_field.h
> @@ -32,6 +32,7 @@
>   */
>  #include "trivia/util.h"
>  #include "tt_static.h"
> +#include "salad/stailq.h"
>  #include "json/json.h"
>  #include "bit/int96.h"
>  #include "mp_decimal.h"
> @@ -254,7 +255,8 @@ enum xrow_update_type {
>  	 * for example, when a rope is split in two parts: not
>  	 * changed left range of fields, and a right range with
>  	 * its first field changed. In this case the left range is
> -	 * NOP.
> +	 * NOP. And when a map is updated and split into rages,
> +	 * when only certain points are not NOP.
>  	 */
>  	XUPDATE_NOP,
>  	/**
> @@ -292,6 +294,11 @@ enum xrow_update_type {
>  	 * '[2] = 30', '[3] = true'.
>  	 */
>  	XUPDATE_ROUTE,
> +	/**
> +	 * Field is an updated map. Check item list for updates of
> +	 * individual fields.
> +	 */
> +	XUPDATE_MAP,
>  };
>  
>  /**
> @@ -383,6 +390,48 @@ struct xrow_update_field {
>  			/** Update subtree. */
>  			struct xrow_update_field *next_hop;
>  		} route;
> +		/**
> +		 * The field is an updated map. Individual fields
> +		 * are stored very similar to array update and its
> +		 * rope nodes. Each item is a key, a value, and a
> +		 * tail of unchanged key-value pairs. The items
> +		 * are stored in a list. But the list is not
> +		 * sorted anyhow by keys, because it does not
> +		 * really make any sense:
> +		 *
> +		 * 1) Keys in MessagePack are not sorted anyway,
> +		 *    and any kind of search would not be possible
> +		 *    even if they were sorted. Sort of a map
> +		 *    would require Nlog(N) time and N memory even
> +		 *    if only a few fields were updated.
> +		 *
> +		 * 2) Double scalar update of the same key is not
> +		 *    possible.
> +		 *
> +		 * Although there is something which can be and is
> +		 * optimized. When a key is updated first time,
> +		 * it is moved to the beginning of the list, and
> +		 * after all operations are done, it is stored
> +		 * in the result tuple before unchanged fields. On
> +		 * a second update of the same tuple it is found
> +		 * immediately.
> +		 */
> +		struct {
> +			/**
> +			 * List of map update items. Each item is
> +			 * a key, a value, and an unchanged tail.
> +			 */
> +			struct stailq items;
> +			/**
> +			 * Number of key-value pairs in the list.
> +			 * Note, it is not a number of items, but
> +			 * exactly number of key-value pairs. It
> +			 * is used to store MessagePack map header
> +			 * without decoding each item again just
> +			 * to learn the size.
> +			 */
> +			int size;
> +		} map;
>  	};
>  };
>  
> @@ -506,6 +555,38 @@ OP_DECL_GENERIC(array)
>  
>  /* }}} xrow_update_field.array */
>  
> +/* {{{ xrow_update_field.map */
> +
> +/**
> + * Initialize @a field as a map to update.
> + * @param[out] field Field to initialize.
> + * @param header Header of the MessagePack map @a data.
> + * @param data MessagePack data of the map to update.
> + * @param data_end End of @a data.
> + * @param field_count Key-value pair count in @data.
> + *
> + * @retval  0 Success.
> + * @retval -1 Error.
> + */
> +int
> +xrow_update_map_create(struct xrow_update_field *field, const char *header,
> +		       const char *data, const char *data_end, int field_count);
> +
> +/**
> + * Create a map update with an existing child. Motivation is
> + * exactly the same as with a similar array constructor. It allows
> + * to avoid unnecessary MessagePack decoding.
> + */
> +int
> +xrow_update_map_create_with_child(struct xrow_update_field *field,
> +				  const char *header,
> +				  const struct xrow_update_field *child,
> +				  const char *key, uint32_t key_len);
> +
> +OP_DECL_GENERIC(map)
> +
> +/* }}} xrow_update_field.map */
> +
>  /* {{{ update_field.bar */
>  
>  OP_DECL_GENERIC(bar)
> @@ -587,6 +668,8 @@ xrow_update_op_do_field_##op_type(struct xrow_update_op *op,			\
>  		return xrow_update_op_do_bar_##op_type(op, field);		\
>  	case XUPDATE_ROUTE:							\
>  		return xrow_update_op_do_route_##op_type(op, field);		\
> +	case XUPDATE_MAP:							\
> +		return xrow_update_op_do_map_##op_type(op, field);		\
>  	default:								\
>  		unreachable();							\
>  	}									\
> diff --git a/src/box/xrow_update_map.c b/src/box/xrow_update_map.c
> new file mode 100644
> index 000000000..22fabff96
> --- /dev/null
> +++ b/src/box/xrow_update_map.c
> @@ -0,0 +1,453 @@
> +/*
> + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "xrow_update_field.h"
> +#include "fiber.h"
> +#include "small/region.h"
> +#include "tuple_format.h"
> +
> +/**
> + * Descriptor of one updated key-value pair. Besides updated data
> + * it contains a tail with unchanged pairs, just so as not to
> + * create a separate object for them, and to be similar with array
> + * update items.
> + */
> +struct xrow_update_map_item {
> +	/**
> +	 * Updated key. Can be NULL. In such a case this item
> +	 * contains only an unchanged tail. A key can be nullified
> +	 * if it was removed from the map, or when a map is just
> +	 * created and has no any update yet.
> +	 */
> +	const char *key;
> +	/** Length of @a key. */
> +	uint32_t key_len;
> +	/** Updated value. */
> +	struct xrow_update_field field;
> +	/** Link in the list of updated map keys. */
> +	struct stailq_entry in_items;
> +	/**
> +	 * Size in bytes of unchanged tail data. It goes right
> +	 * after @a field.data.
> +	 */
> +	uint32_t tail_size;
> +};
> +
> +static inline struct xrow_update_map_item *
> +xrow_update_map_item_alloc(void)
> +{
> +	struct xrow_update_map_item *item = (struct xrow_update_map_item *)
> +		region_alloc(&fiber()->gc, sizeof(*item));
> +	if (item == NULL)
> +		diag_set(OutOfMemory, sizeof(*item), "region_alloc", "item");
> +	return item;
> +}
> +
> +static void
> +xrow_update_map_create_item(struct xrow_update_field *field,
> +			    struct xrow_update_map_item *item,
> +			    enum xrow_update_type type, const char *key,
> +			    uint32_t key_len, const char *data,
> +			    uint32_t data_size, uint32_t tail_size)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	item->key = key;
> +	item->key_len = key_len;
> +	item->field.type = type;
> +	item->field.data = data;
> +	item->field.size = data_size,
> +	item->tail_size = tail_size;
> +	/*
> +	 * Each time a new item it created it is stored in the
> +	 * head of update map item list. It helps in case the
> +	 * tuple is regularly updated, because on all next updates
> +	 * this key will be found from the very beginning of the
> +	 * map.
> +	 */
> +	stailq_add_entry(&field->map.items, item, in_items);
> +}
> +
> +static inline struct xrow_update_map_item *
> +xrow_update_map_new_item(struct xrow_update_field *field,
> +			 enum xrow_update_type type, const char *key,
> +			 uint32_t key_len, const char *data, uint32_t data_size,
> +			 uint32_t tail_size)
> +{
> +	struct xrow_update_map_item *item = xrow_update_map_item_alloc();
> +	if (item != NULL) {
> +		xrow_update_map_create_item(field, item, type, key, key_len,
> +					    data, data_size, tail_size);
> +	}
> +	return item;
> +}
> +
> +/**
> + * Find an update item to which @a op should be applied. The
> + * target field may do not exist, but at least its parent should.
> + */
> +static int
> +xrow_update_map_extract_opt_item(struct xrow_update_field *field,
> +				 struct xrow_update_op *op,
> +				 struct xrow_update_map_item **res)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	if (op->token_type == JSON_TOKEN_END) {
> +		if (xrow_update_op_consume_token(op) != 0)
> +			return -1;
> +		if (op->token_type != JSON_TOKEN_STR) {
> +			return xrow_update_err(op, "can't update a map by not "\
> +					       "a string key");
> +		}
> +	}
> +	struct stailq *items = &field->map.items;
> +	struct xrow_update_map_item *i, *new_item;
> +	/*
> +	 * Firstly, try to find the key among already updated
> +	 * ones. Perhaps, the needed node is just an intermediate
> +	 * key of a bigger JSON path, and there are many updates
> +	 * passing this key, so it should be here for all except
> +	 * first updates.
> +	 */
> +	stailq_foreach_entry(i, items, in_items) {
> +		if (i->key != NULL && i->key_len == op->key_len &&
> +		    memcmp(i->key, op->key, i->key_len) == 0) {
> +			*res = i;
> +			return 0;
> +		}
> +	}
> +	/*
> +	 * Slow path - the key is updated first time, need to
> +	 * decode tails.
> +	 */
> +	uint32_t key_len, i_tail_size;
> +	const char *pos, *key, *end, *tmp, *begin;
> +	stailq_foreach_entry(i, items, in_items) {
> +		begin = i->field.data + i->field.size;
> +		pos = begin;
> +		end = begin + i->tail_size;
> +		i_tail_size = 0;
> +		while(pos < end) {
> +			if (mp_typeof(*pos) != MP_STR) {
> +				mp_next(&pos);
> +				mp_next(&pos);
> +				continue;
> +			}
> +			i_tail_size = pos - begin;
> +			key = mp_decode_str(&pos, &key_len);
> +			if (key_len == op->key_len &&
> +			    memcmp(key, op->key, key_len) == 0)
> +				goto key_is_found;
> +			mp_next(&pos);
> +		}
> +	}
> +	*res = NULL;
> +	return 0;
> +
> +key_is_found:
> +	tmp = pos;
> +	mp_next(&tmp);
> +	if (i_tail_size == 0 && i->key == NULL) {
> +		/*
> +		 * Looks like the needed key was found from the
> +		 * beginning of tail of an item without a key.
> +		 * This is actually good, because this item can
> +		 * be transformed instead of a new item creation.
> +		 */
> +		i->key = op->key;
> +		i->key_len = op->key_len;
> +		i->field.data = pos;
> +		i->field.size = tmp - pos;
> +		i->tail_size = end - tmp;
> +		new_item = i;
> +	} else {
> +		new_item = xrow_update_map_new_item(field, XUPDATE_NOP, op->key,
> +						    op->key_len, pos, tmp - pos,
> +						    end - tmp);
> +		if (new_item == NULL)
> +			return -1;
> +		i->tail_size = i_tail_size;
> +	}
> +	*res = new_item;
> +	return 0;
> +}
> +
> +/**
> + * The same as optional extractor, but the field to update should
> + * exist. It is the case of all scalar operations (except '=' - it
> + * can work as insert).
> + */
> +static inline struct xrow_update_map_item *
> +xrow_update_map_extract_item(struct xrow_update_field *field,
> +			     struct xrow_update_op *op)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	struct xrow_update_map_item *res;
> +	if (xrow_update_map_extract_opt_item(field, op, &res) != 0)
> +		return NULL;
> +	if (res == NULL)
> +		xrow_update_err_no_such_field(op);
> +	return res;
> +}
> +
> +int
> +xrow_update_op_do_map_insert(struct xrow_update_op *op,
> +			     struct xrow_update_field *field)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	struct xrow_update_map_item *item;
> +	if (xrow_update_map_extract_opt_item(field, op, &item) != 0)
> +		return -1;
> +	if (! xrow_update_op_is_term(op)) {
> +		if (item == NULL)
> +			return xrow_update_err_no_such_field(op);
> +		op->token_type = JSON_TOKEN_END;
> +		return xrow_update_op_do_field_insert(op, &item->field);
> +	}
> +	if (item != NULL)
> +		return xrow_update_err_duplicate(op);
> +	++field->map.size;
> +	item = xrow_update_map_new_item(field, XUPDATE_NOP, op->key,
> +					op->key_len, op->arg.set.value,
> +					op->arg.set.length, 0);
> +	return item != NULL ? 0 : -1;
> +}
> +
> +int
> +xrow_update_op_do_map_set(struct xrow_update_op *op,
> +			  struct xrow_update_field *field)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	struct xrow_update_map_item *item;
> +	if (xrow_update_map_extract_opt_item(field, op, &item) != 0)
> +		return -1;
> +	if (! xrow_update_op_is_term(op)) {
> +		if (item == NULL)
> +			return xrow_update_err_no_such_field(op);
> +		op->token_type = JSON_TOKEN_END;
> +		return xrow_update_op_do_field_set(op, &item->field);
> +	}
> +	if (item != NULL) {
> +		op->new_field_len = op->arg.set.length;
> +		/* Ignore the previous op, if any. */
> +		item->field.type = XUPDATE_SCALAR;
> +		item->field.scalar.op = op;
> +		return 0;
> +	}
> +	++field->map.size;
> +	item = xrow_update_map_new_item(field, XUPDATE_NOP, op->key,
> +					op->key_len, op->arg.set.value,
> +					op->arg.set.length, 0);
> +	return item != NULL ? 0 : -1;
> +}
> +
> +int
> +xrow_update_op_do_map_delete(struct xrow_update_op *op,
> +			     struct xrow_update_field *field)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	struct xrow_update_map_item *item;
> +	if (xrow_update_map_extract_opt_item(field, op, &item) != 0)
> +		return -1;
> +	if (! xrow_update_op_is_term(op)) {
> +		if (item == NULL)
> +			return xrow_update_err_no_such_field(op);
> +		op->token_type = JSON_TOKEN_END;
> +		return xrow_update_op_do_field_delete(op, &item->field);
> +	}
> +	if (op->arg.del.count != 1)
> +		return xrow_update_err_delete1(op);
> +	if (item == NULL)
> +		return 0;
> +	/*
> +	 * Note, even if tail size becomes 0, this item is not
> +	 * deleted. This is because items are linked via stailq,
> +	 * elements of which can't be deleted as simple as that.
> +	 * But it is not a big deal, because '#' is a really rare
> +	 * operation.
> +	 * Why just a next key from the tail can't be decoded?
> +	 * Why key should be NULL here? This is because the JSON
> +	 * updates allow to update a map containing non-string
> +	 * keys. If the next key is not a string, it can't be
> +	 * saved as key of the item. Therefore, it is better not
> +	 * to touch unchanged tails unless a new update operation
> +	 * needs it.
> +	 */
> +	item->key = NULL;
> +	item->key_len = 0;
> +	item->field.data += item->field.size;
> +	item->field.size = 0;
> +	item->field.type = XUPDATE_NOP;
> +	--field->map.size;
> +	return 0;
> +}
> +
> +#define DO_SCALAR_OP_GENERIC(op_type)						\
> +int										\
> +xrow_update_op_do_map_##op_type(struct xrow_update_op *op,			\
> +				struct xrow_update_field *field)		\
> +{										\
> +	assert(field->type == XUPDATE_MAP);					\
> +	struct xrow_update_map_item *item =					\
> +		xrow_update_map_extract_item(field, op);			\
> +	if (item == NULL)							\
> +		return -1;							\
> +	if (! xrow_update_op_is_term(op)) {					\
> +		op->token_type = JSON_TOKEN_END;				\
> +		return xrow_update_op_do_field_##op_type(op, &item->field);	\
> +	}									\
> +	if (item->field.type != XUPDATE_NOP)					\
> +		return xrow_update_err_double(op);				\
> +	if (xrow_update_op_do_##op_type(op, item->field.data) != 0)		\
> +		return -1;							\
> +	item->field.type = XUPDATE_SCALAR;					\
> +	item->field.scalar.op = op;						\
> +	return 0;								\
> +}
> +
> +DO_SCALAR_OP_GENERIC(arith)
> +
> +DO_SCALAR_OP_GENERIC(bit)
> +
> +DO_SCALAR_OP_GENERIC(splice)
> +
> +int
> +xrow_update_map_create(struct xrow_update_field *field, const char *header,
> +		       const char *data, const char *data_end, int field_count)
> +{
> +	field->type = XUPDATE_MAP;
> +	field->data = header;
> +	field->size = data_end - header;
> +	field->map.size = field_count;
> +	stailq_create(&field->map.items);
> +	if (field_count == 0)
> +		return 0;
> +	struct xrow_update_map_item *first =
> +		xrow_update_map_new_item(field, XUPDATE_NOP, NULL, 0, data, 0,
> +					 data_end - data);
> +	return first != NULL ? 0 : -1;
> +}
> +
> +int
> +xrow_update_map_create_with_child(struct xrow_update_field *field,
> +				  const char *header,
> +				  const struct xrow_update_field *child,
> +				  const char *key, uint32_t key_len)
> +{
> +	field->type = XUPDATE_MAP;
> +	field->data = header;
> +	stailq_create(&field->map.items);
> +
> +	const char *pos = header;
> +	uint32_t i, ikey_len, field_count = mp_decode_map(&pos);
> +	const char *begin = pos;
> +	struct xrow_update_map_item *item = NULL;
> +	for (i = 0; i < field_count; ++i) {
> +		if (mp_typeof(*pos) != MP_STR) {
> +			mp_next(&pos);
> +			mp_next(&pos);
> +			continue;
> +		}
> +		const char *before_key = pos;
> +		const char *ikey = mp_decode_str(&pos, &ikey_len);
> +		if (ikey_len == key_len && memcmp(ikey, key, key_len) == 0) {
> +			item = xrow_update_map_new_item(field, XUPDATE_NOP,
> +							NULL, 0, begin, 0,
> +							before_key - begin);
> +			if (item == NULL)
> +				return -1;
> +			++i;
> +			break;
> +		}
> +		mp_next(&pos);
> +	}
> +	/*
> +	 * When a map is initialized with an existing child, it
> +	 * means, that it was already found earlier. I.e. it is
> +	 * impossible to miss it.
> +	 */
> +	assert(item != NULL);
> +	const char *data = pos;
> +	mp_next(&pos);
> +	uint32_t data_size = pos - data;
> +	for (; i < field_count; ++i) {
> +		mp_next(&pos);
> +		mp_next(&pos);
> +	}
> +	item = xrow_update_map_item_alloc();
> +	if (item == NULL)
> +		return -1;
> +	item->field = *child;
> +	xrow_update_map_create_item(field, item, child->type, key, key_len,
> +				    data, data_size, pos - data - data_size);
> +	field->map.size = field_count;
> +	field->size = pos - header;
> +	return 0;
> +}
> +
> +uint32_t
> +xrow_update_map_sizeof(struct xrow_update_field *field)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	uint32_t res = mp_sizeof_map(field->map.size);
> +	struct xrow_update_map_item *i;
> +	stailq_foreach_entry(i, &field->map.items, in_items) {
> +		res += i->tail_size;
> +		if (i->key != NULL) {
> +			res += mp_sizeof_str(i->key_len) +
> +			       xrow_update_field_sizeof(&i->field);
> +		}
> +	}
> +	return res;
> +}
> +
> +uint32_t
> +xrow_update_map_store(struct xrow_update_field *field, char *out, char *out_end)
> +{
> +	assert(field->type == XUPDATE_MAP);
> +	char *out_begin = out;
> +	out = mp_encode_map(out, field->map.size);
> +	struct xrow_update_map_item *i;
> +	/*
> +	 * This is the trick about saving updated keys before
> +	 * others. The first cycle doesn't save unchanged tails.
> +	 */
> +	stailq_foreach_entry(i, &field->map.items, in_items) {
> +		if (i->key != NULL) {
> +			out = mp_encode_str(out, i->key, i->key_len);
> +			out += xrow_update_field_store(&i->field, out, out_end);
> +		}
> +	}
> +	stailq_foreach_entry(i, &field->map.items, in_items) {
> +		memcpy(out, i->field.data + i->field.size, i->tail_size);
> +		out += i->tail_size;
> +	}
> +	assert(out <= out_end);
> +	return out - out_begin;
> +}
> diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c
> index f374db837..b76ebb36b 100644
> --- a/src/box/xrow_update_route.c
> +++ b/src/box/xrow_update_route.c
> @@ -123,6 +123,38 @@ xrow_update_route_branch_array(struct xrow_update_field *next_hop,
>  	return op->meta->do_op(op, next_hop);
>  }
>  
> +/**
> + * Do the actual branch, but by a map and a key in that map. Works
> + * exactly the same as the array-counterpart.
> + */
> +static int
> +xrow_update_route_branch_map(struct xrow_update_field *next_hop,
> +			     const char *parent,
> +			     const struct xrow_update_field *child,
> +			     const char *key, int key_len)
> +{
> +	struct xrow_update_op *op = child->bar.op;
> +	if (child->type != XUPDATE_BAR || child->bar.path_len > 0 ||
> +	    (op->opcode != '!' && op->opcode != '#')) {
> +		return xrow_update_map_create_with_child(next_hop, parent,
> +							 child, key, key_len);
> +	}
> +	op->token_type = JSON_TOKEN_STR;
> +	op->key = key;
> +	op->key_len = key_len;
> +	const char *data = parent;
> +	uint32_t field_count = mp_decode_map(&data);
> +	const char *end = data;
> +	for (uint32_t i = 0; i < field_count; ++i) {
> +		mp_next(&end);
> +		mp_next(&end);
> +	}
> +	if (xrow_update_map_create(next_hop, parent, data, end,
> +				   field_count) != 0)
> +		return -1;
> +	return op->meta->do_op(op, next_hop);
> +}
> +
>  struct xrow_update_field *
>  xrow_update_route_branch(struct xrow_update_field *field,
>  			 struct xrow_update_op *new_op)
> @@ -256,9 +288,18 @@ xrow_update_route_branch(struct xrow_update_field *field,
>  						   old_token.num) != 0)
>  			return NULL;
>  	} else if (type == MP_MAP) {
> -		diag_set(ClientError, ER_UNSUPPORTED, "update",
> -			 "path intersection on map");
> -		return NULL;
> +		if (new_token.type != JSON_TOKEN_STR) {
> +			xrow_update_err(new_op, "can not update map by "\
> +					"non-string key");
> +			return NULL;
> +		}
> +		new_op->token_type = JSON_TOKEN_STR;
> +		new_op->key = new_token.str;
> +		new_op->key_len = new_token.len;
> +		if (xrow_update_route_branch_map(next_hop, parent, &child,
> +						 old_token.str,
> +						 old_token.len) != 0)
> +			return NULL;
>  	} else {
>  		xrow_update_err_no_such_field(new_op);
>  		return NULL;
> diff --git a/test/box/update.result b/test/box/update.result
> index bf76f6638..4f9349bce 100644
> --- a/test/box/update.result
> +++ b/test/box/update.result
> @@ -1471,6 +1471,158 @@ t:update({{'+', '[4][3]', 0.5}, {'+', '[4][3][2]', 0.5}})
>  ---
>  - error: Field ''[4][3][2]'' was not found in the tuple
>  ...
> +--
> +-- Intersecting map updates.
> +--
> +t:update({{'=', '[4][6].c.d', 2300}, {'=', '[4][6].c.e', 2400}})
> +---
> +- [1, {}, [], [1, 2, 3, [4, 5, 6, 7, [8, 9, [10, 11, 12], 13, 14], 15, 16, [17, 18,
> +        19]], 20, {'b': 22, 'a': 21, 'c': {'d': 2300, 'e': 2400}}, 25]]
> +...
> +-- Fill an empty map.
> +t:update({{'=', '[2].k', 100}, {'=', '[2].v', 200}})
> +---
> +- [1, {'k': 100, 'v': 200}, [], [1, 2, 3, [4, 5, 6, 7, [8, 9, [10, 11, 12], 13, 14],
> +      15, 16, [17, 18, 19]], 20, {'b': 22, 'a': 21, 'c': {'d': 23, 'e': 24}}, 25]]
> +...
> +t = {}                                  \
> +t[1] = 1                                \
> +t[2] = {                                \
> +    a = 1,                              \
> +    b = 2,                              \
> +    c = {3, 4, 5},                      \
> +    d = {                               \
> +        [0] = 0,                        \
> +        e = 6,                          \
> +        [true] = false,                 \
> +        f = 7,                          \
> +        [-1] = -1,                      \
> +    },                                  \
> +    g = {                               \
> +        {k = 8, v = 9},                 \
> +        [box.NULL] = box.NULL,          \
> +        {k = 10, v = 11},               \
> +        {k = '12str', v = '13str'},     \
> +    },                                  \
> +    h = {                               \
> +        [-1] = {{{{-1}, -1}, -1}, -1},  \
> +        i = {                           \
> +            j = 14,                     \
> +            k = 15,                     \
> +        },                              \
> +        m = 16,                         \
> +    }                                   \
> +}
> +---
> +...
> +t[3] = {}
> +---
> +...
> +t = box.tuple.new(t)
> +---
> +...
> +-- Insert and delete from the same map at once.
> +t:update({{'!', '[2].n', 17}, {'#', '[2].b', 1}})
> +---
> +- [1, {'a': 1, 'n': 17, 'd': {0: 0, -1: -1, true: false, 'f': 7, 'e': 6}, 'h': {'i': {
> +        'j': 14, 'k': 15}, -1: [[[[-1], -1], -1], -1], 'm': 16}, 'c': [3, 4, 5], 'g': {
> +      1: {'k': 8, 'v': 9}, 2: {'k': 10, 'v': 11}, 3: {'k': '12str', 'v': '13str'},
> +      null: null}}, []]
> +...
> +-- Delete a not existing key.
> +t:update({{'=', '[2].d.g', 6000}, {'#', '[2].d.old', 1}})
> +---
> +- [1, {'b': 2, 'a': 1, 'd': {0: 0, 'f': 7, 'e': 6, -1: -1, true: false, 'g': 6000},
> +    'h': {'i': {'j': 14, 'k': 15}, -1: [[[[-1], -1], -1], -1], 'm': 16}, 'c': [3,
> +      4, 5], 'g': {1: {'k': 8, 'v': 9}, 2: {'k': 10, 'v': 11}, 3: {'k': '12str', 'v': '13str'},
> +      null: null}}, []]
> +...
> +-- Scalar operations.
> +t:update({{'+', '[2].d.e', 1}, {'+', '[2].b', 1}, {'+', '[2].d.f', 1}})
> +---
> +- [1, {'b': 3, 'c': [3, 4, 5], 'd': {0: 0, true: false, 'f': 8, -1: -1, 'e': 7}, 'h': {
> +      'i': {'j': 14, 'k': 15}, -1: [[[[-1], -1], -1], -1], 'm': 16}, 'a': 1, 'g': {
> +      1: {'k': 8, 'v': 9}, 2: {'k': 10, 'v': 11}, 3: {'k': '12str', 'v': '13str'},
> +      null: null}}, []]
> +...
> +-- Errors.
> +-- This operation array creates an update tree with several full
> +-- featured maps, not bars. It is required to cover some complex
> +-- cases about fields extraction.
> +ops = {{'=', '[2].h.i.j', 14000}, {'=', '[2].h.i.k', 15000}, {'=', '[2].h.m', 16000}, {'=', '[2].b', 2000}, {}}
> +---
> +...
> +-- When a map update extracts a next token on demand, it should
> +-- reject bad json, obviously. Scalar and '!'/'#' operations are
> +-- implemented separately and tested both.
> +ops[#ops] = {'+', '[2].h.i.<badjson>', -1}
> +---
> +...
> +t:update(ops)
> +---
> +- error: 'Field ''[2].h.i.<badjson>'' UPDATE error: invalid JSON in position 9'
> +...
> +ops[#ops][1] = '!'
> +---
> +...
> +t:update(ops)
> +---
> +- error: 'Field ''[2].h.i.<badjson>'' UPDATE error: invalid JSON in position 9'
> +...
> +ops[#ops][1] = '#'
> +---
> +...
> +ops[#ops][3] = 1
> +---
> +...
> +t:update(ops)
> +---
> +- error: 'Field ''[2].h.i.<badjson>'' UPDATE error: invalid JSON in position 9'
> +...
> +-- Key extractor should reject any attempt to use non-string keys.
> +ops[#ops] = {'-', '[2].h.i[20]', -1}
> +---
> +...
> +t:update(ops)
> +---
> +- error: 'Field ''[2].h.i[20]'' UPDATE error: can''t update a map by not a string
> +    key'
> +...
> +ops[#ops][1] = '='
> +---
> +...
> +t:update(ops)
> +---
> +- error: 'Field ''[2].h.i[20]'' UPDATE error: can''t update a map by not a string
> +    key'
> +...
> +t:update({{'+', '[2].d.e', 1}, {'+', '[2].d.f', 1}, {'+', '[2].d.k', 1}})
> +---
> +- error: Field ''[2].d.k'' was not found in the tuple
> +...
> +t:update({{'=', '[2].d.e', 6000}, {'!', '[2].d.g.h', -1}})
> +---
> +- error: Field ''[2].d.g.h'' was not found in the tuple
> +...
> +t:update({{'!', '[2].d.g', 6000}, {'!', '[2].d.e', -1}})
> +---
> +- error: 'Field ''[2].d.e'' UPDATE error: the key exists already'
> +...
> +t:update({{'=', '[2].d.g', 6000}, {'#', '[2].d.old', 10}})
> +---
> +- error: 'Field ''[2].d.old'' UPDATE error: can delete only 1 field from a map in
> +    a row'
> +...
> +-- '!'/'=' can be used to create a field, but only if it is a
> +-- tail. These operations can't create the whole path.
> +t:update({{'=', '[2].d.g', 6000}, {'=', '[2].d.new.new', -1}})
> +---
> +- error: Field ''[2].d.new.new'' was not found in the tuple
> +...
> +t:update({{'=', '[2].d.g', 6000}, {'#', '[2].d.old.old', 1}})
> +---
> +- error: Field ''[2].d.old.old'' was not found in the tuple
> +...
>  s:drop()
>  ---
>  ...
> diff --git a/test/box/update.test.lua b/test/box/update.test.lua
> index d91a2e95c..6fed818f4 100644
> --- a/test/box/update.test.lua
> +++ b/test/box/update.test.lua
> @@ -515,4 +515,79 @@ t:update({{'!', '[4][3]', 2.5}, {'+', '[4][5][5][5]', 0.75}, {'+', '[4][5][3]',
>  -- during branching.
>  t:update({{'+', '[4][3]', 0.5}, {'+', '[4][3][2]', 0.5}})
>  
> +--
> +-- Intersecting map updates.
> +--
> +t:update({{'=', '[4][6].c.d', 2300}, {'=', '[4][6].c.e', 2400}})
> +-- Fill an empty map.
> +t:update({{'=', '[2].k', 100}, {'=', '[2].v', 200}})
> +
> +t = {}                                  \
> +t[1] = 1                                \
> +t[2] = {                                \
> +    a = 1,                              \
> +    b = 2,                              \
> +    c = {3, 4, 5},                      \
> +    d = {                               \
> +        [0] = 0,                        \
> +        e = 6,                          \
> +        [true] = false,                 \
> +        f = 7,                          \
> +        [-1] = -1,                      \
> +    },                                  \
> +    g = {                               \
> +        {k = 8, v = 9},                 \
> +        [box.NULL] = box.NULL,          \
> +        {k = 10, v = 11},               \
> +        {k = '12str', v = '13str'},     \
> +    },                                  \
> +    h = {                               \
> +        [-1] = {{{{-1}, -1}, -1}, -1},  \
> +        i = {                           \
> +            j = 14,                     \
> +            k = 15,                     \
> +        },                              \
> +        m = 16,                         \
> +    }                                   \
> +}
> +t[3] = {}
> +t = box.tuple.new(t)
> +-- Insert and delete from the same map at once.
> +t:update({{'!', '[2].n', 17}, {'#', '[2].b', 1}})
> +-- Delete a not existing key.
> +t:update({{'=', '[2].d.g', 6000}, {'#', '[2].d.old', 1}})
> +
> +-- Scalar operations.
> +t:update({{'+', '[2].d.e', 1}, {'+', '[2].b', 1}, {'+', '[2].d.f', 1}})
> +
> +-- Errors.
> +-- This operation array creates an update tree with several full
> +-- featured maps, not bars. It is required to cover some complex
> +-- cases about fields extraction.
> +ops = {{'=', '[2].h.i.j', 14000}, {'=', '[2].h.i.k', 15000}, {'=', '[2].h.m', 16000}, {'=', '[2].b', 2000}, {}}
> +-- When a map update extracts a next token on demand, it should
> +-- reject bad json, obviously. Scalar and '!'/'#' operations are
> +-- implemented separately and tested both.
> +ops[#ops] = {'+', '[2].h.i.<badjson>', -1}
> +t:update(ops)
> +ops[#ops][1] = '!'
> +t:update(ops)
> +ops[#ops][1] = '#'
> +ops[#ops][3] = 1
> +t:update(ops)
> +-- Key extractor should reject any attempt to use non-string keys.
> +ops[#ops] = {'-', '[2].h.i[20]', -1}
> +t:update(ops)
> +ops[#ops][1] = '='
> +t:update(ops)
> +
> +t:update({{'+', '[2].d.e', 1}, {'+', '[2].d.f', 1}, {'+', '[2].d.k', 1}})
> +t:update({{'=', '[2].d.e', 6000}, {'!', '[2].d.g.h', -1}})
> +t:update({{'!', '[2].d.g', 6000}, {'!', '[2].d.e', -1}})
> +t:update({{'=', '[2].d.g', 6000}, {'#', '[2].d.old', 10}})
> +-- '!'/'=' can be used to create a field, but only if it is a
> +-- tail. These operations can't create the whole path.
> +t:update({{'=', '[2].d.g', 6000}, {'=', '[2].d.new.new', -1}})
> +t:update({{'=', '[2].d.g', 6000}, {'#', '[2].d.old.old', 1}})
> +
>  s:drop()
> -- 
> 2.21.0 (Apple Git-122.2)
>

next prev parent reply	other threads:[~2019-12-27 14:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-23 22:41 [Tarantool-patches] [PATCH 0/3] JSON update Vladislav Shpilevoy
2019-12-23 22:41 ` [Tarantool-patches] [PATCH 1/3] tuple: make update operation tokens consumable Vladislav Shpilevoy
2019-12-24 16:15   ` Vladislav Shpilevoy
2019-12-26 12:07   ` Sergey Ostanevich
2019-12-27 13:00     ` Vladislav Shpilevoy
2019-12-27 14:59       ` Sergey Ostanevich
2019-12-23 22:41 ` [Tarantool-patches] [PATCH 2/3] tuple: JSON path update intersection at arrays Vladislav Shpilevoy
2019-12-27 14:13   ` Sergey Ostanevich
2019-12-27 15:52     ` Vladislav Shpilevoy
2019-12-23 22:41 ` [Tarantool-patches] [PATCH 3/3] tuple: JSON path update intersection at maps Vladislav Shpilevoy
2019-12-27 14:57   ` Sergey Ostanevich [this message]
2019-12-27 15:52     ` Vladislav Shpilevoy
2019-12-30  5:26 ` [Tarantool-patches] [PATCH 0/3] JSON update Kirill Yukhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191227145704.GB1222@tarantool.org \
    --to=sergos@tarantool.org \
    --cc=tarantool-patches@dev.tarantool.org \
    --cc=v.shpilevoy@tarantool.org \
    --subject='Re: [Tarantool-patches] [PATCH 3/3] tuple: JSON path update intersection at maps' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox