From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sergos@tarantool.org>
Received: from smtp48.i.mail.ru (smtp48.i.mail.ru [94.100.177.108])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by dev.tarantool.org (Postfix) with ESMTPS id 4CD3E46971A
 for <tarantool-patches@dev.tarantool.org>;
 Mon,  9 Dec 2019 23:18:43 +0300 (MSK)
Date: Mon, 9 Dec 2019 23:18:42 +0300
From: Sergey Ostanevich <sergos@tarantool.org>
Message-ID: <20191209201842.GA14366@tarantool.org>
References: <da5e5278552b062e101f0addfc415f83a9810edd.1574378275.git.v.shpilevoy@tarantool.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <da5e5278552b062e101f0addfc415f83a9810edd.1574378275.git.v.shpilevoy@tarantool.org>
Subject: Re: [Tarantool-patches] [PATCH 1/1] tuple: enable isolated JSON
 updates
List-Id: Tarantool development patches <tarantool-patches.dev.tarantool.org>
List-Unsubscribe: <https://lists.tarantool.org/mailman/options/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=unsubscribe>
List-Archive: <https://lists.tarantool.org/pipermail/tarantool-patches/>
List-Post: <mailto:tarantool-patches@dev.tarantool.org>
List-Help: <mailto:tarantool-patches-request@dev.tarantool.org?subject=help>
List-Subscribe: <https://lists.tarantool.org/mailman/listinfo/tarantool-patches>, 
 <mailto:tarantool-patches-request@dev.tarantool.org?subject=subscribe>
To: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Cc: tarantool-patches@dev.tarantool.org

Hi! 

Thanks for the patch!
Please, see 4 comments below.

Regards,
Sergos


> diff --git a/src/box/vinyl.c b/src/box/vinyl.c
> index 767e40006..15a136f81 100644
> --- a/src/box/vinyl.c
> +++ b/src/box/vinyl.c
> @@ -2004,13 +2004,25 @@ request_normalize_ops(struct request *request)
>  		ops_end = mp_encode_str(ops_end, op_name, op_name_len);
>  
>  		int field_no;
> -		if (mp_typeof(*pos) == MP_INT) {
> +		const char *field_name;
> +		switch (mp_typeof(*pos)) {
> +		case MP_INT:
>  			field_no = mp_decode_int(&pos);
>  			ops_end = mp_encode_int(ops_end, field_no);
> -		} else {
> +			break;
> +		case MP_UINT:
>  			field_no = mp_decode_uint(&pos);
>  			field_no -= request->index_base;
>  			ops_end = mp_encode_uint(ops_end, field_no);
> +			break;
> +		case MP_STR:
> +			field_name = pos;
> +			mp_next(&pos);
> +			memcpy(ops_end, field_name, pos - field_name);
> +			ops_end += pos - field_name;
> +			break;

Are you intentionally skip encode/decode here? Is it safe enough?

> diff --git a/src/box/xrow_update.c b/src/box/xrow_update.c
> index 123db081a..ecda17544 100644
> --- a/src/box/xrow_update.c
> +++ b/src/box/xrow_update.c
> @@ -412,6 +434,15 @@ xrow_upsert_squash(const char *expr1, const char *expr1_end,
>  			if (op->opcode != '+' && op->opcode != '-' &&
>  			    op->opcode != '=')
>  				return NULL;
> +			/*
> +			 * Not terminal operation means, that the
> +			 * update is not flat, and squash would
> +			 * need to build a tree of operations to
> +			 * find matches. That is too complex,
> +			 * squash is skipped.
> +			 */
> +			if (! xrow_update_op_is_term(op))
> +				return NULL;

Please, no space after unary op - and further on in the patch -
according to $3.1 of
https://www.tarantool.io/en/doc/2.2/dev_guide/c_style_guide/

> diff --git a/src/box/xrow_update_bar.c b/src/box/xrow_update_bar.c
> new file mode 100644
> index 000000000..737673e8a
> --- /dev/null
> +++ b/src/box/xrow_update_bar.c
> @@ -0,0 +1,454 @@
> +/*
> + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above
> + *    copyright notice, this list of conditions and the
> + *    following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above
> + *    copyright notice, this list of conditions and the following
> + *    disclaimer in the documentation and/or other materials
> + *    provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
> + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
> + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include "xrow_update_field.h"
> +#include "tuple.h"
> +
> +/**
> + * Locate a field to update by @a op's JSON path and initialize
> + * @a field as a bar update.
> + *
> + * @param op Update operation.
> + * @param field Field to locate in.
> + * @param[out] key_len_or_index One parameter for two values,
> + *        depending on where the target point is located: in an
> + *        array or a map. In case of map it is size of a key
> + *        before the found point. It is used to find range of the
> + *        both key and value in '#' operation to drop the pair.
> + *        In case of array it is index of the array element to be
> + *        able to check how many fields are left for deletion.
> + *
> + * @retval 0 Success.
> + * @retval -1 Not found or invalid JSON.
> + */
> +static inline int
> +xrow_update_bar_locate(struct xrow_update_op *op,
> +		       struct xrow_update_field *field,
> +		       int *key_len_or_index)
> +{
> +	/*
> +	 * Bar update is not flat by definition. It always has a
> +	 * non empty path. This is why op is expected to be not
> +	 * terminal.
> +	 */
> +	assert(! xrow_update_op_is_term(op));
> +	int rc;
> +	field->type = XUPDATE_BAR;
> +	field->bar.op = op;
> +	field->bar.path = op->lexer.src + op->lexer.offset;
> +	field->bar.path_len = op->lexer.src_len - op->lexer.offset;
> +	const char *pos = field->data;
> +	struct json_token token;
> +	while ((rc = json_lexer_next_token(&op->lexer, &token)) == 0 &&
> +	       token.type != JSON_TOKEN_END) {
> +
> +		switch (token.type) {
> +		case JSON_TOKEN_NUM:
> +			field->bar.parent = pos;
> +			*key_len_or_index = token.num;
> +			rc = tuple_field_go_to_index(&pos, token.num);
> +			break;
> +		case JSON_TOKEN_STR:
> +			field->bar.parent = pos;
> +			*key_len_or_index = token.len;
> +			rc = tuple_field_go_to_key(&pos, token.str, token.len);
> +			break;
> +		default:
> +			assert(token.type == JSON_TOKEN_ANY);
> +			rc = op->lexer.symbol_count - 1;
> +			return xrow_update_err_bad_json(op, rc);
> +		}
> +		if (rc != 0)
> +			return xrow_update_err_no_such_field(op);
> +	}
> +	if (rc > 0)
> +		return xrow_update_err_bad_json(op, rc);
> +
> +	field->bar.point = pos;
> +	mp_next(&pos);
> +	field->bar.point_size = pos - field->bar.point;
> +	return 0;
> +}
> +
> +/**
> + * Locate an optional field to update by @a op's JSON path. If
> + * found or only a last path part is not found, initialize @a
> + * field as a bar update. Last path part may not exist and it is
> + * ok, for example, for '!' and '=' operations.
> + */
> +static inline int
> +xrow_update_bar_locate_opt(struct xrow_update_op *op,
> +			   struct xrow_update_field *field, bool *is_found,
> +			   int *key_len_or_index)
> +{
> +	/*
> +	 * Bar update is not flat by definition. It always has a
> +	 * non empty path. This is why op is expected to be not
> +	 * terminal.
> +	 */
> +	assert(! xrow_update_op_is_term(op));
> +	int rc;
> +	field->type = XUPDATE_BAR;
> +	field->bar.op = op;
> +	field->bar.path = op->lexer.src + op->lexer.offset;
> +	field->bar.path_len = op->lexer.src_len - op->lexer.offset;
> +	const char *pos = field->data;
> +	struct json_token token;
> +	do {
> +		rc = json_lexer_next_token(&op->lexer, &token);
> +		if (rc != 0)
> +			return xrow_update_err_bad_json(op, rc);
> +
> +		switch (token.type) {
> +		case JSON_TOKEN_END:
> +			*is_found = true;
> +			field->bar.point = pos;
> +			mp_next(&pos);
> +			field->bar.point_size = pos - field->bar.point;
> +			return 0;
> +		case JSON_TOKEN_NUM:
> +			field->bar.parent = pos;
> +			*key_len_or_index = token.num;
> +			rc = tuple_field_go_to_index(&pos, token.num);
> +			break;
> +		case JSON_TOKEN_STR:
> +			field->bar.parent = pos;
> +			*key_len_or_index = token.len;
> +			rc = tuple_field_go_to_key(&pos, token.str, token.len);
> +			break;
> +		default:
> +			assert(token.type == JSON_TOKEN_ANY);
> +			rc = op->lexer.symbol_count - 1;
> +			return xrow_update_err_bad_json(op, rc);
> +		}
> +	} while (rc == 0);
> +	assert(rc == -1);
> +	/* Ensure, that 'token' is next to last path part. */
> +	struct json_token tmp_token;
> +	rc = json_lexer_next_token(&op->lexer, &tmp_token);
> +	if (rc != 0)
> +		return xrow_update_err_bad_json(op, rc);
> +	if (tmp_token.type != JSON_TOKEN_END)
> +		return xrow_update_err_no_such_field(op);
> +
> +	*is_found = false;
> +	if (token.type == JSON_TOKEN_NUM) {
> +		const char *tmp = field->bar.parent;
> +		if (mp_typeof(*tmp) != MP_ARRAY) {
> +			return xrow_update_err(op, "can not access by index a "\
> +					       "non-array field");
> +		}
> +		uint32_t size = mp_decode_array(&tmp);
> +		if ((uint32_t) token.num > size)
> +			return xrow_update_err_no_such_field(op);

IMHO there will be more informative to mention the array and an index
that is out of bounds.

> +		/*
> +		 * The updated point is in an array, its position
> +		 * was not found, and its index is <= size. The
> +		 * only way how can that happen - the update tries
> +		 * to append a new array element. The following
> +		 * code tries to find the array's end.
> +		 */
> +		assert((uint32_t) token.num == size);
> +		if (field->bar.parent == field->data) {
> +			/*
> +			 * Optimization for the case when the path
> +			 * is short. So parent of the updated
> +			 * point is the field itself. It allows
> +			 * not to decode anything at all. It is
> +			 * worth doing, since the paths are
> +			 * usually short.
> +			 */
> +			field->bar.point = field->data + field->size;
> +		} else {
> +			field->bar.point = field->bar.parent;
> +			mp_next(&field->bar.point);
> +		}
> +		field->bar.point_size = 0;
> +	} else {
> +		assert(token.type == JSON_TOKEN_STR);
> +		field->bar.new_key = token.str;
> +		field->bar.new_key_len = token.len;
> +		if (mp_typeof(*field->bar.parent) != MP_MAP) {
> +			return xrow_update_err(op, "can not access by key a "\
> +					       "non-map field");
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
> + * Nop fields are those which are not updated. And when they
> + * receive an update via one of xrow_update_op_do_nop_* functions,
> + * it means, that there is a non terminal path digging inside this
> + * not updated field. It turns nop field into a bar field. How
> + * exactly - depends on a concrete operation.
> + */
> +
> +int
> +xrow_update_op_do_nop_insert(struct xrow_update_op *op,
> +			     struct xrow_update_field *field)
> +{
> +	assert(op->opcode == '!');
> +	assert(field->type == XUPDATE_NOP);

Hmm. Do you expect to handle all possible cases in debug mode? 
Otherwise here should be some sort of gaceful fail to make prod more
stable.