[Tarantool-patches] [PATCH 1/3] tuple: make update operation tokens consumable

Fri Dec 27 17:59:28 MSK 2019

Hi!

> So I added a flag to xrow_update_op.

Thanks, this is something that I had in my mind. 
LGTM.

Sergos.

On 27 Dec 16:00, Vladislav Shpilevoy wrote:
> Hi! Thanks for the review!
> 
> On 26/12/2019 13:07, Sergey Ostanevich wrote:
> > Hi!
> > 
> > Thanks for the patch! 
> > Just one comment below, otherwise LGTM.
> > 
> > Sergos
> > 
> > On 23 Dec 23:41, Vladislav Shpilevoy wrote:
> >> There is a case: [1][2][3][4] = 100. It is not a problem when it
> >> is a single operation, not intersecting with anything. It is an
> >> isolated update then, and works ok. But the next patch allows
> >> several update operations have the same prefix, and the path
> >> [1][2][3][4] can become a tree of updated arrays. For example, a
> >> trivial tree like this:
> >>
> >>     root: [ [1] ]
> >>              |
> >>              [ [1] [2] ]
> >>                     |
> >>                     [ [1] [2] [3] ]
> >>                                |
> >>                                [ [1] [2] [3] [4] ]
> >>                                              =100
> >>
> >> When the update is applied to root, the JSON path [1][2][3][4]
> >> is decoded one part by one. And the operation goes down the tree
> >> until reaches the leaf, where [4] = 100 is applied. Each time when
> >> the update goes one level down, somebody should update
> >> xrow_update_op.field_no so as on the first level it would be 1,
> >> then 2, 3, 4.
> >>
> >> Does it mean that each level of the update [1][2][3][4] should
> >> prepare field_no for the next child? No, because then they would
> >> need to check type of the child if it is an array or map, or
> >> whatever expects a valid field_no/key in xrow_update_op, and
> >> ensure that map-child gets a key, array-child gets a field_no.
> >> That would complicate the code to a totally unreadable
> >> state, and would break encapsulation between
> >> xrow_update_array/map/bar... . Each array update operation would
> >> check a child for all existing types to ensure that the next token
> >> matches it. The same would happen to map updates.
> >>
> >> This patch goes another way - let each level of update check if
> >> its field_no/key is already prepared by the caller. And if not,
> >> extract a next token from the operation path. So the map update
> >> will ensure that it has a valid key, an array update will ensure
> >> that it has a valid field no.
> >>
> >> Part of #1261
> >> ---
> >>  src/box/xrow_update_array.c | 47 +++++++++++++++++++++++++++++++++++--
> >>  src/box/xrow_update_field.c | 22 +++++++++++++++++
> >>  src/box/xrow_update_field.h | 25 ++++++++++++++++++--
> >>  3 files changed, 90 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c
> >> index 57427e39c..6a7ce09ff 100644
> >> --- a/src/box/xrow_update_array.c
> >> +++ b/src/box/xrow_update_array.c
> >> @@ -32,6 +32,31 @@
> >>  #include "msgpuck.h"
> >>  #include "fiber.h"
> >>  
> >> +/**
> >> + * Make sure @a op contains a valid field number to where the
> >> + * operation should be applied next. Field number may be not
> >> + * known, if the array's parent didn't propagate operation's
> >> + * lexer. In fact, the parent fills fieldno only in some rare
> >> + * cases like branching. Generally, an array should care about
> >> + * fieldno by itself.
> >> + */
> >> +static inline int
> >> +xrow_update_op_prepare_num_token(struct xrow_update_op *op)
> >> +{
> >> +	/*
> >> +	 * Token type END is a special value meaning that the
> >> +	 * current token needs to be parsed.
> >> +	 */
> > 
> > I really don't like reuse of entities introduced for some purpose. If
> > one will read this code, then it will be hard to understand why out of
> > nowhere Lexer forced to come to an end at
> > xrow_update_op_do_array_insert() for example. More that that - if lexer
> > in a state of 'reached the end' it forced to parse. 
> > 
> > If we introduce some more functionality for JSON later - I expect we
> > will face even more logic to hang on this 'special value'. Then
> > contexts of these functionalities may intersect... oh my...
> > 
> > Can we introduce a new type with some self-explanatory name instead and
> > leave the JSON_TOKEN_END on its own? 
> 
> Agree. But I don't know how to justify introduction of a new token
> type to the json library. I was thinking about something like
> JSON_TOKEN_INVALID, but 1) json library reports an error via return
> values, not via token types, 2) 'invalid' token assumes an error,
> not an already parsed token, like I need in my patch.
> 
> A value like JSON_TOKEN_CONSUMED/USED looks too specific for my case
> to be added to the core lib.
> 
> So I added a flag to xrow_update_op. See diff below, and the whole
> patch in the end.
> 
> ================================================================================
> 
> diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c
> index 2eb98e8e9..e6b9ed89f 100644
> --- a/src/box/xrow_update_array.c
> +++ b/src/box/xrow_update_array.c
> @@ -43,13 +43,8 @@
>  static inline int
>  xrow_update_op_prepare_num_token(struct xrow_update_op *op)
>  {
> -	/*
> -	 * Token type END is a special value meaning that the
> -	 * current token needs to be parsed.
> -	 */
> -	if (op->token_type == JSON_TOKEN_END &&
> -	    xrow_update_op_consume_token(op) != 0)
> -			return -1;
> +	if (op->is_token_consumed && xrow_update_op_next_token(op) != 0)
> +		return -1;
>  	if (op->token_type != JSON_TOKEN_NUM) {
>  		return xrow_update_err(op, "can't update an array by a "\
>  				       "non-numeric index");
> @@ -64,7 +59,7 @@ xrow_update_op_prepare_num_token(struct xrow_update_op *op)
>  static inline int
>  xrow_update_op_adjust_field_no(struct xrow_update_op *op, int32_t field_count)
>  {
> -	assert(op->token_type == JSON_TOKEN_NUM);
> +	assert(op->token_type == JSON_TOKEN_NUM && !op->is_token_consumed);
>  	if (op->field_no >= 0) {
>  		if (op->field_no < field_count)
>  			return 0;
> @@ -254,7 +249,7 @@ xrow_update_op_do_array_insert(struct xrow_update_op *op,
>  		item = xrow_update_array_extract_item(field, op);
>  		if (item == NULL)
>  			return -1;
> -		op->token_type = JSON_TOKEN_END;
> +		op->is_token_consumed = true;
>  		return xrow_update_op_do_field_insert(op, &item->field);
>  	}
>  
> @@ -290,7 +285,7 @@ xrow_update_op_do_array_set(struct xrow_update_op *op,
>  	if (item == NULL)
>  		return -1;
>  	if (!xrow_update_op_is_term(op)) {
> -		op->token_type = JSON_TOKEN_END;
> +		op->is_token_consumed = true;
>  		return xrow_update_op_do_field_set(op, &item->field);
>  	}
>  	op->new_field_len = op->arg.set.length;
> @@ -318,7 +313,7 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op,
>  			xrow_update_array_extract_item(field, op);
>  		if (item == NULL)
>  			return -1;
> -		op->token_type = JSON_TOKEN_END;
> +		op->is_token_consumed = true;
>  		return xrow_update_op_do_field_delete(op, &item->field);
>  	}
>  
> @@ -347,7 +342,7 @@ xrow_update_op_do_array_##op_type(struct xrow_update_op *op,			\
>  	if (item == NULL)							\
>  		return -1;							\
>  	if (!xrow_update_op_is_term(op)) {					\
> -		op->token_type = JSON_TOKEN_END;				\
> +		op->is_token_consumed = true;					\
>  		return xrow_update_op_do_field_##op_type(op, &item->field);	\
>  	}									\
>  	if (item->field.type != XUPDATE_NOP)					\
> diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c
> index 96fcaf747..460672b4c 100644
> --- a/src/box/xrow_update_field.c
> +++ b/src/box/xrow_update_field.c
> @@ -612,7 +612,7 @@ xrow_update_op_by(char opcode)
>  }
>  
>  int
> -xrow_update_op_consume_token(struct xrow_update_op *op)
> +xrow_update_op_next_token(struct xrow_update_op *op)
>  {
>  	struct json_token token;
>  	int rc = json_lexer_next_token(&op->lexer, &token);
> @@ -620,6 +620,7 @@ xrow_update_op_consume_token(struct xrow_update_op *op)
>  		return xrow_update_err_bad_json(op, rc);
>  	if (token.type == JSON_TOKEN_END)
>  		return xrow_update_err_no_such_field(op);
> +	op->is_token_consumed = false;
>  	op->token_type = token.type;
>  	op->key = token.str;
>  	op->key_len = token.len;
> @@ -661,6 +662,7 @@ xrow_update_op_decode(struct xrow_update_op *op, int index_base,
>  	 * dictionary.
>  	 */
>  	op->token_type = JSON_TOKEN_NUM;
> +	op->is_token_consumed = false;
>  	int32_t field_no = 0;
>  	switch(mp_typeof(**expr)) {
>  	case MP_INT:
> diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h
> index fbaf45c5d..0b0f608fe 100644
> --- a/src/box/xrow_update_field.h
> +++ b/src/box/xrow_update_field.h
> @@ -179,11 +179,16 @@ struct xrow_update_op {
>  	const struct xrow_update_op_meta *meta;
>  	/** Operation arguments. */
>  	union xrow_update_arg arg;
> +	/** Current level token. */
> +	enum json_token_type token_type;
>  	/**
> -	 * Current level token. END means that it is invalid and
> -	 * a next token should be extracted from the lexer.
> +	 * The flag says whether the token is already consumed by
> +	 * the update operation during its forwarding down the
> +	 * update tree. When the flag is true, it means that the
> +	 * next node of the update tree will need to fetch a next
> +	 * token from the lexer.
>  	 */
> -	enum json_token_type token_type;
> +	bool is_token_consumed;
>  	union {
>  		struct {
>  			const char *key;
> @@ -212,7 +217,7 @@ struct xrow_update_op {
>   * map/array.
>   */
>  int
> -xrow_update_op_consume_token(struct xrow_update_op *op);
> +xrow_update_op_next_token(struct xrow_update_op *op);
>  
>  /**
>   * Decode an update operation from MessagePack.
> 
> ================================================================================
> 
> The whole patch:
> 
> ================================================================================
> 
>     tuple: make update operation tokens consumable
>     
>     There is a case: [1][2][3][4] = 100. It is not a problem when it
>     is a single operation, not intersecting with anything. It is an
>     isolated update then, and works ok. But the next patch allows
>     several update operations have the same prefix, and the path
>     [1][2][3][4] can become a tree of updated arrays. For example, a
>     trivial tree like this:
>     
>         root: [ [1] ]
>                  |
>                  [ [1] [2] ]
>                         |
>                         [ [1] [2] [3] ]
>                                    |
>                                    [ [1] [2] [3] [4] ]
>                                                  =100
>     
>     When the update is applied to root, the JSON path [1][2][3][4]
>     is decoded one part by one. And the operation goes down the tree
>     until reaches the leaf, where [4] = 100 is applied. Each time when
>     the update goes one level down, somebody should update
>     xrow_update_op.field_no so as on the first level it would be 1,
>     then 2, 3, 4.
>     
>     Does it mean that each level of the update [1][2][3][4] should
>     prepare field_no for the next child? No, because then they would
>     need to check type of the child if it is an array or map, or
>     whatever expects a valid field_no/key in xrow_update_op, and
>     ensure that map-child gets a key, array-child gets a field_no.
>     That would complicate the code to a totally unreadable
>     state, and would break encapsulation between
>     xrow_update_array/map/bar... . Each array update operation would
>     check a child for all existing types to ensure that the next token
>     matches it. The same would happen to map updates.
>     
>     This patch goes another way - let each level of update check if
>     its field_no/key is already prepared by the caller. And if not,
>     extract a next token from the operation path. So the map update
>     will ensure that it has a valid key, an array update will ensure
>     that it has a valid field no.
>     
>     Part of #1261
> 
> diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c
> index 57427e39c..e6b9ed89f 100644
> --- a/src/box/xrow_update_array.c
> +++ b/src/box/xrow_update_array.c
> @@ -32,6 +32,26 @@
>  #include "msgpuck.h"
>  #include "fiber.h"
>  
> +/**
> + * Make sure @a op contains a valid field number to where the
> + * operation should be applied next. Field number may be not
> + * known, if the array's parent didn't propagate operation's
> + * lexer. In fact, the parent fills fieldno only in some rare
> + * cases like branching. Generally, an array should care about
> + * fieldno by itself.
> + */
> +static inline int
> +xrow_update_op_prepare_num_token(struct xrow_update_op *op)
> +{
> +	if (op->is_token_consumed && xrow_update_op_next_token(op) != 0)
> +		return -1;
> +	if (op->token_type != JSON_TOKEN_NUM) {
> +		return xrow_update_err(op, "can't update an array by a "\
> +				       "non-numeric index");
> +	}
> +	return 0;
> +}
> +
>  /**
>   * Make field index non-negative and check for the field
>   * existence.
> @@ -39,6 +59,7 @@
>  static inline int
>  xrow_update_op_adjust_field_no(struct xrow_update_op *op, int32_t field_count)
>  {
> +	assert(op->token_type == JSON_TOKEN_NUM && !op->is_token_consumed);
>  	if (op->field_no >= 0) {
>  		if (op->field_no < field_count)
>  			return 0;
> @@ -221,10 +242,14 @@ xrow_update_op_do_array_insert(struct xrow_update_op *op,
>  {
>  	assert(field->type == XUPDATE_ARRAY);
>  	struct xrow_update_array_item *item;
> +	if (xrow_update_op_prepare_num_token(op) != 0)
> +		return -1;
> +
>  	if (!xrow_update_op_is_term(op)) {
>  		item = xrow_update_array_extract_item(field, op);
>  		if (item == NULL)
>  			return -1;
> +		op->is_token_consumed = true;
>  		return xrow_update_op_do_field_insert(op, &item->field);
>  	}
>  
> @@ -248,6 +273,9 @@ xrow_update_op_do_array_set(struct xrow_update_op *op,
>  {
>  	assert(field->type == XUPDATE_ARRAY);
>  	struct xrow_update_rope *rope = field->array.rope;
> +	if (xrow_update_op_prepare_num_token(op) != 0)
> +		return -1;
> +
>  	/* Interpret '=' for n + 1 field as insert. */
>  	if (op->field_no == (int32_t) xrow_update_rope_size(rope))
>  		return xrow_update_op_do_array_insert(op, field);
> @@ -256,8 +284,10 @@ xrow_update_op_do_array_set(struct xrow_update_op *op,
>  		xrow_update_array_extract_item(field, op);
>  	if (item == NULL)
>  		return -1;
> -	if (!xrow_update_op_is_term(op))
> +	if (!xrow_update_op_is_term(op)) {
> +		op->is_token_consumed = true;
>  		return xrow_update_op_do_field_set(op, &item->field);
> +	}
>  	op->new_field_len = op->arg.set.length;
>  	/*
>  	 * Ignore the previous op, if any. It is not correct,
> @@ -275,11 +305,15 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op,
>  			       struct xrow_update_field *field)
>  {
>  	assert(field->type == XUPDATE_ARRAY);
> +	if (xrow_update_op_prepare_num_token(op) != 0)
> +		return -1;
> +
>  	if (!xrow_update_op_is_term(op)) {
>  		struct xrow_update_array_item *item =
>  			xrow_update_array_extract_item(field, op);
>  		if (item == NULL)
>  			return -1;
> +		op->is_token_consumed = true;
>  		return xrow_update_op_do_field_delete(op, &item->field);
>  	}
>  
> @@ -301,12 +335,16 @@ int										\
>  xrow_update_op_do_array_##op_type(struct xrow_update_op *op,			\
>  				  struct xrow_update_field *field)		\
>  {										\
> +	if (xrow_update_op_prepare_num_token(op) != 0)				\
> +		return -1;							\
>  	struct xrow_update_array_item *item =					\
>  		xrow_update_array_extract_item(field, op);			\
>  	if (item == NULL)							\
>  		return -1;							\
> -	if (!xrow_update_op_is_term(op))					\
> +	if (!xrow_update_op_is_term(op)) {					\
> +		op->is_token_consumed = true;					\
>  		return xrow_update_op_do_field_##op_type(op, &item->field);	\
> +	}									\
>  	if (item->field.type != XUPDATE_NOP)					\
>  		return xrow_update_err_double(op);				\
>  	if (xrow_update_op_do_##op_type(op, item->field.data) != 0)		\
> diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c
> index de865a21d..460672b4c 100644
> --- a/src/box/xrow_update_field.c
> +++ b/src/box/xrow_update_field.c
> @@ -611,6 +611,23 @@ xrow_update_op_by(char opcode)
>  	}
>  }
>  
> +int
> +xrow_update_op_next_token(struct xrow_update_op *op)
> +{
> +	struct json_token token;
> +	int rc = json_lexer_next_token(&op->lexer, &token);
> +	if (rc != 0)
> +		return xrow_update_err_bad_json(op, rc);
> +	if (token.type == JSON_TOKEN_END)
> +		return xrow_update_err_no_such_field(op);
> +	op->is_token_consumed = false;
> +	op->token_type = token.type;
> +	op->key = token.str;
> +	op->key_len = token.len;
> +	op->field_no = token.num;
> +	return 0;
> +}
> +
>  int
>  xrow_update_op_decode(struct xrow_update_op *op, int index_base,
>  		      struct tuple_dictionary *dict, const char **expr)
> @@ -639,6 +656,13 @@ xrow_update_op_decode(struct xrow_update_op *op, int index_base,
>  		diag_set(ClientError, ER_UNKNOWN_UPDATE_OP);
>  		return -1;
>  	}
> +	/*
> +	 * First token is always num. Even if a user specified a
> +	 * field name it is converted to num by the tuple
> +	 * dictionary.
> +	 */
> +	op->token_type = JSON_TOKEN_NUM;
> +	op->is_token_consumed = false;
>  	int32_t field_no = 0;
>  	switch(mp_typeof(**expr)) {
>  	case MP_INT:
> diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h
> index bda9222cc..0b0f608fe 100644
> --- a/src/box/xrow_update_field.h
> +++ b/src/box/xrow_update_field.h
> @@ -179,8 +179,23 @@ struct xrow_update_op {
>  	const struct xrow_update_op_meta *meta;
>  	/** Operation arguments. */
>  	union xrow_update_arg arg;
> -	/** First level field no. */
> -	int32_t field_no;
> +	/** Current level token. */
> +	enum json_token_type token_type;
> +	/**
> +	 * The flag says whether the token is already consumed by
> +	 * the update operation during its forwarding down the
> +	 * update tree. When the flag is true, it means that the
> +	 * next node of the update tree will need to fetch a next
> +	 * token from the lexer.
> +	 */
> +	bool is_token_consumed;
> +	union {
> +		struct {
> +			const char *key;
> +			uint32_t key_len;
> +		};
> +		int32_t field_no;
> +	};
>  	/** Size of a new field after it is updated. */
>  	uint32_t new_field_len;
>  	/** Opcode symbol: = + - / ... */
> @@ -193,6 +208,17 @@ struct xrow_update_op {
>  	struct json_lexer lexer;
>  };
>  
> +/**
> + * Extract a next token from the operation path lexer. The result
> + * is used to decide to which child of a current map/array the
> + * operation should be forwarded. It is not just a synonym to
> + * json_lexer_next_token, because fills some fields of @a op,
> + * and should be used only to chose a next child inside a current
> + * map/array.
> + */
> +int
> +xrow_update_op_next_token(struct xrow_update_op *op);
> +
>  /**
>   * Decode an update operation from MessagePack.
>   * @param[out] op Update operation.
> 
> ================================================================================