[Tarantool-patches] [PATCH 2/3] tuple: JSON path update intersection at arrays

Vladislav Shpilevoy v.shpilevoy at tarantool.org
Fri Dec 27 18:52:05 MSK 2019


Thanks for the review!

>> diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h
>> index fbaf45c5d..f07b5a649 100644
>> --- a/src/box/xrow_update_field.h
>> +++ b/src/box/xrow_update_field.h
>> @@ -447,6 +475,33 @@ xrow_update_array_create(struct xrow_update_field *field, const char *header,
>>  			 const char *data, const char *data_end,
>>  			 uint32_t field_count);
>>  
>> +/**
>> + * The same as mere create, but the array is created with a first
>> + * child right away. It allows to make it more efficient, because
>> + * under the hood a rope is created not as a one huge range which
>> + * is then spit in parts, but as two rope nodes from the
>               ^^^^
> Should be     split here?
> 

Yes, it is a typo.

================================================================================
  * under the hood a rope is created not as a one huge range which
- * is then spit in parts, but as two rope nodes from the
+ * is then split in parts, but as two rope nodes from the
  * beginning. On the summary: -1 rope node split, -1 decoding of
================================================================================

>> + * beginning. On the summary: -1 rope node split, -1 decoding of
>> + * fields from 0 to @a field_no.
>> + *
>> + * The function is used during branching, where there was an
>> + * existing update, but another one came with the same prefix, and
>> + * a different suffix.
>> + *
>> + * @param[out] field Field to initialize.
>> + * @param header Header of the MessagePack array.
>> + * @param child A child subtree. The child is copied by value into
>> + *        the created array.
>> + * @param field_no Field number of @a child.
>> + *
>> + * @retval  0 Success.
>> + * @retval -1 Error.
>> + */
>> +int
>> +xrow_update_array_create_with_child(struct xrow_update_field *field,
>> +				    const char *header,
>> +				    const struct xrow_update_field *child,
>> +				    int32_t field_no);
>> +
>> diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c
>> new file mode 100644
>> index 000000000..f374db837
>> --- /dev/null
>> +++ b/src/box/xrow_update_route.c
>> @@ -0,0 +1,346 @@
>> +/**
>> + * Do the actual branch. This is the case when an existing
>> + * bar/route path is different from a new operation's path in an
>> + * array. The existing object needs to be split into parent-child,
>> + * and the new operation becomes a second child.
>> + *
>> + * @param[out] next_hop A field which will be initialized as an
>> + *        array, and which will be a point to apply a new
>> + *        operation.
>> + * @param parent MessagePack array to be taken by @a next_hop.
>> + * @param child Current field from which the branch happens. It
>> + *        already contains an update subtree.
>> + */
>> +static int
>> +xrow_update_route_branch_array(struct xrow_update_field *next_hop,
>> +			       const char *parent,
>> +			       const struct xrow_update_field *child,
>> +			       int32_t field_no)
>> +{
>> +	struct xrow_update_op *op = child->bar.op;
>> +	/*
>> +	 * There are rules when a subtree can be just copied as is
> 
> I would name these as limitations, not rules. 
> 

Ok, fixed:

================================================================================
 	struct xrow_update_op *op = child->bar.op;
 	/*
-	 * There are rules when a subtree can be just copied as is
-	 * from one parent to another.
+	 * There are limitations when a subtree can be just copied
+	 * as is from one parent to another.
 	 * 1) It should not be a bar update. Because if it is not
================================================================================

>> +	 * from one parent to another.
>> +	 * 1) It should not be a bar update. Because if it is not
>> +	 *    a bar, then it is either scalar or an array/map.
>> +	 *    Scalar update can be safely moved. Array/map update
>> +	 *    doesn't change their parent, and also can be moved.
>> +	 *    Otherwise see (2).
>> +	 * 2) It is a bar. Then it should not be a leaf. If it is
>> +	 *    not a leaf, then it does not change header and other
>> +	 *    fields of this particular array, and can be safely
>> +	 *    moved to somewhere else.
>> +	 *    Otherwise see (1).
>                             ^^^
>                             (3) I suppose?

Yup.

================================================================================
 	 *    moved to somewhere else.
-	 *    Otherwise see (1).
+	 *    Otherwise see (3).
 	 * 3) Ok, it is a bar, a leaf. Then it is a bar with zero
================================================================================

>> +	 * 3) Ok, it is a bar, a leaf. Then it is a bar with zero
>> +	 *    path length. It could degrade to zero path len in
>> +	 *    during branching. In this case it should be a scalar
>> +	 *    bar. The only non-scalar operations are '!' and '#'.
>> +	 *
>> +	 * Why '#' and '!' can't be moved? '!', for example, being
>> +	 * applied to a field [1], affects all fields [2-*], and
>> +	 * the array header. The same but in an even worse form
>> +	 * about '#'. Such operations should be redone. They
>> +	 * affect many fields and the parent.
>> +	 *
>> +	 * There is a tricky thing though - why not to just redo
>> +	 * all operations here, for the sake of code simplicity?
>> +	 * It would allow to remove 'create_with_child' crutch.
>> +	 * The answer is - it is not possible. If a field is
>> +	 * copyable, it is not re-applicable. And vice-versa. For
>> +	 * example, if it is not a leaf, then there may be many
>> +	 * operations, not one. A subtree just can't be
>> +	 * 're-applied'.
>> +	 *
>> +	 * If the operation is scalar and a leaf, then its result
>> +	 * has already overridden its arguments. This is because
>> +	 * scalar operations save result into the arguments, to
>> +	 * save memory. A second operation appliance would lead
>> +	 * to very surprising results.
>> +	 *
>> +	 * Another reason - performance. This path should be
>> +	 * quite hot, and to copy a struct is for sure much faster
>> +	 * than to reapply an operation using a virtual function.
>> +	 * Operations '!' and '#' are quite rare, so their
>> +	 * optimization is not a critical goal.
>> +	 */
>> +	if (/* (1) Not a bar. */
>> +	    child->type != XUPDATE_BAR ||
>> +	    /* (2) Bar, but not a leaf. */
>> +	    child->bar.path_len > 0 ||
>> +	    /* (3) Leaf, bar, but a scalar operation. */
>> +	    (op->opcode != '!' && op->opcode != '#')) {
>> +		return xrow_update_array_create_with_child(next_hop, parent,
>> +							   child, field_no);
>> +	}
>> +	/*
>> +	 * Can't move the child. The only way to branch is to
>> +	 * reapply the child's operation.
>> +	 */
>> +	op->token_type = JSON_TOKEN_NUM;
>> +	op->field_no = field_no;
>> +	const char *data = parent;
>> +	uint32_t field_count = mp_decode_array(&data);
>> +	const char *end = data;
>> +	for (uint32_t i = 0; i < field_count; ++i)
>> +		mp_next(&end);
>> +	if (xrow_update_array_create(next_hop, parent, data, end,
>> +				     field_count) != 0)
>> +		return -1;
>> +	return op->meta->do_op(op, next_hop);
>> +}
>> +
>> +struct xrow_update_field *
>> +xrow_update_route_branch(struct xrow_update_field *field,
>> +			 struct xrow_update_op *new_op)
>> +{
>> +	assert(new_op->lexer.src != NULL);
>> +	const char *old_path;
>> +	int old_path_len;
>> +	if (field->type == XUPDATE_BAR) {
>> +		old_path = field->bar.path;
>> +		old_path_len = field->bar.path_len;
>> +	} else {
>> +		assert(field->type == XUPDATE_ROUTE);
>> +		old_path = field->route.path;
>> +		old_path_len = field->route.path_len;
>> +	}
>> +	assert(old_path != NULL);
>> +	struct json_lexer old_path_lexer;
>> +	struct json_token old_token, new_token;
>> +	/*
>> +	 * Offset is going to be used as a length of the route
>> +	 * node created as a parent of the old subtree and the
>> +	 * new operation. As it is described in the route
>> +	 * definition in struct xrow_update_field - route is the
>> +	 * common prefix of all operations of the subtree. Here
>> +	 * length of that prefix is calculated.
>> +	 * In addition, it is used here to determine if the new
>> +	 * operation is different from the current subtree from
>> +	 * the very beginning. Basically, it means the offset is
>> +	 * 0, and no route is generated. Root becomes a regular
>> +	 * update field (array, map), not a route. Example:
>> +	 * @a field is a bar '[1].a.b = 20', @a new_op is
>> +	 * '[2].c.d = 30'. In this case the paths are already
>> +	 * different from the beginning, no common prefix = no
>> +	 * route. Array with children [1].a.b and [2].c.d becomes
>> +	 * a root.
>> +	 */
>> +	int saved_old_offset;
>> +	json_lexer_create(&old_path_lexer, old_path, old_path_len,
>> +			  TUPLE_INDEX_BASE);
>> +	const char *parent = field->data;
>> +	do {
>> +		saved_old_offset = old_path_lexer.offset;
>> +		int rc = json_lexer_next_token(&old_path_lexer, &old_token);
>> +		/* Old path is already validated. */
>> +		assert(rc == 0);
>> +		rc = json_lexer_next_token(&new_op->lexer, &new_token);
>> +		if (rc != 0) {
>> +			xrow_update_err_bad_json(new_op, rc);
>> +			return NULL;
>> +		}
>> +		if (json_token_cmp(&old_token, &new_token) != 0)
>> +			break;
>> +		switch(new_token.type) {
>> +		case JSON_TOKEN_NUM:
>> +			rc = tuple_field_go_to_index(&parent, new_token.num);
>> +			break;
>> +		case JSON_TOKEN_STR:
>> +			rc = tuple_field_go_to_key(&parent, new_token.str,
>> +						   new_token.len);
>> +			break;
>> +		default:
>> +			/*
>> +			 * Can't be JSON_TOKEN_ANY, because old
>> +			 * and new tokens are equal, but '*' is
>> +			 * considered invalid and the old was
>> +			 * already checked for that. So the new is
>> +			 * valid too. And can't have type ANY.
>> +			 */
>> +			assert(new_token.type == JSON_TOKEN_END);
>> +			xrow_update_err_double(new_op);
>> +			return NULL;
>> +		}
>> +		/*
>> +		 * Must always find a field, because the old
>> +		 * token already went that path.
>> +		 */
>> +		assert(rc == 0);
>> +	} while (true);
>> +	enum mp_type type = mp_typeof(*parent);
>> +	/*
>> +	 * Ched if the paths are different from the very
>            ^^^^
>            check(ed)?

Sure, typo.

================================================================================
 	/*
-	 * Ched if the paths are different from the very
+	 * Check if the paths are different from the very
 	 * beginning. It means, that the old field should be
================================================================================

>> +	 * beginning. It means, that the old field should be
>> +	 * transformed instead of creating a new route node.
>> +	 */
>> +	bool transform_root = saved_old_offset == 0;
> 
> It would be nicer if you put comparison in () up there
> 

Ok, I don't mind.

================================================================================
 	 */
-	bool transform_root = saved_old_offset == 0;
+	bool transform_root = (saved_old_offset == 0);
 	struct xrow_update_field *next_hop;
================================================================================

>> +	struct xrow_update_field *next_hop;
>> +	if (! transform_root) {
> 
> No spaces after unaries, use sed -ie "s/(! /\(\!/"
> 

True.

================================================================================
diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c
index d03028c7b..7e07e695b 100644
--- a/src/box/xrow_update_route.c
+++ b/src/box/xrow_update_route.c
@@ -210,7 +210,7 @@ xrow_update_route_branch(struct xrow_update_field *field,
 	 */
 	bool transform_root = (saved_old_offset == 0);
 	struct xrow_update_field *next_hop;
-	if (! transform_root) {
+	if (!transform_root) {
 		next_hop = (struct xrow_update_field *)
 			region_alloc(&fiber()->gc, sizeof(*next_hop));
 		if (next_hop == NULL) {
@@ -266,7 +266,7 @@ xrow_update_route_branch(struct xrow_update_field *field,
 		return NULL;
 	}
 
-	if (! transform_root) {
+	if (!transform_root) {
 		field->type = XUPDATE_ROUTE;
 		field->route.path = old_path;
 		field->route.path_len = saved_old_offset;
@@ -286,7 +286,7 @@ static struct xrow_update_field *
 xrow_update_route_next(struct xrow_update_field *field, struct xrow_update_op *op)
 {
 	assert(field->type == XUPDATE_ROUTE);
-	assert(! xrow_update_op_is_term(op));
+	assert(!xrow_update_op_is_term(op));
 	const char *new_path = op->lexer.src + op->lexer.offset;
 	int new_path_len = op->lexer.src_len - op->lexer.offset;
 	if (field->route.path_len <= new_path_len &&
================================================================================


More information about the Tarantool-patches mailing list