[Tarantool-patches] [PATCH 2/3] tuple: JSON path update intersection at arrays
Vladislav Shpilevoy
v.shpilevoy at tarantool.org
Fri Dec 27 18:52:05 MSK 2019
Thanks for the review!
>> diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h
>> index fbaf45c5d..f07b5a649 100644
>> --- a/src/box/xrow_update_field.h
>> +++ b/src/box/xrow_update_field.h
>> @@ -447,6 +475,33 @@ xrow_update_array_create(struct xrow_update_field *field, const char *header,
>> const char *data, const char *data_end,
>> uint32_t field_count);
>>
>> +/**
>> + * The same as mere create, but the array is created with a first
>> + * child right away. It allows to make it more efficient, because
>> + * under the hood a rope is created not as a one huge range which
>> + * is then spit in parts, but as two rope nodes from the
> ^^^^
> Should be split here?
>
Yes, it is a typo.
================================================================================
* under the hood a rope is created not as a one huge range which
- * is then spit in parts, but as two rope nodes from the
+ * is then split in parts, but as two rope nodes from the
* beginning. On the summary: -1 rope node split, -1 decoding of
================================================================================
>> + * beginning. On the summary: -1 rope node split, -1 decoding of
>> + * fields from 0 to @a field_no.
>> + *
>> + * The function is used during branching, where there was an
>> + * existing update, but another one came with the same prefix, and
>> + * a different suffix.
>> + *
>> + * @param[out] field Field to initialize.
>> + * @param header Header of the MessagePack array.
>> + * @param child A child subtree. The child is copied by value into
>> + * the created array.
>> + * @param field_no Field number of @a child.
>> + *
>> + * @retval 0 Success.
>> + * @retval -1 Error.
>> + */
>> +int
>> +xrow_update_array_create_with_child(struct xrow_update_field *field,
>> + const char *header,
>> + const struct xrow_update_field *child,
>> + int32_t field_no);
>> +
>> diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c
>> new file mode 100644
>> index 000000000..f374db837
>> --- /dev/null
>> +++ b/src/box/xrow_update_route.c
>> @@ -0,0 +1,346 @@
>> +/**
>> + * Do the actual branch. This is the case when an existing
>> + * bar/route path is different from a new operation's path in an
>> + * array. The existing object needs to be split into parent-child,
>> + * and the new operation becomes a second child.
>> + *
>> + * @param[out] next_hop A field which will be initialized as an
>> + * array, and which will be a point to apply a new
>> + * operation.
>> + * @param parent MessagePack array to be taken by @a next_hop.
>> + * @param child Current field from which the branch happens. It
>> + * already contains an update subtree.
>> + */
>> +static int
>> +xrow_update_route_branch_array(struct xrow_update_field *next_hop,
>> + const char *parent,
>> + const struct xrow_update_field *child,
>> + int32_t field_no)
>> +{
>> + struct xrow_update_op *op = child->bar.op;
>> + /*
>> + * There are rules when a subtree can be just copied as is
>
> I would name these as limitations, not rules.
>
Ok, fixed:
================================================================================
struct xrow_update_op *op = child->bar.op;
/*
- * There are rules when a subtree can be just copied as is
- * from one parent to another.
+ * There are limitations when a subtree can be just copied
+ * as is from one parent to another.
* 1) It should not be a bar update. Because if it is not
================================================================================
>> + * from one parent to another.
>> + * 1) It should not be a bar update. Because if it is not
>> + * a bar, then it is either scalar or an array/map.
>> + * Scalar update can be safely moved. Array/map update
>> + * doesn't change their parent, and also can be moved.
>> + * Otherwise see (2).
>> + * 2) It is a bar. Then it should not be a leaf. If it is
>> + * not a leaf, then it does not change header and other
>> + * fields of this particular array, and can be safely
>> + * moved to somewhere else.
>> + * Otherwise see (1).
> ^^^
> (3) I suppose?
Yup.
================================================================================
* moved to somewhere else.
- * Otherwise see (1).
+ * Otherwise see (3).
* 3) Ok, it is a bar, a leaf. Then it is a bar with zero
================================================================================
>> + * 3) Ok, it is a bar, a leaf. Then it is a bar with zero
>> + * path length. It could degrade to zero path len in
>> + * during branching. In this case it should be a scalar
>> + * bar. The only non-scalar operations are '!' and '#'.
>> + *
>> + * Why '#' and '!' can't be moved? '!', for example, being
>> + * applied to a field [1], affects all fields [2-*], and
>> + * the array header. The same but in an even worse form
>> + * about '#'. Such operations should be redone. They
>> + * affect many fields and the parent.
>> + *
>> + * There is a tricky thing though - why not to just redo
>> + * all operations here, for the sake of code simplicity?
>> + * It would allow to remove 'create_with_child' crutch.
>> + * The answer is - it is not possible. If a field is
>> + * copyable, it is not re-applicable. And vice-versa. For
>> + * example, if it is not a leaf, then there may be many
>> + * operations, not one. A subtree just can't be
>> + * 're-applied'.
>> + *
>> + * If the operation is scalar and a leaf, then its result
>> + * has already overridden its arguments. This is because
>> + * scalar operations save result into the arguments, to
>> + * save memory. A second operation appliance would lead
>> + * to very surprising results.
>> + *
>> + * Another reason - performance. This path should be
>> + * quite hot, and to copy a struct is for sure much faster
>> + * than to reapply an operation using a virtual function.
>> + * Operations '!' and '#' are quite rare, so their
>> + * optimization is not a critical goal.
>> + */
>> + if (/* (1) Not a bar. */
>> + child->type != XUPDATE_BAR ||
>> + /* (2) Bar, but not a leaf. */
>> + child->bar.path_len > 0 ||
>> + /* (3) Leaf, bar, but a scalar operation. */
>> + (op->opcode != '!' && op->opcode != '#')) {
>> + return xrow_update_array_create_with_child(next_hop, parent,
>> + child, field_no);
>> + }
>> + /*
>> + * Can't move the child. The only way to branch is to
>> + * reapply the child's operation.
>> + */
>> + op->token_type = JSON_TOKEN_NUM;
>> + op->field_no = field_no;
>> + const char *data = parent;
>> + uint32_t field_count = mp_decode_array(&data);
>> + const char *end = data;
>> + for (uint32_t i = 0; i < field_count; ++i)
>> + mp_next(&end);
>> + if (xrow_update_array_create(next_hop, parent, data, end,
>> + field_count) != 0)
>> + return -1;
>> + return op->meta->do_op(op, next_hop);
>> +}
>> +
>> +struct xrow_update_field *
>> +xrow_update_route_branch(struct xrow_update_field *field,
>> + struct xrow_update_op *new_op)
>> +{
>> + assert(new_op->lexer.src != NULL);
>> + const char *old_path;
>> + int old_path_len;
>> + if (field->type == XUPDATE_BAR) {
>> + old_path = field->bar.path;
>> + old_path_len = field->bar.path_len;
>> + } else {
>> + assert(field->type == XUPDATE_ROUTE);
>> + old_path = field->route.path;
>> + old_path_len = field->route.path_len;
>> + }
>> + assert(old_path != NULL);
>> + struct json_lexer old_path_lexer;
>> + struct json_token old_token, new_token;
>> + /*
>> + * Offset is going to be used as a length of the route
>> + * node created as a parent of the old subtree and the
>> + * new operation. As it is described in the route
>> + * definition in struct xrow_update_field - route is the
>> + * common prefix of all operations of the subtree. Here
>> + * length of that prefix is calculated.
>> + * In addition, it is used here to determine if the new
>> + * operation is different from the current subtree from
>> + * the very beginning. Basically, it means the offset is
>> + * 0, and no route is generated. Root becomes a regular
>> + * update field (array, map), not a route. Example:
>> + * @a field is a bar '[1].a.b = 20', @a new_op is
>> + * '[2].c.d = 30'. In this case the paths are already
>> + * different from the beginning, no common prefix = no
>> + * route. Array with children [1].a.b and [2].c.d becomes
>> + * a root.
>> + */
>> + int saved_old_offset;
>> + json_lexer_create(&old_path_lexer, old_path, old_path_len,
>> + TUPLE_INDEX_BASE);
>> + const char *parent = field->data;
>> + do {
>> + saved_old_offset = old_path_lexer.offset;
>> + int rc = json_lexer_next_token(&old_path_lexer, &old_token);
>> + /* Old path is already validated. */
>> + assert(rc == 0);
>> + rc = json_lexer_next_token(&new_op->lexer, &new_token);
>> + if (rc != 0) {
>> + xrow_update_err_bad_json(new_op, rc);
>> + return NULL;
>> + }
>> + if (json_token_cmp(&old_token, &new_token) != 0)
>> + break;
>> + switch(new_token.type) {
>> + case JSON_TOKEN_NUM:
>> + rc = tuple_field_go_to_index(&parent, new_token.num);
>> + break;
>> + case JSON_TOKEN_STR:
>> + rc = tuple_field_go_to_key(&parent, new_token.str,
>> + new_token.len);
>> + break;
>> + default:
>> + /*
>> + * Can't be JSON_TOKEN_ANY, because old
>> + * and new tokens are equal, but '*' is
>> + * considered invalid and the old was
>> + * already checked for that. So the new is
>> + * valid too. And can't have type ANY.
>> + */
>> + assert(new_token.type == JSON_TOKEN_END);
>> + xrow_update_err_double(new_op);
>> + return NULL;
>> + }
>> + /*
>> + * Must always find a field, because the old
>> + * token already went that path.
>> + */
>> + assert(rc == 0);
>> + } while (true);
>> + enum mp_type type = mp_typeof(*parent);
>> + /*
>> + * Ched if the paths are different from the very
> ^^^^
> check(ed)?
Sure, typo.
================================================================================
/*
- * Ched if the paths are different from the very
+ * Check if the paths are different from the very
* beginning. It means, that the old field should be
================================================================================
>> + * beginning. It means, that the old field should be
>> + * transformed instead of creating a new route node.
>> + */
>> + bool transform_root = saved_old_offset == 0;
>
> It would be nicer if you put comparison in () up there
>
Ok, I don't mind.
================================================================================
*/
- bool transform_root = saved_old_offset == 0;
+ bool transform_root = (saved_old_offset == 0);
struct xrow_update_field *next_hop;
================================================================================
>> + struct xrow_update_field *next_hop;
>> + if (! transform_root) {
>
> No spaces after unaries, use sed -ie "s/(! /\(\!/"
>
True.
================================================================================
diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c
index d03028c7b..7e07e695b 100644
--- a/src/box/xrow_update_route.c
+++ b/src/box/xrow_update_route.c
@@ -210,7 +210,7 @@ xrow_update_route_branch(struct xrow_update_field *field,
*/
bool transform_root = (saved_old_offset == 0);
struct xrow_update_field *next_hop;
- if (! transform_root) {
+ if (!transform_root) {
next_hop = (struct xrow_update_field *)
region_alloc(&fiber()->gc, sizeof(*next_hop));
if (next_hop == NULL) {
@@ -266,7 +266,7 @@ xrow_update_route_branch(struct xrow_update_field *field,
return NULL;
}
- if (! transform_root) {
+ if (!transform_root) {
field->type = XUPDATE_ROUTE;
field->route.path = old_path;
field->route.path_len = saved_old_offset;
@@ -286,7 +286,7 @@ static struct xrow_update_field *
xrow_update_route_next(struct xrow_update_field *field, struct xrow_update_op *op)
{
assert(field->type == XUPDATE_ROUTE);
- assert(! xrow_update_op_is_term(op));
+ assert(!xrow_update_op_is_term(op));
const char *new_path = op->lexer.src + op->lexer.offset;
int new_path_len = op->lexer.src_len - op->lexer.offset;
if (field->route.path_len <= new_path_len &&
================================================================================
More information about the Tarantool-patches
mailing list