From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp45.i.mail.ru (smtp45.i.mail.ru [94.100.177.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 6359E4696C5 for ; Fri, 27 Dec 2019 18:52:06 +0300 (MSK) References: <20191227141325.GA1222@tarantool.org> From: Vladislav Shpilevoy Message-ID: <2a9a9d7b-4e6a-9236-a78e-3be6dcd19a35@tarantool.org> Date: Fri, 27 Dec 2019 18:52:05 +0300 MIME-Version: 1.0 In-Reply-To: <20191227141325.GA1222@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 2/3] tuple: JSON path update intersection at arrays List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich Cc: tarantool-patches@dev.tarantool.org Thanks for the review! >> diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h >> index fbaf45c5d..f07b5a649 100644 >> --- a/src/box/xrow_update_field.h >> +++ b/src/box/xrow_update_field.h >> @@ -447,6 +475,33 @@ xrow_update_array_create(struct xrow_update_field *field, const char *header, >> const char *data, const char *data_end, >> uint32_t field_count); >> >> +/** >> + * The same as mere create, but the array is created with a first >> + * child right away. It allows to make it more efficient, because >> + * under the hood a rope is created not as a one huge range which >> + * is then spit in parts, but as two rope nodes from the > ^^^^ > Should be split here? > Yes, it is a typo. ================================================================================ * under the hood a rope is created not as a one huge range which - * is then spit in parts, but as two rope nodes from the + * is then split in parts, but as two rope nodes from the * beginning. On the summary: -1 rope node split, -1 decoding of ================================================================================ >> + * beginning. On the summary: -1 rope node split, -1 decoding of >> + * fields from 0 to @a field_no. >> + * >> + * The function is used during branching, where there was an >> + * existing update, but another one came with the same prefix, and >> + * a different suffix. >> + * >> + * @param[out] field Field to initialize. >> + * @param header Header of the MessagePack array. >> + * @param child A child subtree. The child is copied by value into >> + * the created array. >> + * @param field_no Field number of @a child. >> + * >> + * @retval 0 Success. >> + * @retval -1 Error. >> + */ >> +int >> +xrow_update_array_create_with_child(struct xrow_update_field *field, >> + const char *header, >> + const struct xrow_update_field *child, >> + int32_t field_no); >> + >> diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c >> new file mode 100644 >> index 000000000..f374db837 >> --- /dev/null >> +++ b/src/box/xrow_update_route.c >> @@ -0,0 +1,346 @@ >> +/** >> + * Do the actual branch. This is the case when an existing >> + * bar/route path is different from a new operation's path in an >> + * array. The existing object needs to be split into parent-child, >> + * and the new operation becomes a second child. >> + * >> + * @param[out] next_hop A field which will be initialized as an >> + * array, and which will be a point to apply a new >> + * operation. >> + * @param parent MessagePack array to be taken by @a next_hop. >> + * @param child Current field from which the branch happens. It >> + * already contains an update subtree. >> + */ >> +static int >> +xrow_update_route_branch_array(struct xrow_update_field *next_hop, >> + const char *parent, >> + const struct xrow_update_field *child, >> + int32_t field_no) >> +{ >> + struct xrow_update_op *op = child->bar.op; >> + /* >> + * There are rules when a subtree can be just copied as is > > I would name these as limitations, not rules. > Ok, fixed: ================================================================================ struct xrow_update_op *op = child->bar.op; /* - * There are rules when a subtree can be just copied as is - * from one parent to another. + * There are limitations when a subtree can be just copied + * as is from one parent to another. * 1) It should not be a bar update. Because if it is not ================================================================================ >> + * from one parent to another. >> + * 1) It should not be a bar update. Because if it is not >> + * a bar, then it is either scalar or an array/map. >> + * Scalar update can be safely moved. Array/map update >> + * doesn't change their parent, and also can be moved. >> + * Otherwise see (2). >> + * 2) It is a bar. Then it should not be a leaf. If it is >> + * not a leaf, then it does not change header and other >> + * fields of this particular array, and can be safely >> + * moved to somewhere else. >> + * Otherwise see (1). > ^^^ > (3) I suppose? Yup. ================================================================================ * moved to somewhere else. - * Otherwise see (1). + * Otherwise see (3). * 3) Ok, it is a bar, a leaf. Then it is a bar with zero ================================================================================ >> + * 3) Ok, it is a bar, a leaf. Then it is a bar with zero >> + * path length. It could degrade to zero path len in >> + * during branching. In this case it should be a scalar >> + * bar. The only non-scalar operations are '!' and '#'. >> + * >> + * Why '#' and '!' can't be moved? '!', for example, being >> + * applied to a field [1], affects all fields [2-*], and >> + * the array header. The same but in an even worse form >> + * about '#'. Such operations should be redone. They >> + * affect many fields and the parent. >> + * >> + * There is a tricky thing though - why not to just redo >> + * all operations here, for the sake of code simplicity? >> + * It would allow to remove 'create_with_child' crutch. >> + * The answer is - it is not possible. If a field is >> + * copyable, it is not re-applicable. And vice-versa. For >> + * example, if it is not a leaf, then there may be many >> + * operations, not one. A subtree just can't be >> + * 're-applied'. >> + * >> + * If the operation is scalar and a leaf, then its result >> + * has already overridden its arguments. This is because >> + * scalar operations save result into the arguments, to >> + * save memory. A second operation appliance would lead >> + * to very surprising results. >> + * >> + * Another reason - performance. This path should be >> + * quite hot, and to copy a struct is for sure much faster >> + * than to reapply an operation using a virtual function. >> + * Operations '!' and '#' are quite rare, so their >> + * optimization is not a critical goal. >> + */ >> + if (/* (1) Not a bar. */ >> + child->type != XUPDATE_BAR || >> + /* (2) Bar, but not a leaf. */ >> + child->bar.path_len > 0 || >> + /* (3) Leaf, bar, but a scalar operation. */ >> + (op->opcode != '!' && op->opcode != '#')) { >> + return xrow_update_array_create_with_child(next_hop, parent, >> + child, field_no); >> + } >> + /* >> + * Can't move the child. The only way to branch is to >> + * reapply the child's operation. >> + */ >> + op->token_type = JSON_TOKEN_NUM; >> + op->field_no = field_no; >> + const char *data = parent; >> + uint32_t field_count = mp_decode_array(&data); >> + const char *end = data; >> + for (uint32_t i = 0; i < field_count; ++i) >> + mp_next(&end); >> + if (xrow_update_array_create(next_hop, parent, data, end, >> + field_count) != 0) >> + return -1; >> + return op->meta->do_op(op, next_hop); >> +} >> + >> +struct xrow_update_field * >> +xrow_update_route_branch(struct xrow_update_field *field, >> + struct xrow_update_op *new_op) >> +{ >> + assert(new_op->lexer.src != NULL); >> + const char *old_path; >> + int old_path_len; >> + if (field->type == XUPDATE_BAR) { >> + old_path = field->bar.path; >> + old_path_len = field->bar.path_len; >> + } else { >> + assert(field->type == XUPDATE_ROUTE); >> + old_path = field->route.path; >> + old_path_len = field->route.path_len; >> + } >> + assert(old_path != NULL); >> + struct json_lexer old_path_lexer; >> + struct json_token old_token, new_token; >> + /* >> + * Offset is going to be used as a length of the route >> + * node created as a parent of the old subtree and the >> + * new operation. As it is described in the route >> + * definition in struct xrow_update_field - route is the >> + * common prefix of all operations of the subtree. Here >> + * length of that prefix is calculated. >> + * In addition, it is used here to determine if the new >> + * operation is different from the current subtree from >> + * the very beginning. Basically, it means the offset is >> + * 0, and no route is generated. Root becomes a regular >> + * update field (array, map), not a route. Example: >> + * @a field is a bar '[1].a.b = 20', @a new_op is >> + * '[2].c.d = 30'. In this case the paths are already >> + * different from the beginning, no common prefix = no >> + * route. Array with children [1].a.b and [2].c.d becomes >> + * a root. >> + */ >> + int saved_old_offset; >> + json_lexer_create(&old_path_lexer, old_path, old_path_len, >> + TUPLE_INDEX_BASE); >> + const char *parent = field->data; >> + do { >> + saved_old_offset = old_path_lexer.offset; >> + int rc = json_lexer_next_token(&old_path_lexer, &old_token); >> + /* Old path is already validated. */ >> + assert(rc == 0); >> + rc = json_lexer_next_token(&new_op->lexer, &new_token); >> + if (rc != 0) { >> + xrow_update_err_bad_json(new_op, rc); >> + return NULL; >> + } >> + if (json_token_cmp(&old_token, &new_token) != 0) >> + break; >> + switch(new_token.type) { >> + case JSON_TOKEN_NUM: >> + rc = tuple_field_go_to_index(&parent, new_token.num); >> + break; >> + case JSON_TOKEN_STR: >> + rc = tuple_field_go_to_key(&parent, new_token.str, >> + new_token.len); >> + break; >> + default: >> + /* >> + * Can't be JSON_TOKEN_ANY, because old >> + * and new tokens are equal, but '*' is >> + * considered invalid and the old was >> + * already checked for that. So the new is >> + * valid too. And can't have type ANY. >> + */ >> + assert(new_token.type == JSON_TOKEN_END); >> + xrow_update_err_double(new_op); >> + return NULL; >> + } >> + /* >> + * Must always find a field, because the old >> + * token already went that path. >> + */ >> + assert(rc == 0); >> + } while (true); >> + enum mp_type type = mp_typeof(*parent); >> + /* >> + * Ched if the paths are different from the very > ^^^^ > check(ed)? Sure, typo. ================================================================================ /* - * Ched if the paths are different from the very + * Check if the paths are different from the very * beginning. It means, that the old field should be ================================================================================ >> + * beginning. It means, that the old field should be >> + * transformed instead of creating a new route node. >> + */ >> + bool transform_root = saved_old_offset == 0; > > It would be nicer if you put comparison in () up there > Ok, I don't mind. ================================================================================ */ - bool transform_root = saved_old_offset == 0; + bool transform_root = (saved_old_offset == 0); struct xrow_update_field *next_hop; ================================================================================ >> + struct xrow_update_field *next_hop; >> + if (! transform_root) { > > No spaces after unaries, use sed -ie "s/(! /\(\!/" > True. ================================================================================ diff --git a/src/box/xrow_update_route.c b/src/box/xrow_update_route.c index d03028c7b..7e07e695b 100644 --- a/src/box/xrow_update_route.c +++ b/src/box/xrow_update_route.c @@ -210,7 +210,7 @@ xrow_update_route_branch(struct xrow_update_field *field, */ bool transform_root = (saved_old_offset == 0); struct xrow_update_field *next_hop; - if (! transform_root) { + if (!transform_root) { next_hop = (struct xrow_update_field *) region_alloc(&fiber()->gc, sizeof(*next_hop)); if (next_hop == NULL) { @@ -266,7 +266,7 @@ xrow_update_route_branch(struct xrow_update_field *field, return NULL; } - if (! transform_root) { + if (!transform_root) { field->type = XUPDATE_ROUTE; field->route.path = old_path; field->route.path_len = saved_old_offset; @@ -286,7 +286,7 @@ static struct xrow_update_field * xrow_update_route_next(struct xrow_update_field *field, struct xrow_update_op *op) { assert(field->type == XUPDATE_ROUTE); - assert(! xrow_update_op_is_term(op)); + assert(!xrow_update_op_is_term(op)); const char *new_path = op->lexer.src + op->lexer.offset; int new_path_len = op->lexer.src_len - op->lexer.offset; if (field->route.path_len <= new_path_len && ================================================================================