From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp51.i.mail.ru (smtp51.i.mail.ru [94.100.177.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id 9085546970E for ; Fri, 27 Dec 2019 16:00:11 +0300 (MSK) References: <679a1d9daedb82276388320f4078be590c1979df.1577140688.git.v.shpilevoy@tarantool.org> <20191226120751.GA603@tarantool.org> From: Vladislav Shpilevoy Message-ID: <344b1cd0-2b08-f297-5774-ceca3ba3c985@tarantool.org> Date: Fri, 27 Dec 2019 16:00:10 +0300 MIME-Version: 1.0 In-Reply-To: <20191226120751.GA603@tarantool.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Tarantool-patches] [PATCH 1/3] tuple: make update operation tokens consumable List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Ostanevich Cc: tarantool-patches@dev.tarantool.org Hi! Thanks for the review! On 26/12/2019 13:07, Sergey Ostanevich wrote: > Hi! > > Thanks for the patch! > Just one comment below, otherwise LGTM. > > Sergos > > On 23 Dec 23:41, Vladislav Shpilevoy wrote: >> There is a case: [1][2][3][4] = 100. It is not a problem when it >> is a single operation, not intersecting with anything. It is an >> isolated update then, and works ok. But the next patch allows >> several update operations have the same prefix, and the path >> [1][2][3][4] can become a tree of updated arrays. For example, a >> trivial tree like this: >> >> root: [ [1] ] >> | >> [ [1] [2] ] >> | >> [ [1] [2] [3] ] >> | >> [ [1] [2] [3] [4] ] >> =100 >> >> When the update is applied to root, the JSON path [1][2][3][4] >> is decoded one part by one. And the operation goes down the tree >> until reaches the leaf, where [4] = 100 is applied. Each time when >> the update goes one level down, somebody should update >> xrow_update_op.field_no so as on the first level it would be 1, >> then 2, 3, 4. >> >> Does it mean that each level of the update [1][2][3][4] should >> prepare field_no for the next child? No, because then they would >> need to check type of the child if it is an array or map, or >> whatever expects a valid field_no/key in xrow_update_op, and >> ensure that map-child gets a key, array-child gets a field_no. >> That would complicate the code to a totally unreadable >> state, and would break encapsulation between >> xrow_update_array/map/bar... . Each array update operation would >> check a child for all existing types to ensure that the next token >> matches it. The same would happen to map updates. >> >> This patch goes another way - let each level of update check if >> its field_no/key is already prepared by the caller. And if not, >> extract a next token from the operation path. So the map update >> will ensure that it has a valid key, an array update will ensure >> that it has a valid field no. >> >> Part of #1261 >> --- >> src/box/xrow_update_array.c | 47 +++++++++++++++++++++++++++++++++++-- >> src/box/xrow_update_field.c | 22 +++++++++++++++++ >> src/box/xrow_update_field.h | 25 ++++++++++++++++++-- >> 3 files changed, 90 insertions(+), 4 deletions(-) >> >> diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c >> index 57427e39c..6a7ce09ff 100644 >> --- a/src/box/xrow_update_array.c >> +++ b/src/box/xrow_update_array.c >> @@ -32,6 +32,31 @@ >> #include "msgpuck.h" >> #include "fiber.h" >> >> +/** >> + * Make sure @a op contains a valid field number to where the >> + * operation should be applied next. Field number may be not >> + * known, if the array's parent didn't propagate operation's >> + * lexer. In fact, the parent fills fieldno only in some rare >> + * cases like branching. Generally, an array should care about >> + * fieldno by itself. >> + */ >> +static inline int >> +xrow_update_op_prepare_num_token(struct xrow_update_op *op) >> +{ >> + /* >> + * Token type END is a special value meaning that the >> + * current token needs to be parsed. >> + */ > > I really don't like reuse of entities introduced for some purpose. If > one will read this code, then it will be hard to understand why out of > nowhere Lexer forced to come to an end at > xrow_update_op_do_array_insert() for example. More that that - if lexer > in a state of 'reached the end' it forced to parse. > > If we introduce some more functionality for JSON later - I expect we > will face even more logic to hang on this 'special value'. Then > contexts of these functionalities may intersect... oh my... > > Can we introduce a new type with some self-explanatory name instead and > leave the JSON_TOKEN_END on its own? Agree. But I don't know how to justify introduction of a new token type to the json library. I was thinking about something like JSON_TOKEN_INVALID, but 1) json library reports an error via return values, not via token types, 2) 'invalid' token assumes an error, not an already parsed token, like I need in my patch. A value like JSON_TOKEN_CONSUMED/USED looks too specific for my case to be added to the core lib. So I added a flag to xrow_update_op. See diff below, and the whole patch in the end. ================================================================================ diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c index 2eb98e8e9..e6b9ed89f 100644 --- a/src/box/xrow_update_array.c +++ b/src/box/xrow_update_array.c @@ -43,13 +43,8 @@ static inline int xrow_update_op_prepare_num_token(struct xrow_update_op *op) { - /* - * Token type END is a special value meaning that the - * current token needs to be parsed. - */ - if (op->token_type == JSON_TOKEN_END && - xrow_update_op_consume_token(op) != 0) - return -1; + if (op->is_token_consumed && xrow_update_op_next_token(op) != 0) + return -1; if (op->token_type != JSON_TOKEN_NUM) { return xrow_update_err(op, "can't update an array by a "\ "non-numeric index"); @@ -64,7 +59,7 @@ xrow_update_op_prepare_num_token(struct xrow_update_op *op) static inline int xrow_update_op_adjust_field_no(struct xrow_update_op *op, int32_t field_count) { - assert(op->token_type == JSON_TOKEN_NUM); + assert(op->token_type == JSON_TOKEN_NUM && !op->is_token_consumed); if (op->field_no >= 0) { if (op->field_no < field_count) return 0; @@ -254,7 +249,7 @@ xrow_update_op_do_array_insert(struct xrow_update_op *op, item = xrow_update_array_extract_item(field, op); if (item == NULL) return -1; - op->token_type = JSON_TOKEN_END; + op->is_token_consumed = true; return xrow_update_op_do_field_insert(op, &item->field); } @@ -290,7 +285,7 @@ xrow_update_op_do_array_set(struct xrow_update_op *op, if (item == NULL) return -1; if (!xrow_update_op_is_term(op)) { - op->token_type = JSON_TOKEN_END; + op->is_token_consumed = true; return xrow_update_op_do_field_set(op, &item->field); } op->new_field_len = op->arg.set.length; @@ -318,7 +313,7 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op, xrow_update_array_extract_item(field, op); if (item == NULL) return -1; - op->token_type = JSON_TOKEN_END; + op->is_token_consumed = true; return xrow_update_op_do_field_delete(op, &item->field); } @@ -347,7 +342,7 @@ xrow_update_op_do_array_##op_type(struct xrow_update_op *op, \ if (item == NULL) \ return -1; \ if (!xrow_update_op_is_term(op)) { \ - op->token_type = JSON_TOKEN_END; \ + op->is_token_consumed = true; \ return xrow_update_op_do_field_##op_type(op, &item->field); \ } \ if (item->field.type != XUPDATE_NOP) \ diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c index 96fcaf747..460672b4c 100644 --- a/src/box/xrow_update_field.c +++ b/src/box/xrow_update_field.c @@ -612,7 +612,7 @@ xrow_update_op_by(char opcode) } int -xrow_update_op_consume_token(struct xrow_update_op *op) +xrow_update_op_next_token(struct xrow_update_op *op) { struct json_token token; int rc = json_lexer_next_token(&op->lexer, &token); @@ -620,6 +620,7 @@ xrow_update_op_consume_token(struct xrow_update_op *op) return xrow_update_err_bad_json(op, rc); if (token.type == JSON_TOKEN_END) return xrow_update_err_no_such_field(op); + op->is_token_consumed = false; op->token_type = token.type; op->key = token.str; op->key_len = token.len; @@ -661,6 +662,7 @@ xrow_update_op_decode(struct xrow_update_op *op, int index_base, * dictionary. */ op->token_type = JSON_TOKEN_NUM; + op->is_token_consumed = false; int32_t field_no = 0; switch(mp_typeof(**expr)) { case MP_INT: diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h index fbaf45c5d..0b0f608fe 100644 --- a/src/box/xrow_update_field.h +++ b/src/box/xrow_update_field.h @@ -179,11 +179,16 @@ struct xrow_update_op { const struct xrow_update_op_meta *meta; /** Operation arguments. */ union xrow_update_arg arg; + /** Current level token. */ + enum json_token_type token_type; /** - * Current level token. END means that it is invalid and - * a next token should be extracted from the lexer. + * The flag says whether the token is already consumed by + * the update operation during its forwarding down the + * update tree. When the flag is true, it means that the + * next node of the update tree will need to fetch a next + * token from the lexer. */ - enum json_token_type token_type; + bool is_token_consumed; union { struct { const char *key; @@ -212,7 +217,7 @@ struct xrow_update_op { * map/array. */ int -xrow_update_op_consume_token(struct xrow_update_op *op); +xrow_update_op_next_token(struct xrow_update_op *op); /** * Decode an update operation from MessagePack. ================================================================================ The whole patch: ================================================================================ tuple: make update operation tokens consumable There is a case: [1][2][3][4] = 100. It is not a problem when it is a single operation, not intersecting with anything. It is an isolated update then, and works ok. But the next patch allows several update operations have the same prefix, and the path [1][2][3][4] can become a tree of updated arrays. For example, a trivial tree like this: root: [ [1] ] | [ [1] [2] ] | [ [1] [2] [3] ] | [ [1] [2] [3] [4] ] =100 When the update is applied to root, the JSON path [1][2][3][4] is decoded one part by one. And the operation goes down the tree until reaches the leaf, where [4] = 100 is applied. Each time when the update goes one level down, somebody should update xrow_update_op.field_no so as on the first level it would be 1, then 2, 3, 4. Does it mean that each level of the update [1][2][3][4] should prepare field_no for the next child? No, because then they would need to check type of the child if it is an array or map, or whatever expects a valid field_no/key in xrow_update_op, and ensure that map-child gets a key, array-child gets a field_no. That would complicate the code to a totally unreadable state, and would break encapsulation between xrow_update_array/map/bar... . Each array update operation would check a child for all existing types to ensure that the next token matches it. The same would happen to map updates. This patch goes another way - let each level of update check if its field_no/key is already prepared by the caller. And if not, extract a next token from the operation path. So the map update will ensure that it has a valid key, an array update will ensure that it has a valid field no. Part of #1261 diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c index 57427e39c..e6b9ed89f 100644 --- a/src/box/xrow_update_array.c +++ b/src/box/xrow_update_array.c @@ -32,6 +32,26 @@ #include "msgpuck.h" #include "fiber.h" +/** + * Make sure @a op contains a valid field number to where the + * operation should be applied next. Field number may be not + * known, if the array's parent didn't propagate operation's + * lexer. In fact, the parent fills fieldno only in some rare + * cases like branching. Generally, an array should care about + * fieldno by itself. + */ +static inline int +xrow_update_op_prepare_num_token(struct xrow_update_op *op) +{ + if (op->is_token_consumed && xrow_update_op_next_token(op) != 0) + return -1; + if (op->token_type != JSON_TOKEN_NUM) { + return xrow_update_err(op, "can't update an array by a "\ + "non-numeric index"); + } + return 0; +} + /** * Make field index non-negative and check for the field * existence. @@ -39,6 +59,7 @@ static inline int xrow_update_op_adjust_field_no(struct xrow_update_op *op, int32_t field_count) { + assert(op->token_type == JSON_TOKEN_NUM && !op->is_token_consumed); if (op->field_no >= 0) { if (op->field_no < field_count) return 0; @@ -221,10 +242,14 @@ xrow_update_op_do_array_insert(struct xrow_update_op *op, { assert(field->type == XUPDATE_ARRAY); struct xrow_update_array_item *item; + if (xrow_update_op_prepare_num_token(op) != 0) + return -1; + if (!xrow_update_op_is_term(op)) { item = xrow_update_array_extract_item(field, op); if (item == NULL) return -1; + op->is_token_consumed = true; return xrow_update_op_do_field_insert(op, &item->field); } @@ -248,6 +273,9 @@ xrow_update_op_do_array_set(struct xrow_update_op *op, { assert(field->type == XUPDATE_ARRAY); struct xrow_update_rope *rope = field->array.rope; + if (xrow_update_op_prepare_num_token(op) != 0) + return -1; + /* Interpret '=' for n + 1 field as insert. */ if (op->field_no == (int32_t) xrow_update_rope_size(rope)) return xrow_update_op_do_array_insert(op, field); @@ -256,8 +284,10 @@ xrow_update_op_do_array_set(struct xrow_update_op *op, xrow_update_array_extract_item(field, op); if (item == NULL) return -1; - if (!xrow_update_op_is_term(op)) + if (!xrow_update_op_is_term(op)) { + op->is_token_consumed = true; return xrow_update_op_do_field_set(op, &item->field); + } op->new_field_len = op->arg.set.length; /* * Ignore the previous op, if any. It is not correct, @@ -275,11 +305,15 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op, struct xrow_update_field *field) { assert(field->type == XUPDATE_ARRAY); + if (xrow_update_op_prepare_num_token(op) != 0) + return -1; + if (!xrow_update_op_is_term(op)) { struct xrow_update_array_item *item = xrow_update_array_extract_item(field, op); if (item == NULL) return -1; + op->is_token_consumed = true; return xrow_update_op_do_field_delete(op, &item->field); } @@ -301,12 +335,16 @@ int \ xrow_update_op_do_array_##op_type(struct xrow_update_op *op, \ struct xrow_update_field *field) \ { \ + if (xrow_update_op_prepare_num_token(op) != 0) \ + return -1; \ struct xrow_update_array_item *item = \ xrow_update_array_extract_item(field, op); \ if (item == NULL) \ return -1; \ - if (!xrow_update_op_is_term(op)) \ + if (!xrow_update_op_is_term(op)) { \ + op->is_token_consumed = true; \ return xrow_update_op_do_field_##op_type(op, &item->field); \ + } \ if (item->field.type != XUPDATE_NOP) \ return xrow_update_err_double(op); \ if (xrow_update_op_do_##op_type(op, item->field.data) != 0) \ diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c index de865a21d..460672b4c 100644 --- a/src/box/xrow_update_field.c +++ b/src/box/xrow_update_field.c @@ -611,6 +611,23 @@ xrow_update_op_by(char opcode) } } +int +xrow_update_op_next_token(struct xrow_update_op *op) +{ + struct json_token token; + int rc = json_lexer_next_token(&op->lexer, &token); + if (rc != 0) + return xrow_update_err_bad_json(op, rc); + if (token.type == JSON_TOKEN_END) + return xrow_update_err_no_such_field(op); + op->is_token_consumed = false; + op->token_type = token.type; + op->key = token.str; + op->key_len = token.len; + op->field_no = token.num; + return 0; +} + int xrow_update_op_decode(struct xrow_update_op *op, int index_base, struct tuple_dictionary *dict, const char **expr) @@ -639,6 +656,13 @@ xrow_update_op_decode(struct xrow_update_op *op, int index_base, diag_set(ClientError, ER_UNKNOWN_UPDATE_OP); return -1; } + /* + * First token is always num. Even if a user specified a + * field name it is converted to num by the tuple + * dictionary. + */ + op->token_type = JSON_TOKEN_NUM; + op->is_token_consumed = false; int32_t field_no = 0; switch(mp_typeof(**expr)) { case MP_INT: diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h index bda9222cc..0b0f608fe 100644 --- a/src/box/xrow_update_field.h +++ b/src/box/xrow_update_field.h @@ -179,8 +179,23 @@ struct xrow_update_op { const struct xrow_update_op_meta *meta; /** Operation arguments. */ union xrow_update_arg arg; - /** First level field no. */ - int32_t field_no; + /** Current level token. */ + enum json_token_type token_type; + /** + * The flag says whether the token is already consumed by + * the update operation during its forwarding down the + * update tree. When the flag is true, it means that the + * next node of the update tree will need to fetch a next + * token from the lexer. + */ + bool is_token_consumed; + union { + struct { + const char *key; + uint32_t key_len; + }; + int32_t field_no; + }; /** Size of a new field after it is updated. */ uint32_t new_field_len; /** Opcode symbol: = + - / ... */ @@ -193,6 +208,17 @@ struct xrow_update_op { struct json_lexer lexer; }; +/** + * Extract a next token from the operation path lexer. The result + * is used to decide to which child of a current map/array the + * operation should be forwarded. It is not just a synonym to + * json_lexer_next_token, because fills some fields of @a op, + * and should be used only to chose a next child inside a current + * map/array. + */ +int +xrow_update_op_next_token(struct xrow_update_op *op); + /** * Decode an update operation from MessagePack. * @param[out] op Update operation. ================================================================================