From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.mail.ru (smtp1.mail.ru [94.100.179.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dev.tarantool.org (Postfix) with ESMTPS id D888C46970E for ; Fri, 27 Dec 2019 17:59:28 +0300 (MSK) Date: Fri, 27 Dec 2019 17:59:28 +0300 From: Sergey Ostanevich Message-ID: <20191227145928.GC1222@tarantool.org> References: <679a1d9daedb82276388320f4078be590c1979df.1577140688.git.v.shpilevoy@tarantool.org> <20191226120751.GA603@tarantool.org> <344b1cd0-2b08-f297-5774-ceca3ba3c985@tarantool.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <344b1cd0-2b08-f297-5774-ceca3ba3c985@tarantool.org> Subject: Re: [Tarantool-patches] [PATCH 1/3] tuple: make update operation tokens consumable List-Id: Tarantool development patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladislav Shpilevoy Cc: tarantool-patches@dev.tarantool.org Hi! > So I added a flag to xrow_update_op. Thanks, this is something that I had in my mind. LGTM. Sergos. On 27 Dec 16:00, Vladislav Shpilevoy wrote: > Hi! Thanks for the review! > > On 26/12/2019 13:07, Sergey Ostanevich wrote: > > Hi! > > > > Thanks for the patch! > > Just one comment below, otherwise LGTM. > > > > Sergos > > > > On 23 Dec 23:41, Vladislav Shpilevoy wrote: > >> There is a case: [1][2][3][4] = 100. It is not a problem when it > >> is a single operation, not intersecting with anything. It is an > >> isolated update then, and works ok. But the next patch allows > >> several update operations have the same prefix, and the path > >> [1][2][3][4] can become a tree of updated arrays. For example, a > >> trivial tree like this: > >> > >> root: [ [1] ] > >> | > >> [ [1] [2] ] > >> | > >> [ [1] [2] [3] ] > >> | > >> [ [1] [2] [3] [4] ] > >> =100 > >> > >> When the update is applied to root, the JSON path [1][2][3][4] > >> is decoded one part by one. And the operation goes down the tree > >> until reaches the leaf, where [4] = 100 is applied. Each time when > >> the update goes one level down, somebody should update > >> xrow_update_op.field_no so as on the first level it would be 1, > >> then 2, 3, 4. > >> > >> Does it mean that each level of the update [1][2][3][4] should > >> prepare field_no for the next child? No, because then they would > >> need to check type of the child if it is an array or map, or > >> whatever expects a valid field_no/key in xrow_update_op, and > >> ensure that map-child gets a key, array-child gets a field_no. > >> That would complicate the code to a totally unreadable > >> state, and would break encapsulation between > >> xrow_update_array/map/bar... . Each array update operation would > >> check a child for all existing types to ensure that the next token > >> matches it. The same would happen to map updates. > >> > >> This patch goes another way - let each level of update check if > >> its field_no/key is already prepared by the caller. And if not, > >> extract a next token from the operation path. So the map update > >> will ensure that it has a valid key, an array update will ensure > >> that it has a valid field no. > >> > >> Part of #1261 > >> --- > >> src/box/xrow_update_array.c | 47 +++++++++++++++++++++++++++++++++++-- > >> src/box/xrow_update_field.c | 22 +++++++++++++++++ > >> src/box/xrow_update_field.h | 25 ++++++++++++++++++-- > >> 3 files changed, 90 insertions(+), 4 deletions(-) > >> > >> diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c > >> index 57427e39c..6a7ce09ff 100644 > >> --- a/src/box/xrow_update_array.c > >> +++ b/src/box/xrow_update_array.c > >> @@ -32,6 +32,31 @@ > >> #include "msgpuck.h" > >> #include "fiber.h" > >> > >> +/** > >> + * Make sure @a op contains a valid field number to where the > >> + * operation should be applied next. Field number may be not > >> + * known, if the array's parent didn't propagate operation's > >> + * lexer. In fact, the parent fills fieldno only in some rare > >> + * cases like branching. Generally, an array should care about > >> + * fieldno by itself. > >> + */ > >> +static inline int > >> +xrow_update_op_prepare_num_token(struct xrow_update_op *op) > >> +{ > >> + /* > >> + * Token type END is a special value meaning that the > >> + * current token needs to be parsed. > >> + */ > > > > I really don't like reuse of entities introduced for some purpose. If > > one will read this code, then it will be hard to understand why out of > > nowhere Lexer forced to come to an end at > > xrow_update_op_do_array_insert() for example. More that that - if lexer > > in a state of 'reached the end' it forced to parse. > > > > If we introduce some more functionality for JSON later - I expect we > > will face even more logic to hang on this 'special value'. Then > > contexts of these functionalities may intersect... oh my... > > > > Can we introduce a new type with some self-explanatory name instead and > > leave the JSON_TOKEN_END on its own? > > Agree. But I don't know how to justify introduction of a new token > type to the json library. I was thinking about something like > JSON_TOKEN_INVALID, but 1) json library reports an error via return > values, not via token types, 2) 'invalid' token assumes an error, > not an already parsed token, like I need in my patch. > > A value like JSON_TOKEN_CONSUMED/USED looks too specific for my case > to be added to the core lib. > > So I added a flag to xrow_update_op. See diff below, and the whole > patch in the end. > > ================================================================================ > > diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c > index 2eb98e8e9..e6b9ed89f 100644 > --- a/src/box/xrow_update_array.c > +++ b/src/box/xrow_update_array.c > @@ -43,13 +43,8 @@ > static inline int > xrow_update_op_prepare_num_token(struct xrow_update_op *op) > { > - /* > - * Token type END is a special value meaning that the > - * current token needs to be parsed. > - */ > - if (op->token_type == JSON_TOKEN_END && > - xrow_update_op_consume_token(op) != 0) > - return -1; > + if (op->is_token_consumed && xrow_update_op_next_token(op) != 0) > + return -1; > if (op->token_type != JSON_TOKEN_NUM) { > return xrow_update_err(op, "can't update an array by a "\ > "non-numeric index"); > @@ -64,7 +59,7 @@ xrow_update_op_prepare_num_token(struct xrow_update_op *op) > static inline int > xrow_update_op_adjust_field_no(struct xrow_update_op *op, int32_t field_count) > { > - assert(op->token_type == JSON_TOKEN_NUM); > + assert(op->token_type == JSON_TOKEN_NUM && !op->is_token_consumed); > if (op->field_no >= 0) { > if (op->field_no < field_count) > return 0; > @@ -254,7 +249,7 @@ xrow_update_op_do_array_insert(struct xrow_update_op *op, > item = xrow_update_array_extract_item(field, op); > if (item == NULL) > return -1; > - op->token_type = JSON_TOKEN_END; > + op->is_token_consumed = true; > return xrow_update_op_do_field_insert(op, &item->field); > } > > @@ -290,7 +285,7 @@ xrow_update_op_do_array_set(struct xrow_update_op *op, > if (item == NULL) > return -1; > if (!xrow_update_op_is_term(op)) { > - op->token_type = JSON_TOKEN_END; > + op->is_token_consumed = true; > return xrow_update_op_do_field_set(op, &item->field); > } > op->new_field_len = op->arg.set.length; > @@ -318,7 +313,7 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op, > xrow_update_array_extract_item(field, op); > if (item == NULL) > return -1; > - op->token_type = JSON_TOKEN_END; > + op->is_token_consumed = true; > return xrow_update_op_do_field_delete(op, &item->field); > } > > @@ -347,7 +342,7 @@ xrow_update_op_do_array_##op_type(struct xrow_update_op *op, \ > if (item == NULL) \ > return -1; \ > if (!xrow_update_op_is_term(op)) { \ > - op->token_type = JSON_TOKEN_END; \ > + op->is_token_consumed = true; \ > return xrow_update_op_do_field_##op_type(op, &item->field); \ > } \ > if (item->field.type != XUPDATE_NOP) \ > diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c > index 96fcaf747..460672b4c 100644 > --- a/src/box/xrow_update_field.c > +++ b/src/box/xrow_update_field.c > @@ -612,7 +612,7 @@ xrow_update_op_by(char opcode) > } > > int > -xrow_update_op_consume_token(struct xrow_update_op *op) > +xrow_update_op_next_token(struct xrow_update_op *op) > { > struct json_token token; > int rc = json_lexer_next_token(&op->lexer, &token); > @@ -620,6 +620,7 @@ xrow_update_op_consume_token(struct xrow_update_op *op) > return xrow_update_err_bad_json(op, rc); > if (token.type == JSON_TOKEN_END) > return xrow_update_err_no_such_field(op); > + op->is_token_consumed = false; > op->token_type = token.type; > op->key = token.str; > op->key_len = token.len; > @@ -661,6 +662,7 @@ xrow_update_op_decode(struct xrow_update_op *op, int index_base, > * dictionary. > */ > op->token_type = JSON_TOKEN_NUM; > + op->is_token_consumed = false; > int32_t field_no = 0; > switch(mp_typeof(**expr)) { > case MP_INT: > diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h > index fbaf45c5d..0b0f608fe 100644 > --- a/src/box/xrow_update_field.h > +++ b/src/box/xrow_update_field.h > @@ -179,11 +179,16 @@ struct xrow_update_op { > const struct xrow_update_op_meta *meta; > /** Operation arguments. */ > union xrow_update_arg arg; > + /** Current level token. */ > + enum json_token_type token_type; > /** > - * Current level token. END means that it is invalid and > - * a next token should be extracted from the lexer. > + * The flag says whether the token is already consumed by > + * the update operation during its forwarding down the > + * update tree. When the flag is true, it means that the > + * next node of the update tree will need to fetch a next > + * token from the lexer. > */ > - enum json_token_type token_type; > + bool is_token_consumed; > union { > struct { > const char *key; > @@ -212,7 +217,7 @@ struct xrow_update_op { > * map/array. > */ > int > -xrow_update_op_consume_token(struct xrow_update_op *op); > +xrow_update_op_next_token(struct xrow_update_op *op); > > /** > * Decode an update operation from MessagePack. > > ================================================================================ > > The whole patch: > > ================================================================================ > > tuple: make update operation tokens consumable > > There is a case: [1][2][3][4] = 100. It is not a problem when it > is a single operation, not intersecting with anything. It is an > isolated update then, and works ok. But the next patch allows > several update operations have the same prefix, and the path > [1][2][3][4] can become a tree of updated arrays. For example, a > trivial tree like this: > > root: [ [1] ] > | > [ [1] [2] ] > | > [ [1] [2] [3] ] > | > [ [1] [2] [3] [4] ] > =100 > > When the update is applied to root, the JSON path [1][2][3][4] > is decoded one part by one. And the operation goes down the tree > until reaches the leaf, where [4] = 100 is applied. Each time when > the update goes one level down, somebody should update > xrow_update_op.field_no so as on the first level it would be 1, > then 2, 3, 4. > > Does it mean that each level of the update [1][2][3][4] should > prepare field_no for the next child? No, because then they would > need to check type of the child if it is an array or map, or > whatever expects a valid field_no/key in xrow_update_op, and > ensure that map-child gets a key, array-child gets a field_no. > That would complicate the code to a totally unreadable > state, and would break encapsulation between > xrow_update_array/map/bar... . Each array update operation would > check a child for all existing types to ensure that the next token > matches it. The same would happen to map updates. > > This patch goes another way - let each level of update check if > its field_no/key is already prepared by the caller. And if not, > extract a next token from the operation path. So the map update > will ensure that it has a valid key, an array update will ensure > that it has a valid field no. > > Part of #1261 > > diff --git a/src/box/xrow_update_array.c b/src/box/xrow_update_array.c > index 57427e39c..e6b9ed89f 100644 > --- a/src/box/xrow_update_array.c > +++ b/src/box/xrow_update_array.c > @@ -32,6 +32,26 @@ > #include "msgpuck.h" > #include "fiber.h" > > +/** > + * Make sure @a op contains a valid field number to where the > + * operation should be applied next. Field number may be not > + * known, if the array's parent didn't propagate operation's > + * lexer. In fact, the parent fills fieldno only in some rare > + * cases like branching. Generally, an array should care about > + * fieldno by itself. > + */ > +static inline int > +xrow_update_op_prepare_num_token(struct xrow_update_op *op) > +{ > + if (op->is_token_consumed && xrow_update_op_next_token(op) != 0) > + return -1; > + if (op->token_type != JSON_TOKEN_NUM) { > + return xrow_update_err(op, "can't update an array by a "\ > + "non-numeric index"); > + } > + return 0; > +} > + > /** > * Make field index non-negative and check for the field > * existence. > @@ -39,6 +59,7 @@ > static inline int > xrow_update_op_adjust_field_no(struct xrow_update_op *op, int32_t field_count) > { > + assert(op->token_type == JSON_TOKEN_NUM && !op->is_token_consumed); > if (op->field_no >= 0) { > if (op->field_no < field_count) > return 0; > @@ -221,10 +242,14 @@ xrow_update_op_do_array_insert(struct xrow_update_op *op, > { > assert(field->type == XUPDATE_ARRAY); > struct xrow_update_array_item *item; > + if (xrow_update_op_prepare_num_token(op) != 0) > + return -1; > + > if (!xrow_update_op_is_term(op)) { > item = xrow_update_array_extract_item(field, op); > if (item == NULL) > return -1; > + op->is_token_consumed = true; > return xrow_update_op_do_field_insert(op, &item->field); > } > > @@ -248,6 +273,9 @@ xrow_update_op_do_array_set(struct xrow_update_op *op, > { > assert(field->type == XUPDATE_ARRAY); > struct xrow_update_rope *rope = field->array.rope; > + if (xrow_update_op_prepare_num_token(op) != 0) > + return -1; > + > /* Interpret '=' for n + 1 field as insert. */ > if (op->field_no == (int32_t) xrow_update_rope_size(rope)) > return xrow_update_op_do_array_insert(op, field); > @@ -256,8 +284,10 @@ xrow_update_op_do_array_set(struct xrow_update_op *op, > xrow_update_array_extract_item(field, op); > if (item == NULL) > return -1; > - if (!xrow_update_op_is_term(op)) > + if (!xrow_update_op_is_term(op)) { > + op->is_token_consumed = true; > return xrow_update_op_do_field_set(op, &item->field); > + } > op->new_field_len = op->arg.set.length; > /* > * Ignore the previous op, if any. It is not correct, > @@ -275,11 +305,15 @@ xrow_update_op_do_array_delete(struct xrow_update_op *op, > struct xrow_update_field *field) > { > assert(field->type == XUPDATE_ARRAY); > + if (xrow_update_op_prepare_num_token(op) != 0) > + return -1; > + > if (!xrow_update_op_is_term(op)) { > struct xrow_update_array_item *item = > xrow_update_array_extract_item(field, op); > if (item == NULL) > return -1; > + op->is_token_consumed = true; > return xrow_update_op_do_field_delete(op, &item->field); > } > > @@ -301,12 +335,16 @@ int \ > xrow_update_op_do_array_##op_type(struct xrow_update_op *op, \ > struct xrow_update_field *field) \ > { \ > + if (xrow_update_op_prepare_num_token(op) != 0) \ > + return -1; \ > struct xrow_update_array_item *item = \ > xrow_update_array_extract_item(field, op); \ > if (item == NULL) \ > return -1; \ > - if (!xrow_update_op_is_term(op)) \ > + if (!xrow_update_op_is_term(op)) { \ > + op->is_token_consumed = true; \ > return xrow_update_op_do_field_##op_type(op, &item->field); \ > + } \ > if (item->field.type != XUPDATE_NOP) \ > return xrow_update_err_double(op); \ > if (xrow_update_op_do_##op_type(op, item->field.data) != 0) \ > diff --git a/src/box/xrow_update_field.c b/src/box/xrow_update_field.c > index de865a21d..460672b4c 100644 > --- a/src/box/xrow_update_field.c > +++ b/src/box/xrow_update_field.c > @@ -611,6 +611,23 @@ xrow_update_op_by(char opcode) > } > } > > +int > +xrow_update_op_next_token(struct xrow_update_op *op) > +{ > + struct json_token token; > + int rc = json_lexer_next_token(&op->lexer, &token); > + if (rc != 0) > + return xrow_update_err_bad_json(op, rc); > + if (token.type == JSON_TOKEN_END) > + return xrow_update_err_no_such_field(op); > + op->is_token_consumed = false; > + op->token_type = token.type; > + op->key = token.str; > + op->key_len = token.len; > + op->field_no = token.num; > + return 0; > +} > + > int > xrow_update_op_decode(struct xrow_update_op *op, int index_base, > struct tuple_dictionary *dict, const char **expr) > @@ -639,6 +656,13 @@ xrow_update_op_decode(struct xrow_update_op *op, int index_base, > diag_set(ClientError, ER_UNKNOWN_UPDATE_OP); > return -1; > } > + /* > + * First token is always num. Even if a user specified a > + * field name it is converted to num by the tuple > + * dictionary. > + */ > + op->token_type = JSON_TOKEN_NUM; > + op->is_token_consumed = false; > int32_t field_no = 0; > switch(mp_typeof(**expr)) { > case MP_INT: > diff --git a/src/box/xrow_update_field.h b/src/box/xrow_update_field.h > index bda9222cc..0b0f608fe 100644 > --- a/src/box/xrow_update_field.h > +++ b/src/box/xrow_update_field.h > @@ -179,8 +179,23 @@ struct xrow_update_op { > const struct xrow_update_op_meta *meta; > /** Operation arguments. */ > union xrow_update_arg arg; > - /** First level field no. */ > - int32_t field_no; > + /** Current level token. */ > + enum json_token_type token_type; > + /** > + * The flag says whether the token is already consumed by > + * the update operation during its forwarding down the > + * update tree. When the flag is true, it means that the > + * next node of the update tree will need to fetch a next > + * token from the lexer. > + */ > + bool is_token_consumed; > + union { > + struct { > + const char *key; > + uint32_t key_len; > + }; > + int32_t field_no; > + }; > /** Size of a new field after it is updated. */ > uint32_t new_field_len; > /** Opcode symbol: = + - / ... */ > @@ -193,6 +208,17 @@ struct xrow_update_op { > struct json_lexer lexer; > }; > > +/** > + * Extract a next token from the operation path lexer. The result > + * is used to decide to which child of a current map/array the > + * operation should be forwarded. It is not just a synonym to > + * json_lexer_next_token, because fills some fields of @a op, > + * and should be used only to chose a next child inside a current > + * map/array. > + */ > +int > +xrow_update_op_next_token(struct xrow_update_op *op); > + > /** > * Decode an update operation from MessagePack. > * @param[out] op Update operation. > > ================================================================================