From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id D69D82B80C for ; Wed, 4 Apr 2018 07:30:50 -0400 (EDT) Received: from turing.freelists.org ([127.0.0.1]) by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fqPoMC9jxftm for ; Wed, 4 Apr 2018 07:30:50 -0400 (EDT) Received: from smtp50.i.mail.ru (smtp50.i.mail.ru [94.100.177.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTPS id 6A1C22B653 for ; Wed, 4 Apr 2018 07:30:50 -0400 (EDT) Subject: [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support ICU References: <01a0181a-19f9-c38c-94ea-65ab549a4517@tarantool.org> From: Vladislav Shpilevoy Message-ID: <0bd63589-d3c6-389d-aba4-b1d451f20021@tarantool.org> Date: Wed, 4 Apr 2018 14:30:47 +0300 MIME-Version: 1.0 In-Reply-To: <01a0181a-19f9-c38c-94ea-65ab549a4517@tarantool.org> Content-Type: text/plain; charset="utf-8"; format="flowed" Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: tarantool-patches-bounce@freelists.org Errors-to: tarantool-patches-bounce@freelists.org Reply-To: tarantool-patches@freelists.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: tarantool-patches List-subscribe: List-owner: List-post: List-archive: To: tarantool-patches@freelists.org, Kirill Shcherbatov Hello. See my 12 comments below. And seems, that you ignored my appeal to do not copy pase diff into the letter. Again: indentation here is completely broken - all tabs are turned into 2 spaces, it is unreadable. 04.04.2018 13:37, Kirill Shcherbatov пишет: > ICU Implementation > > From 8703e465382ba05817e7703550694c3790972e54 Mon Sep 17 00:00:00 2001 > Message-Id: > <8703e465382ba05817e7703550694c3790972e54.1522838002.git.kshcherbatov@tarantool.org> > In-Reply-To: > References: > From: Kirill Shcherbatov > Date: Wed, 4 Apr 2018 13:06:22 +0300 > Subject: [PATCH v2 3/3] ICU Unicode parsing implementation > > --- >  src/box/lua/tuple.c        |  31 +++++-- >  src/lib/json/path.c        | 211 > +++++++++++++++++++++++++++++++++------------ >  src/lib/json/path.h        |  30 ++++--- >  test/engine/tuple.result   |  43 +++++++-- >  test/engine/tuple.test.lua |  13 ++- >  test/unit/CMakeLists.txt   |   2 +- >  test/unit/json_path.c      |   4 +- >  7 files changed, 247 insertions(+), 87 deletions(-) > > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 99b9ff2..b89e1f9 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) >  static inline int >  tuple_field_go_to_index(const char **field, uint64_t index) >  { > -    assert(index >= 0); >      enum mp_type type = mp_typeof(**field); >      if (type == MP_ARRAY) { >          if (index == 0) > @@ -497,6 +496,12 @@ tuple_field_go_to_key(const char **field, const > char *key, int len) >  static int >  lbox_tuple_field_by_path(struct lua_State *L) >  { > +    int err_pos = 0; > +    struct json_path_parser parser; > +    /* Need uninitialized structure to > +     * json_path_parser_deinit on lua_isnumber */ > +    memset(&parser, 0, sizeof(parser)); > +    const char *path = NULL; 1. Please, rename init back to create. In Tarantool we do not use init name for objects. We use create name to initialize an object. And new to allocate + initialize. 2. Remove deinit method. If you need UConverter, you could create it in a global static variable in path.c and initialize it in main() call via json_path_open() call or something. Look other libraries for examples. It is too expensive to do ucnv_open/ucnv_close on each parser creation. When update by JSON path will be implemented, the parser will be being created and destroyed very often. In any case, it is the comment on future, because here you can delete UConverter using U8_NEXT. See http://icu-project.org/apiref/icu4c/utf8_8h.html#a6d4c94e845b059fddba0c51e9bad87fd. It 1) does not use external state like converter, 2) handle errors for you. And please, consider how to remove errors raising from this function. It was written by me, when old errors handling convention was accepted, and now it must be fixed. It would be very good, if you will do it in a separate commit before introducing unicode. > const char *field; >      struct tuple *tuple = luaT_istuple(L, 1); >      /* Is checked in Lua wrapper. */ > @@ -506,6 +511,18 @@ lbox_tuple_field_by_path(struct lua_State *L) >          index -= TUPLE_INDEX_BASE; >          if (index < 0) { >  not_found: 3. Please, find a way to simplify the code. I have tried to do it for you, and the code becomes already much simpler, if these gotos are moved into the end of the function. You can go off this hit of. > + if (!path) > +                goto exit_not_found; > +            uint32_t path_len = strlen(path); > +            uint32_t path_hash = lua_hash(path, path_len); > +            field = tuple_field_by_name(tuple, path, > +                                        path_len, path_hash); > +            if (field) > +                goto push_value; > +            if (err_pos || path_len == 0) > +                luaL_error(L, "Error in path on position %d", err_pos); > +exit_not_found: > +            json_path_parser_deinit(&parser); 4. Deinit will be completely removed, when you will use U8_NEXT. > diff --git a/src/lib/json/path.c b/src/lib/json/path.c > index 4a6174e..4aadb3a 100644 > --- a/src/lib/json/path.c > +++ b/src/lib/json/path.c > @@ -31,8 +31,11 @@ > >  #include "path.h" >  #include > +#include >  #include "trivia/util.h" > > +#define REPLACEMENT_CHARACTER (0xFFFD) 5. What is replacement character? > + >  /** Same as strtoull(), but with limited length. */ >  static inline uint64_t >  strntoull(const char *src, int len) { > @@ -45,6 +48,51 @@ strntoull(const char *src, int len) { >  } > >  /** > + * Parse string and update parser's state. > + * @param[out] parser JSON path parser. Upates pos, signs_read. > + * @param[out] UChar32 to store result. > + * > + * @retval 1 Success. > + * @retval 0 End of string. > + * @retval -1 Parse error. > + */ > +static inline int > +parser_read_sign(struct json_path_parser *parser, UChar32 *out) > +{ > +    int rc; > +    UErrorCode status = U_ZERO_ERROR; > +    if (parser->pos == parser->end) > +        return 0; > +    *out = ucnv_getNextUChar(parser->utf8conv, &parser->pos, > parser->end, &status); > +    parser->invalid_sign_off += (rc = U_SUCCESS(status)); > +    return rc ? 1 : -1; > +} > + > +/** > + * Parse string and update parser's state. 6. The comment seems to be not for this function. > + * @param[out] parser JSON path parser. Upates pos, signs_read. > + * @param old parser read offset. > + * @param signs to drop. > + */ > +static inline void > +parser_reset_pos(struct json_path_parser *parser, const char > *old_pos, int signs) > +{ > +    parser->pos = old_pos; > +    parser->invalid_sign_off -= signs; 7. You can do not restore old pos - if the string is invalid somewhere, it is completely invalid and will not be parsed after. > +} > + > +static inline bool > +string_valid_sign(UChar32 c) > +{ > +    int8_t type = u_charType(c); > +    return !(c == REPLACEMENT_CHARACTER || > +             type == U_UNASSIGNED || > +             type == U_LINE_SEPARATOR || > +             type == U_CONTROL_CHAR || > +             type == U_PARAGRAPH_SEPARATOR); > +} 8. If a string is invalid, you will see it in output status output variable in parser_read_sign. I do not see a sense to check for errors here again. > @@ -81,7 +131,7 @@ json_parse_string(struct json_path_parser *parser, > struct json_path_node *node) >  /** >   * Parse digit sequence into integer until non-digit is met. >   * Parser stops right after the last digit. > - * @param parser JSON parser. > + * @param[out] parser JSON parser. Updates signs_read field. 9. The parser is not [out] parameter here. A parameters is called [out], when it is its single mission. Here it is [in][out] parameter actually, but it can be omitted. > +static inline int > +error_sign_offset(struct json_path_parser *parser) > +{ > +    return parser->invalid_sign_off; > +} 10. I see no sense in this wrapper. And looks like you too, because somewhere you continue to use invalid_sign_off with no wrapper. And on a future: when you add a new method for a struct, please, name it using prefix. > > diff --git a/src/lib/json/path.h b/src/lib/json/path.h > index 6e8db4c..0ff68c4 100644 > --- a/src/lib/json/path.h > +++ b/src/lib/json/path.h > @@ -33,6 +33,9 @@ > >  #include >  #include > +#include 11. This header is unused here. > +#include 12. Since the UConverter is used inside path.c only, you can remove this header, and predeclare a converter using the code: struct UConverter; But in any case it will be deleted, when you starting to use U8_NEXT.