* [tarantool-patches] [PATCH v2 0/3] tuple field access via a json path @ 2018-03-29 14:22 Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 1/3] Introduce json_path_parser Kirill Shcherbatov ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-03-29 14:22 UTC (permalink / raw) To: tarantool-patches; +Cc: Kirill Shcherbatov Branch: http://github.com/tarantool/tarantool/tree/gh-1285-tuple-field-by-json Issue: https://github.com/tarantool/tarantool/issues/1285 Kirill Shcherbatov (1): Multibyte characters support Vladislav Shpilevoy (2): Introduce json_path_parser lua: implement json path access to tuple fields src/box/CMakeLists.txt | 2 +- src/box/lua/tuple.c | 175 ++++++++++++++++++++++++++++++++---- src/box/lua/tuple.lua | 45 +++------- src/lib/CMakeLists.txt | 1 + src/lib/json/CMakeLists.txt | 6 ++ src/lib/json/path.c | 203 +++++++++++++++++++++++++++++++++++++++++ src/lib/json/path.h | 111 +++++++++++++++++++++++ test/engine/tuple.result | 214 ++++++++++++++++++++++++++++++++++++++++++++ test/engine/tuple.test.lua | 63 +++++++++++++ test/unit/CMakeLists.txt | 3 + test/unit/json_path.c | 162 +++++++++++++++++++++++++++++++++ test/unit/json_path.result | 82 +++++++++++++++++ 12 files changed, 1015 insertions(+), 52 deletions(-) create mode 100644 src/lib/json/CMakeLists.txt create mode 100644 src/lib/json/path.c create mode 100644 src/lib/json/path.h create mode 100644 test/unit/json_path.c create mode 100644 test/unit/json_path.result -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v2 1/3] Introduce json_path_parser 2018-03-29 14:22 [tarantool-patches] [PATCH v2 0/3] tuple field access via a json path Kirill Shcherbatov @ 2018-03-29 14:22 ` Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 2/3] lua: implement json path access to tuple fields Kirill Shcherbatov ` (2 subsequent siblings) 3 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-03-29 14:22 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Needed for #1285 and for #1261 --- src/lib/CMakeLists.txt | 1 + src/lib/json/CMakeLists.txt | 6 ++ src/lib/json/path.c | 188 ++++++++++++++++++++++++++++++++++++++++++++ src/lib/json/path.h | 111 ++++++++++++++++++++++++++ test/unit/CMakeLists.txt | 3 + test/unit/json_path.c | 162 ++++++++++++++++++++++++++++++++++++++ test/unit/json_path.result | 82 +++++++++++++++++++ 7 files changed, 553 insertions(+) create mode 100644 src/lib/json/CMakeLists.txt create mode 100644 src/lib/json/path.c create mode 100644 src/lib/json/path.h create mode 100644 test/unit/json_path.c create mode 100644 test/unit/json_path.result diff --git a/src/lib/CMakeLists.txt b/src/lib/CMakeLists.txt index 0b274ca..98ff19b 100644 --- a/src/lib/CMakeLists.txt +++ b/src/lib/CMakeLists.txt @@ -4,6 +4,7 @@ set(SMALL_EMBEDDED ON) add_subdirectory(small) add_subdirectory(salad) add_subdirectory(csv) +add_subdirectory(json) if(ENABLE_BUNDLED_MSGPUCK) add_subdirectory(msgpuck EXCLUDE_FROM_ALL) endif() diff --git a/src/lib/json/CMakeLists.txt b/src/lib/json/CMakeLists.txt new file mode 100644 index 0000000..203fe6f --- /dev/null +++ b/src/lib/json/CMakeLists.txt @@ -0,0 +1,6 @@ +set(lib_sources + path.c +) + +set_source_files_compile_flags(${lib_sources}) +add_library(json_path STATIC ${lib_sources}) diff --git a/src/lib/json/path.c b/src/lib/json/path.c new file mode 100644 index 0000000..4a6174e --- /dev/null +++ b/src/lib/json/path.c @@ -0,0 +1,188 @@ +/* + * Copyright 2010-2016 Tarantool AUTHORS: please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include "path.h" +#include <ctype.h> +#include "trivia/util.h" + +/** Same as strtoull(), but with limited length. */ +static inline uint64_t +strntoull(const char *src, int len) { + uint64_t value = 0; + for (const char *end = src + len; src < end; ++src) { + assert(isdigit(*src)); + value = value * 10 + *src - (int)'0'; + } + return value; +} + +/** + * Parse string identifier in quotes. Parser either stops right + * after the closing quote, or returns an error position. + * @param parser JSON path parser. + * @param[out] node JSON node to store result. + * + * @retval 0 Success. + * @retval not 0 1-based position of a syntax error. + */ +static inline int +json_parse_string(struct json_path_parser *parser, struct json_path_node *node) +{ + const char *end = parser->src + parser->src_len; + const char *pos = parser->pos; + assert(pos < end); + char quote_type = *pos; + assert(quote_type == '\'' || quote_type == '"'); + /* Skip first quote. */ + int len = 0; + ++pos; + const char *str = pos; + for (char c = *pos; pos < end && quote_type != c; c = *++pos) + ++len; + /* A string must be terminated with quote. */ + if (*pos != quote_type || len == 0) + return pos - parser->src + 1; + /* Skip the closing quote. */ + parser->pos = pos + 1; + node->type = JSON_PATH_STR; + node->str = str; + node->len = len; + return 0; +} + +/** + * Parse digit sequence into integer until non-digit is met. + * Parser stops right after the last digit. + * @param parser JSON parser. + * @param[out] node JSON node to store result. + * + * @retval 0 Success. + * @retval not 0 1-based position of a syntax error. + */ +static inline int +json_parse_integer(struct json_path_parser *parser, struct json_path_node *node) +{ + const char *end = parser->src + parser->src_len; + const char *pos = parser->pos; + assert(pos < end); + const char *str = pos; + int len = 0; + for (char c = *pos; pos < end && isdigit(c); c = *++pos) + ++len; + if (len == 0) + return pos - parser->src + 1; + parser->pos = pos; + node->type = JSON_PATH_NUM; + node->num = strntoull(str, len); + return 0; +} + +/** + * Parse identifier out of quotes. It can contain only alphas, + * digits and underscores. And can not contain digit at the first + * position. Parser is stoped right after the last non-digit, + * non-alpha and non-underscore symbol. + * @param parser JSON parser. + * @param[out] node JSON node to store result. + * + * @retval 0 Success. + * @retval not 0 1-based position of a syntax error. + */ +static inline int +json_parse_identifier(struct json_path_parser *parser, + struct json_path_node *node) +{ + const char *end = parser->src + parser->src_len; + const char *pos = parser->pos; + assert(pos < end); + const char *str = pos; + char c = *pos; + /* First symbol can not be digit. */ + if (!isalpha(c) && c != '_') + return pos - parser->src + 1; + int len = 1; + for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); + c = *++pos) + ++len; + assert(len > 0); + parser->pos = pos; + node->type = JSON_PATH_STR; + node->str = str; + node->len = len; + return 0; +} + +int +json_path_next(struct json_path_parser *parser, struct json_path_node *node) +{ + const char *end = parser->src + parser->src_len; + if (end == parser->pos) { + node->type = JSON_PATH_END; + return 0; + } + char c = *parser->pos; + int rc; + switch(c) { + case '[': + ++parser->pos; + /* Error for []. */ + if (parser->pos == end) + return parser->pos - parser->src + 1; + c = *parser->pos; + if (c == '"' || c == '\'') + rc = json_parse_string(parser, node); + else + rc = json_parse_integer(parser, node); + if (rc != 0) + return rc; + /* + * Expression, started from [ must be finished + * with ] regardless of its type. + */ + if (parser->pos == end || *parser->pos != ']') + return parser->pos - parser->src + 1; + /* Skip ]. */ + ++parser->pos; + break; + case '.': + /* Skip dot. */ + ++parser->pos; + if (parser->pos == end) + return parser->pos - parser->src + 1; + FALLTHROUGH + default: + rc = json_parse_identifier(parser, node); + if (rc != 0) + return rc; + break; + } + return 0; +} diff --git a/src/lib/json/path.h b/src/lib/json/path.h new file mode 100644 index 0000000..6e8db4c --- /dev/null +++ b/src/lib/json/path.h @@ -0,0 +1,111 @@ +#ifndef TARANTOOL_JSON_PATH_H_INCLUDED +#define TARANTOOL_JSON_PATH_H_INCLUDED +/* + * Copyright 2010-2016 Tarantool AUTHORS: please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <stdbool.h> +#include <stdint.h> + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Parser for JSON paths: + * <field>, <.field>, <[123]>, <['field']> and their combinations. + */ +struct json_path_parser { + /** Source string. */ + const char *src; + /** Length of src. */ + int src_len; + /** Current parser's position. */ + const char *pos; +}; + +enum json_path_type { + JSON_PATH_NUM, + JSON_PATH_STR, + /** Parser reached end of path. */ + JSON_PATH_END, +}; + +/** + * Element of a JSON path. It can be either string or number. + * String idenfiers are in ["..."] and between dots. Numbers are + * indexes in [...]. + */ +struct json_path_node { + enum json_path_type type; + union { + struct { + /** String identifier. */ + const char *str; + /** Length of @a str. */ + int len; + }; + /** Index value. */ + uint64_t num; + }; +}; + +/** + * Create @a parser. + * @param[out] parser Parser to create. + * @param src Source string. + * @param src_len Length of @a src. + */ +static inline void +json_path_parser_create(struct json_path_parser *parser, const char *src, + int src_len) +{ + parser->src = src; + parser->src_len = src_len; + parser->pos = src; +} + +/** + * Get a next path node. + * @param parser Parser. + * @param[out] node Node to store parsed result. + * @retval 0 Success. For result see @a node.str, node.len, + * node.num. + * @retval > 0 Position of a syntax error. A position is 1-based + * and starts from a beginning of a source string. + */ +int +json_path_next(struct json_path_parser *parser, struct json_path_node *node); + +#ifdef __cplusplus +} +#endif + +#endif /* TARANTOOL_JSON_PATH_H_INCLUDED */ diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt index 943788b..fe8b2d2 100644 --- a/test/unit/CMakeLists.txt +++ b/test/unit/CMakeLists.txt @@ -129,6 +129,9 @@ add_executable(reflection_cxx.test reflection_cxx.cc unit.c add_executable(csv.test csv.c) target_link_libraries(csv.test csv) +add_executable(json_path.test json_path.c) +target_link_libraries(json_path.test json_path unit) + add_executable(rmean.test rmean.cc) target_link_libraries(rmean.test stat unit) add_executable(histogram.test histogram.c) diff --git a/test/unit/json_path.c b/test/unit/json_path.c new file mode 100644 index 0000000..599658b --- /dev/null +++ b/test/unit/json_path.c @@ -0,0 +1,162 @@ +#include "json/path.h" +#include "unit.h" +#include "trivia/util.h" +#include <string.h> + +#define reset_to_new_path(value) \ + path = value; \ + len = strlen(value); \ + json_path_parser_create(&parser, path, len); + +#define is_next_index(value_len, value) \ + path = parser.pos; \ + is(json_path_next(&parser, &node), 0, "parse <%." #value_len "s>", \ + path); \ + is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", path); \ + is(node.num, value, "<%." #value_len "s> is " #value, path); + +#define is_next_key(value) \ + len = strlen(value); \ + is(json_path_next(&parser, &node), 0, "parse <" value ">"); \ + is(node.type, JSON_PATH_STR, "<" value "> is str"); \ + is(node.len, len, "len is %d", len); \ + is(strncmp(node.str, value, len), 0, "str is " value); + +void +test_basic() +{ + header(); + plan(53); + const char *path; + int len; + struct json_path_parser parser; + struct json_path_node node; + + reset_to_new_path("[0].field1.field2['field3'][5]"); + is_next_index(3, 0); + is_next_key("field1"); + is_next_key("field2"); + is_next_key("field3"); + is_next_index(3, 5); + + reset_to_new_path("[3].field[2].field") + is_next_index(3, 3); + is_next_key("field"); + is_next_index(3, 2); + is_next_key("field"); + + reset_to_new_path("[\"f1\"][\"f2'3'\"]"); + is_next_key("f1"); + is_next_key("f2'3'"); + + /* Support both '.field1...' and 'field1...'. */ + reset_to_new_path(".field1"); + is_next_key("field1"); + + /* Long number. */ + reset_to_new_path("[1234]"); + is_next_index(6, 1234); + + /* Empty path. */ + reset_to_new_path(""); + is(json_path_next(&parser, &node), 0, "parse empty path"); + is(node.type, JSON_PATH_END, "is str"); + + /* Path with no '.' at the beginning. */ + reset_to_new_path("field1.field2"); + is_next_key("field1"); + + check_plan(); + footer(); +} + +#define check_new_path_on_error(value, errpos) \ + reset_to_new_path(value); \ + struct json_path_node node; \ + is(json_path_next(&parser, &node), errpos, "error on position %d" \ + " for <%s>", errpos, path); + +struct path_and_errpos { + const char *path; + int errpos; +}; + +void +test_errors() +{ + header(); + plan(18); + const char *path; + int len; + struct json_path_parser parser; + const struct path_and_errpos errors[] = { + /* Double [[. */ + {"[[", 2}, + /* Not string inside []. */ + {"[field]", 2}, + /* String outside of []. */ + {"'field1'.field2", 1}, + /* Empty brackets. */ + {"[]", 2}, + /* Empty string. */ + {"''", 1}, + /* Spaces between identifiers. */ + {" field1", 1}, + /* Start from digit. */ + {"1field", 1}, + {".1field", 2}, + /* Unfinished identifiers. */ + {"['field", 8}, + {"['field'", 9}, + {"[123", 5}, + {"['']", 3}, + /* + * Not trivial error: can not write + * '[]' after '.'. + */ + {".[123]", 2}, + /* Misc. */ + {"[.]", 2}, + }; + for (size_t i = 0; i < lengthof(errors); ++i) { + reset_to_new_path(errors[i].path); + int errpos = errors[i].errpos; + struct json_path_node node; + is(json_path_next(&parser, &node), errpos, + "error on position %d for <%s>", errpos, path); + } + + reset_to_new_path("f.[2]") + struct json_path_node node; + json_path_next(&parser, &node); + is(json_path_next(&parser, &node), 3, "can not write <field.[index]>") + + reset_to_new_path("f.") + json_path_next(&parser, &node); + is(json_path_next(&parser, &node), 3, "error in leading <.>"); + + reset_to_new_path("fiel d1") + json_path_next(&parser, &node); + is(json_path_next(&parser, &node), 5, "space inside identifier"); + + reset_to_new_path("field\t1") + json_path_next(&parser, &node); + is(json_path_next(&parser, &node), 6, "tab inside identifier"); + + check_plan(); + footer(); +} + +int +main() +{ + header(); + plan(2); + + test_basic(); + test_errors(); + + int rc = check_plan(); + footer(); + return rc; +} diff --git a/test/unit/json_path.result b/test/unit/json_path.result new file mode 100644 index 0000000..6d28113 --- /dev/null +++ b/test/unit/json_path.result @@ -0,0 +1,82 @@ + *** main *** +1..2 + *** test_basic *** + 1..53 + ok 1 - parse <[0]> + ok 2 - <[0]> is num + ok 3 - <[0]> is 0 + ok 4 - parse <field1> + ok 5 - <field1> is str + ok 6 - len is 6 + ok 7 - str is field1 + ok 8 - parse <field2> + ok 9 - <field2> is str + ok 10 - len is 6 + ok 11 - str is field2 + ok 12 - parse <field3> + ok 13 - <field3> is str + ok 14 - len is 6 + ok 15 - str is field3 + ok 16 - parse <[5]> + ok 17 - <[5]> is num + ok 18 - <[5]> is 5 + ok 19 - parse <[3]> + ok 20 - <[3]> is num + ok 21 - <[3]> is 3 + ok 22 - parse <field> + ok 23 - <field> is str + ok 24 - len is 5 + ok 25 - str is field + ok 26 - parse <[2]> + ok 27 - <[2]> is num + ok 28 - <[2]> is 2 + ok 29 - parse <field> + ok 30 - <field> is str + ok 31 - len is 5 + ok 32 - str is field + ok 33 - parse <f1> + ok 34 - <f1> is str + ok 35 - len is 2 + ok 36 - str is f1 + ok 37 - parse <f2'3'> + ok 38 - <f2'3'> is str + ok 39 - len is 5 + ok 40 - str is f2'3' + ok 41 - parse <field1> + ok 42 - <field1> is str + ok 43 - len is 6 + ok 44 - str is field1 + ok 45 - parse <[1234]> + ok 46 - <[1234]> is num + ok 47 - <[1234]> is 1234 + ok 48 - parse empty path + ok 49 - is str + ok 50 - parse <field1> + ok 51 - <field1> is str + ok 52 - len is 6 + ok 53 - str is field1 +ok 1 - subtests + *** test_basic: done *** + *** test_errors *** + 1..18 + ok 1 - error on position 2 for <[[> + ok 2 - error on position 2 for <[field]> + ok 3 - error on position 1 for <'field1'.field2> + ok 4 - error on position 2 for <[]> + ok 5 - error on position 1 for <''> + ok 6 - error on position 1 for < field1> + ok 7 - error on position 1 for <1field> + ok 8 - error on position 2 for <.1field> + ok 9 - error on position 8 for <['field> + ok 10 - error on position 9 for <['field'> + ok 11 - error on position 5 for <[123> + ok 12 - error on position 3 for <['']> + ok 13 - error on position 2 for <.[123]> + ok 14 - error on position 2 for <[.]> + ok 15 - can not write <field.[index]> + ok 16 - error in leading <.> + ok 17 - space inside identifier + ok 18 - tab inside identifier +ok 2 - subtests + *** test_errors: done *** + *** main: done *** -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v2 2/3] lua: implement json path access to tuple fields 2018-03-29 14:22 [tarantool-patches] [PATCH v2 0/3] tuple field access via a json path Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 1/3] Introduce json_path_parser Kirill Shcherbatov @ 2018-03-29 14:22 ` Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support Kirill Shcherbatov 2018-04-04 10:37 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support ICU Kirill Shcherbatov 3 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-03-29 14:22 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladislav Shpilevoy From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> In progress ... Closes #1285 --- src/box/CMakeLists.txt | 2 +- src/box/lua/tuple.c | 176 +++++++++++++++++++++++++++++++++++----- src/box/lua/tuple.lua | 45 +++-------- test/engine/tuple.result | 198 +++++++++++++++++++++++++++++++++++++++++++++ test/engine/tuple.test.lua | 59 ++++++++++++++ 5 files changed, 428 insertions(+), 52 deletions(-) diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt index e420fe3..add0ff9 100644 --- a/src/box/CMakeLists.txt +++ b/src/box/CMakeLists.txt @@ -130,5 +130,5 @@ add_library(box STATIC ${bin_sources}) target_link_libraries(box box_error tuple stat xrow xlog vclock crc32 scramble - ${common_libraries}) + json_path ${common_libraries}) add_dependencies(box build_bundled_libs) diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 7ca4299..99b9ff2 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -41,6 +41,7 @@ #include "box/tuple_convert.h" #include "box/errcode.h" #include "box/memtx_tuple.h" +#include "json/path.h" /** {{{ box.tuple Lua library * @@ -402,36 +403,175 @@ lbox_tuple_transform(struct lua_State *L) } /** - * Find a tuple field using its name. + * Propagate @a field to MessagePack(field)[index]. + * @param[in][out] field Field to propagate. + * @param index 1-based index to propagate to. + * + * @retval 0 Success, the index was found. + * @retval -1 Not found. + */ +static inline int +tuple_field_go_to_index(const char **field, uint64_t index) +{ + assert(index >= 0); + enum mp_type type = mp_typeof(**field); + if (type == MP_ARRAY) { + if (index == 0) + return -1; + /* Make index 0-based. */ + index -= TUPLE_INDEX_BASE; + uint32_t count = mp_decode_array(field); + if (index >= count) + return -1; + for (; index > 0; --index) + mp_next(field); + return 0; + } else if (type == MP_MAP) { + uint64_t count = mp_decode_map(field); + for (; count > 0; --count) { + type = mp_typeof(**field); + if (type == MP_UINT) { + uint64_t value = mp_decode_uint(field); + if (value == index) + return 0; + } else if (type == MP_INT) { + int64_t value = mp_decode_int(field); + if (value >= 0 && (uint64_t)value == index) + return 0; + } else { + /* Skip key. */ + mp_next(field); + } + /* Skip value. */ + mp_next(field); + } + } + return -1; +} + +/** + * Propagate @a field to MessagePack(field)[key]. + * @param[in][out] field Field to propagate. + * @param key Key to propagate to. + * @param len Length of @a key. + * + * @retval 0 Success, the index was found. + * @retval -1 Not found. + */ +static inline int +tuple_field_go_to_key(const char **field, const char *key, int len) +{ + enum mp_type type = mp_typeof(**field); + if (type != MP_MAP) + return -1; + uint64_t count = mp_decode_map(field); + for (; count > 0; --count) { + type = mp_typeof(**field); + if (type == MP_STR) { + uint32_t value_len; + const char *value = mp_decode_str(field, &value_len); + if (value_len == (uint)len && + memcmp(value, key, len) == 0) + return 0; + } else { + /* Skip key. */ + mp_next(field); + } + /* Skip value. */ + mp_next(field); + } + return -1; +} + +/** + * Find a tuple field by JSON path. * @param L Lua state. - * @param tuple 1-th argument on lua stack, tuple to get field + * @param tuple 1-th argument on a lua stack, tuple to get field * from. - * @param field_name 2-th argument on lua stack, field name to - * get. + * @param path 2-th argument on lua stack. Can be field name, + * JSON path to a field or a field number. * * @retval If a field was not found, return -1 and nil to lua else * return 0 and decoded field. */ static int -lbox_tuple_field_by_name(struct lua_State *L) +lbox_tuple_field_by_path(struct lua_State *L) { + const char *field; struct tuple *tuple = luaT_istuple(L, 1); /* Is checked in Lua wrapper. */ assert(tuple != NULL); - assert(lua_isstring(L, 2)); - size_t name_len; - const char *name = lua_tolstring(L, 2, &name_len); - uint32_t name_hash = lua_hashstring(L, 2); - const char *field = - tuple_field_by_name(tuple, name, name_len, name_hash); - if (field == NULL) { - lua_pushinteger(L, -1); - lua_pushnil(L); + if (lua_isnumber(L, 2)) { + int index = lua_tointeger(L, 2); + index -= TUPLE_INDEX_BASE; + if (index < 0) { +not_found: + lua_pushinteger(L, -1); + lua_pushnil(L); + return 2; + } + field = tuple_field(tuple, index); + if (field == NULL) + goto not_found; +push_value: + lua_pushinteger(L, 0); + luamp_decode(L, luaL_msgpack_default, &field); return 2; } - lua_pushinteger(L, 0); - luamp_decode(L, luaL_msgpack_default, &field); - return 2; + assert(lua_isstring(L, 2)); + size_t path_len; + const char *path = lua_tolstring(L, 2, &path_len); + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, path, path_len); + int rc = json_path_next(&parser, &node); + if (rc != 0 || node.type == JSON_PATH_END) + luaL_error(L, "Error in path on position %d", rc); + if (node.type == JSON_PATH_NUM) { + int index = node.num; + if (index == 0) + goto not_found; + index -= TUPLE_INDEX_BASE; + field = tuple_field(tuple, index); + if (field == NULL) + goto not_found; + } else { + assert(node.type == JSON_PATH_STR); + /* First part of a path is a field name. */ + const char *name = node.str; + uint32_t name_len = node.len; + uint32_t name_hash; + if (path_len == name_len) { + name_hash = lua_hashstring(L, 2); + } else { + /* + * If a string is "field....", then its + * precalculated juajit hash can not be + * used. A tuple dictionary hashes only + * name, not path. + */ + name_hash = lua_hash(name, name_len); + } + field = tuple_field_by_name(tuple, name, name_len, name_hash); + if (field == NULL) + goto not_found; + } + while ((rc = json_path_next(&parser, &node)) == 0 && + node.type != JSON_PATH_END) { + if (node.type == JSON_PATH_NUM) { + rc = tuple_field_go_to_index(&field, node.num); + } else { + assert(node.type == JSON_PATH_STR); + rc = tuple_field_go_to_key(&field, node.str, node.len); + } + if (rc != 0) + goto not_found; + } + if (rc == 0) + goto push_value; + luaL_error(L, "Error in path on position %d", rc); + unreachable(); + goto not_found; } static int @@ -470,8 +610,8 @@ static const struct luaL_Reg lbox_tuple_meta[] = { {"tostring", lbox_tuple_to_string}, {"slice", lbox_tuple_slice}, {"transform", lbox_tuple_transform}, - {"tuple_field_by_name", lbox_tuple_field_by_name}, {"tuple_to_map", lbox_tuple_to_map}, + {"tuple_field_by_path", lbox_tuple_field_by_path}, {NULL, NULL} }; diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua index 001971a..b51b4df 100644 --- a/src/box/lua/tuple.lua +++ b/src/box/lua/tuple.lua @@ -9,16 +9,9 @@ local internal = require('box.internal') ffi.cdef[[ /** \cond public */ -typedef struct tuple_format box_tuple_format_t; - -box_tuple_format_t * -box_tuple_format_default(void); typedef struct tuple box_tuple_t; -box_tuple_t * -box_tuple_new(box_tuple_format_t *format, const char *data, const char *end); - int box_tuple_ref(box_tuple_t *tuple); @@ -34,9 +27,6 @@ box_tuple_bsize(const box_tuple_t *tuple); ssize_t box_tuple_to_buf(const box_tuple_t *tuple, char *buf, size_t size); -box_tuple_format_t * -box_tuple_format(const box_tuple_t *tuple); - const char * box_tuple_field(const box_tuple_t *tuple, uint32_t i); @@ -278,9 +268,9 @@ end msgpackffi.on_encode(const_tuple_ref_t, tuple_to_msgpack) -local function tuple_field_by_name(tuple, name) +local function tuple_field_by_path(tuple, path) tuple_check(tuple, "tuple['field_name']"); - return internal.tuple.tuple_field_by_name(tuple, name) + return internal.tuple.tuple_field_by_path(tuple, path) end local methods = { @@ -306,33 +296,22 @@ end methods["__serialize"] = tuple_totable -- encode hook for msgpack/yaml/json -local tuple_field = function(tuple, field_n) - local field = builtin.box_tuple_field(tuple, field_n - 1) - if field == nil then - return nil - end - -- Use () to shrink stack to the first return value - return (msgpackffi.decode_unchecked(field)) -end - - ffi.metatype(tuple_t, { __len = function(tuple) return builtin.box_tuple_field_count(tuple) end; __tostring = internal.tuple.tostring; __index = function(tuple, key) - if type(key) == "number" then - return tuple_field(tuple, key) - elseif type(key) == "string" then - -- Try to get a field with a name = key. If it was not - -- found (rc ~= 0) then return a method from the - -- vtable. If a collision occurred, then fields have - -- higher priority. For example, if a tuple T has a - -- field with name 'bsize', then T.bsize returns field - -- value, not tuple_bsize function. To access hidden - -- methods use 'box.tuple.<method_name>(T, [args...])'. - local rc, field = tuple_field_by_name(tuple, key) + if type(key) == "string" or type(key) == "number" then + -- Try to get a field by json path or by [index]. If + -- it was not found (rc ~= 0) then return a method + -- from the vtable. If a collision occurred, then + -- fields have higher priority. For example, if a + -- tuple T has a field with name 'bsize', then T.bsize + -- returns field value, not tuple_bsize function. To + -- access hidden methods use + -- 'box.tuple.<method_name>(T, [args...])'. + local rc, field = tuple_field_by_path(tuple, key) if rc == 0 then return field end diff --git a/test/engine/tuple.result b/test/engine/tuple.result index b3b23b2..2d7367a 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -590,6 +590,204 @@ maplen(t1map), t1map[1], t1map[2], t1map[3] s:drop() --- ... +format = {} +--- +... +format[1] = {name = 'field1', type = 'unsigned'} +--- +... +format[2] = {name = 'field2', type = 'array'} +--- +... +format[3] = {name = 'field3', type = 'map'} +--- +... +format[4] = {name = 'field4', type = 'string'} +--- +... +s = box.schema.space.create('test', {format = format}) +--- +... +pk = s:create_index('pk') +--- +... +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +--- +... +field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} +--- +... +t = s:replace{1, field2, field3, "123456"} +--- +... +t[1] +--- +- 1 +... +t[2] +--- +- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] +... +t[3] +--- +- {'k1': 100, 'k3': [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}], -1: 200, 10: 100, 'k2': [ + 1, 2, 3]} +... +t[4] +--- +- '123456' +... +t[2][1] +--- +- 1 +... +t["[2][1]"] +--- +- 1 +... +t[2][5] +--- +- [5, 6, 7] +... +t["[2][5]"] +--- +- [5, 6, 7] +... +t["[2][5][1]"] +--- +- 5 +... +t["[2][5][2]"] +--- +- 6 +... +t["[2][5][3]"] +--- +- 7 +... +t["[2][6].key"] +--- +- key1 +... +t["[2][6].value"] +--- +- value1 +... +t["[2][6]['key']"] +--- +- key1 +... +t["[2][6]['value']"] +--- +- value1 +... +t["[3].k3[2].c"] +--- +- 3 +... +t["[4]"] +--- +- '123456' +... +t.field1 +--- +- 1 +... +t.field2[5] +--- +- [5, 6, 7] +... +t[".field1"] +--- +- 1 +... +t["field1"] +--- +- 1 +... +t["[3][10]"] +--- +- 100 +... +-- Not found. +t[0] +--- +- null +... +t["[0]"] +--- +- null +... +t["[1000]"] +--- +- null +... +t.field1000 +--- +- null +... +t["not_found"] +--- +- null +... +t["[2][5][10]"] +--- +- null +... +t["[2][6].key100"] +--- +- null +... +t["[2][0]"] -- 0-based index in array. +--- +- null +... +t["[4][3]"] -- Can not index string. +--- +- null +... +t["[4]['key']"] +--- +- null +... +-- Not found 'a'. Return 'null' despite of syntax error on a +-- next position. +t["a.b.c d.e.f"] +--- +- null +... +-- Sytax errors. +t[""] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 0' +... +t["[2].[5]"] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 5' +... +t["[-1]"] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 2' +... +t[".."] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 2' +... +t["[["] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 2' +... +t["]]"] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 1' +... +t["{"] +--- +- error: 'builtin/box/tuple.lua:314: Error in path on position 1' +... +s:drop() +--- +... engine = nil --- ... diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index 6d7d254..ba3482d 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -200,5 +200,64 @@ t1map = t1:tomap() maplen(t1map), t1map[1], t1map[2], t1map[3] s:drop() +format = {} +format[1] = {name = 'field1', type = 'unsigned'} +format[2] = {name = 'field2', type = 'array'} +format[3] = {name = 'field3', type = 'map'} +format[4] = {name = 'field4', type = 'string'} +s = box.schema.space.create('test', {format = format}) +pk = s:create_index('pk') +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} +t = s:replace{1, field2, field3, "123456"} +t[1] +t[2] +t[3] +t[4] +t[2][1] +t["[2][1]"] +t[2][5] +t["[2][5]"] +t["[2][5][1]"] +t["[2][5][2]"] +t["[2][5][3]"] +t["[2][6].key"] +t["[2][6].value"] +t["[2][6]['key']"] +t["[2][6]['value']"] +t["[3].k3[2].c"] +t["[4]"] +t.field1 +t.field2[5] +t[".field1"] +t["field1"] +t["[3][10]"] + +-- Not found. +t[0] +t["[0]"] +t["[1000]"] +t.field1000 +t["not_found"] +t["[2][5][10]"] +t["[2][6].key100"] +t["[2][0]"] -- 0-based index in array. +t["[4][3]"] -- Can not index string. +t["[4]['key']"] +-- Not found 'a'. Return 'null' despite of syntax error on a +-- next position. +t["a.b.c d.e.f"] + +-- Sytax errors. +t[""] +t["[2].[5]"] +t["[-1]"] +t[".."] +t["[["] +t["]]"] +t["{"] + +s:drop() + engine = nil test_run = nil -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v2 3/3] Multibyte characters support 2018-03-29 14:22 [tarantool-patches] [PATCH v2 0/3] tuple field access via a json path Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 1/3] Introduce json_path_parser Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 2/3] lua: implement json path access to tuple fields Kirill Shcherbatov @ 2018-03-29 14:22 ` Kirill Shcherbatov 2018-03-29 18:04 ` [tarantool-patches] " Kirill Shcherbatov 2018-04-04 10:37 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support ICU Kirill Shcherbatov 3 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-03-29 14:22 UTC (permalink / raw) To: tarantool-patches; +Cc: Kirill Shcherbatov --- src/box/lua/tuple.c | 1 - src/lib/json/path.c | 19 +++++++++++++++++-- test/engine/tuple.result | 20 ++++++++++++++++++-- test/engine/tuple.test.lua | 6 +++++- 4 files changed, 40 insertions(+), 6 deletions(-) diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 99b9ff2..c3a435b 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) static inline int tuple_field_go_to_index(const char **field, uint64_t index) { - assert(index >= 0); enum mp_type type = mp_typeof(**field); if (type == MP_ARRAY) { if (index == 0) diff --git a/src/lib/json/path.c b/src/lib/json/path.c index 4a6174e..3e1bb80 100644 --- a/src/lib/json/path.c +++ b/src/lib/json/path.c @@ -31,6 +31,8 @@ #include "path.h" #include <ctype.h> +#include <wchar.h> +#include <wctype.h> #include "trivia/util.h" /** Same as strtoull(), but with limited length. */ @@ -44,6 +46,19 @@ strntoull(const char *src, int len) { return value; } +static inline int +ismbaswcalpha(const char *str, size_t str_len_max) +{ + assert(str_len_max < 1024); + wchar_t buff[1024]; + mbstate_t ps; + memset(&ps, 0, sizeof(ps)); + str_len_max = mbrlen(str, str_len_max, &ps); + memset(&ps, 0, sizeof(ps)); + mbsrtowcs(buff, &str, str_len_max, &ps); + return iswalpha((wint_t)buff[0]); +} + /** * Parse string identifier in quotes. Parser either stops right * after the closing quote, or returns an error position. @@ -126,10 +141,10 @@ json_parse_identifier(struct json_path_parser *parser, const char *str = pos; char c = *pos; /* First symbol can not be digit. */ - if (!isalpha(c) && c != '_') + if (!ismbaswcalpha(pos, end - pos) && c != '_') return pos - parser->src + 1; int len = 1; - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); + for (c = *++pos; pos < end && (ismbaswcalpha(pos, end - pos) || c == '_' || isdigit(c)); c = *++pos) ++len; assert(len > 0); diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 2d7367a..d6eb4fa 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -611,7 +611,7 @@ s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') --- ... -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} --- ... field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} @@ -626,7 +626,7 @@ t[1] ... t[2] --- -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] +- [1, 2, 3, '4', [5, 6, 7], {'hello中国world': {'中国': 'test'}, 'key': 'key1', 'value': 'value1'}] ... t[3] --- @@ -665,6 +665,10 @@ t["[2][5][3]"] --- - 7 ... +t["[2][5][3]['hello中国world'].中"] +--- +- null +... t["[2][6].key"] --- - key1 @@ -673,6 +677,10 @@ t["[2][6].value"] --- - value1 ... +t["[2][6].hello中国world"] +--- +- {'中国': 'test'} +... t["[2][6]['key']"] --- - key1 @@ -681,10 +689,18 @@ t["[2][6]['value']"] --- - value1 ... +t["[2][6]['hello中国world']"] +--- +- {'中国': 'test'} +... t["[3].k3[2].c"] --- - 3 ... +t["[2][6]['hello中国world'].中国"] +--- +- test +... t["[4]"] --- - '123456' diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index ba3482d..5a3bcfa 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -207,7 +207,7 @@ format[3] = {name = 'field3', type = 'map'} format[4] = {name = 'field4', type = 'string'} s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} t = s:replace{1, field2, field3, "123456"} t[1] @@ -221,11 +221,15 @@ t["[2][5]"] t["[2][5][1]"] t["[2][5][2]"] t["[2][5][3]"] +t["[2][5][3]['hello中国world'].中"] t["[2][6].key"] t["[2][6].value"] +t["[2][6].hello中国world"] t["[2][6]['key']"] t["[2][6]['value']"] +t["[2][6]['hello中国world']"] t["[3].k3[2].c"] +t["[2][6]['hello中国world'].中国"] t["[4]"] t.field1 t.field2[5] -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support Kirill Shcherbatov @ 2018-03-29 18:04 ` Kirill Shcherbatov 2018-03-30 10:24 ` v.shpilevoy 0 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-03-29 18:04 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy From eb16ee6ea2d2ca5801b6b09c403c6e2f23984bd9 Mon Sep 17 00:00:00 2001 From: Kirill Shcherbatov <kshcherbatov@tarantool.org> Date: Thu, 29 Mar 2018 21:03:21 +0300 Subject: [PATCH] Multibyte characters support --- src/box/lua/tuple.c | 1 - src/lib/json/path.c | 35 ++++++++++++++++++++++++++++++----- src/lib/json/path.h | 2 ++ test/engine/tuple.result | 16 ++++++++++++++-- test/engine/tuple.test.lua | 5 ++++- 5 files changed, 50 insertions(+), 9 deletions(-) diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 99b9ff2..c3a435b 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) static inline int tuple_field_go_to_index(const char **field, uint64_t index) { - assert(index >= 0); enum mp_type type = mp_typeof(**field); if (type == MP_ARRAY) { if (index == 0) diff --git a/src/lib/json/path.c b/src/lib/json/path.c index 4a6174e..1234f6b 100644 --- a/src/lib/json/path.c +++ b/src/lib/json/path.c @@ -31,6 +31,8 @@ #include "path.h" #include <ctype.h> +#include <wchar.h> +#include <wctype.h> #include "trivia/util.h" /** Same as strtoull(), but with limited length. */ @@ -44,6 +46,26 @@ strntoull(const char *src, int len) { return value; } +static inline size_t +mbsize(const char *str, size_t str_size) +{ + mbstate_t ps; + memset(&ps, 0, sizeof(ps)); + mbrlen(NULL,0,&ps); + return mbrlen(str, str_size, &ps); +} + +static inline int +mb2wcisalpha(char *mb_char, size_t mb_char_size) +{ + assert(mb_char_size < 5); + wchar_t buff[2]; + mbstate_t ps; + memset(&ps, 0, sizeof(ps)); + mbsrtowcs(buff, (const char **)&mb_char, mb_char_size + 1, &ps); + return iswalpha((wint_t)buff[0]); +} + /** * Parse string identifier in quotes. Parser either stops right * after the closing quote, or returns an error position. @@ -126,17 +148,20 @@ json_parse_identifier(struct json_path_parser *parser, const char *str = pos; char c = *pos; /* First symbol can not be digit. */ - if (!isalpha(c) && c != '_') - return pos - parser->src + 1; + size_t mb_size = 0; + if ((mb_size = mbsize(pos, end - pos)) && !mb2wcisalpha((char *)pos, mb_size) && c != '_') + return pos - parser->src + mb_size; int len = 1; - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); - c = *++pos) + for (c = *(pos += mb_size); + pos < end && (((mb_size = mbsize(pos, end - pos)) + && mb2wcisalpha((char *)pos, mb_size)) || c == '_' || isdigit(c)); + c = *(pos += mb_size)) ++len; assert(len > 0); parser->pos = pos; node->type = JSON_PATH_STR; node->str = str; - node->len = len; + node->len = (int)(pos - str); return 0; } diff --git a/src/lib/json/path.h b/src/lib/json/path.h index 6e8db4c..b1028f6 100644 --- a/src/lib/json/path.h +++ b/src/lib/json/path.h @@ -33,6 +33,8 @@ #include <stdbool.h> #include <stdint.h> +#include <string.h> +#include <malloc.h> #ifdef __cplusplus extern "C" { diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 2d7367a..c4b361a 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -611,7 +611,7 @@ s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') --- ... -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} --- ... field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} @@ -626,7 +626,7 @@ t[1] ... t[2] --- -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] +- [1, 2, 3, '4', [5, 6, 7], {'hello中国world': {'中国': 'test'}, 'key': 'key1', 'value': 'value1'}] ... t[3] --- @@ -673,6 +673,10 @@ t["[2][6].value"] --- - value1 ... +t["[2][6].hello中国world"] +--- +- {'中国': 'test'} +... t["[2][6]['key']"] --- - key1 @@ -681,10 +685,18 @@ t["[2][6]['value']"] --- - value1 ... +t["[2][6]['hello中国world']"] +--- +- {'中国': 'test'} +... t["[3].k3[2].c"] --- - 3 ... +t["[2][6]['hello中国world'].中国"] +--- +- test +... t["[4]"] --- - '123456' diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index ba3482d..476be90 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -207,7 +207,7 @@ format[3] = {name = 'field3', type = 'map'} format[4] = {name = 'field4', type = 'string'} s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} t = s:replace{1, field2, field3, "123456"} t[1] @@ -223,9 +223,12 @@ t["[2][5][2]"] t["[2][5][3]"] t["[2][6].key"] t["[2][6].value"] +t["[2][6].hello中国world"] t["[2][6]['key']"] t["[2][6]['value']"] +t["[2][6]['hello中国world']"] t["[3].k3[2].c"] +t["[2][6]['hello中国world'].中国"] t["[4]"] t.field1 t.field2[5] -- 2.7.4 On 29.03.2018 17:22, Kirill Shcherbatov wrote: > --- > src/box/lua/tuple.c | 1 - > src/lib/json/path.c | 19 +++++++++++++++++-- > test/engine/tuple.result | 20 ++++++++++++++++++-- > test/engine/tuple.test.lua | 6 +++++- > 4 files changed, 40 insertions(+), 6 deletions(-) > > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 99b9ff2..c3a435b 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) > static inline int > tuple_field_go_to_index(const char **field, uint64_t index) > { > - assert(index >= 0); > enum mp_type type = mp_typeof(**field); > if (type == MP_ARRAY) { > if (index == 0) > diff --git a/src/lib/json/path.c b/src/lib/json/path.c > index 4a6174e..3e1bb80 100644 > --- a/src/lib/json/path.c > +++ b/src/lib/json/path.c > @@ -31,6 +31,8 @@ > > #include "path.h" > #include <ctype.h> > +#include <wchar.h> > +#include <wctype.h> > #include "trivia/util.h" > > /** Same as strtoull(), but with limited length. */ > @@ -44,6 +46,19 @@ strntoull(const char *src, int len) { > return value; > } > > +static inline int > +ismbaswcalpha(const char *str, size_t str_len_max) > +{ > + assert(str_len_max < 1024); > + wchar_t buff[1024]; > + mbstate_t ps; > + memset(&ps, 0, sizeof(ps)); > + str_len_max = mbrlen(str, str_len_max, &ps); > + memset(&ps, 0, sizeof(ps)); > + mbsrtowcs(buff, &str, str_len_max, &ps); > + return iswalpha((wint_t)buff[0]); > +} > + > /** > * Parse string identifier in quotes. Parser either stops right > * after the closing quote, or returns an error position. > @@ -126,10 +141,10 @@ json_parse_identifier(struct json_path_parser *parser, > const char *str = pos; > char c = *pos; > /* First symbol can not be digit. */ > - if (!isalpha(c) && c != '_') > + if (!ismbaswcalpha(pos, end - pos) && c != '_') > return pos - parser->src + 1; > int len = 1; > - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); > + for (c = *++pos; pos < end && (ismbaswcalpha(pos, end - pos) || c == '_' || isdigit(c)); > c = *++pos) > ++len; > assert(len > 0); > diff --git a/test/engine/tuple.result b/test/engine/tuple.result > index 2d7367a..d6eb4fa 100644 > --- a/test/engine/tuple.result > +++ b/test/engine/tuple.result > @@ -611,7 +611,7 @@ s = box.schema.space.create('test', {format = format}) > pk = s:create_index('pk') > --- > ... > -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} > --- > ... > field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} > @@ -626,7 +626,7 @@ t[1] > ... > t[2] > --- > -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] > +- [1, 2, 3, '4', [5, 6, 7], {'hello中国world': {'中国': 'test'}, 'key': 'key1', 'value': 'value1'}] > ... > t[3] > --- > @@ -665,6 +665,10 @@ t["[2][5][3]"] > --- > - 7 > ... > +t["[2][5][3]['hello中国world'].中"] > +--- > +- null > +... > t["[2][6].key"] > --- > - key1 > @@ -673,6 +677,10 @@ t["[2][6].value"] > --- > - value1 > ... > +t["[2][6].hello中国world"] > +--- > +- {'中国': 'test'} > +... > t["[2][6]['key']"] > --- > - key1 > @@ -681,10 +689,18 @@ t["[2][6]['value']"] > --- > - value1 > ... > +t["[2][6]['hello中国world']"] > +--- > +- {'中国': 'test'} > +... > t["[3].k3[2].c"] > --- > - 3 > ... > +t["[2][6]['hello中国world'].中国"] > +--- > +- test > +... > t["[4]"] > --- > - '123456' > diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua > index ba3482d..5a3bcfa 100644 > --- a/test/engine/tuple.test.lua > +++ b/test/engine/tuple.test.lua > @@ -207,7 +207,7 @@ format[3] = {name = 'field3', type = 'map'} > format[4] = {name = 'field4', type = 'string'} > s = box.schema.space.create('test', {format = format}) > pk = s:create_index('pk') > -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} > field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} > t = s:replace{1, field2, field3, "123456"} > t[1] > @@ -221,11 +221,15 @@ t["[2][5]"] > t["[2][5][1]"] > t["[2][5][2]"] > t["[2][5][3]"] > +t["[2][5][3]['hello中国world'].中"] > t["[2][6].key"] > t["[2][6].value"] > +t["[2][6].hello中国world"] > t["[2][6]['key']"] > t["[2][6]['value']"] > +t["[2][6]['hello中国world']"] > t["[3].k3[2].c"] > +t["[2][6]['hello中国world'].中国"] > t["[4]"] > t.field1 > t.field2[5] > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support 2018-03-29 18:04 ` [tarantool-patches] " Kirill Shcherbatov @ 2018-03-30 10:24 ` v.shpilevoy 2018-03-30 10:25 ` v.shpilevoy 2018-04-02 19:19 ` Kirill Shcherbatov 0 siblings, 2 replies; 14+ messages in thread From: v.shpilevoy @ 2018-03-30 10:24 UTC (permalink / raw) To: tarantool-patches; +Cc: Kirill Shcherbatov [-- Attachment #1: Type: text/plain, Size: 4395 bytes --] See below 5 comments. And see at the travis: https://travis-ci.org/tarantool/tarantool/builds/359964649?utm_source=github_status&utm_medium=notification <https://travis-ci.org/tarantool/tarantool/builds/359964649?utm_source=github_status&utm_medium=notification> - seems like the patch does not work. box/alter.test.lua with unicode and your own tests are failed. > 29 марта 2018 г., в 21:04, Kirill Shcherbatov <kshcherbatov@tarantool.org> написал(а): > > From eb16ee6ea2d2ca5801b6b09c403c6e2f23984bd9 Mon Sep 17 00:00:00 2001 > From: Kirill Shcherbatov <kshcherbatov@tarantool.org> > Date: Thu, 29 Mar 2018 21:03:21 +0300 > Subject: [PATCH] Multibyte characters support > > --- > src/box/lua/tuple.c | 1 - > src/lib/json/path.c | 35 ++++++++++++++++++++++++++++++----- > src/lib/json/path.h | 2 ++ > test/engine/tuple.result | 16 ++++++++++++++-- > test/engine/tuple.test.lua | 5 ++++- > 5 files changed, 50 insertions(+), 9 deletions(-) > > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 99b9ff2..c3a435b 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -44,6 +46,26 @@ strntoull(const char *src, int len) { > return value; > } > 1. Please, add comments what the function does, what arguments gets, what returns. > +static inline size_t > +mbsize(const char *str, size_t str_size) > +{ > + mbstate_t ps; > + memset(&ps, 0, sizeof(ps)); > + mbrlen(NULL,0,&ps); 2. I do not like this always-zero ps. Why you can not use mblen instead of mbrlen? > + return mbrlen(str, str_size, &ps); > +} > + > +static inline int > +mb2wcisalpha(char *mb_char, size_t mb_char_size) > +{ > + assert(mb_char_size < 5); > + wchar_t buff[2]; 3. Sorry, I still can not understand, why do you need 2 wchar, if here only one is checked? Why do you use mbsrtowcs instead of mbrtowc. > + mbstate_t ps; > + memset(&ps, 0, sizeof(ps)); > + mbsrtowcs(buff, (const char **)&mb_char, mb_char_size + 1, &ps); > + return iswalpha((wint_t)buff[0]); > +} > + > /** > * Parse string identifier in quotes. Parser either stops right > * after the closing quote, or returns an error position. > @@ -126,17 +148,20 @@ json_parse_identifier(struct json_path_parser *parser, > const char *str = pos; > char c = *pos; > /* First symbol can not be digit. */ > - if (!isalpha(c) && c != '_') > - return pos - parser->src + 1; > + size_t mb_size = 0; > + if ((mb_size = mbsize(pos, end - pos)) && !mb2wcisalpha((char *)pos, mb_size) && c != '_') > + return pos - parser->src + mb_size; 4. I propose to join mbsize and mb2wcisalpha under the single function check_alpha() or something, that either accepts char ** and moves it forward, or returns parsed size. And why you do not check mbsize() for negative results? In the doc I saw, that it can return values like (size_t)-1 or (size_t)-2. And you implementation accepts it as ok. > int len = 1; > - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); > - c = *++pos) > + for (c = *(pos += mb_size); > + pos < end && (((mb_size = mbsize(pos, end - pos)) > + && mb2wcisalpha((char *)pos, mb_size)) || c == '_' || isdigit(c)); > + c = *(pos += mb_size)) > ++len; > assert(len > 0); > parser->pos = pos; > node->type = JSON_PATH_STR; > node->str = str; > - node->len = len; > + node->len = (int)(pos - str); > return 0; > } > > diff --git a/src/lib/json/path.h b/src/lib/json/path.h > index 6e8db4c..b1028f6 100644 > --- a/src/lib/json/path.h > +++ b/src/lib/json/path.h > @@ -33,6 +33,8 @@ > > #include <stdbool.h> > #include <stdint.h> > +#include <string.h> > +#include <malloc.h> > > #ifdef __cplusplus > extern "C" { > diff --git a/test/engine/tuple.result b/test/engine/tuple.result > index 2d7367a..c4b361a 100644 > --- a/test/engine/tuple.result > +++ b/test/engine/tuple.result > @@ -611,7 +611,7 @@ s = box.schema.space.create('test', {format = format}) > pk = s:create_index('pk') > --- > ... > -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} 5. Please, add unit tests on unicode. See commit 49df120aaa7592e6475bf6974fe6f225db9234de (Introduce json_path_parser) for examples. > [-- Attachment #2: Type: text/html, Size: 9490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support 2018-03-30 10:24 ` v.shpilevoy @ 2018-03-30 10:25 ` v.shpilevoy 2018-04-02 19:19 ` Kirill Shcherbatov 1 sibling, 0 replies; 14+ messages in thread From: v.shpilevoy @ 2018-03-30 10:25 UTC (permalink / raw) To: tarantool-patches; +Cc: Kirill Shcherbatov [-- Attachment #1: Type: text/plain, Size: 128 bytes --] You can even use mbtowc. > 30 марта 2018 г., в 13:24, v.shpilevoy@tarantool.org написал(а): > > mbrtowc [-- Attachment #2: Type: text/html, Size: 1193 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support 2018-03-30 10:24 ` v.shpilevoy 2018-03-30 10:25 ` v.shpilevoy @ 2018-04-02 19:19 ` Kirill Shcherbatov 2018-04-03 10:20 ` Vladislav Shpilevoy 1 sibling, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-04-02 19:19 UTC (permalink / raw) To: v.shpilevoy, tarantool-patches diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 99b9ff2..ec6f7cd 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) static inline int tuple_field_go_to_index(const char **field, uint64_t index) { - assert(index >= 0); enum mp_type type = mp_typeof(**field); if (type == MP_ARRAY) { if (index == 0) @@ -497,7 +496,9 @@ tuple_field_go_to_key(const char **field, const char *key, int len) static int lbox_tuple_field_by_path(struct lua_State *L) { + int mark = 0; const char *field; + const char *path = NULL; struct tuple *tuple = luaT_istuple(L, 1); /* Is checked in Lua wrapper. */ assert(tuple != NULL); @@ -506,6 +507,17 @@ lbox_tuple_field_by_path(struct lua_State *L) index -= TUPLE_INDEX_BASE; if (index < 0) { not_found: + if (!path) + goto exit_not_found; + uint32_t path_len = strlen(path); + uint32_t path_hash = lua_hash(path, path_len); + field = tuple_field_by_name(tuple, path, + path_len, path_hash); + if (field) + goto push_value; + if (mark || path_len == 0) + luaL_error(L, "Error in path on position %d", mark); +exit_not_found: lua_pushinteger(L, -1); lua_pushnil(L); return 2; @@ -520,13 +532,15 @@ push_value: } assert(lua_isstring(L, 2)); size_t path_len; - const char *path = lua_tolstring(L, 2, &path_len); + path = lua_tolstring(L, 2, &path_len); struct json_path_parser parser; struct json_path_node node; json_path_parser_create(&parser, path, path_len); int rc = json_path_next(&parser, &node); - if (rc != 0 || node.type == JSON_PATH_END) - luaL_error(L, "Error in path on position %d", rc); + if (rc != 0 || node.type == JSON_PATH_END) { + mark = rc; + goto not_found; + } if (node.type == JSON_PATH_NUM) { int index = node.num; if (index == 0) @@ -553,8 +567,10 @@ push_value: name_hash = lua_hash(name, name_len); } field = tuple_field_by_name(tuple, name, name_len, name_hash); - if (field == NULL) + if (field == NULL) { + rc = 1; goto not_found; + } } while ((rc = json_path_next(&parser, &node)) == 0 && node.type != JSON_PATH_END) { diff --git a/src/lib/json/path.c b/src/lib/json/path.c index 4a6174e..d9473be 100644 --- a/src/lib/json/path.c +++ b/src/lib/json/path.c @@ -31,6 +31,8 @@ #include "path.h" #include <ctype.h> +#include <wchar.h> +#include <wctype.h> #include "trivia/util.h" /** Same as strtoull(), but with limited length. */ @@ -45,6 +47,44 @@ strntoull(const char *src, int len) { } /** + * Checks is multibyte character whose first byte + * is pointed to by mb_str is alphabetic. + * NOTE: You have to clean global context + * with mbtowc(NULL, 0, 0); before first call + * @param mb_str + * @param mb_str_size + * @return multebyte character bytes count Success + * @retval 0 on non-alphabetic character or parse failure + */ +static inline size_t +mbtowc_check_alpha(const char *mb_str, size_t mb_str_size) +{ + wchar_t wc; + ssize_t ret = mbtowc(&wc, mb_str, mb_str_size); + if (ret <= 0 || !iswalpha((wint_t)wc)) + return 0; + return (size_t)ret; +} + + +/** + * Counts user-string sign count in mb_str_size bytes + * @param mb_str + * @param mb_str_size + * @return sign count + */ +static inline int +mbtowc_count(struct json_path_parser *parser, const char *mb_str, size_t mb_str_len) +{ + char src[mb_str_len+1]; + memcpy(src, mb_str, mb_str_len); + src[mb_str_len] = '\0'; + mbtowc(NULL, 0, 0); + int rc = (int)mbstowcs(NULL, (const char *)src, 0); + return (rc >= 0) ? rc + (parser->src_len < (int)mb_str_len) : 1; +} + +/** * Parse string identifier in quotes. Parser either stops right * after the closing quote, or returns an error position. * @param parser JSON path parser. @@ -58,22 +98,26 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) { const char *end = parser->src + parser->src_len; const char *pos = parser->pos; + const char *str = pos; assert(pos < end); char quote_type = *pos; assert(quote_type == '\'' || quote_type == '"'); - /* Skip first quote. */ - int len = 0; - ++pos; - const char *str = pos; - for (char c = *pos; pos < end && quote_type != c; c = *++pos) - ++len; + /* Skip first quote. */++pos; + mbtowc(NULL, 0, 0); /* A string must be terminated with quote. */ + for (; *pos != quote_type && pos < end;) { + int wc_sz = mbtowc(NULL, pos, end - pos); + if (wc_sz <= 0) + return pos - parser->src + 1; + pos += wc_sz; + } + int len = (int)(pos - str - 1); if (*pos != quote_type || len == 0) return pos - parser->src + 1; /* Skip the closing quote. */ parser->pos = pos + 1; node->type = JSON_PATH_STR; - node->str = str; + node->str = str + 1; node->len = len; return 0; } @@ -124,19 +168,21 @@ json_parse_identifier(struct json_path_parser *parser, const char *pos = parser->pos; assert(pos < end); const char *str = pos; - char c = *pos; - /* First symbol can not be digit. */ - if (!isalpha(c) && c != '_') + size_t mb_size = 0; + /* clean global mb to wc decode context */ + mbtowc(NULL, 0, 0); + /* First symbol also can not be digit. */ + if (!(mb_size = mbtowc_check_alpha(pos, end - pos)) && (*pos != '_')) return pos - parser->src + 1; - int len = 1; - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); - c = *++pos) - ++len; - assert(len > 0); + while (mb_size) { + pos += mb_size; + mb_size = mbtowc_check_alpha(pos, end - pos); + mb_size += (!mb_size && (isdigit(*pos) || *pos == '_')); + } parser->pos = pos; node->type = JSON_PATH_STR; node->str = str; - node->len = len; + node->len = (int)(pos - str); return 0; } @@ -155,20 +201,22 @@ json_path_next(struct json_path_parser *parser, struct json_path_node *node) ++parser->pos; /* Error for []. */ if (parser->pos == end) - return parser->pos - parser->src + 1; + return mbtowc_count(parser, parser->src, + parser->pos - parser->src + 1); c = *parser->pos; if (c == '"' || c == '\'') rc = json_parse_string(parser, node); else rc = json_parse_integer(parser, node); if (rc != 0) - return rc; + return mbtowc_count(parser, parser->src, rc); /* * Expression, started from [ must be finished * with ] regardless of its type. */ if (parser->pos == end || *parser->pos != ']') - return parser->pos - parser->src + 1; + return mbtowc_count(parser, parser->src, + parser->pos - parser->src + 1); /* Skip ]. */ ++parser->pos; break; @@ -176,12 +224,13 @@ json_path_next(struct json_path_parser *parser, struct json_path_node *node) /* Skip dot. */ ++parser->pos; if (parser->pos == end) - return parser->pos - parser->src + 1; + return mbtowc_count(parser, parser->src, + parser->pos - parser->src + 1); FALLTHROUGH default: rc = json_parse_identifier(parser, node); if (rc != 0) - return rc; + return mbtowc_count(parser, parser->src, rc); break; } return 0; diff --git a/src/lib/json/path.h b/src/lib/json/path.h index 6e8db4c..b1028f6 100644 --- a/src/lib/json/path.h +++ b/src/lib/json/path.h @@ -33,6 +33,8 @@ #include <stdbool.h> #include <stdint.h> +#include <string.h> +#include <malloc.h> #ifdef __cplusplus extern "C" { diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 2d7367a..c4b361a 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -611,7 +611,7 @@ s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') --- ... -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} --- ... field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} @@ -626,7 +626,7 @@ t[1] ... t[2] --- -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] +- [1, 2, 3, '4', [5, 6, 7], {'hello中国world': {'中国': 'test'}, 'key': 'key1', 'value': 'value1'}] ... t[3] --- @@ -673,6 +673,10 @@ t["[2][6].value"] --- - value1 ... +t["[2][6].hello中国world"] +--- +- {'中国': 'test'} +... t["[2][6]['key']"] --- - key1 @@ -681,10 +685,18 @@ t["[2][6]['value']"] --- - value1 ... +t["[2][6]['hello中国world']"] +--- +- {'中国': 'test'} +... t["[3].k3[2].c"] --- - 3 ... +t["[2][6]['hello中国world'].中国"] +--- +- test +... t["[4]"] --- - '123456' diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index ba3482d..476be90 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -207,7 +207,7 @@ format[3] = {name = 'field3', type = 'map'} format[4] = {name = 'field4', type = 'string'} s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中国world = {中国 = 'test'}}} field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} t = s:replace{1, field2, field3, "123456"} t[1] @@ -223,9 +223,12 @@ t["[2][5][2]"] t["[2][5][3]"] t["[2][6].key"] t["[2][6].value"] +t["[2][6].hello中国world"] t["[2][6]['key']"] t["[2][6]['value']"] +t["[2][6]['hello中国world']"] t["[3].k3[2].c"] +t["[2][6]['hello中国world'].中国"] t["[4]"] t.field1 t.field2[5] -- 2.7.4 On 30.03.2018 13:24, v.shpilevoy@tarantool.org wrote: > See below 5 comments. And see at the travis: > https://travis-ci.org/tarantool/tarantool/builds/359964649?utm_source=github_status&utm_medium=notification - > seems like the patch does not work. box/alter.test.lua with unicode and > your own tests are failed. > >> 29 марта 2018 г., в 21:04, Kirill Shcherbatov >> <kshcherbatov@tarantool.org <mailto:kshcherbatov@tarantool.org>> >> написал(а): >> >> From eb16ee6ea2d2ca5801b6b09c403c6e2f23984bd9 Mon Sep 17 00:00:00 2001 >> From: Kirill Shcherbatov <kshcherbatov@tarantool.org >> <mailto:kshcherbatov@tarantool.org>> >> Date: Thu, 29 Mar 2018 21:03:21 +0300 >> Subject: [PATCH] Multibyte characters support >> >> --- >> src/box/lua/tuple.c | 1 - >> src/lib/json/path.c | 35 ++++++++++++++++++++++++++++++----- >> src/lib/json/path.h | 2 ++ >> test/engine/tuple.result | 16 ++++++++++++++-- >> test/engine/tuple.test.lua | 5 ++++- >> 5 files changed, 50 insertions(+), 9 deletions(-) >> >> diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c >> index 99b9ff2..c3a435b 100644 >> --- a/src/box/lua/tuple.c >> +++ b/src/box/lua/tuple.c >> @@ -44,6 +46,26 @@ strntoull(const char *src, int len) { >> return value; >> } >> > > 1. Please, add comments what the function does, what arguments gets, > what returns. > >> +static inline size_t >> +mbsize(const char *str, size_t str_size) >> +{ >> +mbstate_t ps; >> +memset(&ps, 0, sizeof(ps)); >> +mbrlen(NULL,0,&ps); > > 2. I do not like this always-zero ps. Why you can not use mblen instead > of mbrlen? > >> +return mbrlen(str, str_size, &ps); >> +} >> + >> +static inline int >> +mb2wcisalpha(char *mb_char, size_t mb_char_size) >> +{ >> +assert(mb_char_size < 5); >> +wchar_t buff[2]; > > 3. Sorry, I still can not understand, why do you need 2 wchar, if here only > one is checked? Why do you use mbsrtowcs instead of mbrtowc. > >> +mbstate_t ps; >> +memset(&ps, 0, sizeof(ps)); >> +mbsrtowcs(buff, (const char **)&mb_char, mb_char_size + 1, &ps); >> +return iswalpha((wint_t)buff[0]); >> +} >> + >> /** >> * Parse string identifier in quotes. Parser either stops right >> * after the closing quote, or returns an error position. >> @@ -126,17 +148,20 @@ json_parse_identifier(struct json_path_parser >> *parser, >> const char *str = pos; >> char c = *pos; >> /* First symbol can not be digit. */ >> -if (!isalpha(c) && c != '_') >> -return pos - parser->src + 1; >> +size_t mb_size = 0; >> +if ((mb_size = mbsize(pos, end - pos)) && !mb2wcisalpha((char *)pos, >> mb_size) && c != '_') >> +return pos - parser->src + mb_size; > > 4. I propose to join mbsize and mb2wcisalpha under the single function > check_alpha() > or something, that either accepts char ** and moves it forward, or > returns parsed size. > > And why you do not check mbsize() for negative results? In the doc I > saw, that it can > return values like (size_t)-1 or (size_t)-2. And you implementation > accepts it as ok. > >> int len = 1; >> -for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); >> - c = *++pos) >> +for (c = *(pos += mb_size); >> + pos < end && (((mb_size = mbsize(pos, end - pos)) >> + && mb2wcisalpha((char *)pos, mb_size)) || c == '_' >> || isdigit(c)); >> + c = *(pos += mb_size)) >> ++len; >> assert(len > 0); >> parser->pos = pos; >> node->type = JSON_PATH_STR; >> node->str = str; >> -node->len = len; >> +node->len = (int)(pos - str); >> return 0; >> } >> >> diff --git a/src/lib/json/path.h b/src/lib/json/path.h >> index 6e8db4c..b1028f6 100644 >> --- a/src/lib/json/path.h >> +++ b/src/lib/json/path.h >> @@ -33,6 +33,8 @@ >> >> #include <stdbool.h> >> #include <stdint.h> >> +#include <string.h> >> +#include <malloc.h> >> >> #ifdef __cplusplus >> extern "C" { >> diff --git a/test/engine/tuple.result b/test/engine/tuple.result >> index 2d7367a..c4b361a 100644 >> --- a/test/engine/tuple.result >> +++ b/test/engine/tuple.result >> @@ -611,7 +611,7 @@ s = box.schema.space.create('test', {format = format}) >> pk = s:create_index('pk') >> --- >> ... >> -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} >> +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1", hello中 >> 国world = {中国 = 'test'}}} > > 5. Please, add unit tests on unicode. See commit > 49df120aaa7592e6475bf6974fe6f225db9234de (Introduce > json_path_parser) for examples. >> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support 2018-04-02 19:19 ` Kirill Shcherbatov @ 2018-04-03 10:20 ` Vladislav Shpilevoy 2018-04-05 14:09 ` [tarantool-patches] [PATCH v2 1/1] ICU Unicode support for JSON parser Kirill Shcherbatov 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-04-03 10:20 UTC (permalink / raw) To: tarantool-patches, Kirill Shcherbatov Hello. Please, consider 9 comments. 1. Seems like you again sent the patch using diff copy-paste into mail client. Please, don't do that. Such "method" - destroys all tabs, converting them to spaces, which count != tab width, - skips commit message body. Now I look at the letter in commits mail list. 2. Please, write a commit message body and ref GitHub issue using 'Linked with #NNNN' or 'Part of #NNNN'. But actually your patch closes it, as I think. 3. In lbox_tuple_field_by_path you calculate path len when a field is not found - please, try use already calculated len from this place: > size_t path_len; > - const char *path = lua_tolstring(L, 2, &path_len); > + path = lua_tolstring(L, 2, &path_len); 4. I can not build the branch: > [ 15%] Building C object test/unit/CMakeFiles/heap.test.dir/heap.c.o > In file included from > /Users/v.shpilevoy/Work/Repositories/tarantool/src/lib/json/path.c:32: > /Users/v.shpilevoy/Work/Repositories/tarantool/src/lib/json/path.h:37:10: > fatal error: 'malloc.h' file not found > #include <malloc.h> > ^~~~~~~~~~ > [ 15%] Built target api > if (index < 0) { > not_found: > + if (!path) > + goto exit_not_found; > + uint32_t path_len = strlen(path); 5. The indentation looks broken. 6. Please, try to simplify the function. It looks very complex with 3 labels and a strange "mark". > /** + * Checks is multibyte character whose first byte + * is pointed > to by mb_str is alphabetic. + * NOTE: You have to clean global context > + * with mbtowc(NULL, 0, 0); before first call 7. Where you found this info? I can not find it. > +/** + * Counts user-string sign count in mb_str_size bytes + * @param > mb_str + * @param mb_str_size + * @return sign count + */ +static > inline int +mbtowc_count(struct json_path_parser *parser, const char > *mb_str, size_t mb_str_len) +{ + char src[mb_str_len+1]; 8. I very-very do not like allocation of arrays with variable length on a stack. Please, do not do that. It is bad even if you use this function for errors only. And please try to remove this function at all. I propose to calculate symbols count in struct json_path_parser. For example, you can add a member symbol_count, and increase it during each json_path_next call. 9. Looks, like you did not add a tests on complex cases, which we have discussed verbally, when a space format field has name looking like JSON paths. Please, add them. And I recommend you do not hurry. Speed of the patch pushing into master does not depend on speed of patch resending. And remember, that it will be very hot code, and it must extremely optimized. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v2 1/1] ICU Unicode support for JSON parser. 2018-04-03 10:20 ` Vladislav Shpilevoy @ 2018-04-05 14:09 ` Kirill Shcherbatov 2018-04-05 18:00 ` [tarantool-patches] " Kirill Shcherbatov 0 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-04-05 14:09 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov --- src/box/CMakeLists.txt | 2 +- src/box/lua/tuple.c | 172 ++++++---------------------------------- src/box/lua/tuple.lua | 7 +- src/box/tuple_format.c | 155 ++++++++++++++++++++++++++++++++++++ src/box/tuple_format.h | 15 ++++ src/lib/json/path.c | 191 ++++++++++++++++++++++++++++++--------------- src/lib/json/path.h | 25 ++++-- test/engine/tuple.result | 57 ++++++++++---- test/engine/tuple.test.lua | 13 ++- test/unit/CMakeLists.txt | 2 +- test/unit/json_path.c | 2 +- 11 files changed, 398 insertions(+), 243 deletions(-) diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt index f053ae4..e5a0c21 100644 --- a/src/box/CMakeLists.txt +++ b/src/box/CMakeLists.txt @@ -45,7 +45,7 @@ add_library(tuple STATIC field_def.c opt_def.c ) -target_link_libraries(tuple box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit) +target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) add_library(xlog STATIC xlog.c) target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES}) diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 99b9ff2..3b89cdd 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -403,87 +403,6 @@ lbox_tuple_transform(struct lua_State *L) } /** - * Propagate @a field to MessagePack(field)[index]. - * @param[in][out] field Field to propagate. - * @param index 1-based index to propagate to. - * - * @retval 0 Success, the index was found. - * @retval -1 Not found. - */ -static inline int -tuple_field_go_to_index(const char **field, uint64_t index) -{ - assert(index >= 0); - enum mp_type type = mp_typeof(**field); - if (type == MP_ARRAY) { - if (index == 0) - return -1; - /* Make index 0-based. */ - index -= TUPLE_INDEX_BASE; - uint32_t count = mp_decode_array(field); - if (index >= count) - return -1; - for (; index > 0; --index) - mp_next(field); - return 0; - } else if (type == MP_MAP) { - uint64_t count = mp_decode_map(field); - for (; count > 0; --count) { - type = mp_typeof(**field); - if (type == MP_UINT) { - uint64_t value = mp_decode_uint(field); - if (value == index) - return 0; - } else if (type == MP_INT) { - int64_t value = mp_decode_int(field); - if (value >= 0 && (uint64_t)value == index) - return 0; - } else { - /* Skip key. */ - mp_next(field); - } - /* Skip value. */ - mp_next(field); - } - } - return -1; -} - -/** - * Propagate @a field to MessagePack(field)[key]. - * @param[in][out] field Field to propagate. - * @param key Key to propagate to. - * @param len Length of @a key. - * - * @retval 0 Success, the index was found. - * @retval -1 Not found. - */ -static inline int -tuple_field_go_to_key(const char **field, const char *key, int len) -{ - enum mp_type type = mp_typeof(**field); - if (type != MP_MAP) - return -1; - uint64_t count = mp_decode_map(field); - for (; count > 0; --count) { - type = mp_typeof(**field); - if (type == MP_STR) { - uint32_t value_len; - const char *value = mp_decode_str(field, &value_len); - if (value_len == (uint)len && - memcmp(value, key, len) == 0) - return 0; - } else { - /* Skip key. */ - mp_next(field); - } - /* Skip value. */ - mp_next(field); - } - return -1; -} - -/** * Find a tuple field by JSON path. * @param L Lua state. * @param tuple 1-th argument on a lua stack, tuple to get field @@ -497,81 +416,34 @@ tuple_field_go_to_key(const char **field, const char *key, int len) static int lbox_tuple_field_by_path(struct lua_State *L) { - const char *field; struct tuple *tuple = luaT_istuple(L, 1); /* Is checked in Lua wrapper. */ assert(tuple != NULL); - if (lua_isnumber(L, 2)) { + const char *field = NULL; + if (!lua_isnumber(L, 2)) { + /* string */ + field = tuple_field_raw_by_path(tuple_format(tuple), tuple_data(tuple), + tuple_field_map(tuple), + lua_tostring(L, 2), (int)lua_strlen(L, 2), + lua_hashstring(L, 2)); + } else { int index = lua_tointeger(L, 2); index -= TUPLE_INDEX_BASE; - if (index < 0) { -not_found: - lua_pushinteger(L, -1); - lua_pushnil(L); - return 2; - } - field = tuple_field(tuple, index); - if (field == NULL) - goto not_found; -push_value: - lua_pushinteger(L, 0); - luamp_decode(L, luaL_msgpack_default, &field); - return 2; + if (index >= 0) + field = tuple_field(tuple, index); } - assert(lua_isstring(L, 2)); - size_t path_len; - const char *path = lua_tolstring(L, 2, &path_len); - struct json_path_parser parser; - struct json_path_node node; - json_path_parser_create(&parser, path, path_len); - int rc = json_path_next(&parser, &node); - if (rc != 0 || node.type == JSON_PATH_END) - luaL_error(L, "Error in path on position %d", rc); - if (node.type == JSON_PATH_NUM) { - int index = node.num; - if (index == 0) - goto not_found; - index -= TUPLE_INDEX_BASE; - field = tuple_field(tuple, index); - if (field == NULL) - goto not_found; - } else { - assert(node.type == JSON_PATH_STR); - /* First part of a path is a field name. */ - const char *name = node.str; - uint32_t name_len = node.len; - uint32_t name_hash; - if (path_len == name_len) { - name_hash = lua_hashstring(L, 2); - } else { - /* - * If a string is "field....", then its - * precalculated juajit hash can not be - * used. A tuple dictionary hashes only - * name, not path. - */ - name_hash = lua_hash(name, name_len); - } - field = tuple_field_by_name(tuple, name, name_len, name_hash); - if (field == NULL) - goto not_found; - } - while ((rc = json_path_next(&parser, &node)) == 0 && - node.type != JSON_PATH_END) { - if (node.type == JSON_PATH_NUM) { - rc = tuple_field_go_to_index(&field, node.num); - } else { - assert(node.type == JSON_PATH_STR); - rc = tuple_field_go_to_key(&field, node.str, node.len); - } - if (rc != 0) - goto not_found; - } - if (rc == 0) - goto push_value; - luaL_error(L, "Error in path on position %d", rc); - unreachable(); - goto not_found; + /* error code or message */ + struct error *e = diag_last_error(diag_get()); + if (field == NULL && e != NULL) + lua_pushstring(L, e->errmsg); + else + lua_pushinteger(L, -1*(field == NULL)); + if (field) + luamp_decode(L, luaL_msgpack_default, &field); + else + lua_pushnil(L); + diag_clear(diag_get()); + return 2; } static int diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua index b51b4df..3785568 100644 --- a/src/box/lua/tuple.lua +++ b/src/box/lua/tuple.lua @@ -6,6 +6,7 @@ local msgpackffi = require('msgpackffi') local fun = require('fun') local buffer = require('buffer') local internal = require('box.internal') +local log = require('log') ffi.cdef[[ /** \cond public */ @@ -302,6 +303,7 @@ ffi.metatype(tuple_t, { end; __tostring = internal.tuple.tostring; __index = function(tuple, key) + local rc, field if type(key) == "string" or type(key) == "number" then -- Try to get a field by json path or by [index]. If -- it was not found (rc ~= 0) then return a method @@ -311,12 +313,13 @@ ffi.metatype(tuple_t, { -- returns field value, not tuple_bsize function. To -- access hidden methods use -- 'box.tuple.<method_name>(T, [args...])'. - local rc, field = tuple_field_by_path(tuple, key) + rc, field = tuple_field_by_path(tuple, key) if rc == 0 then return field end end - return methods[key] + local method = methods[key] + return method == box.NULL and rc ~= -1 and rc or method end; __eq = function(tuple_a, tuple_b) -- Two tuple are considered equal if they have same memory address diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index e458f49..9c6e3da 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -28,6 +28,8 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include <lib/json/path.h> +#include <third_party/luajit/src/lua.h> #include "tuple_format.h" /** Global table of tuple formats */ @@ -478,3 +480,156 @@ box_tuple_format_unref(box_tuple_format_t *format) { tuple_format_unref(format); } + +/** + * Propagate @a field to MessagePack(field)[index]. + * @param[in][out] field Field to propagate. + * @param index 1-based index to propagate to. + * + * @retval 0 Success, the index was found. + * @retval -1 Not found. + */ +static inline int +tuple_field_go_to_index(const char **field, uint64_t index) +{ + enum mp_type type = mp_typeof(**field); + if (type == MP_ARRAY) { + if (index == 0) + return -1; + /* Make index 0-based. */ + index -= TUPLE_INDEX_BASE; + uint32_t count = mp_decode_array(field); + if (index >= count) + return -1; + for (; index > 0; --index) + mp_next(field); + return 0; + } else if (type == MP_MAP) { + uint64_t count = mp_decode_map(field); + for (; count > 0; --count) { + type = mp_typeof(**field); + if (type == MP_UINT) { + uint64_t value = mp_decode_uint(field); + if (value == index) + return 0; + } else if (type == MP_INT) { + int64_t value = mp_decode_int(field); + if (value >= 0 && (uint64_t)value == index) + return 0; + } else { + /* Skip key. */ + mp_next(field); + } + /* Skip value. */ + mp_next(field); + } + } + return -1; +} + +/** + * Propagate @a field to MessagePack(field)[key]. + * @param[in][out] field Field to propagate. + * @param key Key to propagate to. + * @param len Length of @a key. + * + * @retval 0 Success, the index was found. + * @retval -1 Not found. + */ +static inline int +tuple_field_go_to_key(const char **field, const char *key, int len) +{ + enum mp_type type = mp_typeof(**field); + if (type != MP_MAP) + return -1; + uint64_t count = mp_decode_map(field); + for (; count > 0; --count) { + type = mp_typeof(**field); + if (type == MP_STR) { + uint32_t value_len; + const char *value = mp_decode_str(field, &value_len); + if (value_len == (uint)len && + memcmp(value, key, len) == 0) + return 0; + } else { + /* Skip key. */ + mp_next(field); + } + /* Skip value. */ + mp_next(field); + } + return -1; +} + +const char * +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, + const uint32_t *field_map, + const char *path,uint32_t path_len, uint32_t path_hash) +{ + int rc = 0; + const char *field; + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, path, path_len); + rc = json_path_next(&parser, &node); + if (rc != 0 || node.type == JSON_PATH_END) + goto not_found; + if (node.type == JSON_PATH_NUM) { + int index = node.num; + if (index == 0) + goto not_found; + index -= TUPLE_INDEX_BASE; + field = tuple_field_raw(format, tuple, + field_map, index); + if (field == NULL) + goto not_found; + } else { + assert(node.type == JSON_PATH_STR); + /* First part of a path is a field name. */ + const char *name = node.str; + uint32_t name_len = node.len; + uint32_t name_hash; + if (path_len == name_len) { + name_hash = path_hash; + } else { + /* + * If a string is "field....", then its + * precalculated juajit hash can not be + * used. A tuple dictionary hashes only + * name, not path. + */ + name_hash = lua_hash(name, name_len); + } + field = tuple_field_raw_by_name(format, tuple, + field_map, + name, name_len, name_hash); + if (field == NULL) + goto not_found; + } + while ((rc = json_path_next(&parser, &node)) == 0 && + node.type != JSON_PATH_END) { + if (node.type == JSON_PATH_NUM) { + rc = tuple_field_go_to_index(&field, node.num); + } else { + assert(node.type == JSON_PATH_STR); + rc = tuple_field_go_to_key(&field, node.str, node.len); + } + if (rc != 0) { + rc = 0; /* prevent error raise */ + goto not_found; + } + } + if (rc == 0) + return field; + not_found: + /* try to use the whole string as identifier */ + field = tuple_field_raw_by_name(format, tuple, + field_map, + path, path_len, path_hash); + if (field) + return field; + if (rc || path_len == 0) + diag_set(IllegalParams, "Error in path on " + "position %d", rc); + return NULL; +} diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index d35182d..ce17085 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -377,6 +377,21 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, return tuple_field_raw(format, tuple, field_map, fieldno); } +/** + * Get tuple field by its path. + * @param Tuple. + * @param path Field path. + * @param path_len Length of @a path. + * @param path_hash Hash of @a path. + * + * @retval not NULL on field found. + * @retval NULL No field by @a path. + */ +const char * +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, + const uint32_t *field_map, + const char *path,uint32_t path_len, uint32_t path_hash); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ diff --git a/src/lib/json/path.c b/src/lib/json/path.c index 4a6174e..f3fcc46 100644 --- a/src/lib/json/path.c +++ b/src/lib/json/path.c @@ -31,8 +31,10 @@ #include "path.h" #include <ctype.h> +#include <unicode/uchar.h> #include "trivia/util.h" + /** Same as strtoull(), but with limited length. */ static inline uint64_t strntoull(const char *src, int len) { @@ -45,6 +47,42 @@ strntoull(const char *src, int len) { } /** + * Parse string and update parser's state. + * @param parser JSON path parser. Upates pos, signs_read. + * @param[out] UChar32 to store result. + * + * @retval 1 Success. + * @retval 0 End of string. + * @retval -1 Parse error. + */ +static inline int +json_read_sign(struct json_path_parser *parser, UChar32 *out) +{ + if (unlikely(parser->offset == parser->src_len)) + return 0; + UChar32 c; + U8_NEXT_OR_FFFD(parser->src, parser->offset, parser->src_len, c) + if (c == 0xFFFD) + return -1; + *out = c; + parser->invalid_sign_off += 1; + return 1; +} + +/** + * Reset parser state to previous one. + * @param parser JSON path parser. + * @param old parser read offset. + * @param signs to drop in signs_read counter. + */ +static inline void +json_reset_pos(struct json_path_parser *parser, int old_offset, int signs) +{ + parser->offset = old_offset; + parser->invalid_sign_off -= signs; +} + +/** * Parse string identifier in quotes. Parser either stops right * after the closing quote, or returns an error position. * @param parser JSON path parser. @@ -56,24 +94,26 @@ strntoull(const char *src, int len) { static inline int json_parse_string(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - const char *pos = parser->pos; - assert(pos < end); - char quote_type = *pos; - assert(quote_type == '\'' || quote_type == '"'); - /* Skip first quote. */ - int len = 0; - ++pos; - const char *str = pos; - for (char c = *pos; pos < end && quote_type != c; c = *++pos) - ++len; - /* A string must be terminated with quote. */ - if (*pos != quote_type || len == 0) - return pos - parser->src + 1; - /* Skip the closing quote. */ - parser->pos = pos + 1; + assert(parser->offset < parser->src_len); + UChar32 quote_type; + (void) json_read_sign(parser, "e_type); + assert(quote_type == (UChar32)'\'' || quote_type == (UChar32)'"'); + int str_offset = parser->offset; + UChar32 c = 0; + int rc = 0; + + while (((rc = json_read_sign(parser, &c)) > 0) + && c != quote_type); + int len = (int)(parser->offset - str_offset - 1); + if (rc < 0 || len == 0) + return -1; + if (c != (UChar32)quote_type) { + parser->invalid_sign_off++; + return -1; + } + node->type = JSON_PATH_STR; - node->str = str; + node->str = parser->src + str_offset; node->len = len; return 0; } @@ -81,7 +121,7 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) /** * Parse digit sequence into integer until non-digit is met. * Parser stops right after the last digit. - * @param parser JSON parser. + * @param parser JSON parser. Updates signs_read field. * @param[out] node JSON node to store result. * * @retval 0 Success. @@ -90,27 +130,40 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) static inline int json_parse_integer(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - const char *pos = parser->pos; - assert(pos < end); - const char *str = pos; - int len = 0; - for (char c = *pos; pos < end && isdigit(c); c = *++pos) - ++len; - if (len == 0) - return pos - parser->src + 1; - parser->pos = pos; + assert(parser->offset < parser->src_len); + int str_offset = parser->offset; + int last_offset = str_offset; + int len = 0, rc = 0; + UChar32 c = 0; + + while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { + last_offset = parser->offset; + len++; + } + if (rc > 0 && len > 0 && !u_isdigit(c)) + json_reset_pos(parser, last_offset, 1); + if (rc < 0 || len == 0) + return -1; + node->type = JSON_PATH_NUM; - node->num = strntoull(str, len); + node->num = strntoull(parser->src + str_offset, len); return 0; } +static inline bool +identifier_valid_sign(UChar32 c) +{ + return u_isUAlphabetic(c) + || c == (UChar32)'_' + || u_isdigit(c); +} + /** * Parse identifier out of quotes. It can contain only alphas, * digits and underscores. And can not contain digit at the first * position. Parser is stoped right after the last non-digit, * non-alpha and non-underscore symbol. - * @param parser JSON parser. + * @param parser JSON parser. Updates signs_read field. * @param[out] node JSON node to store result. * * @retval 0 Success. @@ -120,68 +173,78 @@ static inline int json_parse_identifier(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - const char *pos = parser->pos; - assert(pos < end); - const char *str = pos; - char c = *pos; + assert(parser->offset < parser->src_len); + int str_offset = parser->offset; + UChar32 c; + int rc = 0; + if (json_read_sign(parser, &c) < 0) + return -1; /* First symbol can not be digit. */ - if (!isalpha(c) && c != '_') - return pos - parser->src + 1; - int len = 1; - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); - c = *++pos) - ++len; - assert(len > 0); - parser->pos = pos; + if (!u_isalpha(c) && c != (UChar32)'_') + return -1; + + int last_offset = parser->offset; + while ((rc = json_read_sign(parser, &c)) > 0 && identifier_valid_sign(c)) + last_offset = parser->offset; + if (rc > 0 && !identifier_valid_sign(c)) + json_reset_pos(parser, last_offset, 1); + if (rc < 0) + return -1; + node->type = JSON_PATH_STR; - node->str = str; - node->len = len; + node->str = parser->src + str_offset; + node->len = parser->offset - str_offset; return 0; } +static inline char +json_curr_char(struct json_path_parser *parser) +{ + return *json_path_curr_substring(parser); +} + int json_path_next(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - if (end == parser->pos) { + int end_offset = parser->src_len; + if (end_offset == parser->offset) { node->type = JSON_PATH_END; return 0; } - char c = *parser->pos; + UChar32 c = 0; + int last_offset = parser->offset; + if (json_read_sign(parser, &c) < 0) + return parser->invalid_sign_off; int rc; switch(c) { - case '[': - ++parser->pos; + case (UChar32)'[': /* Error for []. */ - if (parser->pos == end) - return parser->pos - parser->src + 1; - c = *parser->pos; + if (parser->offset == end_offset) + return parser->invalid_sign_off; + c = json_curr_char(parser); if (c == '"' || c == '\'') rc = json_parse_string(parser, node); else rc = json_parse_integer(parser, node); if (rc != 0) - return rc; + return parser->invalid_sign_off; /* * Expression, started from [ must be finished * with ] regardless of its type. */ - if (parser->pos == end || *parser->pos != ']') - return parser->pos - parser->src + 1; + if (parser->offset == end_offset || json_curr_char(parser) != ']') + return parser->invalid_sign_off + 1; /* Skip ]. */ - ++parser->pos; + (void) json_read_sign(parser, &c); break; - case '.': - /* Skip dot. */ - ++parser->pos; - if (parser->pos == end) - return parser->pos - parser->src + 1; - FALLTHROUGH default: + if (c != (UChar32)'.') + json_reset_pos(parser, last_offset, 1); + else if (parser->offset == end_offset) + return parser->invalid_sign_off + 1; rc = json_parse_identifier(parser, node); if (rc != 0) - return rc; + return parser->invalid_sign_off; break; } return 0; diff --git a/src/lib/json/path.h b/src/lib/json/path.h index 6e8db4c..15292fb 100644 --- a/src/lib/json/path.h +++ b/src/lib/json/path.h @@ -45,10 +45,11 @@ extern "C" { struct json_path_parser { /** Source string. */ const char *src; - /** Length of src. */ + /** Length of string. */ int src_len; - /** Current parser's position. */ - const char *pos; + /** Current parser's offset. */ + int offset; + int invalid_sign_off; }; enum json_path_type { @@ -78,18 +79,30 @@ struct json_path_node { }; /** - * Create @a parser. + * Init @a parser. * @param[out] parser Parser to create. * @param src Source string. * @param src_len Length of @a src. */ static inline void json_path_parser_create(struct json_path_parser *parser, const char *src, - int src_len) + int src_len) { parser->src = src; parser->src_len = src_len; - parser->pos = src; + parser->offset = 0; + parser->invalid_sign_off = 0; +} + +/** + * Get current substring of parser. + * @param parser Parser. + * @retval ptr to substring + */ +static inline const char * +json_path_curr_substring(struct json_path_parser *parser) +{ + return parser->src + parser->offset; } /** diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 2d7367a..3a8e828 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -602,7 +602,10 @@ format[2] = {name = 'field2', type = 'array'} format[3] = {name = 'field3', type = 'map'} --- ... -format[4] = {name = 'field4', type = 'string'} +format[4] = {name = 'field4', type = 'string' } +--- +... +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = 'string'} --- ... s = box.schema.space.create('test', {format = format}) @@ -611,13 +614,13 @@ s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') --- ... -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, key="value1", value="key1"}} --- ... field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} --- ... -t = s:replace{1, field2, field3, "123456"} +t = s:replace{1, field2, field3, "123456", "yes, this"} --- ... t[1] @@ -626,7 +629,7 @@ t[1] ... t[2] --- -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] +- [1, 2, 3, '4', [5, 6, 7], {'привет中国world': {'中国': 'привет'}, 'key': 'value1', 'value': 'key1'}] ... t[3] --- @@ -667,19 +670,43 @@ t["[2][5][3]"] ... t["[2][6].key"] --- -- key1 +- value1 ... t["[2][6].value"] --- -- value1 +- key1 ... t["[2][6]['key']"] --- -- key1 +- value1 ... t["[2][6]['value']"] --- -- value1 +- key1 +... +t[2][6].привет中国world.中国 +--- +- привет +... +t["[2][6].привет中国world"].中国 +--- +- привет +... +t["[2][6].привет中国world.中国"] +--- +- привет +... +t["[2][6]['привет中国world']"]["中国"] +--- +- привет +... +t["[2][6]['привет中国world']['中国']"] +--- +- привет +... +t["[2][6]['привет中国world']['中国a']"] +--- +- yes, this ... t["[3].k3[2].c"] --- @@ -759,31 +786,31 @@ t["a.b.c d.e.f"] -- Sytax errors. t[""] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 0' +- Error in path on position 0 ... t["[2].[5]"] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 5' +- Error in path on position 5 ... t["[-1]"] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' +- Error in path on position 2 ... t[".."] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' +- Error in path on position 2 ... t["[["] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' +- Error in path on position 2 ... t["]]"] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 1' +- Error in path on position 1 ... t["{"] --- -- error: 'builtin/box/tuple.lua:314: Error in path on position 1' +- Error in path on position 1 ... s:drop() --- diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index ba3482d..90da8b2 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -204,12 +204,13 @@ format = {} format[1] = {name = 'field1', type = 'unsigned'} format[2] = {name = 'field2', type = 'array'} format[3] = {name = 'field3', type = 'map'} -format[4] = {name = 'field4', type = 'string'} +format[4] = {name = 'field4', type = 'string' } +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = 'string'} s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, key="value1", value="key1"}} field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} -t = s:replace{1, field2, field3, "123456"} +t = s:replace{1, field2, field3, "123456", "yes, this"} t[1] t[2] t[3] @@ -225,6 +226,12 @@ t["[2][6].key"] t["[2][6].value"] t["[2][6]['key']"] t["[2][6]['value']"] +t[2][6].привет中国world.中国 +t["[2][6].привет中国world"].中国 +t["[2][6].привет中国world.中国"] +t["[2][6]['привет中国world']"]["中国"] +t["[2][6]['привет中国world']['中国']"] +t["[2][6]['привет中国world']['中国a']"] t["[3].k3[2].c"] t["[4]"] t.field1 diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt index fe8b2d2..667194c 100644 --- a/test/unit/CMakeLists.txt +++ b/test/unit/CMakeLists.txt @@ -130,7 +130,7 @@ add_executable(csv.test csv.c) target_link_libraries(csv.test csv) add_executable(json_path.test json_path.c) -target_link_libraries(json_path.test json_path unit) +target_link_libraries(json_path.test json_path unit ${ICU_LIBRARIES}) add_executable(rmean.test rmean.cc) target_link_libraries(rmean.test stat unit) diff --git a/test/unit/json_path.c b/test/unit/json_path.c index 599658b..81ef7fc 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -9,7 +9,7 @@ json_path_parser_create(&parser, path, len); #define is_next_index(value_len, value) \ - path = parser.pos; \ + path = json_path_curr_substring(&parser); \ is(json_path_next(&parser, &node), 0, "parse <%." #value_len "s>", \ path); \ is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", path); \ -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 1/1] ICU Unicode support for JSON parser. 2018-04-05 14:09 ` [tarantool-patches] [PATCH v2 1/1] ICU Unicode support for JSON parser Kirill Shcherbatov @ 2018-04-05 18:00 ` Kirill Shcherbatov 2018-04-05 23:32 ` Vladislav Shpilevoy 0 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-04-05 18:00 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy From 8d97afd7ac7a90e5dfb1e756fd9d389bafdadd56 Mon Sep 17 00:00:00 2001 From: Kirill Shcherbatov <kshcherbatov@tarantool.org> Date: Thu, 5 Apr 2018 20:35:25 +0300 Subject: [PATCH] Accounted Vlad's comments --- src/box/CMakeLists.txt | 2 +- src/box/lua/tuple.c | 23 +++++++++++------------ src/box/lua/tuple.lua | 11 +++++------ src/box/tuple.h | 20 ++++++++++++++++++++ src/box/tuple_format.c | 13 ++++++------- src/box/tuple_format.h | 4 +++- src/lib/json/path.c | 30 ++++++++++++++---------------- src/lib/json/path.h | 1 + 8 files changed, 61 insertions(+), 43 deletions(-) >Четверг, 5 апреля 2018, 17:48 +03:00 от Vladislav Shpilevoy <v.shpilevoy@tarantool.org>: > >Hello. See 18 comments below. > >05.04.2018 17:06, Kirill Shcherbatov пишет: >> This is an automated email from the git hooks/post-receive script. >> >> kshcherbatov pushed a commit to branch gh-1285-tuple-field-by-json-icu >> in repository tarantool. >> >> commit ab52c7a18441037c19d0b2759558a7eb93aabb0d >> Author: Kirill Shcherbatov < kshcherbatov@tarantool.org > >> AuthorDate: Thu Apr 5 17:04:15 2018 +0300 >> >> ICU Unicode support for JSON parser. >> --- >> src/box/CMakeLists.txt | 2 +- >> src/box/lua/tuple.c | 172 ++++++---------------------------------- >> src/box/lua/tuple.lua | 7 +- >> src/box/tuple_format.c | 155 ++++++++++++++++++++++++++++++++++++ >> src/box/tuple_format.h | 15 ++++ >> src/lib/json/path.c | 191 ++++++++++++++++++++++++++++++--------------- >> src/lib/json/path.h | 25 ++++-- >> test/engine/tuple.result | 57 ++++++++++---- >> test/engine/tuple.test.lua | 13 ++- >> test/unit/CMakeLists.txt | 2 +- >> test/unit/json_path.c | 2 +- >> 11 files changed, 398 insertions(+), 243 deletions(-) >> >> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt >> index add0ff9..0f5b197 100644 >> --- a/src/box/CMakeLists.txt >> +++ b/src/box/CMakeLists.txt >> @@ -44,7 +44,7 @@ add_library(tuple STATIC >> field_def.c >> opt_def.c >> ) >> -target_link_libraries(tuple box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit) >> +target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) >1. Remove ${common_libraries} please, the tuple lib must not depend on >Lua. See how to do it in the next comments. diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt index e5a0c21..88c2c60 100644 --- a/src/box/CMakeLists.txt +++ b/src/box/CMakeLists.txt @@ -45,7 +45,7 @@ add_library(tuple STATIC field_def.c opt_def.c ) -target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) +target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit) add_library(xlog STATIC xlog.c) target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES}) >> @@ -497,81 +416,34 @@ tuple_field_go_to_key(const char **field, const char *key, int len) >> static int >> lbox_tuple_field_by_path(struct lua_State *L) >> { >> -const char *field; >> struct tuple *tuple = luaT_istuple(L, 1); >> /* Is checked in Lua wrapper. */ >> assert(tuple != NULL); >> -if (lua_isnumber(L, 2)) { >> +const char *field = NULL; >> +if (!lua_isnumber(L, 2)) { >> +/* string */ >> +field = tuple_field_raw_by_path(tuple_format(tuple), tuple_data(tuple), >> + tuple_field_map(tuple), >> + lua_tostring(L, 2), (int)lua_strlen(L, 2), >> + lua_hashstring(L, 2)); >2. Add a tuple_field_by_path wrapper for it, as I told you verbally. >Moreover >this code is out of 80 symbols width. >> +/* error code or message */ >> +struct error *e = diag_last_error(diag_get()); >3. Please, move all of this error checking into !lua_isnumber branch. >Tuple_field never returns >error, because sometimes absent field is ok. >> +if (field == NULL && e != NULL) >> +lua_pushstring(L, e->errmsg); >> +else >> +lua_pushinteger(L, -1*(field == NULL)); >> +if (field) >4. Use explicit != NULL, please. diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 3b89cdd..ab62663 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -422,28 +422,27 @@ lbox_tuple_field_by_path(struct lua_State *L) const char *field = NULL; if (!lua_isnumber(L, 2)) { /* string */ - field = tuple_field_raw_by_path(tuple_format(tuple), tuple_data(tuple), - tuple_field_map(tuple), - lua_tostring(L, 2), (int)lua_strlen(L, 2), - lua_hashstring(L, 2)); + field = tuple_field_by_path(tuple, lua_tostring(L, 2), + (int)lua_strlen(L, 2), + lua_hashstring(L, 2)); + struct error *e = diag_last_error(diag_get()); + if (field == NULL && e) { + lua_pushnil(L); + lua_pushstring(L, e->errmsg); + return 2; + } } else { int index = lua_tointeger(L, 2); index -= TUPLE_INDEX_BASE; if (index >= 0) field = tuple_field(tuple, index); } - /* error code or message */ - struct error *e = diag_last_error(diag_get()); - if (field == NULL && e != NULL) - lua_pushstring(L, e->errmsg); - else - lua_pushinteger(L, -1*(field == NULL)); - if (field) + if (field != NULL) luamp_decode(L, luaL_msgpack_default, &field); else lua_pushnil(L); diag_clear(diag_get()); - return 2; + return 1; } static int >> diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua >> index b51b4df..3785568 100644 >> --- a/src/box/lua/tuple.lua >> +++ b/src/box/lua/tuple.lua >> @@ -6,6 +6,7 @@ local msgpackffi = require('msgpackffi') >> local fun = require('fun') >> local buffer = require('buffer') >> local internal = require('box.internal') >> +local log = require('log') >5. Unsed module. diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua index 3785568..1c7e2a5 100644 --- a/src/box/lua/tuple.lua +++ b/src/box/lua/tuple.lua @@ -6,7 +6,6 @@ local msgpackffi = require('msgpackffi') local fun = require('fun') local buffer = require('buffer') local internal = require('box.internal') -local log = require('log') ffi.cdef[[ /** \cond public */ >> >> ffi.cdef[[ >> /** \cond public */ >> @@ -302,6 +303,7 @@ ffi.metatype(tuple_t, { >> end; >> __tostring = internal.tuple.tostring; >> __index = function(tuple, key) >> + local rc, field >> if type(key) == "string" or type(key) == "number" then >> -- Try to get a field by json path or by [index]. If >> -- it was not found (rc ~= 0) then return a method >> @@ -311,12 +313,13 @@ ffi.metatype(tuple_t, { >> -- returns field value, not tuple_bsize function. To >> -- access hidden methods use >> -- 'box.tuple.<method_name>(T, [args...])'. >> - local rc, field = tuple_field_by_path(tuple, key) >> + rc, field = tuple_field_by_path(tuple, key) >> if rc == 0 then >> return field >> end >> end >> - return methods[key] >> + local method = methods[key] >> + return method == box.NULL and rc ~= -1 and rc or method >6. You must not return -1. Error must be nil + error message/object. @@ -303,7 +302,7 @@ ffi.metatype(tuple_t, { end; __tostring = internal.tuple.tostring; __index = function(tuple, key) - local rc, field + local res, err if type(key) == "string" or type(key) == "number" then -- Try to get a field by json path or by [index]. If -- it was not found (rc ~= 0) then return a method @@ -313,13 +312,13 @@ ffi.metatype(tuple_t, { -- returns field value, not tuple_bsize function. To -- access hidden methods use -- 'box.tuple.<method_name>(T, [args...])'. - rc, field = tuple_field_by_path(tuple, key) - if rc == 0 then - return field + res, err = tuple_field_by_path(tuple, key) + if res ~= box.NULL then + return res end end local method = methods[key] - return method == box.NULL and rc ~= -1 and rc or method + return method == box.NULL and err or method end; __eq = function(tuple_a, tuple_b) -- Two tuple are considered equal if they have same memory address >> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c >> index 3e2c8bf..f2dffaf 100644 >> --- a/src/box/tuple_format.c >> +++ b/src/box/tuple_format.c >> @@ -28,6 +28,8 @@ >> * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >> * SUCH DAMAGE. >> */ >> +#include <lib/json/path.h> >> +#include <third_party/luajit/src/lua.h> >7. Do not include lua headers in files, that are not in lua/ dir. >Tarantool core must not >depend on lua. As I told you earlier, you must use field_name_hash >function, declared in >tuple_dictionary.h. It allows to remove lua dependency and remove >${common_libraries} >from cmake file. diff --git a/src/box/tuple.h b/src/box/tuple.h index 6ebedf5..f084c9d 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -514,6 +514,26 @@ tuple_field(const struct tuple *tuple, uint32_t fieldno) } /** + * Get tuple field by its path. + * @param tuple. + * @param path Field path. + * @param path_len Length of @a path. + * @param path_hash Hash of @a path. + * + * @retval not NULL on field found. + * @retval NULL No field by @a path. + */ +static inline const char * +tuple_field_by_path(struct tuple *tuple, const char *path, + uint32_t path_len, uint32_t path_hash) +{ + return tuple_field_raw_by_path(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), + path, path_len, path_hash); +} + +/** * Get tuple field by its name. * @param tuple Tuple to get field from. * @param name Field name. diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 9c6e3da..8ee7948 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -29,8 +29,8 @@ * SUCH DAMAGE. */ #include <lib/json/path.h> -#include <third_party/luajit/src/lua.h> #include "tuple_format.h" +#include "tuple_dictionary.h" /** Global table of tuple formats */ struct tuple_format **tuple_formats; >> +const char * >> +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, >> + const uint32_t *field_map, >> + const char *path,uint32_t path_len, uint32_t path_hash) >8. Path argument can be placed on the previous line. > >9. After ',' put a space please. >> +{ >> +int rc = 0; >> +const char *field; >> +struct json_path_parser parser; >> +struct json_path_node node; >> +json_path_parser_create(&parser, path, path_len); >> +rc = json_path_next(&parser, &node); >> +if (rc != 0 || node.type == JSON_PATH_END) >> +goto not_found; >> +if (node.type == JSON_PATH_NUM) { >> +int index = node.num; >> +if (index == 0) >> +goto not_found; >> +index -= TUPLE_INDEX_BASE; >> +field = tuple_field_raw(format, tuple, >> + field_map, index); >> +if (field == NULL) >> +goto not_found; >> +} else { >> +assert(node.type == JSON_PATH_STR); >> +/* First part of a path is a field name. */ >> +const char *name = node.str; >> +uint32_t name_len = node.len; >> +uint32_t name_hash; >> +if (path_len == name_len) { >> +name_hash = path_hash; >> +} else { >> +/* >> + * If a string is "field....", then its >> + * precalculated juajit hash can not be >> + * used. A tuple dictionary hashes only >> + * name, not path. >> + */ >> +name_hash = lua_hash(name, name_len); >10. Good comment! One remark - use field_name_hash instead of explicit >lua_hash. >> +} >> +field = tuple_field_raw_by_name(format, tuple, >> + field_map, >> + name, name_len, name_hash); >> +if (field == NULL) >> +goto not_found; >> +} >> +while ((rc = json_path_next(&parser, &node)) == 0 && >> + node.type != JSON_PATH_END) { >> +if (node.type == JSON_PATH_NUM) { >> +rc = tuple_field_go_to_index(&field, node.num); >> +} else { >> +assert(node.type == JSON_PATH_STR); >> +rc = tuple_field_go_to_key(&field, node.str, node.len); >> +} >> +if (rc != 0) { >> +rc = 0; /* prevent error raise */ >> +goto not_found; >> +} >> +} >> +if (rc == 0) >> +return field; >> +not_found: >11. Please, put labels to the line beginning. @@ -563,8 +563,8 @@ tuple_field_go_to_key(const char **field, const char *key, int len) const char * tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, - const uint32_t *field_map, - const char *path,uint32_t path_len, uint32_t path_hash) + const uint32_t *field_map, const char *path, + uint32_t path_len, uint32_t path_hash) { int rc = 0; const char *field; @@ -598,7 +598,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, * used. A tuple dictionary hashes only * name, not path. */ - name_hash = lua_hash(name, name_len); + name_hash = field_name_hash(name, name_len); } field = tuple_field_raw_by_name(format, tuple, field_map, @@ -621,7 +621,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, } if (rc == 0) return field; - not_found: +not_found: /* try to use the whole string as identifier */ field = tuple_field_raw_by_name(format, tuple, field_map, @@ -629,7 +629,6 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, if (field) return field; if (rc || path_len == 0) - diag_set(IllegalParams, "Error in path on " - "position %d", rc); + diag_set(IllegalParams, "Error in path on position %d", rc); return NULL; } >> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h >> index d33c77a..3e7e595 100644 >> --- a/src/box/tuple_format.h >> +++ b/src/box/tuple_format.h >> @@ -343,6 +343,21 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, >> return tuple_field_raw(format, tuple, field_map, fieldno); >> } >> >> +/** >> + * Get tuple field by its path. >> + * @param Tuple. >> + * @param path Field path. >> + * @param path_len Length of @a path. >> + * @param path_hash Hash of @a path. >> + * >> + * @retval not NULL on field found. >> + * @retval NULL No field by @a path. >> + */ >> +const char * >> +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, >> + const uint32_t *field_map, >> + const char *path,uint32_t path_len, uint32_t path_hash); >12. Please, add tuple_field_by_path as well to tuple.h, that calls this >function. It allows you >to call tuple_field_by_path in lua/tuple.c. diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index ce17085..ffa3d16 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -379,7 +379,9 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, /** * Get tuple field by its path. - * @param Tuple. + * @param format Tuple format. + * @param tuple MessagePack tuple's body. + * @param field_map Tuple field map. * @param path Field path. * @param path_len Length of @a path. * @param path_hash Hash of @a path. >> + >> #if defined(__cplusplus) >> } /* extern "C" */ >> #endif /* defined(__cplusplus) */ >> diff --git a/src/lib/json/path.c b/src/lib/json/path.c >> index 4a6174e..f3fcc46 100644 >> --- a/src/lib/json/path.c >> +++ b/src/lib/json/path.c >> @@ -31,8 +31,10 @@ >> >> #include "path.h" >> #include <ctype.h> >> +#include <unicode/uchar.h> >> #include "trivia/util.h" >> >> + >13. Garbage diff. >> @@ -90,27 +130,40 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) >> static inline int >> json_parse_integer(struct json_path_parser *parser, struct json_path_node *node) >> { >> -const char *end = parser->src + parser->src_len; >> -const char *pos = parser->pos; >> -assert(pos < end); >> -const char *str = pos; >> -int len = 0; >> -for (char c = *pos; pos < end && isdigit(c); c = *++pos) >> -++len; >> -if (len == 0) >> -return pos - parser->src + 1; >> -parser->pos = pos; >> +assert(parser->offset < parser->src_len); >> +int str_offset = parser->offset; >> +int last_offset = str_offset; >> +int len = 0, rc = 0; >> +UChar32 c = 0; >> + >> +while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { >14. I remember, that you told me, that a digit can be read using simple >char. Can you please implement >this cycle with no UChar? diff --git a/src/lib/json/path.c b/src/lib/json/path.c index f3fcc46..e253ec4 100644 --- a/src/lib/json/path.c +++ b/src/lib/json/path.c @@ -104,7 +104,7 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) while (((rc = json_read_sign(parser, &c)) > 0) && c != quote_type); - int len = (int)(parser->offset - str_offset - 1); + int len = parser->offset - str_offset - 1; if (rc < 0 || len == 0) return -1; if (c != (UChar32)quote_type) { @@ -130,23 +130,21 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) static inline int json_parse_integer(struct json_path_parser *parser, struct json_path_node *node) { - assert(parser->offset < parser->src_len); - int str_offset = parser->offset; - int last_offset = str_offset; - int len = 0, rc = 0; - UChar32 c = 0; - - while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { - last_offset = parser->offset; - len++; - } - if (rc > 0 && len > 0 && !u_isdigit(c)) - json_reset_pos(parser, last_offset, 1); - if (rc < 0 || len == 0) + const char *end = parser->src + parser->src_len; + const char *pos = parser->src + parser->offset; + assert(pos < end); + const char *str = pos; + int len = 0; + for (char c = *pos; pos < end && isdigit(c); c = *++pos) + ++len; + if (len == 0) { + parser->invalid_sign_off++; return -1; - + } + parser->offset += len; + parser->invalid_sign_off += len; node->type = JSON_PATH_NUM; - node->num = strntoull(parser->src + str_offset, len); + node->num = strntoull(str, len); return 0; } >> >> +static inline char >> +json_curr_char(struct json_path_parser *parser) >> +{ >> +return *json_path_curr_substring(parser); >> +} >15. I do not like that on each char you must do string + offset. Please, >find a way to >store this position. Why did you delete parser->pos? I have no good ideas. This code is also not hot: just on each new lexem, believe nothing critical. >> diff --git a/src/lib/json/path.h b/src/lib/json/path.h >> index 6e8db4c..15292fb 100644 >> --- a/src/lib/json/path.h >> +++ b/src/lib/json/path.h >> @@ -45,10 +45,11 @@ extern "C" { >> struct json_path_parser { >> /** Source string. */ >> const char *src; >> -/** Length of src. */ >> +/** Length of string. */ >> int src_len; >> -/** Current parser's position. */ >> -const char *pos; >> +/** Current parser's offset. */ >> +int offset; >> +int invalid_sign_off; >16. Please, add a comment. diff --git a/src/lib/json/path.h b/src/lib/json/path.h index 15292fb..09f2e6f 100644 --- a/src/lib/json/path.h +++ b/src/lib/json/path.h @@ -49,6 +49,7 @@ struct json_path_parser { int src_len; /** Current parser's offset. */ int offset; + /** Successfully parsed signs count. */ int invalid_sign_off; }; -- 2.7.4 On 05.04.2018 17:09, Kirill Shcherbatov wrote: > --- > src/box/CMakeLists.txt | 2 +- > src/box/lua/tuple.c | 172 ++++++---------------------------------- > src/box/lua/tuple.lua | 7 +- > src/box/tuple_format.c | 155 ++++++++++++++++++++++++++++++++++++ > src/box/tuple_format.h | 15 ++++ > src/lib/json/path.c | 191 ++++++++++++++++++++++++++++++--------------- > src/lib/json/path.h | 25 ++++-- > test/engine/tuple.result | 57 ++++++++++---- > test/engine/tuple.test.lua | 13 ++- > test/unit/CMakeLists.txt | 2 +- > test/unit/json_path.c | 2 +- > 11 files changed, 398 insertions(+), 243 deletions(-) > > diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt > index f053ae4..e5a0c21 100644 > --- a/src/box/CMakeLists.txt > +++ b/src/box/CMakeLists.txt > @@ -45,7 +45,7 @@ add_library(tuple STATIC > field_def.c > opt_def.c > ) > -target_link_libraries(tuple box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit) > +target_link_libraries(tuple json_path box_error core ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) > > add_library(xlog STATIC xlog.c) > target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES}) > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 99b9ff2..3b89cdd 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -403,87 +403,6 @@ lbox_tuple_transform(struct lua_State *L) > } > > /** > - * Propagate @a field to MessagePack(field)[index]. > - * @param[in][out] field Field to propagate. > - * @param index 1-based index to propagate to. > - * > - * @retval 0 Success, the index was found. > - * @retval -1 Not found. > - */ > -static inline int > -tuple_field_go_to_index(const char **field, uint64_t index) > -{ > - assert(index >= 0); > - enum mp_type type = mp_typeof(**field); > - if (type == MP_ARRAY) { > - if (index == 0) > - return -1; > - /* Make index 0-based. */ > - index -= TUPLE_INDEX_BASE; > - uint32_t count = mp_decode_array(field); > - if (index >= count) > - return -1; > - for (; index > 0; --index) > - mp_next(field); > - return 0; > - } else if (type == MP_MAP) { > - uint64_t count = mp_decode_map(field); > - for (; count > 0; --count) { > - type = mp_typeof(**field); > - if (type == MP_UINT) { > - uint64_t value = mp_decode_uint(field); > - if (value == index) > - return 0; > - } else if (type == MP_INT) { > - int64_t value = mp_decode_int(field); > - if (value >= 0 && (uint64_t)value == index) > - return 0; > - } else { > - /* Skip key. */ > - mp_next(field); > - } > - /* Skip value. */ > - mp_next(field); > - } > - } > - return -1; > -} > - > -/** > - * Propagate @a field to MessagePack(field)[key]. > - * @param[in][out] field Field to propagate. > - * @param key Key to propagate to. > - * @param len Length of @a key. > - * > - * @retval 0 Success, the index was found. > - * @retval -1 Not found. > - */ > -static inline int > -tuple_field_go_to_key(const char **field, const char *key, int len) > -{ > - enum mp_type type = mp_typeof(**field); > - if (type != MP_MAP) > - return -1; > - uint64_t count = mp_decode_map(field); > - for (; count > 0; --count) { > - type = mp_typeof(**field); > - if (type == MP_STR) { > - uint32_t value_len; > - const char *value = mp_decode_str(field, &value_len); > - if (value_len == (uint)len && > - memcmp(value, key, len) == 0) > - return 0; > - } else { > - /* Skip key. */ > - mp_next(field); > - } > - /* Skip value. */ > - mp_next(field); > - } > - return -1; > -} > - > -/** > * Find a tuple field by JSON path. > * @param L Lua state. > * @param tuple 1-th argument on a lua stack, tuple to get field > @@ -497,81 +416,34 @@ tuple_field_go_to_key(const char **field, const char *key, int len) > static int > lbox_tuple_field_by_path(struct lua_State *L) > { > - const char *field; > struct tuple *tuple = luaT_istuple(L, 1); > /* Is checked in Lua wrapper. */ > assert(tuple != NULL); > - if (lua_isnumber(L, 2)) { > + const char *field = NULL; > + if (!lua_isnumber(L, 2)) { > + /* string */ > + field = tuple_field_raw_by_path(tuple_format(tuple), tuple_data(tuple), > + tuple_field_map(tuple), > + lua_tostring(L, 2), (int)lua_strlen(L, 2), > + lua_hashstring(L, 2)); > + } else { > int index = lua_tointeger(L, 2); > index -= TUPLE_INDEX_BASE; > - if (index < 0) { > -not_found: > - lua_pushinteger(L, -1); > - lua_pushnil(L); > - return 2; > - } > - field = tuple_field(tuple, index); > - if (field == NULL) > - goto not_found; > -push_value: > - lua_pushinteger(L, 0); > - luamp_decode(L, luaL_msgpack_default, &field); > - return 2; > + if (index >= 0) > + field = tuple_field(tuple, index); > } > - assert(lua_isstring(L, 2)); > - size_t path_len; > - const char *path = lua_tolstring(L, 2, &path_len); > - struct json_path_parser parser; > - struct json_path_node node; > - json_path_parser_create(&parser, path, path_len); > - int rc = json_path_next(&parser, &node); > - if (rc != 0 || node.type == JSON_PATH_END) > - luaL_error(L, "Error in path on position %d", rc); > - if (node.type == JSON_PATH_NUM) { > - int index = node.num; > - if (index == 0) > - goto not_found; > - index -= TUPLE_INDEX_BASE; > - field = tuple_field(tuple, index); > - if (field == NULL) > - goto not_found; > - } else { > - assert(node.type == JSON_PATH_STR); > - /* First part of a path is a field name. */ > - const char *name = node.str; > - uint32_t name_len = node.len; > - uint32_t name_hash; > - if (path_len == name_len) { > - name_hash = lua_hashstring(L, 2); > - } else { > - /* > - * If a string is "field....", then its > - * precalculated juajit hash can not be > - * used. A tuple dictionary hashes only > - * name, not path. > - */ > - name_hash = lua_hash(name, name_len); > - } > - field = tuple_field_by_name(tuple, name, name_len, name_hash); > - if (field == NULL) > - goto not_found; > - } > - while ((rc = json_path_next(&parser, &node)) == 0 && > - node.type != JSON_PATH_END) { > - if (node.type == JSON_PATH_NUM) { > - rc = tuple_field_go_to_index(&field, node.num); > - } else { > - assert(node.type == JSON_PATH_STR); > - rc = tuple_field_go_to_key(&field, node.str, node.len); > - } > - if (rc != 0) > - goto not_found; > - } > - if (rc == 0) > - goto push_value; > - luaL_error(L, "Error in path on position %d", rc); > - unreachable(); > - goto not_found; > + /* error code or message */ > + struct error *e = diag_last_error(diag_get()); > + if (field == NULL && e != NULL) > + lua_pushstring(L, e->errmsg); > + else > + lua_pushinteger(L, -1*(field == NULL)); > + if (field) > + luamp_decode(L, luaL_msgpack_default, &field); > + else > + lua_pushnil(L); > + diag_clear(diag_get()); > + return 2; > } > > static int > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > index b51b4df..3785568 100644 > --- a/src/box/lua/tuple.lua > +++ b/src/box/lua/tuple.lua > @@ -6,6 +6,7 @@ local msgpackffi = require('msgpackffi') > local fun = require('fun') > local buffer = require('buffer') > local internal = require('box.internal') > +local log = require('log') > > ffi.cdef[[ > /** \cond public */ > @@ -302,6 +303,7 @@ ffi.metatype(tuple_t, { > end; > __tostring = internal.tuple.tostring; > __index = function(tuple, key) > + local rc, field > if type(key) == "string" or type(key) == "number" then > -- Try to get a field by json path or by [index]. If > -- it was not found (rc ~= 0) then return a method > @@ -311,12 +313,13 @@ ffi.metatype(tuple_t, { > -- returns field value, not tuple_bsize function. To > -- access hidden methods use > -- 'box.tuple.<method_name>(T, [args...])'. > - local rc, field = tuple_field_by_path(tuple, key) > + rc, field = tuple_field_by_path(tuple, key) > if rc == 0 then > return field > end > end > - return methods[key] > + local method = methods[key] > + return method == box.NULL and rc ~= -1 and rc or method > end; > __eq = function(tuple_a, tuple_b) > -- Two tuple are considered equal if they have same memory address > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index e458f49..9c6e3da 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -28,6 +28,8 @@ > * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > * SUCH DAMAGE. > */ > +#include <lib/json/path.h> > +#include <third_party/luajit/src/lua.h> > #include "tuple_format.h" > > /** Global table of tuple formats */ > @@ -478,3 +480,156 @@ box_tuple_format_unref(box_tuple_format_t *format) > { > tuple_format_unref(format); > } > + > +/** > + * Propagate @a field to MessagePack(field)[index]. > + * @param[in][out] field Field to propagate. > + * @param index 1-based index to propagate to. > + * > + * @retval 0 Success, the index was found. > + * @retval -1 Not found. > + */ > +static inline int > +tuple_field_go_to_index(const char **field, uint64_t index) > +{ > + enum mp_type type = mp_typeof(**field); > + if (type == MP_ARRAY) { > + if (index == 0) > + return -1; > + /* Make index 0-based. */ > + index -= TUPLE_INDEX_BASE; > + uint32_t count = mp_decode_array(field); > + if (index >= count) > + return -1; > + for (; index > 0; --index) > + mp_next(field); > + return 0; > + } else if (type == MP_MAP) { > + uint64_t count = mp_decode_map(field); > + for (; count > 0; --count) { > + type = mp_typeof(**field); > + if (type == MP_UINT) { > + uint64_t value = mp_decode_uint(field); > + if (value == index) > + return 0; > + } else if (type == MP_INT) { > + int64_t value = mp_decode_int(field); > + if (value >= 0 && (uint64_t)value == index) > + return 0; > + } else { > + /* Skip key. */ > + mp_next(field); > + } > + /* Skip value. */ > + mp_next(field); > + } > + } > + return -1; > +} > + > +/** > + * Propagate @a field to MessagePack(field)[key]. > + * @param[in][out] field Field to propagate. > + * @param key Key to propagate to. > + * @param len Length of @a key. > + * > + * @retval 0 Success, the index was found. > + * @retval -1 Not found. > + */ > +static inline int > +tuple_field_go_to_key(const char **field, const char *key, int len) > +{ > + enum mp_type type = mp_typeof(**field); > + if (type != MP_MAP) > + return -1; > + uint64_t count = mp_decode_map(field); > + for (; count > 0; --count) { > + type = mp_typeof(**field); > + if (type == MP_STR) { > + uint32_t value_len; > + const char *value = mp_decode_str(field, &value_len); > + if (value_len == (uint)len && > + memcmp(value, key, len) == 0) > + return 0; > + } else { > + /* Skip key. */ > + mp_next(field); > + } > + /* Skip value. */ > + mp_next(field); > + } > + return -1; > +} > + > +const char * > +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, > + const uint32_t *field_map, > + const char *path,uint32_t path_len, uint32_t path_hash) > +{ > + int rc = 0; > + const char *field; > + struct json_path_parser parser; > + struct json_path_node node; > + json_path_parser_create(&parser, path, path_len); > + rc = json_path_next(&parser, &node); > + if (rc != 0 || node.type == JSON_PATH_END) > + goto not_found; > + if (node.type == JSON_PATH_NUM) { > + int index = node.num; > + if (index == 0) > + goto not_found; > + index -= TUPLE_INDEX_BASE; > + field = tuple_field_raw(format, tuple, > + field_map, index); > + if (field == NULL) > + goto not_found; > + } else { > + assert(node.type == JSON_PATH_STR); > + /* First part of a path is a field name. */ > + const char *name = node.str; > + uint32_t name_len = node.len; > + uint32_t name_hash; > + if (path_len == name_len) { > + name_hash = path_hash; > + } else { > + /* > + * If a string is "field....", then its > + * precalculated juajit hash can not be > + * used. A tuple dictionary hashes only > + * name, not path. > + */ > + name_hash = lua_hash(name, name_len); > + } > + field = tuple_field_raw_by_name(format, tuple, > + field_map, > + name, name_len, name_hash); > + if (field == NULL) > + goto not_found; > + } > + while ((rc = json_path_next(&parser, &node)) == 0 && > + node.type != JSON_PATH_END) { > + if (node.type == JSON_PATH_NUM) { > + rc = tuple_field_go_to_index(&field, node.num); > + } else { > + assert(node.type == JSON_PATH_STR); > + rc = tuple_field_go_to_key(&field, node.str, node.len); > + } > + if (rc != 0) { > + rc = 0; /* prevent error raise */ > + goto not_found; > + } > + } > + if (rc == 0) > + return field; > + not_found: > + /* try to use the whole string as identifier */ > + field = tuple_field_raw_by_name(format, tuple, > + field_map, > + path, path_len, path_hash); > + if (field) > + return field; > + if (rc || path_len == 0) > + diag_set(IllegalParams, "Error in path on " > + "position %d", rc); > + return NULL; > +} > diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > index d35182d..ce17085 100644 > --- a/src/box/tuple_format.h > +++ b/src/box/tuple_format.h > @@ -377,6 +377,21 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, > return tuple_field_raw(format, tuple, field_map, fieldno); > } > > +/** > + * Get tuple field by its path. > + * @param Tuple. > + * @param path Field path. > + * @param path_len Length of @a path. > + * @param path_hash Hash of @a path. > + * > + * @retval not NULL on field found. > + * @retval NULL No field by @a path. > + */ > +const char * > +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, > + const uint32_t *field_map, > + const char *path,uint32_t path_len, uint32_t path_hash); > + > #if defined(__cplusplus) > } /* extern "C" */ > #endif /* defined(__cplusplus) */ > diff --git a/src/lib/json/path.c b/src/lib/json/path.c > index 4a6174e..f3fcc46 100644 > --- a/src/lib/json/path.c > +++ b/src/lib/json/path.c > @@ -31,8 +31,10 @@ > > #include "path.h" > #include <ctype.h> > +#include <unicode/uchar.h> > #include "trivia/util.h" > > + > /** Same as strtoull(), but with limited length. */ > static inline uint64_t > strntoull(const char *src, int len) { > @@ -45,6 +47,42 @@ strntoull(const char *src, int len) { > } > > /** > + * Parse string and update parser's state. > + * @param parser JSON path parser. Upates pos, signs_read. > + * @param[out] UChar32 to store result. > + * > + * @retval 1 Success. > + * @retval 0 End of string. > + * @retval -1 Parse error. > + */ > +static inline int > +json_read_sign(struct json_path_parser *parser, UChar32 *out) > +{ > + if (unlikely(parser->offset == parser->src_len)) > + return 0; > + UChar32 c; > + U8_NEXT_OR_FFFD(parser->src, parser->offset, parser->src_len, c) > + if (c == 0xFFFD) > + return -1; > + *out = c; > + parser->invalid_sign_off += 1; > + return 1; > +} > + > +/** > + * Reset parser state to previous one. > + * @param parser JSON path parser. > + * @param old parser read offset. > + * @param signs to drop in signs_read counter. > + */ > +static inline void > +json_reset_pos(struct json_path_parser *parser, int old_offset, int signs) > +{ > + parser->offset = old_offset; > + parser->invalid_sign_off -= signs; > +} > + > +/** > * Parse string identifier in quotes. Parser either stops right > * after the closing quote, or returns an error position. > * @param parser JSON path parser. > @@ -56,24 +94,26 @@ strntoull(const char *src, int len) { > static inline int > json_parse_string(struct json_path_parser *parser, struct json_path_node *node) > { > - const char *end = parser->src + parser->src_len; > - const char *pos = parser->pos; > - assert(pos < end); > - char quote_type = *pos; > - assert(quote_type == '\'' || quote_type == '"'); > - /* Skip first quote. */ > - int len = 0; > - ++pos; > - const char *str = pos; > - for (char c = *pos; pos < end && quote_type != c; c = *++pos) > - ++len; > - /* A string must be terminated with quote. */ > - if (*pos != quote_type || len == 0) > - return pos - parser->src + 1; > - /* Skip the closing quote. */ > - parser->pos = pos + 1; > + assert(parser->offset < parser->src_len); > + UChar32 quote_type; > + (void) json_read_sign(parser, "e_type); > + assert(quote_type == (UChar32)'\'' || quote_type == (UChar32)'"'); > + int str_offset = parser->offset; > + UChar32 c = 0; > + int rc = 0; > + > + while (((rc = json_read_sign(parser, &c)) > 0) > + && c != quote_type); > + int len = (int)(parser->offset - str_offset - 1); > + if (rc < 0 || len == 0) > + return -1; > + if (c != (UChar32)quote_type) { > + parser->invalid_sign_off++; > + return -1; > + } > + > node->type = JSON_PATH_STR; > - node->str = str; > + node->str = parser->src + str_offset; > node->len = len; > return 0; > } > @@ -81,7 +121,7 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) > /** > * Parse digit sequence into integer until non-digit is met. > * Parser stops right after the last digit. > - * @param parser JSON parser. > + * @param parser JSON parser. Updates signs_read field. > * @param[out] node JSON node to store result. > * > * @retval 0 Success. > @@ -90,27 +130,40 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) > static inline int > json_parse_integer(struct json_path_parser *parser, struct json_path_node *node) > { > - const char *end = parser->src + parser->src_len; > - const char *pos = parser->pos; > - assert(pos < end); > - const char *str = pos; > - int len = 0; > - for (char c = *pos; pos < end && isdigit(c); c = *++pos) > - ++len; > - if (len == 0) > - return pos - parser->src + 1; > - parser->pos = pos; > + assert(parser->offset < parser->src_len); > + int str_offset = parser->offset; > + int last_offset = str_offset; > + int len = 0, rc = 0; > + UChar32 c = 0; > + > + while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { > + last_offset = parser->offset; > + len++; > + } > + if (rc > 0 && len > 0 && !u_isdigit(c)) > + json_reset_pos(parser, last_offset, 1); > + if (rc < 0 || len == 0) > + return -1; > + > node->type = JSON_PATH_NUM; > - node->num = strntoull(str, len); > + node->num = strntoull(parser->src + str_offset, len); > return 0; > } > > +static inline bool > +identifier_valid_sign(UChar32 c) > +{ > + return u_isUAlphabetic(c) > + || c == (UChar32)'_' > + || u_isdigit(c); > +} > + > /** > * Parse identifier out of quotes. It can contain only alphas, > * digits and underscores. And can not contain digit at the first > * position. Parser is stoped right after the last non-digit, > * non-alpha and non-underscore symbol. > - * @param parser JSON parser. > + * @param parser JSON parser. Updates signs_read field. > * @param[out] node JSON node to store result. > * > * @retval 0 Success. > @@ -120,68 +173,78 @@ static inline int > json_parse_identifier(struct json_path_parser *parser, > struct json_path_node *node) > { > - const char *end = parser->src + parser->src_len; > - const char *pos = parser->pos; > - assert(pos < end); > - const char *str = pos; > - char c = *pos; > + assert(parser->offset < parser->src_len); > + int str_offset = parser->offset; > + UChar32 c; > + int rc = 0; > + if (json_read_sign(parser, &c) < 0) > + return -1; > /* First symbol can not be digit. */ > - if (!isalpha(c) && c != '_') > - return pos - parser->src + 1; > - int len = 1; > - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); > - c = *++pos) > - ++len; > - assert(len > 0); > - parser->pos = pos; > + if (!u_isalpha(c) && c != (UChar32)'_') > + return -1; > + > + int last_offset = parser->offset; > + while ((rc = json_read_sign(parser, &c)) > 0 && identifier_valid_sign(c)) > + last_offset = parser->offset; > + if (rc > 0 && !identifier_valid_sign(c)) > + json_reset_pos(parser, last_offset, 1); > + if (rc < 0) > + return -1; > + > node->type = JSON_PATH_STR; > - node->str = str; > - node->len = len; > + node->str = parser->src + str_offset; > + node->len = parser->offset - str_offset; > return 0; > } > > +static inline char > +json_curr_char(struct json_path_parser *parser) > +{ > + return *json_path_curr_substring(parser); > +} > + > int > json_path_next(struct json_path_parser *parser, struct json_path_node *node) > { > - const char *end = parser->src + parser->src_len; > - if (end == parser->pos) { > + int end_offset = parser->src_len; > + if (end_offset == parser->offset) { > node->type = JSON_PATH_END; > return 0; > } > - char c = *parser->pos; > + UChar32 c = 0; > + int last_offset = parser->offset; > + if (json_read_sign(parser, &c) < 0) > + return parser->invalid_sign_off; > int rc; > switch(c) { > - case '[': > - ++parser->pos; > + case (UChar32)'[': > /* Error for []. */ > - if (parser->pos == end) > - return parser->pos - parser->src + 1; > - c = *parser->pos; > + if (parser->offset == end_offset) > + return parser->invalid_sign_off; > + c = json_curr_char(parser); > if (c == '"' || c == '\'') > rc = json_parse_string(parser, node); > else > rc = json_parse_integer(parser, node); > if (rc != 0) > - return rc; > + return parser->invalid_sign_off; > /* > * Expression, started from [ must be finished > * with ] regardless of its type. > */ > - if (parser->pos == end || *parser->pos != ']') > - return parser->pos - parser->src + 1; > + if (parser->offset == end_offset || json_curr_char(parser) != ']') > + return parser->invalid_sign_off + 1; > /* Skip ]. */ > - ++parser->pos; > + (void) json_read_sign(parser, &c); > break; > - case '.': > - /* Skip dot. */ > - ++parser->pos; > - if (parser->pos == end) > - return parser->pos - parser->src + 1; > - FALLTHROUGH > default: > + if (c != (UChar32)'.') > + json_reset_pos(parser, last_offset, 1); > + else if (parser->offset == end_offset) > + return parser->invalid_sign_off + 1; > rc = json_parse_identifier(parser, node); > if (rc != 0) > - return rc; > + return parser->invalid_sign_off; > break; > } > return 0; > diff --git a/src/lib/json/path.h b/src/lib/json/path.h > index 6e8db4c..15292fb 100644 > --- a/src/lib/json/path.h > +++ b/src/lib/json/path.h > @@ -45,10 +45,11 @@ extern "C" { > struct json_path_parser { > /** Source string. */ > const char *src; > - /** Length of src. */ > + /** Length of string. */ > int src_len; > - /** Current parser's position. */ > - const char *pos; > + /** Current parser's offset. */ > + int offset; > + int invalid_sign_off; > }; > > enum json_path_type { > @@ -78,18 +79,30 @@ struct json_path_node { > }; > > /** > - * Create @a parser. > + * Init @a parser. > * @param[out] parser Parser to create. > * @param src Source string. > * @param src_len Length of @a src. > */ > static inline void > json_path_parser_create(struct json_path_parser *parser, const char *src, > - int src_len) > + int src_len) > { > parser->src = src; > parser->src_len = src_len; > - parser->pos = src; > + parser->offset = 0; > + parser->invalid_sign_off = 0; > +} > + > +/** > + * Get current substring of parser. > + * @param parser Parser. > + * @retval ptr to substring > + */ > +static inline const char * > +json_path_curr_substring(struct json_path_parser *parser) > +{ > + return parser->src + parser->offset; > } > > /** > diff --git a/test/engine/tuple.result b/test/engine/tuple.result > index 2d7367a..3a8e828 100644 > --- a/test/engine/tuple.result > +++ b/test/engine/tuple.result > @@ -602,7 +602,10 @@ format[2] = {name = 'field2', type = 'array'} > format[3] = {name = 'field3', type = 'map'} > --- > ... > -format[4] = {name = 'field4', type = 'string'} > +format[4] = {name = 'field4', type = 'string' } > +--- > +... > +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = 'string'} > --- > ... > s = box.schema.space.create('test', {format = format}) > @@ -611,13 +614,13 @@ s = box.schema.space.create('test', {format = format}) > pk = s:create_index('pk') > --- > ... > -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, key="value1", value="key1"}} > --- > ... > field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} > --- > ... > -t = s:replace{1, field2, field3, "123456"} > +t = s:replace{1, field2, field3, "123456", "yes, this"} > --- > ... > t[1] > @@ -626,7 +629,7 @@ t[1] > ... > t[2] > --- > -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] > +- [1, 2, 3, '4', [5, 6, 7], {'привет中国world': {'中国': 'привет'}, 'key': 'value1', 'value': 'key1'}] > ... > t[3] > --- > @@ -667,19 +670,43 @@ t["[2][5][3]"] > ... > t["[2][6].key"] > --- > -- key1 > +- value1 > ... > t["[2][6].value"] > --- > -- value1 > +- key1 > ... > t["[2][6]['key']"] > --- > -- key1 > +- value1 > ... > t["[2][6]['value']"] > --- > -- value1 > +- key1 > +... > +t[2][6].привет中国world.中国 > +--- > +- привет > +... > +t["[2][6].привет中国world"].中国 > +--- > +- привет > +... > +t["[2][6].привет中国world.中国"] > +--- > +- привет > +... > +t["[2][6]['привет中国world']"]["中国"] > +--- > +- привет > +... > +t["[2][6]['привет中国world']['中国']"] > +--- > +- привет > +... > +t["[2][6]['привет中国world']['中国a']"] > +--- > +- yes, this > ... > t["[3].k3[2].c"] > --- > @@ -759,31 +786,31 @@ t["a.b.c d.e.f"] > -- Sytax errors. > t[""] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 0' > +- Error in path on position 0 > ... > t["[2].[5]"] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 5' > +- Error in path on position 5 > ... > t["[-1]"] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' > +- Error in path on position 2 > ... > t[".."] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' > +- Error in path on position 2 > ... > t["[["] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' > +- Error in path on position 2 > ... > t["]]"] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 1' > +- Error in path on position 1 > ... > t["{"] > --- > -- error: 'builtin/box/tuple.lua:314: Error in path on position 1' > +- Error in path on position 1 > ... > s:drop() > --- > diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua > index ba3482d..90da8b2 100644 > --- a/test/engine/tuple.test.lua > +++ b/test/engine/tuple.test.lua > @@ -204,12 +204,13 @@ format = {} > format[1] = {name = 'field1', type = 'unsigned'} > format[2] = {name = 'field2', type = 'array'} > format[3] = {name = 'field3', type = 'map'} > -format[4] = {name = 'field4', type = 'string'} > +format[4] = {name = 'field4', type = 'string' } > +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = 'string'} > s = box.schema.space.create('test', {format = format}) > pk = s:create_index('pk') > -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, key="value1", value="key1"}} > field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} > -t = s:replace{1, field2, field3, "123456"} > +t = s:replace{1, field2, field3, "123456", "yes, this"} > t[1] > t[2] > t[3] > @@ -225,6 +226,12 @@ t["[2][6].key"] > t["[2][6].value"] > t["[2][6]['key']"] > t["[2][6]['value']"] > +t[2][6].привет中国world.中国 > +t["[2][6].привет中国world"].中国 > +t["[2][6].привет中国world.中国"] > +t["[2][6]['привет中国world']"]["中国"] > +t["[2][6]['привет中国world']['中国']"] > +t["[2][6]['привет中国world']['中国a']"] > t["[3].k3[2].c"] > t["[4]"] > t.field1 > diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt > index fe8b2d2..667194c 100644 > --- a/test/unit/CMakeLists.txt > +++ b/test/unit/CMakeLists.txt > @@ -130,7 +130,7 @@ add_executable(csv.test csv.c) > target_link_libraries(csv.test csv) > > add_executable(json_path.test json_path.c) > -target_link_libraries(json_path.test json_path unit) > +target_link_libraries(json_path.test json_path unit ${ICU_LIBRARIES}) > > add_executable(rmean.test rmean.cc) > target_link_libraries(rmean.test stat unit) > diff --git a/test/unit/json_path.c b/test/unit/json_path.c > index 599658b..81ef7fc 100644 > --- a/test/unit/json_path.c > +++ b/test/unit/json_path.c > @@ -9,7 +9,7 @@ > json_path_parser_create(&parser, path, len); > > #define is_next_index(value_len, value) \ > - path = parser.pos; \ > + path = json_path_curr_substring(&parser); \ > is(json_path_next(&parser, &node), 0, "parse <%." #value_len "s>", \ > path); \ > is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", path); \ > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 1/1] ICU Unicode support for JSON parser. 2018-04-05 18:00 ` [tarantool-patches] " Kirill Shcherbatov @ 2018-04-05 23:32 ` Vladislav Shpilevoy 0 siblings, 0 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-04-05 23:32 UTC (permalink / raw) To: tarantool-patches, Kirill Shcherbatov See my review fixes on branch in a separate commit. Please, bench the branch with my review fixes, and without. 05.04.2018 21:00, Kirill Shcherbatov пишет: > > From 8d97afd7ac7a90e5dfb1e756fd9d389bafdadd56 Mon Sep 17 00:00:00 2001 > From: Kirill Shcherbatov <kshcherbatov@tarantool.org> > Date: Thu, 5 Apr 2018 20:35:25 +0300 > Subject: [PATCH] Accounted Vlad's comments > > --- > src/box/CMakeLists.txt | 2 +- > src/box/lua/tuple.c | 23 +++++++++++------------ > src/box/lua/tuple.lua | 11 +++++------ > src/box/tuple.h | 20 ++++++++++++++++++++ > src/box/tuple_format.c | 13 ++++++------- > src/box/tuple_format.h | 4 +++- > src/lib/json/path.c | 30 ++++++++++++++---------------- > src/lib/json/path.h | 1 + > 8 files changed, 61 insertions(+), 43 deletions(-) > > >Четверг, 5 апреля 2018, 17:48 +03:00 от Vladislav Shpilevoy > <v.shpilevoy@tarantool.org>: > > > >Hello. See 18 comments below. > > > >05.04.2018 17:06, Kirill Shcherbatov пишет: > >> This is an automated email from the git hooks/post-receive script. > >> > >> kshcherbatov pushed a commit to branch gh-1285-tuple-field-by-json-icu > >> in repository tarantool. > >> > >> commit ab52c7a18441037c19d0b2759558a7eb93aabb0d > >> Author: Kirill Shcherbatov < kshcherbatov@tarantool.org > > >> AuthorDate: Thu Apr 5 17:04:15 2018 +0300 > >> > >> ICU Unicode support for JSON parser. > >> --- > >> src/box/CMakeLists.txt | 2 +- > >> src/box/lua/tuple.c | 172 ++++++---------------------------------- > >> src/box/lua/tuple.lua | 7 +- > >> src/box/tuple_format.c | 155 ++++++++++++++++++++++++++++++++++++ > >> src/box/tuple_format.h | 15 ++++ > >> src/lib/json/path.c | 191 > ++++++++++++++++++++++++++++++--------------- > >> src/lib/json/path.h | 25 ++++-- > >> test/engine/tuple.result | 57 ++++++++++---- > >> test/engine/tuple.test.lua | 13 ++- > >> test/unit/CMakeLists.txt | 2 +- > >> test/unit/json_path.c | 2 +- > >> 11 files changed, 398 insertions(+), 243 deletions(-) > >> > >> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt > >> index add0ff9..0f5b197 100644 > >> --- a/src/box/CMakeLists.txt > >> +++ b/src/box/CMakeLists.txt > >> @@ -44,7 +44,7 @@ add_library(tuple STATIC > >> field_def.c > >> opt_def.c > >> ) > >> -target_link_libraries(tuple box_error core ${MSGPUCK_LIBRARIES} > ${ICU_LIBRARIES} misc bit) > >> +target_link_libraries(tuple json_path box_error core > ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) > >1. Remove ${common_libraries} please, the tuple lib must not depend on > >Lua. See how to do it in the next comments. > > diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt > index e5a0c21..88c2c60 100644 > --- a/src/box/CMakeLists.txt > +++ b/src/box/CMakeLists.txt > @@ -45,7 +45,7 @@ add_library(tuple STATIC > field_def.c > opt_def.c > ) > -target_link_libraries(tuple json_path box_error core > ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) > +target_link_libraries(tuple json_path box_error core > ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit) > add_library(xlog STATIC xlog.c) > target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES}) > > >> @@ -497,81 +416,34 @@ tuple_field_go_to_key(const char **field, > const char *key, int len) > >> static int > >> lbox_tuple_field_by_path(struct lua_State *L) > >> { > >> -const char *field; > >> struct tuple *tuple = luaT_istuple(L, 1); > >> /* Is checked in Lua wrapper. */ > >> assert(tuple != NULL); > >> -if (lua_isnumber(L, 2)) { > >> +const char *field = NULL; > >> +if (!lua_isnumber(L, 2)) { > >> +/* string */ > >> +field = tuple_field_raw_by_path(tuple_format(tuple), > tuple_data(tuple), > >> + tuple_field_map(tuple), > >> + lua_tostring(L, 2), (int)lua_strlen(L, 2), > >> + lua_hashstring(L, 2)); > >2. Add a tuple_field_by_path wrapper for it, as I told you verbally. > >Moreover > >this code is out of 80 symbols width. > >> +/* error code or message */ > >> +struct error *e = diag_last_error(diag_get()); > >3. Please, move all of this error checking into !lua_isnumber branch. > >Tuple_field never returns > >error, because sometimes absent field is ok. > >> +if (field == NULL && e != NULL) > >> +lua_pushstring(L, e->errmsg); > >> +else > >> +lua_pushinteger(L, -1*(field == NULL)); > >> +if (field) > >4. Use explicit != NULL, please. > > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 3b89cdd..ab62663 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -422,28 +422,27 @@ lbox_tuple_field_by_path(struct lua_State *L) > const char *field = NULL; > if (!lua_isnumber(L, 2)) { > /* string */ > - field = tuple_field_raw_by_path(tuple_format(tuple), > tuple_data(tuple), > - tuple_field_map(tuple), > - lua_tostring(L, 2), (int)lua_strlen(L, 2), > - lua_hashstring(L, 2)); > + field = tuple_field_by_path(tuple, lua_tostring(L, 2), > + (int)lua_strlen(L, 2), > + lua_hashstring(L, 2)); > + struct error *e = diag_last_error(diag_get()); > + if (field == NULL && e) { > + lua_pushnil(L); > + lua_pushstring(L, e->errmsg); > + return 2; > + } > } else { > int index = lua_tointeger(L, 2); > index -= TUPLE_INDEX_BASE; > if (index >= 0) > field = tuple_field(tuple, index); > } > - /* error code or message */ > - struct error *e = diag_last_error(diag_get()); > - if (field == NULL && e != NULL) > - lua_pushstring(L, e->errmsg); > - else > - lua_pushinteger(L, -1*(field == NULL)); > - if (field) > + if (field != NULL) > luamp_decode(L, luaL_msgpack_default, &field); > else > lua_pushnil(L); > diag_clear(diag_get()); > - return 2; > + return 1; > } > static int > > >> diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > >> index b51b4df..3785568 100644 > >> --- a/src/box/lua/tuple.lua > >> +++ b/src/box/lua/tuple.lua > >> @@ -6,6 +6,7 @@ local msgpackffi = require('msgpackffi') > >> local fun = require('fun') > >> local buffer = require('buffer') > >> local internal = require('box.internal') > >> +local log = require('log') > >5. Unsed module. > > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > index 3785568..1c7e2a5 100644 > --- a/src/box/lua/tuple.lua > +++ b/src/box/lua/tuple.lua > @@ -6,7 +6,6 @@ local msgpackffi = require('msgpackffi') > local fun = require('fun') > local buffer = require('buffer') > local internal = require('box.internal') > -local log = require('log') > ffi.cdef[[ > /** \cond public */ > > >> > >> ffi.cdef[[ > >> /** \cond public */ > >> @@ -302,6 +303,7 @@ ffi.metatype(tuple_t, { > >> end; > >> __tostring = internal.tuple.tostring; > >> __index = function(tuple, key) > >> + local rc, field > >> if type(key) == "string" or type(key) == "number" then > >> -- Try to get a field by json path or by [index]. If > >> -- it was not found (rc ~= 0) then return a method > >> @@ -311,12 +313,13 @@ ffi.metatype(tuple_t, { > >> -- returns field value, not tuple_bsize function. To > >> -- access hidden methods use > >> -- 'box.tuple.<method_name>(T, [args...])'. > >> - local rc, field = tuple_field_by_path(tuple, key) > >> + rc, field = tuple_field_by_path(tuple, key) > >> if rc == 0 then > >> return field > >> end > >> end > >> - return methods[key] > >> + local method = methods[key] > >> + return method == box.NULL and rc ~= -1 and rc or method > >6. You must not return -1. Error must be nil + error message/object. > > @@ -303,7 +302,7 @@ ffi.metatype(tuple_t, { > end; > __tostring = internal.tuple.tostring; > __index = function(tuple, key) > - local rc, field > + local res, err > if type(key) == "string" or type(key) == "number" then > -- Try to get a field by json path or by [index]. If > -- it was not found (rc ~= 0) then return a method > @@ -313,13 +312,13 @@ ffi.metatype(tuple_t, { > -- returns field value, not tuple_bsize function. To > -- access hidden methods use > -- 'box.tuple.<method_name>(T, [args...])'. > - rc, field = tuple_field_by_path(tuple, key) > - if rc == 0 then > - return field > + res, err = tuple_field_by_path(tuple, key) > + if res ~= box.NULL then > + return res > end > end > local method = methods[key] > - return method == box.NULL and rc ~= -1 and rc or method > + return method == box.NULL and err or method > end; > __eq = function(tuple_a, tuple_b) > -- Two tuple are considered equal if they have same memory > address > > >> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > >> index 3e2c8bf..f2dffaf 100644 > >> --- a/src/box/tuple_format.c > >> +++ b/src/box/tuple_format.c > >> @@ -28,6 +28,8 @@ > >> * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > >> * SUCH DAMAGE. > >> */ > >> +#include <lib/json/path.h> > >> +#include <third_party/luajit/src/lua.h> > >7. Do not include lua headers in files, that are not in lua/ dir. > >Tarantool core must not > >depend on lua. As I told you earlier, you must use field_name_hash > >function, declared in > >tuple_dictionary.h. It allows to remove lua dependency and remove > >${common_libraries} > >from cmake file. > > diff --git a/src/box/tuple.h b/src/box/tuple.h > index 6ebedf5..f084c9d 100644 > --- a/src/box/tuple.h > +++ b/src/box/tuple.h > @@ -514,6 +514,26 @@ tuple_field(const struct tuple *tuple, uint32_t > fieldno) > } > /** > + * Get tuple field by its path. > + * @param tuple. > + * @param path Field path. > + * @param path_len Length of @a path. > + * @param path_hash Hash of @a path. > + * > + * @retval not NULL on field found. > + * @retval NULL No field by @a path. > + */ > +static inline const char * > +tuple_field_by_path(struct tuple *tuple, const char *path, > + uint32_t path_len, uint32_t path_hash) > +{ > + return tuple_field_raw_by_path(tuple_format(tuple), > + tuple_data(tuple), > + tuple_field_map(tuple), > + path, path_len, path_hash); > +} > + > +/** > * Get tuple field by its name. > * @param tuple Tuple to get field from. > * @param name Field name. > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 9c6e3da..8ee7948 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -29,8 +29,8 @@ > * SUCH DAMAGE. > */ > #include <lib/json/path.h> > -#include <third_party/luajit/src/lua.h> > #include "tuple_format.h" > +#include "tuple_dictionary.h" > /** Global table of tuple formats */ > struct tuple_format **tuple_formats; > > >> +const char * > >> +tuple_field_raw_by_path(struct tuple_format *format, const char > *tuple, > >> + const uint32_t *field_map, > >> + const char *path,uint32_t path_len, uint32_t path_hash) > >8. Path argument can be placed on the previous line. > > > >9. After ',' put a space please. > >> +{ > >> +int rc = 0; > >> +const char *field; > >> +struct json_path_parser parser; > >> +struct json_path_node node; > >> +json_path_parser_create(&parser, path, path_len); > >> +rc = json_path_next(&parser, &node); > >> +if (rc != 0 || node.type == JSON_PATH_END) > >> +goto not_found; > >> +if (node.type == JSON_PATH_NUM) { > >> +int index = node.num; > >> +if (index == 0) > >> +goto not_found; > >> +index -= TUPLE_INDEX_BASE; > >> +field = tuple_field_raw(format, tuple, > >> + field_map, index); > >> +if (field == NULL) > >> +goto not_found; > >> +} else { > >> +assert(node.type == JSON_PATH_STR); > >> +/* First part of a path is a field name. */ > >> +const char *name = node.str; > >> +uint32_t name_len = node.len; > >> +uint32_t name_hash; > >> +if (path_len == name_len) { > >> +name_hash = path_hash; > >> +} else { > >> +/* > >> + * If a string is "field....", then its > >> + * precalculated juajit hash can not be > >> + * used. A tuple dictionary hashes only > >> + * name, not path. > >> + */ > >> +name_hash = lua_hash(name, name_len); > >10. Good comment! One remark - use field_name_hash instead of explicit > >lua_hash. > >> +} > >> +field = tuple_field_raw_by_name(format, tuple, > >> + field_map, > >> + name, name_len, name_hash); > >> +if (field == NULL) > >> +goto not_found; > >> +} > >> +while ((rc = json_path_next(&parser, &node)) == 0 && > >> + node.type != JSON_PATH_END) { > >> +if (node.type == JSON_PATH_NUM) { > >> +rc = tuple_field_go_to_index(&field, node.num); > >> +} else { > >> +assert(node.type == JSON_PATH_STR); > >> +rc = tuple_field_go_to_key(&field, node.str, node.len); > >> +} > >> +if (rc != 0) { > >> +rc = 0; /* prevent error raise */ > >> +goto not_found; > >> +} > >> +} > >> +if (rc == 0) > >> +return field; > >> +not_found: > >11. Please, put labels to the line beginning. > > @@ -563,8 +563,8 @@ tuple_field_go_to_key(const char **field, const > char *key, int len) > const char * > tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, > - const uint32_t *field_map, > - const char *path,uint32_t path_len, uint32_t path_hash) > + const uint32_t *field_map, const char *path, > + uint32_t path_len, uint32_t path_hash) > { > int rc = 0; > const char *field; > @@ -598,7 +598,7 @@ tuple_field_raw_by_path(struct tuple_format > *format, const char *tuple, > * used. A tuple dictionary hashes only > * name, not path. > */ > - name_hash = lua_hash(name, name_len); > + name_hash = field_name_hash(name, name_len); > } > field = tuple_field_raw_by_name(format, tuple, > field_map, > @@ -621,7 +621,7 @@ tuple_field_raw_by_path(struct tuple_format > *format, const char *tuple, > } > if (rc == 0) > return field; > - not_found: > +not_found: > /* try to use the whole string as identifier */ > field = tuple_field_raw_by_name(format, tuple, > field_map, > @@ -629,7 +629,6 @@ tuple_field_raw_by_path(struct tuple_format > *format, const char *tuple, > if (field) > return field; > if (rc || path_len == 0) > - diag_set(IllegalParams, "Error in path on " > - "position %d", rc); > + diag_set(IllegalParams, "Error in path on position %d", rc); > return NULL; > } > > >> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > >> index d33c77a..3e7e595 100644 > >> --- a/src/box/tuple_format.h > >> +++ b/src/box/tuple_format.h > >> @@ -343,6 +343,21 @@ tuple_field_raw_by_name(struct tuple_format > *format, const char *tuple, > >> return tuple_field_raw(format, tuple, field_map, fieldno); > >> } > >> > >> +/** > >> + * Get tuple field by its path. > >> + * @param Tuple. > >> + * @param path Field path. > >> + * @param path_len Length of @a path. > >> + * @param path_hash Hash of @a path. > >> + * > >> + * @retval not NULL on field found. > >> + * @retval NULL No field by @a path. > >> + */ > >> +const char * > >> +tuple_field_raw_by_path(struct tuple_format *format, const char > *tuple, > >> + const uint32_t *field_map, > >> + const char *path,uint32_t path_len, uint32_t path_hash); > >12. Please, add tuple_field_by_path as well to tuple.h, that calls this > >function. It allows you > >to call tuple_field_by_path in lua/tuple.c. > > diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > index ce17085..ffa3d16 100644 > --- a/src/box/tuple_format.h > +++ b/src/box/tuple_format.h > @@ -379,7 +379,9 @@ tuple_field_raw_by_name(struct tuple_format > *format, const char *tuple, > /** > * Get tuple field by its path. > - * @param Tuple. > + * @param format Tuple format. > + * @param tuple MessagePack tuple's body. > + * @param field_map Tuple field map. > * @param path Field path. > * @param path_len Length of @a path. > * @param path_hash Hash of @a path. > > >> + > >> #if defined(__cplusplus) > >> } /* extern "C" */ > >> #endif /* defined(__cplusplus) */ > >> diff --git a/src/lib/json/path.c b/src/lib/json/path.c > >> index 4a6174e..f3fcc46 100644 > >> --- a/src/lib/json/path.c > >> +++ b/src/lib/json/path.c > >> @@ -31,8 +31,10 @@ > >> > >> #include "path.h" > >> #include <ctype.h> > >> +#include <unicode/uchar.h> > >> #include "trivia/util.h" > >> > >> + > >13. Garbage diff. > >> @@ -90,27 +130,40 @@ json_parse_string(struct json_path_parser > *parser, struct json_path_node *node) > >> static inline int > >> json_parse_integer(struct json_path_parser *parser, struct > json_path_node *node) > >> { > >> -const char *end = parser->src + parser->src_len; > >> -const char *pos = parser->pos; > >> -assert(pos < end); > >> -const char *str = pos; > >> -int len = 0; > >> -for (char c = *pos; pos < end && isdigit(c); c = *++pos) > >> -++len; > >> -if (len == 0) > >> -return pos - parser->src + 1; > >> -parser->pos = pos; > >> +assert(parser->offset < parser->src_len); > >> +int str_offset = parser->offset; > >> +int last_offset = str_offset; > >> +int len = 0, rc = 0; > >> +UChar32 c = 0; > >> + > >> +while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { > >14. I remember, that you told me, that a digit can be read using simple > >char. Can you please implement > >this cycle with no UChar? > > diff --git a/src/lib/json/path.c b/src/lib/json/path.c > index f3fcc46..e253ec4 100644 > --- a/src/lib/json/path.c > +++ b/src/lib/json/path.c > @@ -104,7 +104,7 @@ json_parse_string(struct json_path_parser *parser, > struct json_path_node *node) > while (((rc = json_read_sign(parser, &c)) > 0) > && c != quote_type); > - int len = (int)(parser->offset - str_offset - 1); > + int len = parser->offset - str_offset - 1; > if (rc < 0 || len == 0) > return -1; > if (c != (UChar32)quote_type) { > @@ -130,23 +130,21 @@ json_parse_string(struct json_path_parser > *parser, struct json_path_node *node) > static inline int > json_parse_integer(struct json_path_parser *parser, struct > json_path_node *node) > { > - assert(parser->offset < parser->src_len); > - int str_offset = parser->offset; > - int last_offset = str_offset; > - int len = 0, rc = 0; > - UChar32 c = 0; > - > - while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { > - last_offset = parser->offset; > - len++; > - } > - if (rc > 0 && len > 0 && !u_isdigit(c)) > - json_reset_pos(parser, last_offset, 1); > - if (rc < 0 || len == 0) > + const char *end = parser->src + parser->src_len; > + const char *pos = parser->src + parser->offset; > + assert(pos < end); > + const char *str = pos; > + int len = 0; > + for (char c = *pos; pos < end && isdigit(c); c = *++pos) > + ++len; > + if (len == 0) { > + parser->invalid_sign_off++; > return -1; > - > + } > + parser->offset += len; > + parser->invalid_sign_off += len; > node->type = JSON_PATH_NUM; > - node->num = strntoull(parser->src + str_offset, len); > + node->num = strntoull(str, len); > return 0; > } > >> >> +static inline char > >> +json_curr_char(struct json_path_parser *parser) > >> +{ > >> +return *json_path_curr_substring(parser); > >> +} > >15. I do not like that on each char you must do string + offset. Please, > >find a way to > >store this position. Why did you delete parser->pos? > > I have no good ideas. This code is also not hot: just on each new > lexem, believe nothing critical. > > >> diff --git a/src/lib/json/path.h b/src/lib/json/path.h > >> index 6e8db4c..15292fb 100644 > >> --- a/src/lib/json/path.h > >> +++ b/src/lib/json/path.h > >> @@ -45,10 +45,11 @@ extern "C" { > >> struct json_path_parser { > >> /** Source string. */ > >> const char *src; > >> -/** Length of src. */ > >> +/** Length of string. */ > >> int src_len; > >> -/** Current parser's position. */ > >> -const char *pos; > >> +/** Current parser's offset. */ > >> +int offset; > >> +int invalid_sign_off; > >16. Please, add a comment. > > diff --git a/src/lib/json/path.h b/src/lib/json/path.h > index 15292fb..09f2e6f 100644 > --- a/src/lib/json/path.h > +++ b/src/lib/json/path.h > @@ -49,6 +49,7 @@ struct json_path_parser { > int src_len; > /** Current parser's offset. */ > int offset; > + /** Successfully parsed signs count. */ > int invalid_sign_off; > }; > -- 2.7.4 > > On 05.04.2018 17:09, Kirill Shcherbatov wrote: >> --- >> src/box/CMakeLists.txt | 2 +- >> src/box/lua/tuple.c | 172 >> ++++++---------------------------------- >> src/box/lua/tuple.lua | 7 +- >> src/box/tuple_format.c | 155 ++++++++++++++++++++++++++++++++++++ >> src/box/tuple_format.h | 15 ++++ >> src/lib/json/path.c | 191 >> ++++++++++++++++++++++++++++++--------------- >> src/lib/json/path.h | 25 ++++-- >> test/engine/tuple.result | 57 ++++++++++---- >> test/engine/tuple.test.lua | 13 ++- >> test/unit/CMakeLists.txt | 2 +- >> test/unit/json_path.c | 2 +- >> 11 files changed, 398 insertions(+), 243 deletions(-) >> >> diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt >> index f053ae4..e5a0c21 100644 >> --- a/src/box/CMakeLists.txt >> +++ b/src/box/CMakeLists.txt >> @@ -45,7 +45,7 @@ add_library(tuple STATIC >> field_def.c >> opt_def.c >> ) >> -target_link_libraries(tuple box_error core ${MSGPUCK_LIBRARIES} >> ${ICU_LIBRARIES} misc bit) >> +target_link_libraries(tuple json_path box_error core >> ${MSGPUCK_LIBRARIES} ${ICU_LIBRARIES} misc bit ${common_libraries}) >> add_library(xlog STATIC xlog.c) >> target_link_libraries(xlog core box_error crc32 ${ZSTD_LIBRARIES}) >> diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c >> index 99b9ff2..3b89cdd 100644 >> --- a/src/box/lua/tuple.c >> +++ b/src/box/lua/tuple.c >> @@ -403,87 +403,6 @@ lbox_tuple_transform(struct lua_State *L) >> } >> /** >> - * Propagate @a field to MessagePack(field)[index]. >> - * @param[in][out] field Field to propagate. >> - * @param index 1-based index to propagate to. >> - * >> - * @retval 0 Success, the index was found. >> - * @retval -1 Not found. >> - */ >> -static inline int >> -tuple_field_go_to_index(const char **field, uint64_t index) >> -{ >> - assert(index >= 0); >> - enum mp_type type = mp_typeof(**field); >> - if (type == MP_ARRAY) { >> - if (index == 0) >> - return -1; >> - /* Make index 0-based. */ >> - index -= TUPLE_INDEX_BASE; >> - uint32_t count = mp_decode_array(field); >> - if (index >= count) >> - return -1; >> - for (; index > 0; --index) >> - mp_next(field); >> - return 0; >> - } else if (type == MP_MAP) { >> - uint64_t count = mp_decode_map(field); >> - for (; count > 0; --count) { >> - type = mp_typeof(**field); >> - if (type == MP_UINT) { >> - uint64_t value = mp_decode_uint(field); >> - if (value == index) >> - return 0; >> - } else if (type == MP_INT) { >> - int64_t value = mp_decode_int(field); >> - if (value >= 0 && (uint64_t)value == index) >> - return 0; >> - } else { >> - /* Skip key. */ >> - mp_next(field); >> - } >> - /* Skip value. */ >> - mp_next(field); >> - } >> - } >> - return -1; >> -} >> - >> -/** >> - * Propagate @a field to MessagePack(field)[key]. >> - * @param[in][out] field Field to propagate. >> - * @param key Key to propagate to. >> - * @param len Length of @a key. >> - * >> - * @retval 0 Success, the index was found. >> - * @retval -1 Not found. >> - */ >> -static inline int >> -tuple_field_go_to_key(const char **field, const char *key, int len) >> -{ >> - enum mp_type type = mp_typeof(**field); >> - if (type != MP_MAP) >> - return -1; >> - uint64_t count = mp_decode_map(field); >> - for (; count > 0; --count) { >> - type = mp_typeof(**field); >> - if (type == MP_STR) { >> - uint32_t value_len; >> - const char *value = mp_decode_str(field, &value_len); >> - if (value_len == (uint)len && >> - memcmp(value, key, len) == 0) >> - return 0; >> - } else { >> - /* Skip key. */ >> - mp_next(field); >> - } >> - /* Skip value. */ >> - mp_next(field); >> - } >> - return -1; >> -} >> - >> -/** >> * Find a tuple field by JSON path. >> * @param L Lua state. >> * @param tuple 1-th argument on a lua stack, tuple to get field >> @@ -497,81 +416,34 @@ tuple_field_go_to_key(const char **field, const >> char *key, int len) >> static int >> lbox_tuple_field_by_path(struct lua_State *L) >> { >> - const char *field; >> struct tuple *tuple = luaT_istuple(L, 1); >> /* Is checked in Lua wrapper. */ >> assert(tuple != NULL); >> - if (lua_isnumber(L, 2)) { >> + const char *field = NULL; >> + if (!lua_isnumber(L, 2)) { >> + /* string */ >> + field = tuple_field_raw_by_path(tuple_format(tuple), >> tuple_data(tuple), >> + tuple_field_map(tuple), >> + lua_tostring(L, 2), >> (int)lua_strlen(L, 2), >> + lua_hashstring(L, 2)); >> + } else { >> int index = lua_tointeger(L, 2); >> index -= TUPLE_INDEX_BASE; >> - if (index < 0) { >> -not_found: >> - lua_pushinteger(L, -1); >> - lua_pushnil(L); >> - return 2; >> - } >> - field = tuple_field(tuple, index); >> - if (field == NULL) >> - goto not_found; >> -push_value: >> - lua_pushinteger(L, 0); >> - luamp_decode(L, luaL_msgpack_default, &field); >> - return 2; >> + if (index >= 0) >> + field = tuple_field(tuple, index); >> } >> - assert(lua_isstring(L, 2)); >> - size_t path_len; >> - const char *path = lua_tolstring(L, 2, &path_len); >> - struct json_path_parser parser; >> - struct json_path_node node; >> - json_path_parser_create(&parser, path, path_len); >> - int rc = json_path_next(&parser, &node); >> - if (rc != 0 || node.type == JSON_PATH_END) >> - luaL_error(L, "Error in path on position %d", rc); >> - if (node.type == JSON_PATH_NUM) { >> - int index = node.num; >> - if (index == 0) >> - goto not_found; >> - index -= TUPLE_INDEX_BASE; >> - field = tuple_field(tuple, index); >> - if (field == NULL) >> - goto not_found; >> - } else { >> - assert(node.type == JSON_PATH_STR); >> - /* First part of a path is a field name. */ >> - const char *name = node.str; >> - uint32_t name_len = node.len; >> - uint32_t name_hash; >> - if (path_len == name_len) { >> - name_hash = lua_hashstring(L, 2); >> - } else { >> - /* >> - * If a string is "field....", then its >> - * precalculated juajit hash can not be >> - * used. A tuple dictionary hashes only >> - * name, not path. >> - */ >> - name_hash = lua_hash(name, name_len); >> - } >> - field = tuple_field_by_name(tuple, name, name_len, name_hash); >> - if (field == NULL) >> - goto not_found; >> - } >> - while ((rc = json_path_next(&parser, &node)) == 0 && >> - node.type != JSON_PATH_END) { >> - if (node.type == JSON_PATH_NUM) { >> - rc = tuple_field_go_to_index(&field, node.num); >> - } else { >> - assert(node.type == JSON_PATH_STR); >> - rc = tuple_field_go_to_key(&field, node.str, node.len); >> - } >> - if (rc != 0) >> - goto not_found; >> - } >> - if (rc == 0) >> - goto push_value; >> - luaL_error(L, "Error in path on position %d", rc); >> - unreachable(); >> - goto not_found; >> + /* error code or message */ >> + struct error *e = diag_last_error(diag_get()); >> + if (field == NULL && e != NULL) >> + lua_pushstring(L, e->errmsg); >> + else >> + lua_pushinteger(L, -1*(field == NULL)); >> + if (field) >> + luamp_decode(L, luaL_msgpack_default, &field); >> + else >> + lua_pushnil(L); >> + diag_clear(diag_get()); >> + return 2; >> } >> static int >> diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua >> index b51b4df..3785568 100644 >> --- a/src/box/lua/tuple.lua >> +++ b/src/box/lua/tuple.lua >> @@ -6,6 +6,7 @@ local msgpackffi = require('msgpackffi') >> local fun = require('fun') >> local buffer = require('buffer') >> local internal = require('box.internal') >> +local log = require('log') >> ffi.cdef[[ >> /** \cond public */ >> @@ -302,6 +303,7 @@ ffi.metatype(tuple_t, { >> end; >> __tostring = internal.tuple.tostring; >> __index = function(tuple, key) >> + local rc, field >> if type(key) == "string" or type(key) == "number" then >> -- Try to get a field by json path or by [index]. If >> -- it was not found (rc ~= 0) then return a method >> @@ -311,12 +313,13 @@ ffi.metatype(tuple_t, { >> -- returns field value, not tuple_bsize function. To >> -- access hidden methods use >> -- 'box.tuple.<method_name>(T, [args...])'. >> - local rc, field = tuple_field_by_path(tuple, key) >> + rc, field = tuple_field_by_path(tuple, key) >> if rc == 0 then >> return field >> end >> end >> - return methods[key] >> + local method = methods[key] >> + return method == box.NULL and rc ~= -1 and rc or method >> end; >> __eq = function(tuple_a, tuple_b) >> -- Two tuple are considered equal if they have same memory >> address >> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c >> index e458f49..9c6e3da 100644 >> --- a/src/box/tuple_format.c >> +++ b/src/box/tuple_format.c >> @@ -28,6 +28,8 @@ >> * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF >> * SUCH DAMAGE. >> */ >> +#include <lib/json/path.h> >> +#include <third_party/luajit/src/lua.h> >> #include "tuple_format.h" >> /** Global table of tuple formats */ >> @@ -478,3 +480,156 @@ box_tuple_format_unref(box_tuple_format_t *format) >> { >> tuple_format_unref(format); >> } >> + >> +/** >> + * Propagate @a field to MessagePack(field)[index]. >> + * @param[in][out] field Field to propagate. >> + * @param index 1-based index to propagate to. >> + * >> + * @retval 0 Success, the index was found. >> + * @retval -1 Not found. >> + */ >> +static inline int >> +tuple_field_go_to_index(const char **field, uint64_t index) >> +{ >> + enum mp_type type = mp_typeof(**field); >> + if (type == MP_ARRAY) { >> + if (index == 0) >> + return -1; >> + /* Make index 0-based. */ >> + index -= TUPLE_INDEX_BASE; >> + uint32_t count = mp_decode_array(field); >> + if (index >= count) >> + return -1; >> + for (; index > 0; --index) >> + mp_next(field); >> + return 0; >> + } else if (type == MP_MAP) { >> + uint64_t count = mp_decode_map(field); >> + for (; count > 0; --count) { >> + type = mp_typeof(**field); >> + if (type == MP_UINT) { >> + uint64_t value = mp_decode_uint(field); >> + if (value == index) >> + return 0; >> + } else if (type == MP_INT) { >> + int64_t value = mp_decode_int(field); >> + if (value >= 0 && (uint64_t)value == index) >> + return 0; >> + } else { >> + /* Skip key. */ >> + mp_next(field); >> + } >> + /* Skip value. */ >> + mp_next(field); >> + } >> + } >> + return -1; >> +} >> + >> +/** >> + * Propagate @a field to MessagePack(field)[key]. >> + * @param[in][out] field Field to propagate. >> + * @param key Key to propagate to. >> + * @param len Length of @a key. >> + * >> + * @retval 0 Success, the index was found. >> + * @retval -1 Not found. >> + */ >> +static inline int >> +tuple_field_go_to_key(const char **field, const char *key, int len) >> +{ >> + enum mp_type type = mp_typeof(**field); >> + if (type != MP_MAP) >> + return -1; >> + uint64_t count = mp_decode_map(field); >> + for (; count > 0; --count) { >> + type = mp_typeof(**field); >> + if (type == MP_STR) { >> + uint32_t value_len; >> + const char *value = mp_decode_str(field, &value_len); >> + if (value_len == (uint)len && >> + memcmp(value, key, len) == 0) >> + return 0; >> + } else { >> + /* Skip key. */ >> + mp_next(field); >> + } >> + /* Skip value. */ >> + mp_next(field); >> + } >> + return -1; >> +} >> + >> +const char * >> +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, >> + const uint32_t *field_map, >> + const char *path,uint32_t path_len, uint32_t >> path_hash) >> +{ >> + int rc = 0; >> + const char *field; >> + struct json_path_parser parser; >> + struct json_path_node node; >> + json_path_parser_create(&parser, path, path_len); >> + rc = json_path_next(&parser, &node); >> + if (rc != 0 || node.type == JSON_PATH_END) >> + goto not_found; >> + if (node.type == JSON_PATH_NUM) { >> + int index = node.num; >> + if (index == 0) >> + goto not_found; >> + index -= TUPLE_INDEX_BASE; >> + field = tuple_field_raw(format, tuple, >> + field_map, index); >> + if (field == NULL) >> + goto not_found; >> + } else { >> + assert(node.type == JSON_PATH_STR); >> + /* First part of a path is a field name. */ >> + const char *name = node.str; >> + uint32_t name_len = node.len; >> + uint32_t name_hash; >> + if (path_len == name_len) { >> + name_hash = path_hash; >> + } else { >> + /* >> + * If a string is "field....", then its >> + * precalculated juajit hash can not be >> + * used. A tuple dictionary hashes only >> + * name, not path. >> + */ >> + name_hash = lua_hash(name, name_len); >> + } >> + field = tuple_field_raw_by_name(format, tuple, >> + field_map, >> + name, name_len, name_hash); >> + if (field == NULL) >> + goto not_found; >> + } >> + while ((rc = json_path_next(&parser, &node)) == 0 && >> + node.type != JSON_PATH_END) { >> + if (node.type == JSON_PATH_NUM) { >> + rc = tuple_field_go_to_index(&field, node.num); >> + } else { >> + assert(node.type == JSON_PATH_STR); >> + rc = tuple_field_go_to_key(&field, node.str, node.len); >> + } >> + if (rc != 0) { >> + rc = 0; /* prevent error raise */ >> + goto not_found; >> + } >> + } >> + if (rc == 0) >> + return field; >> + not_found: >> + /* try to use the whole string as identifier */ >> + field = tuple_field_raw_by_name(format, tuple, >> + field_map, >> + path, path_len, path_hash); >> + if (field) >> + return field; >> + if (rc || path_len == 0) >> + diag_set(IllegalParams, "Error in path on " >> + "position %d", rc); >> + return NULL; >> +} >> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h >> index d35182d..ce17085 100644 >> --- a/src/box/tuple_format.h >> +++ b/src/box/tuple_format.h >> @@ -377,6 +377,21 @@ tuple_field_raw_by_name(struct tuple_format >> *format, const char *tuple, >> return tuple_field_raw(format, tuple, field_map, fieldno); >> } >> +/** >> + * Get tuple field by its path. >> + * @param Tuple. >> + * @param path Field path. >> + * @param path_len Length of @a path. >> + * @param path_hash Hash of @a path. >> + * >> + * @retval not NULL on field found. >> + * @retval NULL No field by @a path. >> + */ >> +const char * >> +tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, >> + const uint32_t *field_map, >> + const char *path,uint32_t path_len, uint32_t >> path_hash); >> + >> #if defined(__cplusplus) >> } /* extern "C" */ >> #endif /* defined(__cplusplus) */ >> diff --git a/src/lib/json/path.c b/src/lib/json/path.c >> index 4a6174e..f3fcc46 100644 >> --- a/src/lib/json/path.c >> +++ b/src/lib/json/path.c >> @@ -31,8 +31,10 @@ >> #include "path.h" >> #include <ctype.h> >> +#include <unicode/uchar.h> >> #include "trivia/util.h" >> + >> /** Same as strtoull(), but with limited length. */ >> static inline uint64_t >> strntoull(const char *src, int len) { >> @@ -45,6 +47,42 @@ strntoull(const char *src, int len) { >> } >> /** >> + * Parse string and update parser's state. >> + * @param parser JSON path parser. Upates pos, signs_read. >> + * @param[out] UChar32 to store result. >> + * >> + * @retval 1 Success. >> + * @retval 0 End of string. >> + * @retval -1 Parse error. >> + */ >> +static inline int >> +json_read_sign(struct json_path_parser *parser, UChar32 *out) >> +{ >> + if (unlikely(parser->offset == parser->src_len)) >> + return 0; >> + UChar32 c; >> + U8_NEXT_OR_FFFD(parser->src, parser->offset, parser->src_len, c) >> + if (c == 0xFFFD) >> + return -1; >> + *out = c; >> + parser->invalid_sign_off += 1; >> + return 1; >> +} >> + >> +/** >> + * Reset parser state to previous one. >> + * @param parser JSON path parser. >> + * @param old parser read offset. >> + * @param signs to drop in signs_read counter. >> + */ >> +static inline void >> +json_reset_pos(struct json_path_parser *parser, int old_offset, int >> signs) >> +{ >> + parser->offset = old_offset; >> + parser->invalid_sign_off -= signs; >> +} >> + >> +/** >> * Parse string identifier in quotes. Parser either stops right >> * after the closing quote, or returns an error position. >> * @param parser JSON path parser. >> @@ -56,24 +94,26 @@ strntoull(const char *src, int len) { >> static inline int >> json_parse_string(struct json_path_parser *parser, struct >> json_path_node *node) >> { >> - const char *end = parser->src + parser->src_len; >> - const char *pos = parser->pos; >> - assert(pos < end); >> - char quote_type = *pos; >> - assert(quote_type == '\'' || quote_type == '"'); >> - /* Skip first quote. */ >> - int len = 0; >> - ++pos; >> - const char *str = pos; >> - for (char c = *pos; pos < end && quote_type != c; c = *++pos) >> - ++len; >> - /* A string must be terminated with quote. */ >> - if (*pos != quote_type || len == 0) >> - return pos - parser->src + 1; >> - /* Skip the closing quote. */ >> - parser->pos = pos + 1; >> + assert(parser->offset < parser->src_len); >> + UChar32 quote_type; >> + (void) json_read_sign(parser, "e_type); >> + assert(quote_type == (UChar32)'\'' || quote_type == (UChar32)'"'); >> + int str_offset = parser->offset; >> + UChar32 c = 0; >> + int rc = 0; >> + >> + while (((rc = json_read_sign(parser, &c)) > 0) >> + && c != quote_type); >> + int len = (int)(parser->offset - str_offset - 1); >> + if (rc < 0 || len == 0) >> + return -1; >> + if (c != (UChar32)quote_type) { >> + parser->invalid_sign_off++; >> + return -1; >> + } >> + >> node->type = JSON_PATH_STR; >> - node->str = str; >> + node->str = parser->src + str_offset; >> node->len = len; >> return 0; >> } >> @@ -81,7 +121,7 @@ json_parse_string(struct json_path_parser *parser, >> struct json_path_node *node) >> /** >> * Parse digit sequence into integer until non-digit is met. >> * Parser stops right after the last digit. >> - * @param parser JSON parser. >> + * @param parser JSON parser. Updates signs_read field. >> * @param[out] node JSON node to store result. >> * >> * @retval 0 Success. >> @@ -90,27 +130,40 @@ json_parse_string(struct json_path_parser >> *parser, struct json_path_node *node) >> static inline int >> json_parse_integer(struct json_path_parser *parser, struct >> json_path_node *node) >> { >> - const char *end = parser->src + parser->src_len; >> - const char *pos = parser->pos; >> - assert(pos < end); >> - const char *str = pos; >> - int len = 0; >> - for (char c = *pos; pos < end && isdigit(c); c = *++pos) >> - ++len; >> - if (len == 0) >> - return pos - parser->src + 1; >> - parser->pos = pos; >> + assert(parser->offset < parser->src_len); >> + int str_offset = parser->offset; >> + int last_offset = str_offset; >> + int len = 0, rc = 0; >> + UChar32 c = 0; >> + >> + while (((rc = json_read_sign(parser, &c)) > 0) && u_isdigit(c)) { >> + last_offset = parser->offset; >> + len++; >> + } >> + if (rc > 0 && len > 0 && !u_isdigit(c)) >> + json_reset_pos(parser, last_offset, 1); >> + if (rc < 0 || len == 0) >> + return -1; >> + >> node->type = JSON_PATH_NUM; >> - node->num = strntoull(str, len); >> + node->num = strntoull(parser->src + str_offset, len); >> return 0; >> } >> +static inline bool >> +identifier_valid_sign(UChar32 c) >> +{ >> + return u_isUAlphabetic(c) >> + || c == (UChar32)'_' >> + || u_isdigit(c); >> +} >> + >> /** >> * Parse identifier out of quotes. It can contain only alphas, >> * digits and underscores. And can not contain digit at the first >> * position. Parser is stoped right after the last non-digit, >> * non-alpha and non-underscore symbol. >> - * @param parser JSON parser. >> + * @param parser JSON parser. Updates signs_read field. >> * @param[out] node JSON node to store result. >> * >> * @retval 0 Success. >> @@ -120,68 +173,78 @@ static inline int >> json_parse_identifier(struct json_path_parser *parser, >> struct json_path_node *node) >> { >> - const char *end = parser->src + parser->src_len; >> - const char *pos = parser->pos; >> - assert(pos < end); >> - const char *str = pos; >> - char c = *pos; >> + assert(parser->offset < parser->src_len); >> + int str_offset = parser->offset; >> + UChar32 c; >> + int rc = 0; >> + if (json_read_sign(parser, &c) < 0) >> + return -1; >> /* First symbol can not be digit. */ >> - if (!isalpha(c) && c != '_') >> - return pos - parser->src + 1; >> - int len = 1; >> - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || >> isdigit(c)); >> - c = *++pos) >> - ++len; >> - assert(len > 0); >> - parser->pos = pos; >> + if (!u_isalpha(c) && c != (UChar32)'_') >> + return -1; >> + >> + int last_offset = parser->offset; >> + while ((rc = json_read_sign(parser, &c)) > 0 && >> identifier_valid_sign(c)) >> + last_offset = parser->offset; >> + if (rc > 0 && !identifier_valid_sign(c)) >> + json_reset_pos(parser, last_offset, 1); >> + if (rc < 0) >> + return -1; >> + >> node->type = JSON_PATH_STR; >> - node->str = str; >> - node->len = len; >> + node->str = parser->src + str_offset; >> + node->len = parser->offset - str_offset; >> return 0; >> } >> +static inline char >> +json_curr_char(struct json_path_parser *parser) >> +{ >> + return *json_path_curr_substring(parser); >> +} >> + >> int >> json_path_next(struct json_path_parser *parser, struct >> json_path_node *node) >> { >> - const char *end = parser->src + parser->src_len; >> - if (end == parser->pos) { >> + int end_offset = parser->src_len; >> + if (end_offset == parser->offset) { >> node->type = JSON_PATH_END; >> return 0; >> } >> - char c = *parser->pos; >> + UChar32 c = 0; >> + int last_offset = parser->offset; >> + if (json_read_sign(parser, &c) < 0) >> + return parser->invalid_sign_off; >> int rc; >> switch(c) { >> - case '[': >> - ++parser->pos; >> + case (UChar32)'[': >> /* Error for []. */ >> - if (parser->pos == end) >> - return parser->pos - parser->src + 1; >> - c = *parser->pos; >> + if (parser->offset == end_offset) >> + return parser->invalid_sign_off; >> + c = json_curr_char(parser); >> if (c == '"' || c == '\'') >> rc = json_parse_string(parser, node); >> else >> rc = json_parse_integer(parser, node); >> if (rc != 0) >> - return rc; >> + return parser->invalid_sign_off; >> /* >> * Expression, started from [ must be finished >> * with ] regardless of its type. >> */ >> - if (parser->pos == end || *parser->pos != ']') >> - return parser->pos - parser->src + 1; >> + if (parser->offset == end_offset || json_curr_char(parser) >> != ']') >> + return parser->invalid_sign_off + 1; >> /* Skip ]. */ >> - ++parser->pos; >> + (void) json_read_sign(parser, &c); >> break; >> - case '.': >> - /* Skip dot. */ >> - ++parser->pos; >> - if (parser->pos == end) >> - return parser->pos - parser->src + 1; >> - FALLTHROUGH >> default: >> + if (c != (UChar32)'.') >> + json_reset_pos(parser, last_offset, 1); >> + else if (parser->offset == end_offset) >> + return parser->invalid_sign_off + 1; >> rc = json_parse_identifier(parser, node); >> if (rc != 0) >> - return rc; >> + return parser->invalid_sign_off; >> break; >> } >> return 0; >> diff --git a/src/lib/json/path.h b/src/lib/json/path.h >> index 6e8db4c..15292fb 100644 >> --- a/src/lib/json/path.h >> +++ b/src/lib/json/path.h >> @@ -45,10 +45,11 @@ extern "C" { >> struct json_path_parser { >> /** Source string. */ >> const char *src; >> - /** Length of src. */ >> + /** Length of string. */ >> int src_len; >> - /** Current parser's position. */ >> - const char *pos; >> + /** Current parser's offset. */ >> + int offset; >> + int invalid_sign_off; >> }; >> enum json_path_type { >> @@ -78,18 +79,30 @@ struct json_path_node { >> }; >> /** >> - * Create @a parser. >> + * Init @a parser. >> * @param[out] parser Parser to create. >> * @param src Source string. >> * @param src_len Length of @a src. >> */ >> static inline void >> json_path_parser_create(struct json_path_parser *parser, const char >> *src, >> - int src_len) >> + int src_len) >> { >> parser->src = src; >> parser->src_len = src_len; >> - parser->pos = src; >> + parser->offset = 0; >> + parser->invalid_sign_off = 0; >> +} >> + >> +/** >> + * Get current substring of parser. >> + * @param parser Parser. >> + * @retval ptr to substring >> + */ >> +static inline const char * >> +json_path_curr_substring(struct json_path_parser *parser) >> +{ >> + return parser->src + parser->offset; >> } >> /** >> diff --git a/test/engine/tuple.result b/test/engine/tuple.result >> index 2d7367a..3a8e828 100644 >> --- a/test/engine/tuple.result >> +++ b/test/engine/tuple.result >> @@ -602,7 +602,10 @@ format[2] = {name = 'field2', type = 'array'} >> format[3] = {name = 'field3', type = 'map'} >> --- >> ... >> -format[4] = {name = 'field4', type = 'string'} >> +format[4] = {name = 'field4', type = 'string' } >> +--- >> +... >> +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = >> 'string'} >> --- >> ... >> s = box.schema.space.create('test', {format = format}) >> @@ -611,13 +614,13 @@ s = box.schema.space.create('test', {format = >> format}) >> pk = s:create_index('pk') >> --- >> ... >> -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} >> +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, >> key="value1", value="key1"}} >> --- >> ... >> field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, >> {c=3, d=4} }, [-1] = 200} >> --- >> ... >> -t = s:replace{1, field2, field3, "123456"} >> +t = s:replace{1, field2, field3, "123456", "yes, this"} >> --- >> ... >> t[1] >> @@ -626,7 +629,7 @@ t[1] >> ... >> t[2] >> --- >> -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] >> +- [1, 2, 3, '4', [5, 6, 7], {'привет中国world': {'中国': 'привет'}, >> 'key': 'value1', 'value': 'key1'}] >> ... >> t[3] >> --- >> @@ -667,19 +670,43 @@ t["[2][5][3]"] >> ... >> t["[2][6].key"] >> --- >> -- key1 >> +- value1 >> ... >> t["[2][6].value"] >> --- >> -- value1 >> +- key1 >> ... >> t["[2][6]['key']"] >> --- >> -- key1 >> +- value1 >> ... >> t["[2][6]['value']"] >> --- >> -- value1 >> +- key1 >> +... >> +t[2][6].привет中国world.中国 >> +--- >> +- привет >> +... >> +t["[2][6].привет中国world"].中国 >> +--- >> +- привет >> +... >> +t["[2][6].привет中国world.中国"] >> +--- >> +- привет >> +... >> +t["[2][6]['привет中国world']"]["中国"] >> +--- >> +- привет >> +... >> +t["[2][6]['привет中国world']['中国']"] >> +--- >> +- привет >> +... >> +t["[2][6]['привет中国world']['中国a']"] >> +--- >> +- yes, this >> ... >> t["[3].k3[2].c"] >> --- >> @@ -759,31 +786,31 @@ t["a.b.c d.e.f"] >> -- Sytax errors. >> t[""] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 0' >> +- Error in path on position 0 >> ... >> t["[2].[5]"] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 5' >> +- Error in path on position 5 >> ... >> t["[-1]"] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' >> +- Error in path on position 2 >> ... >> t[".."] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' >> +- Error in path on position 2 >> ... >> t["[["] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 2' >> +- Error in path on position 2 >> ... >> t["]]"] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 1' >> +- Error in path on position 1 >> ... >> t["{"] >> --- >> -- error: 'builtin/box/tuple.lua:314: Error in path on position 1' >> +- Error in path on position 1 >> ... >> s:drop() >> --- >> diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua >> index ba3482d..90da8b2 100644 >> --- a/test/engine/tuple.test.lua >> +++ b/test/engine/tuple.test.lua >> @@ -204,12 +204,13 @@ format = {} >> format[1] = {name = 'field1', type = 'unsigned'} >> format[2] = {name = 'field2', type = 'array'} >> format[3] = {name = 'field3', type = 'map'} >> -format[4] = {name = 'field4', type = 'string'} >> +format[4] = {name = 'field4', type = 'string' } >> +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = >> 'string'} >> s = box.schema.space.create('test', {format = format}) >> pk = s:create_index('pk') >> -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} >> +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, >> key="value1", value="key1"}} >> field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, >> {c=3, d=4} }, [-1] = 200} >> -t = s:replace{1, field2, field3, "123456"} >> +t = s:replace{1, field2, field3, "123456", "yes, this"} >> t[1] >> t[2] >> t[3] >> @@ -225,6 +226,12 @@ t["[2][6].key"] >> t["[2][6].value"] >> t["[2][6]['key']"] >> t["[2][6]['value']"] >> +t[2][6].привет中国world.中国 >> +t["[2][6].привет中国world"].中国 >> +t["[2][6].привет中国world.中国"] >> +t["[2][6]['привет中国world']"]["中国"] >> +t["[2][6]['привет中国world']['中国']"] >> +t["[2][6]['привет中国world']['中国a']"] >> t["[3].k3[2].c"] >> t["[4]"] >> t.field1 >> diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt >> index fe8b2d2..667194c 100644 >> --- a/test/unit/CMakeLists.txt >> +++ b/test/unit/CMakeLists.txt >> @@ -130,7 +130,7 @@ add_executable(csv.test csv.c) >> target_link_libraries(csv.test csv) >> add_executable(json_path.test json_path.c) >> -target_link_libraries(json_path.test json_path unit) >> +target_link_libraries(json_path.test json_path unit ${ICU_LIBRARIES}) >> add_executable(rmean.test rmean.cc) >> target_link_libraries(rmean.test stat unit) >> diff --git a/test/unit/json_path.c b/test/unit/json_path.c >> index 599658b..81ef7fc 100644 >> --- a/test/unit/json_path.c >> +++ b/test/unit/json_path.c >> @@ -9,7 +9,7 @@ >> json_path_parser_create(&parser, path, len); >> #define is_next_index(value_len, value) \ >> - path = parser.pos; \ >> + path = json_path_curr_substring(&parser); \ >> is(json_path_next(&parser, &node), 0, "parse <%." #value_len >> "s>", \ >> path); \ >> is(node.type, JSON_PATH_NUM, "<%." #value_len "s> is num", >> path); \ >> > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v2 3/3] Multibyte characters support ICU 2018-03-29 14:22 [tarantool-patches] [PATCH v2 0/3] tuple field access via a json path Kirill Shcherbatov ` (2 preceding siblings ...) 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support Kirill Shcherbatov @ 2018-04-04 10:37 ` Kirill Shcherbatov 2018-04-04 11:30 ` [tarantool-patches] " Vladislav Shpilevoy 3 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-04-04 10:37 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy ICU Implementation From 8703e465382ba05817e7703550694c3790972e54 Mon Sep 17 00:00:00 2001 Message-Id: <8703e465382ba05817e7703550694c3790972e54.1522838002.git.kshcherbatov@tarantool.org> In-Reply-To: <cover.1522838002.git.kshcherbatov@tarantool.org> References: <cover.1522838002.git.kshcherbatov@tarantool.org> From: Kirill Shcherbatov <kshcherbatov@tarantool.org> Date: Wed, 4 Apr 2018 13:06:22 +0300 Subject: [PATCH v2 3/3] ICU Unicode parsing implementation --- src/box/lua/tuple.c | 31 +++++-- src/lib/json/path.c | 211 +++++++++++++++++++++++++++++++++------------ src/lib/json/path.h | 30 ++++--- test/engine/tuple.result | 43 +++++++-- test/engine/tuple.test.lua | 13 ++- test/unit/CMakeLists.txt | 2 +- test/unit/json_path.c | 4 +- 7 files changed, 247 insertions(+), 87 deletions(-) diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c index 99b9ff2..b89e1f9 100644 --- a/src/box/lua/tuple.c +++ b/src/box/lua/tuple.c @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) static inline int tuple_field_go_to_index(const char **field, uint64_t index) { - assert(index >= 0); enum mp_type type = mp_typeof(**field); if (type == MP_ARRAY) { if (index == 0) @@ -497,6 +496,12 @@ tuple_field_go_to_key(const char **field, const char *key, int len) static int lbox_tuple_field_by_path(struct lua_State *L) { + int err_pos = 0; + struct json_path_parser parser; + /* Need uninitialized structure to + * json_path_parser_deinit on lua_isnumber */ + memset(&parser, 0, sizeof(parser)); + const char *path = NULL; const char *field; struct tuple *tuple = luaT_istuple(L, 1); /* Is checked in Lua wrapper. */ @@ -506,6 +511,18 @@ lbox_tuple_field_by_path(struct lua_State *L) index -= TUPLE_INDEX_BASE; if (index < 0) { not_found: + if (!path) + goto exit_not_found; + uint32_t path_len = strlen(path); + uint32_t path_hash = lua_hash(path, path_len); + field = tuple_field_by_name(tuple, path, + path_len, path_hash); + if (field) + goto push_value; + if (err_pos || path_len == 0) + luaL_error(L, "Error in path on position %d", err_pos); +exit_not_found: + json_path_parser_deinit(&parser); lua_pushinteger(L, -1); lua_pushnil(L); return 2; @@ -514,19 +531,21 @@ not_found: if (field == NULL) goto not_found; push_value: + json_path_parser_deinit(&parser); lua_pushinteger(L, 0); luamp_decode(L, luaL_msgpack_default, &field); return 2; } assert(lua_isstring(L, 2)); size_t path_len; - const char *path = lua_tolstring(L, 2, &path_len); - struct json_path_parser parser; + path = lua_tolstring(L, 2, &path_len); struct json_path_node node; - json_path_parser_create(&parser, path, path_len); + json_path_parser_init(&parser, path, path_len); int rc = json_path_next(&parser, &node); - if (rc != 0 || node.type == JSON_PATH_END) - luaL_error(L, "Error in path on position %d", rc); + if (rc != 0 || node.type == JSON_PATH_END) { + err_pos = rc; + goto not_found; + } if (node.type == JSON_PATH_NUM) { int index = node.num; if (index == 0) diff --git a/src/lib/json/path.c b/src/lib/json/path.c index 4a6174e..4aadb3a 100644 --- a/src/lib/json/path.c +++ b/src/lib/json/path.c @@ -31,8 +31,11 @@ #include "path.h" #include <ctype.h> +#include <unicode/uchar.h> #include "trivia/util.h" +#define REPLACEMENT_CHARACTER (0xFFFD) + /** Same as strtoull(), but with limited length. */ static inline uint64_t strntoull(const char *src, int len) { @@ -45,6 +48,51 @@ strntoull(const char *src, int len) { } /** + * Parse string and update parser's state. + * @param[out] parser JSON path parser. Upates pos, signs_read. + * @param[out] UChar32 to store result. + * + * @retval 1 Success. + * @retval 0 End of string. + * @retval -1 Parse error. + */ +static inline int +parser_read_sign(struct json_path_parser *parser, UChar32 *out) +{ + int rc; + UErrorCode status = U_ZERO_ERROR; + if (parser->pos == parser->end) + return 0; + *out = ucnv_getNextUChar(parser->utf8conv, &parser->pos, parser->end, &status); + parser->invalid_sign_off += (rc = U_SUCCESS(status)); + return rc ? 1 : -1; +} + +/** + * Parse string and update parser's state. + * @param[out] parser JSON path parser. Upates pos, signs_read. + * @param old parser read offset. + * @param signs to drop. + */ +static inline void +parser_reset_pos(struct json_path_parser *parser, const char *old_pos, int signs) +{ + parser->pos = old_pos; + parser->invalid_sign_off -= signs; +} + +static inline bool +string_valid_sign(UChar32 c) +{ + int8_t type = u_charType(c); + return !(c == REPLACEMENT_CHARACTER || + type == U_UNASSIGNED || + type == U_LINE_SEPARATOR || + type == U_CONTROL_CHAR || + type == U_PARAGRAPH_SEPARATOR); +} + +/** * Parse string identifier in quotes. Parser either stops right * after the closing quote, or returns an error position. * @param parser JSON path parser. @@ -56,22 +104,24 @@ strntoull(const char *src, int len) { static inline int json_parse_string(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - const char *pos = parser->pos; - assert(pos < end); - char quote_type = *pos; - assert(quote_type == '\'' || quote_type == '"'); - /* Skip first quote. */ - int len = 0; - ++pos; - const char *str = pos; - for (char c = *pos; pos < end && quote_type != c; c = *++pos) - ++len; - /* A string must be terminated with quote. */ - if (*pos != quote_type || len == 0) - return pos - parser->src + 1; - /* Skip the closing quote. */ - parser->pos = pos + 1; + assert(parser->pos < parser->end); + UChar32 quote_type; + (void)parser_read_sign(parser, "e_type); + assert(quote_type == (UChar32)'\'' || quote_type == (UChar32)'"'); + const char *str = parser->pos; + UChar32 c = 0; + int rc = 0; + + while (((rc = parser_read_sign(parser, &c)) > 0) + && string_valid_sign(c) && c != quote_type); + int len = (int)(parser->pos - str - 1); + if (rc < 0 || len == 0) + return -1; + if (c != (UChar32)quote_type) { + parser->invalid_sign_off++; + return -1; + } + node->type = JSON_PATH_STR; node->str = str; node->len = len; @@ -81,7 +131,7 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) /** * Parse digit sequence into integer until non-digit is met. * Parser stops right after the last digit. - * @param parser JSON parser. + * @param[out] parser JSON parser. Updates signs_read field. * @param[out] node JSON node to store result. * * @retval 0 Success. @@ -90,27 +140,40 @@ json_parse_string(struct json_path_parser *parser, struct json_path_node *node) static inline int json_parse_integer(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - const char *pos = parser->pos; - assert(pos < end); - const char *str = pos; - int len = 0; - for (char c = *pos; pos < end && isdigit(c); c = *++pos) - ++len; - if (len == 0) - return pos - parser->src + 1; - parser->pos = pos; + assert(parser->pos < parser->end); + const char *str = parser->pos; + const char *last_pos = parser->pos; + int len = 0, rc = 0; + UChar32 c = 0; + + while (((rc = parser_read_sign(parser, &c)) > 0) && u_isdigit(c)) { + last_pos = parser->pos; + len++; + } + if (rc > 0 && len > 0 && !u_isdigit(c)) + parser_reset_pos(parser, last_pos, 1); + if (rc < 0 || len == 0) + return -1; + node->type = JSON_PATH_NUM; node->num = strntoull(str, len); return 0; } +static inline bool +identifier_valid_sign(UChar32 c) +{ + return u_isUAlphabetic(c) + || c == (UChar32)'_' + || u_isdigit(c); +} + /** * Parse identifier out of quotes. It can contain only alphas, * digits and underscores. And can not contain digit at the first * position. Parser is stoped right after the last non-digit, * non-alpha and non-underscore symbol. - * @param parser JSON parser. + * @param[out] parser JSON parser. Updates signs_read field. * @param[out] node JSON node to store result. * * @retval 0 Success. @@ -120,68 +183,102 @@ static inline int json_parse_identifier(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; - const char *pos = parser->pos; - assert(pos < end); - const char *str = pos; - char c = *pos; + assert(parser->pos < parser->end); + const char *str = parser->pos; + UChar32 c; + int rc = 0; + if (parser_read_sign(parser, &c) < 0) + return -1; /* First symbol can not be digit. */ - if (!isalpha(c) && c != '_') - return pos - parser->src + 1; - int len = 1; - for (c = *++pos; pos < end && (isalpha(c) || c == '_' || isdigit(c)); - c = *++pos) - ++len; - assert(len > 0); - parser->pos = pos; + if (!u_isalpha(c) && c != (UChar32)'_') + return -1; + + const char *last_pos = parser->pos; + while ((rc = parser_read_sign(parser, &c)) > 0 && identifier_valid_sign(c)) + last_pos = parser->pos; + if (rc > 0 && !identifier_valid_sign(c)) + parser_reset_pos(parser, last_pos, 1); + if (rc < 0) + return -1; + node->type = JSON_PATH_STR; node->str = str; - node->len = len; + node->len = (int)(parser->pos - str); return 0; } int +json_path_parser_init(struct json_path_parser *parser, const char *src, + int src_len) +{ + UErrorCode status = U_ZERO_ERROR ; + parser->utf8conv = ucnv_open("utf8", &status); + if (U_FAILURE(status)) + return -1; + assert(parser->utf8conv); + parser->src = src; + parser->end = src + src_len; + parser->pos = src; + parser->invalid_sign_off = 0; + return 0; +} + +void +json_path_parser_deinit(struct json_path_parser *parser) +{ + if (parser->utf8conv) + ucnv_close(parser->utf8conv); +} + +static inline int +error_sign_offset(struct json_path_parser *parser) +{ + return parser->invalid_sign_off; +} + +int json_path_next(struct json_path_parser *parser, struct json_path_node *node) { - const char *end = parser->src + parser->src_len; + assert(parser->utf8conv); + const char *end = parser->end; if (end == parser->pos) { node->type = JSON_PATH_END; return 0; } - char c = *parser->pos; + UChar32 c = 0; + const char *last_pos = parser->pos; + if (parser_read_sign(parser, &c) < 0) + return error_sign_offset(parser); int rc; switch(c) { - case '[': - ++parser->pos; + case (UChar32)'[': /* Error for []. */ if (parser->pos == end) - return parser->pos - parser->src + 1; + return parser->invalid_sign_off; c = *parser->pos; if (c == '"' || c == '\'') rc = json_parse_string(parser, node); else rc = json_parse_integer(parser, node); if (rc != 0) - return rc; + return parser->invalid_sign_off; /* * Expression, started from [ must be finished * with ] regardless of its type. */ if (parser->pos == end || *parser->pos != ']') - return parser->pos - parser->src + 1; + return parser->invalid_sign_off + 1; /* Skip ]. */ - ++parser->pos; + (void)parser_read_sign(parser, &c); break; - case '.': - /* Skip dot. */ - ++parser->pos; - if (parser->pos == end) - return parser->pos - parser->src + 1; - FALLTHROUGH default: + if (c != (UChar32)'.') + parser_reset_pos(parser, last_pos, 1); + else if (parser->pos == end) + return parser->invalid_sign_off + 1; rc = json_parse_identifier(parser, node); if (rc != 0) - return rc; + return parser->invalid_sign_off; break; } return 0; diff --git a/src/lib/json/path.h b/src/lib/json/path.h index 6e8db4c..0ff68c4 100644 --- a/src/lib/json/path.h +++ b/src/lib/json/path.h @@ -33,6 +33,9 @@ #include <stdbool.h> #include <stdint.h> +#include <unicode/ucnv_err.h> +#include <unicode/ucnv.h> +#include <assert.h> #ifdef __cplusplus extern "C" { @@ -45,10 +48,12 @@ extern "C" { struct json_path_parser { /** Source string. */ const char *src; - /** Length of src. */ - int src_len; + /** End of string. */ + const char *end; /** Current parser's position. */ const char *pos; + int invalid_sign_off; + UConverter* utf8conv; }; enum json_path_type { @@ -78,19 +83,22 @@ struct json_path_node { }; /** - * Create @a parser. + * Init @a parser. * @param[out] parser Parser to create. * @param src Source string. * @param src_len Length of @a src. + * @retval 0 Success. + * @retval -1 Init error. */ -static inline void -json_path_parser_create(struct json_path_parser *parser, const char *src, - int src_len) -{ - parser->src = src; - parser->src_len = src_len; - parser->pos = src; -} +int +json_path_parser_init(struct json_path_parser *parser, const char *src, + int src_len); +/** + * Deinit @a parser. + * @param[out] parser instance to deinit. + */ +void +json_path_parser_deinit(struct json_path_parser *parser); /** * Get a next path node. diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 2d7367a..6b597d6 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -602,7 +602,10 @@ format[2] = {name = 'field2', type = 'array'} format[3] = {name = 'field3', type = 'map'} --- ... -format[4] = {name = 'field4', type = 'string'} +format[4] = {name = 'field4', type = 'string' } +--- +... +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = 'string'} --- ... s = box.schema.space.create('test', {format = format}) @@ -611,13 +614,13 @@ s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') --- ... -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, key="value1", value="key1"}} --- ... field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} --- ... -t = s:replace{1, field2, field3, "123456"} +t = s:replace{1, field2, field3, "123456", "yes, this"} --- ... t[1] @@ -626,7 +629,7 @@ t[1] ... t[2] --- -- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] +- [1, 2, 3, '4', [5, 6, 7], {'привет中国world': {'中国': 'привет'}, 'key': 'value1', 'value': 'key1'}] ... t[3] --- @@ -667,19 +670,43 @@ t["[2][5][3]"] ... t["[2][6].key"] --- -- key1 +- value1 ... t["[2][6].value"] --- -- value1 +- key1 ... t["[2][6]['key']"] --- -- key1 +- value1 ... t["[2][6]['value']"] --- -- value1 +- key1 +... +t[2][6].привет中国world.中国 +--- +- привет +... +t["[2][6].привет中国world"].中国 +--- +- привет +... +t["[2][6].привет中国world.中国"] +--- +- привет +... +t["[2][6]['привет中国world']"]["中国"] +--- +- привет +... +t["[2][6]['привет中国world']['中国']"] +--- +- привет +... +t["[2][6]['привет中国world']['中国a']"] +--- +- yes, this ... t["[3].k3[2].c"] --- diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index ba3482d..90da8b2 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -204,12 +204,13 @@ format = {} format[1] = {name = 'field1', type = 'unsigned'} format[2] = {name = 'field2', type = 'array'} format[3] = {name = 'field3', type = 'map'} -format[4] = {name = 'field4', type = 'string'} +format[4] = {name = 'field4', type = 'string' } +format[5] = {name = "[2][6]['привет中国world']['中国a']", type = 'string'} s = box.schema.space.create('test', {format = format}) pk = s:create_index('pk') -field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} +field2 = {1, 2, 3, "4", {5,6,7}, {привет中国world={中国="привет"}, key="value1", value="key1"}} field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} -t = s:replace{1, field2, field3, "123456"} +t = s:replace{1, field2, field3, "123456", "yes, this"} t[1] t[2] t[3] @@ -225,6 +226,12 @@ t["[2][6].key"] t["[2][6].value"] t["[2][6]['key']"] t["[2][6]['value']"] +t[2][6].привет中国world.中国 +t["[2][6].привет中国world"].中国 +t["[2][6].привет中国world.中国"] +t["[2][6]['привет中国world']"]["中国"] +t["[2][6]['привет中国world']['中国']"] +t["[2][6]['привет中国world']['中国a']"] t["[3].k3[2].c"] t["[4]"] t.field1 diff --git a/test/unit/CMakeLists.txt b/test/unit/CMakeLists.txt index fe8b2d2..667194c 100644 --- a/test/unit/CMakeLists.txt +++ b/test/unit/CMakeLists.txt @@ -130,7 +130,7 @@ add_executable(csv.test csv.c) target_link_libraries(csv.test csv) add_executable(json_path.test json_path.c) -target_link_libraries(json_path.test json_path unit) +target_link_libraries(json_path.test json_path unit ${ICU_LIBRARIES}) add_executable(rmean.test rmean.cc) target_link_libraries(rmean.test stat unit) diff --git a/test/unit/json_path.c b/test/unit/json_path.c index 599658b..b62afd2 100644 --- a/test/unit/json_path.c +++ b/test/unit/json_path.c @@ -6,7 +6,7 @@ #define reset_to_new_path(value) \ path = value; \ len = strlen(value); \ - json_path_parser_create(&parser, path, len); + (void)json_path_parser_init(&parser, path, len); #define is_next_index(value_len, value) \ path = parser.pos; \ @@ -30,6 +30,7 @@ test_basic() const char *path; int len; struct json_path_parser parser; + memset(&parser, 0, sizeof(parser)); struct json_path_node node; reset_to_new_path("[0].field1.field2['field3'][5]"); @@ -89,6 +90,7 @@ test_errors() const char *path; int len; struct json_path_parser parser; + memset(&parser, 0, sizeof(parser)); const struct path_and_errpos errors[] = { /* Double [[. */ {"[[", 2}, -- 2.7.4 On 29.03.2018 17:22, Kirill Shcherbatov wrote: > From: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> > > In progress ... > > Closes #1285 > --- > src/box/CMakeLists.txt | 2 +- > src/box/lua/tuple.c | 176 +++++++++++++++++++++++++++++++++++----- > src/box/lua/tuple.lua | 45 +++-------- > test/engine/tuple.result | 198 +++++++++++++++++++++++++++++++++++++++++++++ > test/engine/tuple.test.lua | 59 ++++++++++++++ > 5 files changed, 428 insertions(+), 52 deletions(-) > > diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt > index e420fe3..add0ff9 100644 > --- a/src/box/CMakeLists.txt > +++ b/src/box/CMakeLists.txt > @@ -130,5 +130,5 @@ add_library(box STATIC > ${bin_sources}) > > target_link_libraries(box box_error tuple stat xrow xlog vclock crc32 scramble > - ${common_libraries}) > + json_path ${common_libraries}) > add_dependencies(box build_bundled_libs) > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 7ca4299..99b9ff2 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -41,6 +41,7 @@ > #include "box/tuple_convert.h" > #include "box/errcode.h" > #include "box/memtx_tuple.h" > +#include "json/path.h" > > /** {{{ box.tuple Lua library > * > @@ -402,36 +403,175 @@ lbox_tuple_transform(struct lua_State *L) > } > > /** > - * Find a tuple field using its name. > + * Propagate @a field to MessagePack(field)[index]. > + * @param[in][out] field Field to propagate. > + * @param index 1-based index to propagate to. > + * > + * @retval 0 Success, the index was found. > + * @retval -1 Not found. > + */ > +static inline int > +tuple_field_go_to_index(const char **field, uint64_t index) > +{ > + assert(index >= 0); > + enum mp_type type = mp_typeof(**field); > + if (type == MP_ARRAY) { > + if (index == 0) > + return -1; > + /* Make index 0-based. */ > + index -= TUPLE_INDEX_BASE; > + uint32_t count = mp_decode_array(field); > + if (index >= count) > + return -1; > + for (; index > 0; --index) > + mp_next(field); > + return 0; > + } else if (type == MP_MAP) { > + uint64_t count = mp_decode_map(field); > + for (; count > 0; --count) { > + type = mp_typeof(**field); > + if (type == MP_UINT) { > + uint64_t value = mp_decode_uint(field); > + if (value == index) > + return 0; > + } else if (type == MP_INT) { > + int64_t value = mp_decode_int(field); > + if (value >= 0 && (uint64_t)value == index) > + return 0; > + } else { > + /* Skip key. */ > + mp_next(field); > + } > + /* Skip value. */ > + mp_next(field); > + } > + } > + return -1; > +} > + > +/** > + * Propagate @a field to MessagePack(field)[key]. > + * @param[in][out] field Field to propagate. > + * @param key Key to propagate to. > + * @param len Length of @a key. > + * > + * @retval 0 Success, the index was found. > + * @retval -1 Not found. > + */ > +static inline int > +tuple_field_go_to_key(const char **field, const char *key, int len) > +{ > + enum mp_type type = mp_typeof(**field); > + if (type != MP_MAP) > + return -1; > + uint64_t count = mp_decode_map(field); > + for (; count > 0; --count) { > + type = mp_typeof(**field); > + if (type == MP_STR) { > + uint32_t value_len; > + const char *value = mp_decode_str(field, &value_len); > + if (value_len == (uint)len && > + memcmp(value, key, len) == 0) > + return 0; > + } else { > + /* Skip key. */ > + mp_next(field); > + } > + /* Skip value. */ > + mp_next(field); > + } > + return -1; > +} > + > +/** > + * Find a tuple field by JSON path. > * @param L Lua state. > - * @param tuple 1-th argument on lua stack, tuple to get field > + * @param tuple 1-th argument on a lua stack, tuple to get field > * from. > - * @param field_name 2-th argument on lua stack, field name to > - * get. > + * @param path 2-th argument on lua stack. Can be field name, > + * JSON path to a field or a field number. > * > * @retval If a field was not found, return -1 and nil to lua else > * return 0 and decoded field. > */ > static int > -lbox_tuple_field_by_name(struct lua_State *L) > +lbox_tuple_field_by_path(struct lua_State *L) > { > + const char *field; > struct tuple *tuple = luaT_istuple(L, 1); > /* Is checked in Lua wrapper. */ > assert(tuple != NULL); > - assert(lua_isstring(L, 2)); > - size_t name_len; > - const char *name = lua_tolstring(L, 2, &name_len); > - uint32_t name_hash = lua_hashstring(L, 2); > - const char *field = > - tuple_field_by_name(tuple, name, name_len, name_hash); > - if (field == NULL) { > - lua_pushinteger(L, -1); > - lua_pushnil(L); > + if (lua_isnumber(L, 2)) { > + int index = lua_tointeger(L, 2); > + index -= TUPLE_INDEX_BASE; > + if (index < 0) { > +not_found: > + lua_pushinteger(L, -1); > + lua_pushnil(L); > + return 2; > + } > + field = tuple_field(tuple, index); > + if (field == NULL) > + goto not_found; > +push_value: > + lua_pushinteger(L, 0); > + luamp_decode(L, luaL_msgpack_default, &field); > return 2; > } > - lua_pushinteger(L, 0); > - luamp_decode(L, luaL_msgpack_default, &field); > - return 2; > + assert(lua_isstring(L, 2)); > + size_t path_len; > + const char *path = lua_tolstring(L, 2, &path_len); > + struct json_path_parser parser; > + struct json_path_node node; > + json_path_parser_create(&parser, path, path_len); > + int rc = json_path_next(&parser, &node); > + if (rc != 0 || node.type == JSON_PATH_END) > + luaL_error(L, "Error in path on position %d", rc); > + if (node.type == JSON_PATH_NUM) { > + int index = node.num; > + if (index == 0) > + goto not_found; > + index -= TUPLE_INDEX_BASE; > + field = tuple_field(tuple, index); > + if (field == NULL) > + goto not_found; > + } else { > + assert(node.type == JSON_PATH_STR); > + /* First part of a path is a field name. */ > + const char *name = node.str; > + uint32_t name_len = node.len; > + uint32_t name_hash; > + if (path_len == name_len) { > + name_hash = lua_hashstring(L, 2); > + } else { > + /* > + * If a string is "field....", then its > + * precalculated juajit hash can not be > + * used. A tuple dictionary hashes only > + * name, not path. > + */ > + name_hash = lua_hash(name, name_len); > + } > + field = tuple_field_by_name(tuple, name, name_len, name_hash); > + if (field == NULL) > + goto not_found; > + } > + while ((rc = json_path_next(&parser, &node)) == 0 && > + node.type != JSON_PATH_END) { > + if (node.type == JSON_PATH_NUM) { > + rc = tuple_field_go_to_index(&field, node.num); > + } else { > + assert(node.type == JSON_PATH_STR); > + rc = tuple_field_go_to_key(&field, node.str, node.len); > + } > + if (rc != 0) > + goto not_found; > + } > + if (rc == 0) > + goto push_value; > + luaL_error(L, "Error in path on position %d", rc); > + unreachable(); > + goto not_found; > } > > static int > @@ -470,8 +610,8 @@ static const struct luaL_Reg lbox_tuple_meta[] = { > {"tostring", lbox_tuple_to_string}, > {"slice", lbox_tuple_slice}, > {"transform", lbox_tuple_transform}, > - {"tuple_field_by_name", lbox_tuple_field_by_name}, > {"tuple_to_map", lbox_tuple_to_map}, > + {"tuple_field_by_path", lbox_tuple_field_by_path}, > {NULL, NULL} > }; > > diff --git a/src/box/lua/tuple.lua b/src/box/lua/tuple.lua > index 001971a..b51b4df 100644 > --- a/src/box/lua/tuple.lua > +++ b/src/box/lua/tuple.lua > @@ -9,16 +9,9 @@ local internal = require('box.internal') > > ffi.cdef[[ > /** \cond public */ > -typedef struct tuple_format box_tuple_format_t; > - > -box_tuple_format_t * > -box_tuple_format_default(void); > > typedef struct tuple box_tuple_t; > > -box_tuple_t * > -box_tuple_new(box_tuple_format_t *format, const char *data, const char *end); > - > int > box_tuple_ref(box_tuple_t *tuple); > > @@ -34,9 +27,6 @@ box_tuple_bsize(const box_tuple_t *tuple); > ssize_t > box_tuple_to_buf(const box_tuple_t *tuple, char *buf, size_t size); > > -box_tuple_format_t * > -box_tuple_format(const box_tuple_t *tuple); > - > const char * > box_tuple_field(const box_tuple_t *tuple, uint32_t i); > > @@ -278,9 +268,9 @@ end > > msgpackffi.on_encode(const_tuple_ref_t, tuple_to_msgpack) > > -local function tuple_field_by_name(tuple, name) > +local function tuple_field_by_path(tuple, path) > tuple_check(tuple, "tuple['field_name']"); > - return internal.tuple.tuple_field_by_name(tuple, name) > + return internal.tuple.tuple_field_by_path(tuple, path) > end > > local methods = { > @@ -306,33 +296,22 @@ end > > methods["__serialize"] = tuple_totable -- encode hook for msgpack/yaml/json > > -local tuple_field = function(tuple, field_n) > - local field = builtin.box_tuple_field(tuple, field_n - 1) > - if field == nil then > - return nil > - end > - -- Use () to shrink stack to the first return value > - return (msgpackffi.decode_unchecked(field)) > -end > - > - > ffi.metatype(tuple_t, { > __len = function(tuple) > return builtin.box_tuple_field_count(tuple) > end; > __tostring = internal.tuple.tostring; > __index = function(tuple, key) > - if type(key) == "number" then > - return tuple_field(tuple, key) > - elseif type(key) == "string" then > - -- Try to get a field with a name = key. If it was not > - -- found (rc ~= 0) then return a method from the > - -- vtable. If a collision occurred, then fields have > - -- higher priority. For example, if a tuple T has a > - -- field with name 'bsize', then T.bsize returns field > - -- value, not tuple_bsize function. To access hidden > - -- methods use 'box.tuple.<method_name>(T, [args...])'. > - local rc, field = tuple_field_by_name(tuple, key) > + if type(key) == "string" or type(key) == "number" then > + -- Try to get a field by json path or by [index]. If > + -- it was not found (rc ~= 0) then return a method > + -- from the vtable. If a collision occurred, then > + -- fields have higher priority. For example, if a > + -- tuple T has a field with name 'bsize', then T.bsize > + -- returns field value, not tuple_bsize function. To > + -- access hidden methods use > + -- 'box.tuple.<method_name>(T, [args...])'. > + local rc, field = tuple_field_by_path(tuple, key) > if rc == 0 then > return field > end > diff --git a/test/engine/tuple.result b/test/engine/tuple.result > index b3b23b2..2d7367a 100644 > --- a/test/engine/tuple.result > +++ b/test/engine/tuple.result > @@ -590,6 +590,204 @@ maplen(t1map), t1map[1], t1map[2], t1map[3] > s:drop() > --- > ... > +format = {} > +--- > +... > +format[1] = {name = 'field1', type = 'unsigned'} > +--- > +... > +format[2] = {name = 'field2', type = 'array'} > +--- > +... > +format[3] = {name = 'field3', type = 'map'} > +--- > +... > +format[4] = {name = 'field4', type = 'string'} > +--- > +... > +s = box.schema.space.create('test', {format = format}) > +--- > +... > +pk = s:create_index('pk') > +--- > +... > +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +--- > +... > +field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} > +--- > +... > +t = s:replace{1, field2, field3, "123456"} > +--- > +... > +t[1] > +--- > +- 1 > +... > +t[2] > +--- > +- [1, 2, 3, '4', [5, 6, 7], {'key': 'key1', 'value': 'value1'}] > +... > +t[3] > +--- > +- {'k1': 100, 'k3': [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}], -1: 200, 10: 100, 'k2': [ > + 1, 2, 3]} > +... > +t[4] > +--- > +- '123456' > +... > +t[2][1] > +--- > +- 1 > +... > +t["[2][1]"] > +--- > +- 1 > +... > +t[2][5] > +--- > +- [5, 6, 7] > +... > +t["[2][5]"] > +--- > +- [5, 6, 7] > +... > +t["[2][5][1]"] > +--- > +- 5 > +... > +t["[2][5][2]"] > +--- > +- 6 > +... > +t["[2][5][3]"] > +--- > +- 7 > +... > +t["[2][6].key"] > +--- > +- key1 > +... > +t["[2][6].value"] > +--- > +- value1 > +... > +t["[2][6]['key']"] > +--- > +- key1 > +... > +t["[2][6]['value']"] > +--- > +- value1 > +... > +t["[3].k3[2].c"] > +--- > +- 3 > +... > +t["[4]"] > +--- > +- '123456' > +... > +t.field1 > +--- > +- 1 > +... > +t.field2[5] > +--- > +- [5, 6, 7] > +... > +t[".field1"] > +--- > +- 1 > +... > +t["field1"] > +--- > +- 1 > +... > +t["[3][10]"] > +--- > +- 100 > +... > +-- Not found. > +t[0] > +--- > +- null > +... > +t["[0]"] > +--- > +- null > +... > +t["[1000]"] > +--- > +- null > +... > +t.field1000 > +--- > +- null > +... > +t["not_found"] > +--- > +- null > +... > +t["[2][5][10]"] > +--- > +- null > +... > +t["[2][6].key100"] > +--- > +- null > +... > +t["[2][0]"] -- 0-based index in array. > +--- > +- null > +... > +t["[4][3]"] -- Can not index string. > +--- > +- null > +... > +t["[4]['key']"] > +--- > +- null > +... > +-- Not found 'a'. Return 'null' despite of syntax error on a > +-- next position. > +t["a.b.c d.e.f"] > +--- > +- null > +... > +-- Sytax errors. > +t[""] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 0' > +... > +t["[2].[5]"] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 5' > +... > +t["[-1]"] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 2' > +... > +t[".."] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 2' > +... > +t["[["] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 2' > +... > +t["]]"] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 1' > +... > +t["{"] > +--- > +- error: 'builtin/box/tuple.lua:314: Error in path on position 1' > +... > +s:drop() > +--- > +... > engine = nil > --- > ... > diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua > index 6d7d254..ba3482d 100644 > --- a/test/engine/tuple.test.lua > +++ b/test/engine/tuple.test.lua > @@ -200,5 +200,64 @@ t1map = t1:tomap() > maplen(t1map), t1map[1], t1map[2], t1map[3] > s:drop() > > +format = {} > +format[1] = {name = 'field1', type = 'unsigned'} > +format[2] = {name = 'field2', type = 'array'} > +format[3] = {name = 'field3', type = 'map'} > +format[4] = {name = 'field4', type = 'string'} > +s = box.schema.space.create('test', {format = format}) > +pk = s:create_index('pk') > +field2 = {1, 2, 3, "4", {5,6,7}, {key="key1", value="value1"}} > +field3 = {[10] = 100, k1 = 100, k2 = {1,2,3}, k3 = { {a=1, b=2}, {c=3, d=4} }, [-1] = 200} > +t = s:replace{1, field2, field3, "123456"} > +t[1] > +t[2] > +t[3] > +t[4] > +t[2][1] > +t["[2][1]"] > +t[2][5] > +t["[2][5]"] > +t["[2][5][1]"] > +t["[2][5][2]"] > +t["[2][5][3]"] > +t["[2][6].key"] > +t["[2][6].value"] > +t["[2][6]['key']"] > +t["[2][6]['value']"] > +t["[3].k3[2].c"] > +t["[4]"] > +t.field1 > +t.field2[5] > +t[".field1"] > +t["field1"] > +t["[3][10]"] > + > +-- Not found. > +t[0] > +t["[0]"] > +t["[1000]"] > +t.field1000 > +t["not_found"] > +t["[2][5][10]"] > +t["[2][6].key100"] > +t["[2][0]"] -- 0-based index in array. > +t["[4][3]"] -- Can not index string. > +t["[4]['key']"] > +-- Not found 'a'. Return 'null' despite of syntax error on a > +-- next position. > +t["a.b.c d.e.f"] > + > +-- Sytax errors. > +t[""] > +t["[2].[5]"] > +t["[-1]"] > +t[".."] > +t["[["] > +t["]]"] > +t["{"] > + > +s:drop() > + > engine = nil > test_run = nil > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v2 3/3] Multibyte characters support ICU 2018-04-04 10:37 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support ICU Kirill Shcherbatov @ 2018-04-04 11:30 ` Vladislav Shpilevoy 0 siblings, 0 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-04-04 11:30 UTC (permalink / raw) To: tarantool-patches, Kirill Shcherbatov Hello. See my 12 comments below. And seems, that you ignored my appeal to do not copy pase diff into the letter. Again: indentation here is completely broken - all tabs are turned into 2 spaces, it is unreadable. 04.04.2018 13:37, Kirill Shcherbatov пишет: > ICU Implementation > > From 8703e465382ba05817e7703550694c3790972e54 Mon Sep 17 00:00:00 2001 > Message-Id: > <8703e465382ba05817e7703550694c3790972e54.1522838002.git.kshcherbatov@tarantool.org> > In-Reply-To: <cover.1522838002.git.kshcherbatov@tarantool.org> > References: <cover.1522838002.git.kshcherbatov@tarantool.org> > From: Kirill Shcherbatov <kshcherbatov@tarantool.org> > Date: Wed, 4 Apr 2018 13:06:22 +0300 > Subject: [PATCH v2 3/3] ICU Unicode parsing implementation > > --- > src/box/lua/tuple.c | 31 +++++-- > src/lib/json/path.c | 211 > +++++++++++++++++++++++++++++++++------------ > src/lib/json/path.h | 30 ++++--- > test/engine/tuple.result | 43 +++++++-- > test/engine/tuple.test.lua | 13 ++- > test/unit/CMakeLists.txt | 2 +- > test/unit/json_path.c | 4 +- > 7 files changed, 247 insertions(+), 87 deletions(-) > > diff --git a/src/box/lua/tuple.c b/src/box/lua/tuple.c > index 99b9ff2..b89e1f9 100644 > --- a/src/box/lua/tuple.c > +++ b/src/box/lua/tuple.c > @@ -413,7 +413,6 @@ lbox_tuple_transform(struct lua_State *L) > static inline int > tuple_field_go_to_index(const char **field, uint64_t index) > { > - assert(index >= 0); > enum mp_type type = mp_typeof(**field); > if (type == MP_ARRAY) { > if (index == 0) > @@ -497,6 +496,12 @@ tuple_field_go_to_key(const char **field, const > char *key, int len) > static int > lbox_tuple_field_by_path(struct lua_State *L) > { > + int err_pos = 0; > + struct json_path_parser parser; > + /* Need uninitialized structure to > + * json_path_parser_deinit on lua_isnumber */ > + memset(&parser, 0, sizeof(parser)); > + const char *path = NULL; 1. Please, rename init back to create. In Tarantool we do not use init name for objects. We use create name to initialize an object. And new to allocate + initialize. 2. Remove deinit method. If you need UConverter, you could create it in a global static variable in path.c and initialize it in main() call via json_path_open() call or something. Look other libraries for examples. It is too expensive to do ucnv_open/ucnv_close on each parser creation. When update by JSON path will be implemented, the parser will be being created and destroyed very often. In any case, it is the comment on future, because here you can delete UConverter using U8_NEXT. See http://icu-project.org/apiref/icu4c/utf8_8h.html#a6d4c94e845b059fddba0c51e9bad87fd. It 1) does not use external state like converter, 2) handle errors for you. And please, consider how to remove errors raising from this function. It was written by me, when old errors handling convention was accepted, and now it must be fixed. It would be very good, if you will do it in a separate commit before introducing unicode. > const char *field; > struct tuple *tuple = luaT_istuple(L, 1); > /* Is checked in Lua wrapper. */ > @@ -506,6 +511,18 @@ lbox_tuple_field_by_path(struct lua_State *L) > index -= TUPLE_INDEX_BASE; > if (index < 0) { > not_found: 3. Please, find a way to simplify the code. I have tried to do it for you, and the code becomes already much simpler, if these gotos are moved into the end of the function. You can go off this hit of. > + if (!path) > + goto exit_not_found; > + uint32_t path_len = strlen(path); > + uint32_t path_hash = lua_hash(path, path_len); > + field = tuple_field_by_name(tuple, path, > + path_len, path_hash); > + if (field) > + goto push_value; > + if (err_pos || path_len == 0) > + luaL_error(L, "Error in path on position %d", err_pos); > +exit_not_found: > + json_path_parser_deinit(&parser); 4. Deinit will be completely removed, when you will use U8_NEXT. > diff --git a/src/lib/json/path.c b/src/lib/json/path.c > index 4a6174e..4aadb3a 100644 > --- a/src/lib/json/path.c > +++ b/src/lib/json/path.c > @@ -31,8 +31,11 @@ > > #include "path.h" > #include <ctype.h> > +#include <unicode/uchar.h> > #include "trivia/util.h" > > +#define REPLACEMENT_CHARACTER (0xFFFD) 5. What is replacement character? > + > /** Same as strtoull(), but with limited length. */ > static inline uint64_t > strntoull(const char *src, int len) { > @@ -45,6 +48,51 @@ strntoull(const char *src, int len) { > } > > /** > + * Parse string and update parser's state. > + * @param[out] parser JSON path parser. Upates pos, signs_read. > + * @param[out] UChar32 to store result. > + * > + * @retval 1 Success. > + * @retval 0 End of string. > + * @retval -1 Parse error. > + */ > +static inline int > +parser_read_sign(struct json_path_parser *parser, UChar32 *out) > +{ > + int rc; > + UErrorCode status = U_ZERO_ERROR; > + if (parser->pos == parser->end) > + return 0; > + *out = ucnv_getNextUChar(parser->utf8conv, &parser->pos, > parser->end, &status); > + parser->invalid_sign_off += (rc = U_SUCCESS(status)); > + return rc ? 1 : -1; > +} > + > +/** > + * Parse string and update parser's state. 6. The comment seems to be not for this function. > + * @param[out] parser JSON path parser. Upates pos, signs_read. > + * @param old parser read offset. > + * @param signs to drop. > + */ > +static inline void > +parser_reset_pos(struct json_path_parser *parser, const char > *old_pos, int signs) > +{ > + parser->pos = old_pos; > + parser->invalid_sign_off -= signs; 7. You can do not restore old pos - if the string is invalid somewhere, it is completely invalid and will not be parsed after. > +} > + > +static inline bool > +string_valid_sign(UChar32 c) > +{ > + int8_t type = u_charType(c); > + return !(c == REPLACEMENT_CHARACTER || > + type == U_UNASSIGNED || > + type == U_LINE_SEPARATOR || > + type == U_CONTROL_CHAR || > + type == U_PARAGRAPH_SEPARATOR); > +} 8. If a string is invalid, you will see it in output status output variable in parser_read_sign. I do not see a sense to check for errors here again. > @@ -81,7 +131,7 @@ json_parse_string(struct json_path_parser *parser, > struct json_path_node *node) > /** > * Parse digit sequence into integer until non-digit is met. > * Parser stops right after the last digit. > - * @param parser JSON parser. > + * @param[out] parser JSON parser. Updates signs_read field. 9. The parser is not [out] parameter here. A parameters is called [out], when it is its single mission. Here it is [in][out] parameter actually, but it can be omitted. > +static inline int > +error_sign_offset(struct json_path_parser *parser) > +{ > + return parser->invalid_sign_off; > +} 10. I see no sense in this wrapper. And looks like you too, because somewhere you continue to use invalid_sign_off with no wrapper. And on a future: when you add a new method for a struct, please, name it using <struct_name> prefix. > > diff --git a/src/lib/json/path.h b/src/lib/json/path.h > index 6e8db4c..0ff68c4 100644 > --- a/src/lib/json/path.h > +++ b/src/lib/json/path.h > @@ -33,6 +33,9 @@ > > #include <stdbool.h> > #include <stdint.h> > +#include <unicode/ucnv_err.h> 11. This header is unused here. > +#include <unicode/ucnv.h> 12. Since the UConverter is used inside path.c only, you can remove this header, and predeclare a converter using the code: struct UConverter; But in any case it will be deleted, when you starting to use U8_NEXT. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-04-05 23:32 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-29 14:22 [tarantool-patches] [PATCH v2 0/3] tuple field access via a json path Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 1/3] Introduce json_path_parser Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 2/3] lua: implement json path access to tuple fields Kirill Shcherbatov 2018-03-29 14:22 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support Kirill Shcherbatov 2018-03-29 18:04 ` [tarantool-patches] " Kirill Shcherbatov 2018-03-30 10:24 ` v.shpilevoy 2018-03-30 10:25 ` v.shpilevoy 2018-04-02 19:19 ` Kirill Shcherbatov 2018-04-03 10:20 ` Vladislav Shpilevoy 2018-04-05 14:09 ` [tarantool-patches] [PATCH v2 1/1] ICU Unicode support for JSON parser Kirill Shcherbatov 2018-04-05 18:00 ` [tarantool-patches] " Kirill Shcherbatov 2018-04-05 23:32 ` Vladislav Shpilevoy 2018-04-04 10:37 ` [tarantool-patches] [PATCH v2 3/3] Multibyte characters support ICU Kirill Shcherbatov 2018-04-04 11:30 ` [tarantool-patches] " Vladislav Shpilevoy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox