From: Vladimir Davydov <vdavydov.dev@gmail.com> To: Kirill Shcherbatov <kshcherbatov@tarantool.org> Cc: tarantool-patches@freelists.org Subject: Re: [PATCH v4 09/14] lib: introduce json_path_normalize routine Date: Tue, 16 Oct 2018 11:39:06 +0300 [thread overview] Message-ID: <20181016083906.vcapdiopyg2k6oqk@esperanza> (raw) In-Reply-To: <d2d1f9d989f3c5931e8131daff5ff1efdb0ef0e0.1539244271.git.kshcherbatov@tarantool.org> On Thu, Oct 11, 2018 at 10:58:54AM +0300, Kirill Shcherbatov wrote: > Introduced a new routine json_path_normalize that makes a > conversion of JSON path to the 'canonical' form: > - all maps keys are specified with operator ["key"] form > - all array indexes are specified with operator [i] form. > This notation is preferable because in the general case it can > be uniquely parsed. > We need such API in JSON indexes patch to store all paths in > 'canonical' form to commit the path uniqueness checks and > to tune access with JSON path hashtable. > > Need for #1012. Nit: Needed for #1012 > --- > src/lib/json/path.c | 25 +++++++++++++++++++++++++ > src/lib/json/path.h | 18 ++++++++++++++++++ > test/unit/json_path.c | 41 ++++++++++++++++++++++++++++++++++++++++- > test/unit/json_path.result | 14 +++++++++++++- > 4 files changed, 96 insertions(+), 2 deletions(-) > > diff --git a/src/lib/json/path.c b/src/lib/json/path.c > index 2e72930..0eb5d49 100644 > --- a/src/lib/json/path.c > +++ b/src/lib/json/path.c > @@ -242,3 +242,28 @@ json_path_next(struct json_path_parser *parser, struct json_path_node *node) > return json_parse_identifier(parser, node); > } > } > + > +int > +json_path_normalize(const char *path, uint32_t path_len, char *out) > +{ > + struct json_path_parser parser; > + struct json_path_node node; > + json_path_parser_create(&parser, path, path_len); > + int rc; > + while ((rc = json_path_next(&parser, &node)) == 0 && > + node.type != JSON_PATH_END) { > + if (node.type == JSON_PATH_NUM) { > + out += sprintf(out, "[%llu]", > + (unsigned long long)node.num); > + } else if (node.type == JSON_PATH_STR) { > + out += sprintf(out, "[\"%.*s\"]", node.len, node.str); Using sprintf here is unsafe, because it can write beyond the buffer boundaries. The API should be reworked so that the output buffer length is passed explicitly and the function returns the number of printed characters, in an snprintf-like manner. You might want to use SNPRINT helper macro. BTW, it may be worth renaming json_path_node to json_path_token and json_path_parser to json_path_tokenizer so as not to confuse path tokens with tree nodes. > + } else { > + unreachable(); > + } > + }; > + if (rc != 0) > + return rc; > + *out = '\0'; > + assert(node.type == JSON_PATH_END); > + return 0; > +} > diff --git a/src/lib/json/path.h b/src/lib/json/path.h > index c3c381a..f6b2ee2 100644 > --- a/src/lib/json/path.h > +++ b/src/lib/json/path.h > @@ -105,6 +105,24 @@ json_path_parser_create(struct json_path_parser *parser, const char *src, > int > json_path_next(struct json_path_parser *parser, struct json_path_node *node); > > +/** > + * Convert path to the 'canonical' form: > + * - all maps keys are specified with operator ["key"] form > + * - all array indexes are specified with operator [i] form. > + * This notation is preferable because in the general case it can > + * be uniquely parsed. > + * @param path Source path string to be converted. > + * @param path_len The length of the @path. > + * @param[out] out Memory to store normalized string. > + * The worst-case scenario require > + * 2.5 * path_len + 1 buffer. > + * @retval 0 On success. > + * @retval > 0 Position of a syntax error. A position is 1-based > + * and starts from a beginning of a source string. > + */ > +int > +json_path_normalize(const char *path, uint32_t path_len, char *out); > + > #ifdef __cplusplus > } > #endif > diff --git a/test/unit/json_path.c b/test/unit/json_path.c > index 9a1de06..775b0b1 100644 > --- a/test/unit/json_path.c > +++ b/test/unit/json_path.c > @@ -360,15 +360,54 @@ test_tree() > footer(); > } > > +void > +test_normalize_path() > +{ > + header(); > + plan(8); > + > + const char *path_normalized = "[\"FIO\"][3][\"fname\"]"; > + const char *path1 = "FIO[3].fname"; > + const char *path2 = "[\"FIO\"][3].fname"; > + const char *path3 = "FIO[3][\"fname\"]"; > + char buff[strlen(path_normalized) + 1]; > + int rc; > + > + rc = json_path_normalize(path_normalized, strlen(path_normalized), > + buff); > + is(rc, 0, "normalize '%s' path status", path_normalized); > + is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare", > + path_normalized); > + > + rc = json_path_normalize(path1, strlen(path1), buff); > + is(rc, 0, "normalize '%s' path status", path1); > + is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare", > + path1); > + > + rc = json_path_normalize(path2, strlen(path2), buff); > + is(rc, 0, "normalize '%s' path status", path2); > + is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare", > + path2); > + > + rc = json_path_normalize(path3, strlen(path3), buff); > + is(rc, 0, "normalize '%s' path status", path3); > + is(strcmp(buff, path_normalized), 0, "normalize '%s' path compare", > + path3); I guess, you should test invalid paths as well so as to cover error paths.
prev parent reply other threads:[~2018-10-16 8:39 UTC|newest] Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-10-11 7:58 [PATCH v4 00/14] box: indexes by JSON path Kirill Shcherbatov 2018-10-11 7:58 ` [PATCH v4 01/14] box: refactor key_def_find routine Kirill Shcherbatov 2018-10-15 17:27 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 10/14] box: introduce JSON indexes Kirill Shcherbatov 2018-10-16 9:33 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 11/14] box: introduce has_json_paths flag in templates Kirill Shcherbatov 2018-10-11 7:58 ` [PATCH v4 12/14] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov 2018-10-11 7:58 ` [PATCH v4 13/14] box: introduce offset slot cache in key_part Kirill Shcherbatov 2018-10-11 7:58 ` [PATCH v4 14/14] box: specify indexes in user-friendly form Kirill Shcherbatov 2018-10-11 7:58 ` [PATCH v4 02/14] box: introduce key_def_parts_are_sequential Kirill Shcherbatov 2018-10-15 17:29 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 03/14] box: introduce tuple_field_by_relative_path Kirill Shcherbatov 2018-10-15 17:46 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 04/14] box: introduce tuple_format_add_key_part Kirill Shcherbatov 2018-10-15 19:39 ` Vladimir Davydov 2018-10-11 7:58 ` [tarantool-patches] [PATCH v4 05/14] box: introduce tuple_format_sizeof routine Kirill Shcherbatov 2018-10-15 17:52 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 06/14] box: move tuple_field_go_to_{index,key} definition Kirill Shcherbatov 2018-10-16 8:15 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 07/14] box: drop format const qualifier in *init_field_map Kirill Shcherbatov 2018-10-11 7:58 ` [PATCH v4 08/14] lib: implement JSON tree class for json library Kirill Shcherbatov 2018-10-16 8:26 ` Vladimir Davydov 2018-10-11 7:58 ` [PATCH v4 09/14] lib: introduce json_path_normalize routine Kirill Shcherbatov 2018-10-16 8:39 ` Vladimir Davydov [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20181016083906.vcapdiopyg2k6oqk@esperanza \ --to=vdavydov.dev@gmail.com \ --cc=kshcherbatov@tarantool.org \ --cc=tarantool-patches@freelists.org \ --subject='Re: [PATCH v4 09/14] lib: introduce json_path_normalize routine' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox