* [PATCH v3 0/7] box: introduce multikey indexes in memtx @ 2019-04-02 15:49 Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter Kirill Shcherbatov ` (6 more replies) 0 siblings, 7 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Multikey indexes allows you to automatically index set of documents by JSON paths having array index placeholder "[*]". Multikey index cannot be primary as it cannot be unique(by definition). Multikey index parts must be compatible: only one "[*]" placeholder is allowed in same position(for all JSON paths in index parts). Changes in version 3: - introduced tuple_parse_iterator class to encapsulate tuple parse details - refactored json lib helpers - introduced mp_stack_top helper in mspuck library - stubs for unused functions - refactored tuple_extract set - fixed unique multikey index - new descriptive comments Changes in version 2: - introduced field_map_builder class to perform field_map preparation - reworked code: new helpers json_path_is_multikey and new flag json_path_cmp(is_weak_cmp = true) - rebased to actual tuple hints head - new tests v2: https://www.freelists.org/post/tarantool-patches/PATCH-v5-44-box-introduce-multikey-indexes http://github.com/tarantool/tarantool/tree/kshch/gh-1257-multikey-indexes https://github.com/tarantool/tarantool/issues/1257 Kirill Shcherbatov (7): box: cleanup key_def virtual extract_key setter lib: introduce a new json_path_multikey_offset helper box: move offset_slot init to tuple_format_add_field lib: update msgpuck library box: introduce tuple_parse_iterator class box: introduce field_map_builder class box: introduce multikey indexes in memtx src/box/CMakeLists.txt | 1 + src/box/errcode.h | 1 + src/box/field_map.c | 125 +++++++++++ src/box/field_map.h | 250 +++++++++++++++++++++ src/box/index_def.c | 5 + src/box/key_def.c | 112 ++++++--- src/box/key_def.h | 34 +++ src/box/memtx_engine.c | 8 +- src/box/memtx_space.c | 18 ++ src/box/memtx_tree.c | 180 ++++++++++++--- src/box/sql.c | 6 +- src/box/tuple.c | 42 +++- src/box/tuple.h | 85 +++++-- src/box/tuple_compare.cc | 150 ++++++++++--- src/box/tuple_extract_key.cc | 129 ++++++----- src/box/tuple_format.c | 411 +++++++++++++++++++++------------- src/box/tuple_format.h | 109 ++++++++- src/box/vinyl.c | 5 + src/box/vy_stmt.c | 102 +++------ src/lib/json/json.c | 18 ++ src/lib/json/json.h | 10 + src/lib/msgpuck | 2 +- test/box/misc.result | 1 + test/engine/json.result | 13 -- test/engine/json.test.lua | 7 - test/engine/multikey.result | 298 ++++++++++++++++++++++++ test/engine/multikey.test.lua | 88 ++++++++ test/unit/json.c | 32 ++- test/unit/json.result | 12 +- 29 files changed, 1842 insertions(+), 412 deletions(-) create mode 100644 src/box/field_map.c create mode 100644 src/box/field_map.h create mode 100644 test/engine/multikey.result create mode 100644 test/engine/multikey.test.lua -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-03 12:42 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper Kirill Shcherbatov ` (5 subsequent siblings) 6 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov This patch inspired by 082cffca4dba attempts to simplify setting appropriate tuple_extract_key pointer for plain and json indexes in key_def_set_extract_func routine. Being split to plain and json blocks this code becomes easier to understand and extend. In further patches we need to introduce is_multikey branch and without this refactoring required amendments turn the key_def_set_extract_func code into a mess. Needed for #1257 --- src/box/tuple_extract_key.cc | 98 ++++++++++++++++++------------------ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 0a8337102..836d4e565 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -341,62 +341,62 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, return key; } -static const tuple_extract_key_t extract_key_slowpath_funcs[] = { - tuple_extract_key_slowpath<false, false, false>, - tuple_extract_key_slowpath<true, false, false>, - tuple_extract_key_slowpath<false, true, false>, - tuple_extract_key_slowpath<true, true, false>, - tuple_extract_key_slowpath<false, false, true>, - tuple_extract_key_slowpath<true, false, true>, - tuple_extract_key_slowpath<false, true, true>, - tuple_extract_key_slowpath<true, true, true> -}; - /** * Initialize tuple_extract_key() and tuple_extract_key_raw() */ -void -key_def_set_extract_func(struct key_def *key_def) +template<bool contains_sequential_parts, bool has_optional_parts> +static void +key_def_set_extract_func_plain(struct key_def *def) { - if (key_def_is_sequential(key_def)) { - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - key_def->tuple_extract_key = - tuple_extract_key_sequential<true>; - key_def->tuple_extract_key_raw = - tuple_extract_key_sequential_raw<true>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_sequential<false>; - key_def->tuple_extract_key_raw = - tuple_extract_key_sequential_raw<false>; - } + assert(!def->has_json_paths); + if (key_def_is_sequential(def)) { + assert(contains_sequential_parts || def->part_count == 1); + def->tuple_extract_key = tuple_extract_key_sequential + <has_optional_parts>; + def->tuple_extract_key_raw = tuple_extract_key_sequential_raw + <has_optional_parts>; } else { - int func_idx = - (key_def_contains_sequential_parts(key_def) ? 1 : 0) + - 2 * (key_def->has_optional_parts ? 1 : 0) + - 4 * (key_def->has_json_paths ? 1 : 0); - key_def->tuple_extract_key = - extract_key_slowpath_funcs[func_idx]; - assert(!key_def->has_optional_parts || key_def->is_nullable); + def->tuple_extract_key = tuple_extract_key_slowpath + <contains_sequential_parts, + has_optional_parts, false>; + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw + <has_optional_parts, false>; } - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - if (key_def->has_json_paths) { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true, true>; - } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true, false>; - } +} + +template<bool contains_sequential_parts, bool has_optional_parts> +static void +key_def_set_extract_func_json(struct key_def *def) +{ + assert(def->has_json_paths); + def->tuple_extract_key = tuple_extract_key_slowpath + <contains_sequential_parts, + has_optional_parts, true>; + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw + <has_optional_parts, true>; +} + +void +key_def_set_extract_func(struct key_def *key_def) +{ + int func_idx = (key_def_contains_sequential_parts(key_def) ? 1 : 0) + + 2 * (key_def->has_optional_parts ? 1 : 0); + if (!key_def->has_json_paths) { + void (*set_extract_func[])(struct key_def *) = { + key_def_set_extract_func_plain<false, false>, + key_def_set_extract_func_plain<true, false>, + key_def_set_extract_func_plain<false, true>, + key_def_set_extract_func_plain<true, true>, + }; + set_extract_func[func_idx](key_def); } else { - if (key_def->has_json_paths) { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false, true>; - } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false, false>; - } + void (*set_extract_func[])(struct key_def *) = { + key_def_set_extract_func_json<false, false>, + key_def_set_extract_func_json<true, false>, + key_def_set_extract_func_json<false, true>, + key_def_set_extract_func_json<true, true>, + }; + set_extract_func[func_idx](key_def); } } -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter 2019-04-02 15:49 ` [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter Kirill Shcherbatov @ 2019-04-03 12:42 ` Vladimir Davydov 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov 0 siblings, 1 reply; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 12:42 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:32PM +0300, Kirill Shcherbatov wrote: > This patch inspired by 082cffca4dba attempts to simplify setting > appropriate tuple_extract_key pointer for plain and json indexes > in key_def_set_extract_func routine. > Being split to plain and json blocks this code becomes easier > to understand and extend. > > In further patches we need to introduce is_multikey branch and > without this refactoring required amendments turn the > key_def_set_extract_func code into a mess. > > Needed for #1257 > --- > src/box/tuple_extract_key.cc | 98 ++++++++++++++++++------------------ > 1 file changed, 49 insertions(+), 49 deletions(-) > > diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc > index 0a8337102..836d4e565 100644 > --- a/src/box/tuple_extract_key.cc > +++ b/src/box/tuple_extract_key.cc > @@ -341,62 +341,62 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, > return key; > } > > -static const tuple_extract_key_t extract_key_slowpath_funcs[] = { > - tuple_extract_key_slowpath<false, false, false>, > - tuple_extract_key_slowpath<true, false, false>, > - tuple_extract_key_slowpath<false, true, false>, > - tuple_extract_key_slowpath<true, true, false>, > - tuple_extract_key_slowpath<false, false, true>, > - tuple_extract_key_slowpath<true, false, true>, > - tuple_extract_key_slowpath<false, true, true>, > - tuple_extract_key_slowpath<true, true, true> > -}; > - > /** > * Initialize tuple_extract_key() and tuple_extract_key_raw() > */ > -void > -key_def_set_extract_func(struct key_def *key_def) > +template<bool contains_sequential_parts, bool has_optional_parts> > +static void > +key_def_set_extract_func_plain(struct key_def *def) > { > - if (key_def_is_sequential(key_def)) { > - if (key_def->has_optional_parts) { > - assert(key_def->is_nullable); > - key_def->tuple_extract_key = > - tuple_extract_key_sequential<true>; > - key_def->tuple_extract_key_raw = > - tuple_extract_key_sequential_raw<true>; > - } else { > - key_def->tuple_extract_key = > - tuple_extract_key_sequential<false>; > - key_def->tuple_extract_key_raw = > - tuple_extract_key_sequential_raw<false>; > - } > + assert(!def->has_json_paths); > + if (key_def_is_sequential(def)) { > + assert(contains_sequential_parts || def->part_count == 1); > + def->tuple_extract_key = tuple_extract_key_sequential > + <has_optional_parts>; > + def->tuple_extract_key_raw = tuple_extract_key_sequential_raw > + <has_optional_parts>; > } else { > - int func_idx = > - (key_def_contains_sequential_parts(key_def) ? 1 : 0) + > - 2 * (key_def->has_optional_parts ? 1 : 0) + > - 4 * (key_def->has_json_paths ? 1 : 0); > - key_def->tuple_extract_key = > - extract_key_slowpath_funcs[func_idx]; > - assert(!key_def->has_optional_parts || key_def->is_nullable); > + def->tuple_extract_key = tuple_extract_key_slowpath > + <contains_sequential_parts, > + has_optional_parts, false>; > + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw > + <has_optional_parts, false>; > } > - if (key_def->has_optional_parts) { > - assert(key_def->is_nullable); > - if (key_def->has_json_paths) { > - key_def->tuple_extract_key_raw = > - tuple_extract_key_slowpath_raw<true, true>; > - } else { > - key_def->tuple_extract_key_raw = > - tuple_extract_key_slowpath_raw<true, false>; > - } > +} > + > +template<bool contains_sequential_parts, bool has_optional_parts> > +static void > +key_def_set_extract_func_json(struct key_def *def) > +{ > + assert(def->has_json_paths); > + def->tuple_extract_key = tuple_extract_key_slowpath > + <contains_sequential_parts, > + has_optional_parts, true>; > + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw > + <has_optional_parts, true>; > +} > + > +void > +key_def_set_extract_func(struct key_def *key_def) > +{ > + int func_idx = (key_def_contains_sequential_parts(key_def) ? 1 : 0) + > + 2 * (key_def->has_optional_parts ? 1 : 0); > + if (!key_def->has_json_paths) { > + void (*set_extract_func[])(struct key_def *) = { > + key_def_set_extract_func_plain<false, false>, > + key_def_set_extract_func_plain<true, false>, > + key_def_set_extract_func_plain<false, true>, > + key_def_set_extract_func_plain<true, true>, > + }; > + set_extract_func[func_idx](key_def); > } else { > - if (key_def->has_json_paths) { > - key_def->tuple_extract_key_raw = > - tuple_extract_key_slowpath_raw<false, true>; > - } else { > - key_def->tuple_extract_key_raw = > - tuple_extract_key_slowpath_raw<false, false>; > - } > + void (*set_extract_func[])(struct key_def *) = { > + key_def_set_extract_func_json<false, false>, > + key_def_set_extract_func_json<true, false>, > + key_def_set_extract_func_json<false, true>, > + key_def_set_extract_func_json<true, true>, > + }; > + set_extract_func[func_idx](key_def); > } > } Yeah, this looks better. However, I'd also replace set_extract_func with plain if-else-if - it would take only a couple lines longer, but look more straightforward IMO. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter 2019-04-03 12:42 ` Vladimir Davydov @ 2019-04-03 16:22 ` Kirill Shcherbatov 2019-04-03 18:01 ` Vladimir Davydov 0 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-03 16:22 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov > Yeah, this looks better. However, I'd also replace set_extract_func with > plain if-else-if - it would take only a couple lines longer, but look > more straightforward IMO. I don't mind. Updated version is on the branch. ====================================================== This patch inspired by 082cffca4dba attempts to simplify setting appropriate tuple_extract_key pointer for plain and json indexes in key_def_set_extract_func routine. Being split to plain and json blocks this code becomes easier to understand and extend. In further patches we need to introduce is_multikey branch and without this refactoring required amendments turn the key_def_set_extract_func code into a mess. Needed for #1257 --- src/box/tuple_extract_key.cc | 97 +++++++++++++++++++----------------- 1 file changed, 52 insertions(+), 45 deletions(-) diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 0a8337102..28ca80cf8 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -341,61 +341,68 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, return key; } -static const tuple_extract_key_t extract_key_slowpath_funcs[] = { - tuple_extract_key_slowpath<false, false, false>, - tuple_extract_key_slowpath<true, false, false>, - tuple_extract_key_slowpath<false, true, false>, - tuple_extract_key_slowpath<true, true, false>, - tuple_extract_key_slowpath<false, false, true>, - tuple_extract_key_slowpath<true, false, true>, - tuple_extract_key_slowpath<false, true, true>, - tuple_extract_key_slowpath<true, true, true> -}; - /** * Initialize tuple_extract_key() and tuple_extract_key_raw() */ -void -key_def_set_extract_func(struct key_def *key_def) +template<bool contains_sequential_parts, bool has_optional_parts> +static void +key_def_set_extract_func_plain(struct key_def *def) { - if (key_def_is_sequential(key_def)) { - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - key_def->tuple_extract_key = - tuple_extract_key_sequential<true>; - key_def->tuple_extract_key_raw = - tuple_extract_key_sequential_raw<true>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_sequential<false>; - key_def->tuple_extract_key_raw = - tuple_extract_key_sequential_raw<false>; - } + assert(!def->has_json_paths); + if (key_def_is_sequential(def)) { + assert(contains_sequential_parts || def->part_count == 1); + def->tuple_extract_key = tuple_extract_key_sequential + <has_optional_parts>; + def->tuple_extract_key_raw = tuple_extract_key_sequential_raw + <has_optional_parts>; } else { - int func_idx = - (key_def_contains_sequential_parts(key_def) ? 1 : 0) + - 2 * (key_def->has_optional_parts ? 1 : 0) + - 4 * (key_def->has_json_paths ? 1 : 0); - key_def->tuple_extract_key = - extract_key_slowpath_funcs[func_idx]; - assert(!key_def->has_optional_parts || key_def->is_nullable); + def->tuple_extract_key = tuple_extract_key_slowpath + <contains_sequential_parts, + has_optional_parts, false>; + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw + <has_optional_parts, false>; } - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - if (key_def->has_json_paths) { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true, true>; +} + +template<bool contains_sequential_parts, bool has_optional_parts> +static void +key_def_set_extract_func_json(struct key_def *def) +{ + assert(def->has_json_paths); + def->tuple_extract_key = tuple_extract_key_slowpath + <contains_sequential_parts, + has_optional_parts, true>; + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw + <has_optional_parts, true>; +} + +void +key_def_set_extract_func(struct key_def *key_def) +{ + bool contains_sequential_parts = + key_def_contains_sequential_parts(key_def); + bool has_optional_parts = key_def->has_optional_parts; + if (!key_def->has_json_paths) { + if (!contains_sequential_parts && !has_optional_parts) { + key_def_set_extract_func_plain<false, false>(key_def); + } else if (!contains_sequential_parts && has_optional_parts) { + key_def_set_extract_func_plain<false, true>(key_def); + } else if (contains_sequential_parts && !has_optional_parts) { + key_def_set_extract_func_plain<true, false>(key_def); } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true, false>; + assert(contains_sequential_parts && has_optional_parts); + key_def_set_extract_func_plain<true, true>(key_def); } } else { - if (key_def->has_json_paths) { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false, true>; + if (!contains_sequential_parts && !has_optional_parts) { + key_def_set_extract_func_json<false, false>(key_def); + } else if (!contains_sequential_parts && has_optional_parts) { + key_def_set_extract_func_json<false, true>(key_def); + } else if (contains_sequential_parts && !has_optional_parts) { + key_def_set_extract_func_json<true, false>(key_def); } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false, false>; + assert(contains_sequential_parts && has_optional_parts); + key_def_set_extract_func_json<true, true>(key_def); } } } -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov @ 2019-04-03 18:01 ` Vladimir Davydov 0 siblings, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 18:01 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Wed, Apr 03, 2019 at 07:22:56PM +0300, Kirill Shcherbatov wrote: > > Yeah, this looks better. However, I'd also replace set_extract_func with > > plain if-else-if - it would take only a couple lines longer, but look > > more straightforward IMO. > > I don't mind. Updated version is on the branch. > > ====================================================== > > This patch inspired by 082cffca4dba attempts to simplify setting > appropriate tuple_extract_key pointer for plain and json indexes > in key_def_set_extract_func routine. > Being split to plain and json blocks this code becomes easier > to understand and extend. > > In further patches we need to introduce is_multikey branch and > without this refactoring required amendments turn the > key_def_set_extract_func code into a mess. > > Needed for #1257 > --- > src/box/tuple_extract_key.cc | 97 +++++++++++++++++++----------------- > 1 file changed, 52 insertions(+), 45 deletions(-) Pushed to master. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-03 12:56 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov ` (4 subsequent siblings) 6 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Introduced a new procedure json_path_multikey_offset. This helper scans the JSON path string and returns the offset of the first character [*] (the array index placeholder). We need this function in the scope of the multikey index patchset to extract the number of keys to be inserted into the index using JSON subpath that has json_path_multikey_offset() length. Needed for #1257 --- src/lib/json/json.c | 18 ++++++++++++++++++ src/lib/json/json.h | 10 ++++++++++ test/unit/json.c | 32 +++++++++++++++++++++++++++++++- test/unit/json.result | 12 +++++++++++- 4 files changed, 70 insertions(+), 2 deletions(-) diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 854158f63..2d3c35f4b 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -321,6 +321,24 @@ json_path_validate(const char *path, int path_len, int index_base) return rc; } +int +json_path_multikey_offset(const char *path, int path_len, int index_base) +{ + struct json_lexer lexer; + json_lexer_create(&lexer, path, path_len, index_base); + struct json_token token; + int rc, last_lexer_offset = 0; + while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { + if (token.type == JSON_TOKEN_ANY) + return last_lexer_offset; + else if (token.type == JSON_TOKEN_END) + break; + last_lexer_offset = lexer.offset; + } + assert(rc == 0); + return -1; +} + /** * An snprint-style helper to print an individual token key. */ diff --git a/src/lib/json/json.h b/src/lib/json/json.h index c1de3e579..2cc5f7b84 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -259,6 +259,16 @@ json_path_cmp(const char *a, int a_len, const char *b, int b_len, int json_path_validate(const char *path, int path_len, int index_base); +/** + * Scan the JSON path string and return the offset of the first + * character [*] (the array index placeholder). + * - if [*] is not found, -1 is returned. + * - specified JSON path must be valid + * (may be tested with json_path_validate). + */ +int +json_path_multikey_offset(const char *path, int path_len, int index_base); + /** * Test if a given JSON token is a JSON tree leaf, i.e. * has no child nodes. diff --git a/test/unit/json.c b/test/unit/json.c index fd320c9eb..194e1664b 100644 --- a/test/unit/json.c +++ b/test/unit/json.c @@ -555,17 +555,47 @@ test_path_snprint() footer(); } +void +test_path_multikey() +{ + static struct { + const char *str; + int rc; + } test_cases[] = { + {"", -1}, + {"[1]Data[1]extra[1]", -1}, + {"[*]Data[1]extra[1]", 0}, + {"[*]Data[*]extra[1]", 0}, + {"[1]Data[*]extra[1]", 7}, + {"[1]Data[1]extra[*]", 15}, + }; + + header(); + plan(lengthof(test_cases)); + for (unsigned i = 0; i < lengthof(test_cases); i++) { + int rc = json_path_multikey_offset(test_cases[i].str, + strlen(test_cases[i].str), + INDEX_BASE); + is(rc, test_cases[i].rc, "Test json_path_multikey_offset with " + "%s: have %d expected %d", test_cases[i].str, rc, + test_cases[i].rc); + } + check_plan(); + footer(); +} + int main() { header(); - plan(5); + plan(6); test_basic(); test_errors(); test_tree(); test_path_cmp(); test_path_snprint(); + test_path_multikey(); int rc = check_plan(); footer(); diff --git a/test/unit/json.result b/test/unit/json.result index 3a15e84bb..8f5fda6b1 100644 --- a/test/unit/json.result +++ b/test/unit/json.result @@ -1,5 +1,5 @@ *** main *** -1..5 +1..6 *** test_basic *** 1..71 ok 1 - parse <[1]> @@ -194,4 +194,14 @@ ok 4 - subtests ok 9 - 0-byte buffer - retval ok 5 - subtests *** test_path_snprint: done *** + *** test_path_multikey *** + 1..6 + ok 1 - Test json_path_multikey_offset with : have -1 expected -1 + ok 2 - Test json_path_multikey_offset with [1]Data[1]extra[1]: have -1 expected -1 + ok 3 - Test json_path_multikey_offset with [*]Data[1]extra[1]: have 0 expected 0 + ok 4 - Test json_path_multikey_offset with [*]Data[*]extra[1]: have 0 expected 0 + ok 5 - Test json_path_multikey_offset with [1]Data[*]extra[1]: have 7 expected 7 + ok 6 - Test json_path_multikey_offset with [1]Data[1]extra[*]: have 15 expected 15 +ok 6 - subtests + *** test_path_multikey: done *** *** main: done *** -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper 2019-04-02 15:49 ` [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper Kirill Shcherbatov @ 2019-04-03 12:56 ` Vladimir Davydov 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov 0 siblings, 1 reply; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 12:56 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:33PM +0300, Kirill Shcherbatov wrote: > Introduced a new procedure json_path_multikey_offset. This helper > scans the JSON path string and returns the offset of the first character > [*] (the array index placeholder). > > We need this function in the scope of the multikey index patchset to > extract the number of keys to be inserted into the index > using JSON subpath that has json_path_multikey_offset() length. > > Needed for #1257 > --- > src/lib/json/json.c | 18 ++++++++++++++++++ > src/lib/json/json.h | 10 ++++++++++ > test/unit/json.c | 32 +++++++++++++++++++++++++++++++- > test/unit/json.result | 12 +++++++++++- > 4 files changed, 70 insertions(+), 2 deletions(-) > > diff --git a/src/lib/json/json.c b/src/lib/json/json.c > index 854158f63..2d3c35f4b 100644 > --- a/src/lib/json/json.c > +++ b/src/lib/json/json.c > @@ -321,6 +321,24 @@ json_path_validate(const char *path, int path_len, int index_base) > return rc; > } > > +int > +json_path_multikey_offset(const char *path, int path_len, int index_base) > +{ > + struct json_lexer lexer; > + json_lexer_create(&lexer, path, path_len, index_base); > + struct json_token token; > + int rc, last_lexer_offset = 0; > + while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { > + if (token.type == JSON_TOKEN_ANY) > + return last_lexer_offset; > + else if (token.type == JSON_TOKEN_END) > + break; > + last_lexer_offset = lexer.offset; > + } > + assert(rc == 0); > + return -1; > +} IMO returning path_len if no multikey token is found would be more common (think of STL methods, which return end() iterator if not found). Doesn't really matter though. Other than that looks good to me. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper 2019-04-03 12:56 ` Vladimir Davydov @ 2019-04-03 16:22 ` Kirill Shcherbatov 2019-04-03 18:02 ` Vladimir Davydov 2019-04-04 6:17 ` Konstantin Osipov 0 siblings, 2 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-03 16:22 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov > IMO returning path_len if no multikey token is found would be more > common (think of STL methods, which return end() iterator if not found). > Doesn't really matter though. > > Other than that looks good to me. I don't mind. Updated version is on the branch. ================================================== Introduced a new procedure json_path_multikey_offset. This helper scans the JSON path string and returns the offset of the first character [*] (the array index placeholder). We need this function in the scope of the multikey index patchset to extract the number of keys to be inserted into the index using JSON subpath that has json_path_multikey_offset() length. Needed for #1257 --- src/lib/json/json.c | 18 ++++++++++++++++++ src/lib/json/json.h | 10 ++++++++++ test/unit/json.c | 32 +++++++++++++++++++++++++++++++- test/unit/json.result | 12 +++++++++++- 4 files changed, 70 insertions(+), 2 deletions(-) diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 854158f63..7ef511d4e 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -321,6 +321,24 @@ json_path_validate(const char *path, int path_len, int index_base) return rc; } +int +json_path_multikey_offset(const char *path, int path_len, int index_base) +{ + struct json_lexer lexer; + json_lexer_create(&lexer, path, path_len, index_base); + struct json_token token; + int rc, last_lexer_offset = 0; + while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { + if (token.type == JSON_TOKEN_ANY) + return last_lexer_offset; + else if (token.type == JSON_TOKEN_END) + break; + last_lexer_offset = lexer.offset; + } + assert(rc == 0); + return path_len; +} + /** * An snprint-style helper to print an individual token key. */ diff --git a/src/lib/json/json.h b/src/lib/json/json.h index c1de3e579..d66a9c7a4 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -259,6 +259,16 @@ json_path_cmp(const char *a, int a_len, const char *b, int b_len, int json_path_validate(const char *path, int path_len, int index_base); +/** + * Scan the JSON path string and return the offset of the first + * character [*] (the array index placeholder). + * - if [*] is not found, path_len is returned. + * - specified JSON path must be valid + * (may be tested with json_path_validate). + */ +int +json_path_multikey_offset(const char *path, int path_len, int index_base); + /** * Test if a given JSON token is a JSON tree leaf, i.e. * has no child nodes. diff --git a/test/unit/json.c b/test/unit/json.c index fd320c9eb..2b7236ea8 100644 --- a/test/unit/json.c +++ b/test/unit/json.c @@ -555,17 +555,47 @@ test_path_snprint() footer(); } +void +test_path_multikey() +{ + static struct { + const char *str; + int rc; + } test_cases[] = { + {"", 0}, + {"[1]Data[1]extra[1]", 18}, + {"[*]Data[1]extra[1]", 0}, + {"[*]Data[*]extra[1]", 0}, + {"[1]Data[*]extra[1]", 7}, + {"[1]Data[1]extra[*]", 15}, + }; + + header(); + plan(lengthof(test_cases)); + for (unsigned i = 0; i < lengthof(test_cases); i++) { + int rc = json_path_multikey_offset(test_cases[i].str, + strlen(test_cases[i].str), + INDEX_BASE); + is(rc, test_cases[i].rc, "Test json_path_multikey_offset with " + "%s: have %d expected %d", test_cases[i].str, rc, + test_cases[i].rc); + } + check_plan(); + footer(); +} + int main() { header(); - plan(5); + plan(6); test_basic(); test_errors(); test_tree(); test_path_cmp(); test_path_snprint(); + test_path_multikey(); int rc = check_plan(); footer(); diff --git a/test/unit/json.result b/test/unit/json.result index 3a15e84bb..1c3e7f7d2 100644 --- a/test/unit/json.result +++ b/test/unit/json.result @@ -1,5 +1,5 @@ *** main *** -1..5 +1..6 *** test_basic *** 1..71 ok 1 - parse <[1]> @@ -194,4 +194,14 @@ ok 4 - subtests ok 9 - 0-byte buffer - retval ok 5 - subtests *** test_path_snprint: done *** + *** test_path_multikey *** + 1..6 + ok 1 - Test json_path_multikey_offset with : have 0 expected 0 + ok 2 - Test json_path_multikey_offset with [1]Data[1]extra[1]: have 18 expected 18 + ok 3 - Test json_path_multikey_offset with [*]Data[1]extra[1]: have 0 expected 0 + ok 4 - Test json_path_multikey_offset with [*]Data[*]extra[1]: have 0 expected 0 + ok 5 - Test json_path_multikey_offset with [1]Data[*]extra[1]: have 7 expected 7 + ok 6 - Test json_path_multikey_offset with [1]Data[1]extra[*]: have 15 expected 15 +ok 6 - subtests + *** test_path_multikey: done *** *** main: done *** -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov @ 2019-04-03 18:02 ` Vladimir Davydov 2019-04-04 6:17 ` Konstantin Osipov 1 sibling, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 18:02 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Wed, Apr 03, 2019 at 07:22:57PM +0300, Kirill Shcherbatov wrote: > > IMO returning path_len if no multikey token is found would be more > > common (think of STL methods, which return end() iterator if not found). > > Doesn't really matter though. > > > > Other than that looks good to me. > > I don't mind. Updated version is on the branch. > > ================================================== > > Introduced a new procedure json_path_multikey_offset. This helper > scans the JSON path string and returns the offset of the first character > [*] (the array index placeholder). > > We need this function in the scope of the multikey index patchset to > extract the number of keys to be inserted into the index > using JSON subpath that has json_path_multikey_offset() length. > > Needed for #1257 > --- > src/lib/json/json.c | 18 ++++++++++++++++++ > src/lib/json/json.h | 10 ++++++++++ > test/unit/json.c | 32 +++++++++++++++++++++++++++++++- > test/unit/json.result | 12 +++++++++++- > 4 files changed, 70 insertions(+), 2 deletions(-) Pushed to master. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-03 18:02 ` Vladimir Davydov @ 2019-04-04 6:17 ` Konstantin Osipov 1 sibling, 0 replies; 29+ messages in thread From: Konstantin Osipov @ 2019-04-04 6:17 UTC (permalink / raw) To: tarantool-patches; +Cc: Vladimir Davydov * Kirill Shcherbatov <kshcherbatov@tarantool.org> [19/04/03 21:37]: > > IMO returning path_len if no multikey token is found would be more > > common (think of STL methods, which return end() iterator if not found). > > Doesn't really matter though. > > > > Other than that looks good to me. > > I don't mind. Updated version is on the branch. I agree, since "not found" is not an error, it's search result. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-03 12:57 ` Vladimir Davydov ` (2 more replies) 2019-04-02 15:49 ` [PATCH v3 4/7] lib: update msgpuck library Kirill Shcherbatov ` (3 subsequent siblings) 6 siblings, 3 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Due to the fact that the allocation of offset_slot in the case of multikey indexes will become more complicated and will be necessary for intermediate nodes of the tuple_field tree, we must move this logic to the tuple_format_add_field that performs an intermediate nodes allocation for a JSON path. Needed for #1257 --- src/box/tuple_format.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 22a84af86..1043707ad 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -205,13 +205,14 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id) */ static struct tuple_field * tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, - const char *path, uint32_t path_len, char **path_pool) + const char *path, uint32_t path_len, bool is_sequential, + int *current_slot, char **path_pool) { struct tuple_field *field = NULL; struct tuple_field *parent = tuple_format_field(format, fieldno); assert(parent != NULL); if (path == NULL) - return parent; + goto end; field = tuple_field_new(); if (field == NULL) goto fail; @@ -279,12 +280,23 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, assert(parent != NULL); /* Update tree depth information. */ format->fields_depth = MAX(format->fields_depth, token_count + 1); -end: +cleanup: tuple_field_delete(field); +end: + /* + * In the tuple, store only offsets necessary to access + * fields of non-sequential keys. First field is always + * simply accessible, so we don't store an offset for it. + */ + if (parent != NULL && parent->offset_slot == TUPLE_OFFSET_SLOT_NIL && + is_sequential == false && (fieldno > 0 || path != NULL)) { + *current_slot = *current_slot - 1; + parent->offset_slot = *current_slot; + } return parent; fail: parent = NULL; - goto end; + goto cleanup; } static int @@ -295,7 +307,8 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, assert(part->fieldno < tuple_format_field_count(format)); struct tuple_field *field = tuple_format_add_field(format, part->fieldno, part->path, - part->path_len, path_pool); + part->path_len, is_sequential, + current_slot, path_pool); if (field == NULL) return -1; /* @@ -350,17 +363,6 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, return -1; } field->is_key_part = true; - /* - * In the tuple, store only offsets necessary to access - * fields of non-sequential keys. First field is always - * simply accessible, so we don't store an offset for it. - */ - if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && - is_sequential == false && - (part->fieldno > 0 || part->path != NULL)) { - *current_slot = *current_slot - 1; - field->offset_slot = *current_slot; - } return 0; } -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field 2019-04-02 15:49 ` [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov @ 2019-04-03 12:57 ` Vladimir Davydov 2019-04-03 18:02 ` Vladimir Davydov 2019-04-04 6:19 ` [tarantool-patches] " Konstantin Osipov 2 siblings, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 12:57 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:34PM +0300, Kirill Shcherbatov wrote: > Due to the fact that the allocation of offset_slot in the case of > multikey indexes will become more complicated and will be > necessary for intermediate nodes of the tuple_field tree, we must > move this logic to the tuple_format_add_field that performs > an intermediate nodes allocation for a JSON path. > > Needed for #1257 > --- > src/box/tuple_format.c | 34 ++++++++++++++++++---------------- > 1 file changed, 18 insertions(+), 16 deletions(-) Trivial. Looks okay to me. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field 2019-04-02 15:49 ` [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov 2019-04-03 12:57 ` Vladimir Davydov @ 2019-04-03 18:02 ` Vladimir Davydov 2019-04-04 6:19 ` [tarantool-patches] " Konstantin Osipov 2 siblings, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 18:02 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:34PM +0300, Kirill Shcherbatov wrote: > Due to the fact that the allocation of offset_slot in the case of > multikey indexes will become more complicated and will be > necessary for intermediate nodes of the tuple_field tree, we must > move this logic to the tuple_format_add_field that performs > an intermediate nodes allocation for a JSON path. > > Needed for #1257 > --- > src/box/tuple_format.c | 34 ++++++++++++++++++---------------- > 1 file changed, 18 insertions(+), 16 deletions(-) Pushed to master. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field 2019-04-02 15:49 ` [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov 2019-04-03 12:57 ` Vladimir Davydov 2019-04-03 18:02 ` Vladimir Davydov @ 2019-04-04 6:19 ` Konstantin Osipov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 2 siblings, 1 reply; 29+ messages in thread From: Konstantin Osipov @ 2019-04-04 6:19 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov * Kirill Shcherbatov <kshcherbatov@tarantool.org> [19/04/02 19:27]: > Due to the fact that the allocation of offset_slot in the case of > multikey indexes will become more complicated and will be > necessary for intermediate nodes of the tuple_field tree, we must > move this logic to the tuple_format_add_field that performs > an intermediate nodes allocation for a JSON path. I actually had a nit about this patch, I think 3 goto labels in such a simple function is a bit of an overkill - the code is harder to follow than necessary. I would ditch the 'cleanup' label, this would make the code simpler. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field 2019-04-04 6:19 ` [tarantool-patches] " Konstantin Osipov @ 2019-04-05 17:17 ` Kirill Shcherbatov 0 siblings, 0 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-05 17:17 UTC (permalink / raw) To: tarantool-patches, Konstantin Osipov; +Cc: vdavydov.dev > I actually had a nit about this patch, I think 3 goto labels in > such a simple function is a bit of an overkill - the code is > harder to follow than necessary. I would ditch the 'cleanup' > label, this would make the code simpler Hi! Ok! I'll try to fix it with central patch that would touch this code region. I will deal with the last patch fixes one week later, after vacation. After investigation the causes of degradation on SQL tests conducted by Alexander, I think we can not worry about initializing the field map and concentrate on featured tuple_format_iterator. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 4/7] lib: update msgpuck library 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (2 preceding siblings ...) 2019-04-02 15:49 ` [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-03 17:49 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 5/7] box: introduce tuple_parse_iterator class Kirill Shcherbatov ` (2 subsequent siblings) 6 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov The msgpack dependency has been updated because the new version introduces the new method mp_stack_top for the mp_stack class which we will use to store a pointer for a multikey frame to initialize a field_map in case of multikey index. Needed for #1012 --- src/lib/msgpuck | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lib/msgpuck b/src/lib/msgpuck index 222b71aa6..316cea905 160000 --- a/src/lib/msgpuck +++ b/src/lib/msgpuck @@ -1 +1 @@ -Subproject commit 222b71aa63e8de6d0015588442d828460560d9be +Subproject commit 316cea905dca2ac8db8b1adb9d0040a9c338dc5c -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] [PATCH v3 4/7] lib: update msgpuck library 2019-04-02 15:49 ` [PATCH v3 4/7] lib: update msgpuck library Kirill Shcherbatov @ 2019-04-03 17:49 ` Kirill Shcherbatov 2019-04-04 15:54 ` Vladimir Davydov 0 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-03 17:49 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov Due to changes in the mp_stack API class, we must also fix tuple_format.c In order not to change this code twice, the patch was moved to the position subsequent iterator introduction. Now this is [6/7] patch in patchset. On the branch. ================================================ The msgpack dependency has been updated because the new version introduces the new method mp_stack_top for the mp_stack class which we will use to store a pointer for a multikey frame to initialize a field_map in case of multikey index. As the library API has changed, the code has been modified correspondingly. Needed for #1012 --- src/box/tuple_format.c | 9 +++++---- src/lib/msgpuck | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 9a643b700..7cfdd73bc 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -974,8 +974,8 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, struct tuple_field **field, const char **data, const char **data_end) { - int idx, rc = 0; - while ((idx = mp_stack_advance(&it->stack)) == -1) { + int rc = 0; + while (!mp_frame_advance(mp_stack_top(&it->stack))) { /* * If the elements of the current frame * are over, pop this frame out of stack @@ -993,12 +993,13 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, * current data offset to prepare the JSON token * for the subsequent format::fields lookup. */ + struct mp_frame *frame = mp_stack_top(&it->stack); struct json_token token; - switch (mp_stack_type(&it->stack)) { + switch (frame->type) { case MP_ARRAY: rc = 1; token.type = JSON_TOKEN_NUM; - token.num = idx; + token.num = frame->idx; break; case MP_MAP: rc = 2; diff --git a/src/lib/msgpuck b/src/lib/msgpuck index 222b71aa6..4166d0f77 160000 --- a/src/lib/msgpuck +++ b/src/lib/msgpuck @@ -1 +1 @@ -Subproject commit 222b71aa63e8de6d0015588442d828460560d9be +Subproject commit 4166d0f77496040bcd595d5b02bbf17e7fac5d28 -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] [PATCH v3 4/7] lib: update msgpuck library 2019-04-03 17:49 ` [tarantool-patches] " Kirill Shcherbatov @ 2019-04-04 15:54 ` Vladimir Davydov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 0 siblings, 1 reply; 29+ messages in thread From: Vladimir Davydov @ 2019-04-04 15:54 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Wed, Apr 03, 2019 at 08:49:21PM +0300, Kirill Shcherbatov wrote: > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 9a643b700..7cfdd73bc 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -974,8 +974,8 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, > struct tuple_field **field, const char **data, > const char **data_end) Once you figure out why msgpuck build failed on Travis (see my other email), please rebase this patch on top of the master branch so that we can push it right away. > { > - int idx, rc = 0; > - while ((idx = mp_stack_advance(&it->stack)) == -1) { > + int rc = 0; > + while (!mp_frame_advance(mp_stack_top(&it->stack))) { > /* > * If the elements of the current frame > * are over, pop this frame out of stack > @@ -993,12 +993,13 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, > * current data offset to prepare the JSON token > * for the subsequent format::fields lookup. > */ > + struct mp_frame *frame = mp_stack_top(&it->stack); Can we please refactor it a little so that we don't call mp_stack_top twice in case the frame is advanced? diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1043707a..d3c82a7a 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -850,8 +850,8 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, struct tuple_field *field; struct json_token *parent = &format->fields.root; while (true) { - int idx; - while ((idx = mp_stack_advance(&stack)) == -1) { + struct mp_stack_frame *frame = mp_stack_top(&stack); + while (!mp_frame_advance(frame)) { /* * If the elements of the current frame * are over, pop this frame out of stack @@ -863,6 +863,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, mp_stack_pop(&stack); if (mp_stack_is_empty(&stack)) goto finish; + frame = mp_stack_top(&stack); parent = parent->parent; } /* ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 4/7] lib: update msgpuck library 2019-04-04 15:54 ` Vladimir Davydov @ 2019-04-05 17:17 ` Kirill Shcherbatov 2019-04-07 12:22 ` Vladimir Davydov 0 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-05 17:17 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov > Other than that, the patch looks good, BUT for some reason it doesn't > compile on CentOS 6 on Travis, see > > https://travis-ci.org/tarantool/msgpuck/jobs/515314874 > > It fails with > > BFD: /build/BUILDROOT/msgpuck-2.0.23-1.el6.x86_64/usr/lib64/libmsgpuck.a(msgpuck.c.o): invalid relocation type 42 > > Please figure out why is that and fix if possible. > Once you figure out why msgpuck build failed on Travis (see my other > email), please rebase this patch on top of the master branch so that > we can push it right away. Unfortunately, I don't know what's wrong here. This problem is reproduced even on empty branch(master duplicate) https://travis-ci.org/tarantool/msgpuck/builds/516062909?utm_source=github_status&utm_medium=notification Seems something wrong with el6. I've googled that this connected with -fPIC. > Can we please refactor it a little so that we don't call mp_stack_top > twice in case the frame is advanced? Nice. Tnx! ============================================================ The msgpack dependency has been updated because the new version introduces the new method mp_stack_top for the mp_stack class which we will use to store a pointer for a multikey frame to initialize a field_map in case of multikey index. As the library API has changed, the code has been modified correspondingly. Needed for #1012 --- src/box/tuple_format.c | 9 +++++---- src/box/vy_stmt.c | 8 ++++---- src/lib/msgpuck | 2 +- 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1043707ad..093046b37 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -850,8 +850,8 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, struct tuple_field *field; struct json_token *parent = &format->fields.root; while (true) { - int idx; - while ((idx = mp_stack_advance(&stack)) == -1) { + struct mp_frame *frame = mp_stack_top(&stack); + while (!mp_frame_advance(frame)) { /* * If the elements of the current frame * are over, pop this frame out of stack @@ -863,6 +863,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, mp_stack_pop(&stack); if (mp_stack_is_empty(&stack)) goto finish; + frame = mp_stack_top(&stack); parent = parent->parent; } /* @@ -871,10 +872,10 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, * for the subsequent format::fields lookup. */ struct json_token token; - switch (mp_stack_type(&stack)) { + switch (frame->type) { case MP_ARRAY: token.type = JSON_TOKEN_NUM; - token.num = idx; + token.num = frame->idx; break; case MP_MAP: if (mp_typeof(*pos) != MP_STR) { diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 7387d72be..776b1f69c 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -447,18 +447,18 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format, struct tuple_field *field; struct json_token *parent = &format->fields.root; while (true) { - int idx; - while ((idx = mp_stack_advance(&stack)) == -1) { + struct mp_frame *frame = mp_stack_top(&stack); + while (!mp_frame_advance(frame)) { mp_stack_pop(&stack); if (mp_stack_is_empty(&stack)) goto finish; parent = parent->parent; } struct json_token token; - switch (mp_stack_type(&stack)) { + switch (frame->type) { case MP_ARRAY: token.type = JSON_TOKEN_NUM; - token.num = idx; + token.num = frame->idx; break; case MP_MAP: if (mp_typeof(*src_pos) != MP_STR) { diff --git a/src/lib/msgpuck b/src/lib/msgpuck index 222b71aa6..9d04562fd 160000 --- a/src/lib/msgpuck +++ b/src/lib/msgpuck @@ -1 +1 @@ -Subproject commit 222b71aa63e8de6d0015588442d828460560d9be +Subproject commit 9d04562fd898159f68b644ee11975210e15c6871 -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 4/7] lib: update msgpuck library 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov @ 2019-04-07 12:22 ` Vladimir Davydov 0 siblings, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-07 12:22 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Fri, Apr 05, 2019 at 08:17:25PM +0300, Kirill Shcherbatov wrote: > The msgpack dependency has been updated because the new version > introduces the new method mp_stack_top for the mp_stack class > which we will use to store a pointer for a multikey frame to > initialize a field_map in case of multikey index. > > As the library API has changed, the code has been modified > correspondingly. > > Needed for #1012 > --- > src/box/tuple_format.c | 9 +++++---- > src/box/vy_stmt.c | 8 ++++---- > src/lib/msgpuck | 2 +- > 3 files changed, 10 insertions(+), 9 deletions(-) > > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 1043707ad..093046b37 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -850,8 +850,8 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, > struct tuple_field *field; > struct json_token *parent = &format->fields.root; > while (true) { > - int idx; > - while ((idx = mp_stack_advance(&stack)) == -1) { > + struct mp_frame *frame = mp_stack_top(&stack); > + while (!mp_frame_advance(frame)) { > /* > * If the elements of the current frame > * are over, pop this frame out of stack > @@ -863,6 +863,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, > mp_stack_pop(&stack); > if (mp_stack_is_empty(&stack)) > goto finish; > + frame = mp_stack_top(&stack); > parent = parent->parent; > } > /* > @@ -871,10 +872,10 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, > * for the subsequent format::fields lookup. > */ > struct json_token token; > - switch (mp_stack_type(&stack)) { > + switch (frame->type) { > case MP_ARRAY: > token.type = JSON_TOKEN_NUM; > - token.num = idx; > + token.num = frame->idx; > break; > case MP_MAP: > if (mp_typeof(*pos) != MP_STR) { > diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c > index 7387d72be..776b1f69c 100644 > --- a/src/box/vy_stmt.c > +++ b/src/box/vy_stmt.c > @@ -447,18 +447,18 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format, > struct tuple_field *field; > struct json_token *parent = &format->fields.root; > while (true) { > - int idx; > - while ((idx = mp_stack_advance(&stack)) == -1) { > + struct mp_frame *frame = mp_stack_top(&stack); > + while (!mp_frame_advance(frame)) { > mp_stack_pop(&stack); > if (mp_stack_is_empty(&stack)) > goto finish; > parent = parent->parent; mp_stack_top is missing! This result in assertion failure in engine/json. Looks like you didn't bother to run tests before submitting the patch. Fixed and pushed to master. > } > struct json_token token; > - switch (mp_stack_type(&stack)) { > + switch (frame->type) { > case MP_ARRAY: > token.type = JSON_TOKEN_NUM; > - token.num = idx; > + token.num = frame->idx; > break; > case MP_MAP: > if (mp_typeof(*src_pos) != MP_STR) { ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 5/7] box: introduce tuple_parse_iterator class 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (3 preceding siblings ...) 2019-04-02 15:49 ` [PATCH v3 4/7] lib: update msgpuck library Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-03 14:04 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 6/7] box: introduce field_map_builder class Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 7/7] box: introduce multikey indexes in memtx Kirill Shcherbatov 6 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov The similar code in tuple_field_map_create and vy_stmt_new_surrogate_delete_raw that performs tuple parsing in deep has been refactored as reusable tuple_parse_iterator class. Being thus encapsulated, this code will be uniformly managed and extended in the further patches in scope of multikey indexes. Needed for #1257 --- src/box/tuple_format.c | 222 +++++++++++++++++++++++------------------ src/box/tuple_format.h | 65 ++++++++++++ src/box/vy_stmt.c | 93 ++++++----------- 3 files changed, 218 insertions(+), 162 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1043707ad..070897ec2 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -831,109 +831,34 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, * which key fields are absent in tuple_field(). */ memset((char *)*field_map - *field_map_size, 0, *field_map_size); - /* - * Prepare mp stack of the size equal to the maximum depth - * of the indexed field in the format::fields tree - * (fields_depth) to carry out a simultaneous parsing of - * the tuple and tree traversal to process type - * validations and field map initialization. - */ - uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); - struct mp_frame *frames = region_alloc(region, frames_sz); - if (frames == NULL) { - diag_set(OutOfMemory, frames_sz, "region", "frames"); + struct tuple_parse_iterator it; + if (tuple_parse_iterator_create(&it, format, pos, defined_field_count, + region) != 0) goto error; - } - struct mp_stack stack; - mp_stack_create(&stack, format->fields_depth, frames); - mp_stack_push(&stack, MP_ARRAY, defined_field_count); + const char *pos_end; struct tuple_field *field; - struct json_token *parent = &format->fields.root; - while (true) { - int idx; - while ((idx = mp_stack_advance(&stack)) == -1) { - /* - * If the elements of the current frame - * are over, pop this frame out of stack - * and climb one position in the - * format::fields tree to match the - * changed JSON path to the data in the - * tuple. - */ - mp_stack_pop(&stack); - if (mp_stack_is_empty(&stack)) - goto finish; - parent = parent->parent; - } - /* - * Use the top frame of the stack and the - * current data offset to prepare the JSON token - * for the subsequent format::fields lookup. - */ - struct json_token token; - switch (mp_stack_type(&stack)) { - case MP_ARRAY: - token.type = JSON_TOKEN_NUM; - token.num = idx; - break; - case MP_MAP: - if (mp_typeof(*pos) != MP_STR) { - /* - * JSON path support only string - * keys for map. Skip this entry. - */ - mp_next(&pos); - mp_next(&pos); - continue; - } - token.type = JSON_TOKEN_STR; - token.str = mp_decode_str(&pos, (uint32_t *)&token.len); - break; - default: - unreachable(); - } - /* - * Perform lookup for a field in format::fields, - * that represents the field metadata by JSON path - * corresponding to the current position in the - * tuple. - */ - enum mp_type type = mp_typeof(*pos); - assert(parent != NULL); - field = json_tree_lookup_entry(&format->fields, parent, &token, - struct tuple_field, token); - if (field != NULL) { - bool is_nullable = tuple_field_is_nullable(field); - if (validate && - !field_mp_type_is_compatible(field->type, type, - is_nullable) != 0) { - diag_set(ClientError, ER_FIELD_TYPE, - tuple_field_path(field), - field_type_strs[field->type]); - goto error; - } - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - (*field_map)[field->offset_slot] = pos - tuple; - if (required_fields != NULL) - bit_clear(required_fields, field->id); - } + while (tuple_parse_iterator_advice(&it, &field, &pos, &pos_end) > 0) { + if (field == NULL) + continue; /* - * If the current position of the data in tuple - * matches the container type (MP_MAP or MP_ARRAY) - * and the format::fields tree has such a record, - * prepare a new stack frame because it needs to - * be analyzed in the next iterations. + * Check if field mp_type is compatible with type + * defined in format. */ - if ((type == MP_ARRAY || type == MP_MAP) && - !mp_stack_is_full(&stack) && field != NULL) { - uint32_t size = type == MP_ARRAY ? - mp_decode_array(&pos) : - mp_decode_map(&pos); - mp_stack_push(&stack, type, size); - parent = &field->token; - } else { - mp_next(&pos); + bool is_nullable = tuple_field_is_nullable(field); + if (validate && + !field_mp_type_is_compatible(field->type, mp_typeof(*pos), + is_nullable) != 0) { + diag_set(ClientError, ER_FIELD_TYPE, + tuple_field_path(field), + field_type_strs[field->type]); + goto error; } + /* Initialize field_map with data offset. */ + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) + (*field_map)[field->offset_slot] = pos - tuple; + /* Mark this field as present in the tuple. */ + if (required_fields != NULL) + bit_clear(required_fields, field->id); } finish: /* @@ -1029,3 +954,104 @@ box_tuple_format_unref(box_tuple_format_t *format) tuple_format_unref(format); } +int +tuple_parse_iterator_create(struct tuple_parse_iterator *it, + struct tuple_format *format, const char *data, + uint32_t field_count, struct region *region) +{ + /* + * Prepare mp stack of the size equal to the maximum depth + * of the indexed field in the format::fields tree + * (fields_depth) to carry out a simultaneous parsing of + * the tuple and tree traversal. + */ + uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); + struct mp_frame *frames = region_alloc(region, frames_sz); + if (frames == NULL) { + diag_set(OutOfMemory, frames_sz, "region", "frames"); + return -1; + } + mp_stack_create(&it->stack, format->fields_depth, frames); + mp_stack_push(&it->stack, MP_ARRAY, field_count); + it->parent = &format->fields.root; + it->format = format; + it->pos = data; + return 0; +} + +int +tuple_parse_iterator_advice(struct tuple_parse_iterator *it, + struct tuple_field **field, const char **data, + const char **data_end) +{ + int idx, rc = 0; + while ((idx = mp_stack_advance(&it->stack)) == -1) { + /* + * If the elements of the current frame + * are over, pop this frame out of stack + * and climb one position in the format::fields + * tree to match the changed JSON path to the + * data in the tuple. + */ + mp_stack_pop(&it->stack); + if (mp_stack_is_empty(&it->stack)) + return rc; + it->parent = it->parent->parent; + } + /* + * Use the top frame of the stack and the + * current data offset to prepare the JSON token + * for the subsequent format::fields lookup. + */ + struct json_token token; + switch (mp_stack_type(&it->stack)) { + case MP_ARRAY: + rc = 1; + token.type = JSON_TOKEN_NUM; + token.num = idx; + break; + case MP_MAP: + rc = 2; + if (mp_typeof(*it->pos) != MP_STR) { + mp_next(&it->pos); + mp_next(&it->pos); + *field = NULL; + return rc; + } + token.type = JSON_TOKEN_STR; + token.str = mp_decode_str(&it->pos, (uint32_t *)&token.len); + break; + default: + unreachable(); + } + /* + * Perform lookup for a field in format::fields, + * that represents the field metadata by JSON path + * corresponding to the current position in the + * tuple. + */ + assert(it->parent != NULL); + *field = json_tree_lookup_entry(&it->format->fields, it->parent, &token, + struct tuple_field, token); + *data = it->pos; + /* + * If the current position of the data in tuple + * matches the container type (MP_MAP or MP_ARRAY) + * and the format::fields tree has such a record, + * prepare a new stack frame because it needs to + * be analyzed in the next iterations. + */ + enum mp_type type = mp_typeof(*it->pos); + if ((type == MP_ARRAY || type == MP_MAP) && + !mp_stack_is_full(&it->stack) && *field != NULL) { + uint32_t size = type == MP_ARRAY ? + mp_decode_array(&it->pos) : + mp_decode_map(&it->pos); + mp_stack_push(&it->stack, type, size); + it->parent = &(*field)->token; + } else { + mp_next(&it->pos); + } + *data_end = it->pos; + return rc; +} diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 22a0fb232..bef1d0903 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -412,6 +412,71 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, int tuple_format_init(); +/** + * A tuple msgpack iterator that parse tuple in deep an returns + * only fields that are described in the tuple_format. + */ +struct tuple_parse_iterator { + /** + * Tuple format is used to perform field lookups in + * format::fields JSON tree. + */ + struct tuple_format *format; + /** + * The pointer to the parent node in the format::fields + * JSON tree. Is required for relative lookup for the + * next field. + */ + struct json_token *parent; + /** + * Traversal stack of msgpack frames is used to determine + * when the parsing of the current composite mp structure + * (array or map) is completed to update to the parent + * pointer accordingly. + */ + struct mp_stack stack; + /** The current read position in msgpack. */ + const char *pos; +}; + +/** + * Initialize tuple parse iterator with tuple format, data pointer + * and the count of top-level msgpack fields to be processed. + * + * This function assumes that the msgpack header containing the + * number of top-level msgpack fields (field_count) has already + * been parsed and the data pointer has already been shifted + * correspondingly. This allows directly limit the number of + * fields that must be parsed. + + * Function uses the region for the traversal stack allocation. + * + * Returns 0 on success. In case of memory allocation error sets + * diag message and returns -1. + */ +int +tuple_parse_iterator_create(struct tuple_parse_iterator *it, + struct tuple_format *format, const char *data, + uint32_t field_count, struct region *region); + +/** + * Parse tuple in deep and update iterator state. + * + * Returns the number of fields at the current tuple nesting + * level that have been processed (2 for map item, 1 for array + * key:value pair, 0 on stop) and initializes: + * field - the tuple_field pointer to format::fields field + * that matches to the currently processed msgpack field + * (when exists), + * data - the pointer to the currently processed msgpack field, + * data_end - the pointer to the end of currently processed + * msgpack field. + */ +int +tuple_parse_iterator_advice(struct tuple_parse_iterator *it, + struct tuple_field **field, const char **data, + const char **data_end); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index add86622b..1e8bb7825 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -431,81 +431,46 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format, /* * Perform simultaneous parsing of the tuple and * format::fields tree traversal to copy indexed field - * data and initialize field map. In many details the code - * above works like tuple_field_map_create, read it's - * comments for more details. + * data and initialize field map. */ - uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); - struct mp_frame *frames = region_alloc(region, frames_sz); - if (frames == NULL) { - diag_set(OutOfMemory, frames_sz, "region", "frames"); + struct tuple_parse_iterator it; + if (tuple_parse_iterator_create(&it, format, src_pos, field_count, + region) != 0) goto out; - } - struct mp_stack stack; - mp_stack_create(&stack, format->fields_depth, frames); - mp_stack_push(&stack, MP_ARRAY, field_count); + int rc; + const char *src_pos_end; struct tuple_field *field; - struct json_token *parent = &format->fields.root; - while (true) { - int idx; - while ((idx = mp_stack_advance(&stack)) == -1) { - mp_stack_pop(&stack); - if (mp_stack_is_empty(&stack)) - goto finish; - parent = parent->parent; - } - struct json_token token; - switch (mp_stack_type(&stack)) { - case MP_ARRAY: - token.type = JSON_TOKEN_NUM; - token.num = idx; - break; - case MP_MAP: - if (mp_typeof(*src_pos) != MP_STR) { - mp_next(&src_pos); - mp_next(&src_pos); - pos = mp_encode_nil(pos); - pos = mp_encode_nil(pos); - continue; - } - token.type = JSON_TOKEN_STR; - token.str = mp_decode_str(&src_pos, (uint32_t *)&token.len); - pos = mp_encode_str(pos, token.str, token.len); - break; - default: - unreachable(); - } - assert(parent != NULL); - field = json_tree_lookup_entry(&format->fields, parent, &token, - struct tuple_field, token); + while ((rc = tuple_parse_iterator_advice(&it, &field, &src_pos, + &src_pos_end)) > 0) { if (field == NULL || !field->is_key_part) { - mp_next(&src_pos); - pos = mp_encode_nil(pos); + /* + * Instead of copying useless data having + * no representation in tuple_format, + * write nil. + */ + while (rc-- > 0) + pos = mp_encode_nil(pos); continue; } + if (field->token.type == JSON_TOKEN_STR) { + assert(rc-- == 2); + pos = mp_encode_str(pos, field->token.str, + field->token.len); + } + assert(rc == 1); + /* Initialize field_map with data offset. */ if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) field_map[field->offset_slot] = pos - data; - enum mp_type type = mp_typeof(*src_pos); - if ((type == MP_ARRAY || type == MP_MAP) && - !mp_stack_is_full(&stack)) { - uint32_t size; - if (type == MP_ARRAY) { - size = mp_decode_array(&src_pos); - pos = mp_encode_array(pos, size); - } else { - size = mp_decode_map(&src_pos); - pos = mp_encode_map(pos, size); - } - mp_stack_push(&stack, type, size); - parent = &field->token; + /* Copy field data. */ + if (field->type == FIELD_TYPE_ARRAY) { + pos = mp_encode_array(pos, mp_decode_array(&src_pos)); + } else if (field->type == FIELD_TYPE_MAP) { + pos = mp_encode_map(pos, mp_decode_map(&src_pos)); } else { - const char *src_field = src_pos; - mp_next(&src_pos); - memcpy(pos, src_field, src_pos - src_field); - pos += src_pos - src_field; + memcpy(pos, src_pos, src_pos_end - src_pos); + pos += src_pos_end - src_pos; } } -finish: assert(pos <= data + src_size); uint32_t bsize = pos - data; stmt = vy_stmt_alloc(format, bsize); -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 5/7] box: introduce tuple_parse_iterator class 2019-04-02 15:49 ` [PATCH v3 5/7] box: introduce tuple_parse_iterator class Kirill Shcherbatov @ 2019-04-03 14:04 ` Vladimir Davydov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 0 siblings, 1 reply; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 14:04 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:36PM +0300, Kirill Shcherbatov wrote: > diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > index 22a0fb232..bef1d0903 100644 > --- a/src/box/tuple_format.h > +++ b/src/box/tuple_format.h > @@ -412,6 +412,71 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, > int > tuple_format_init(); > > +/** > + * A tuple msgpack iterator that parse tuple in deep an returns 'in deep' is an idiom, which isn't appropriate in your case. Please rephrase. > + * only fields that are described in the tuple_format. This isn't quite correct. Suppose I have field [1][3] indexed. Then this iterator will return [1][1] and [1][2] in addition to [1][3] even if [1][1] and [1][2] are not described in the format. > + */ > +struct tuple_parse_iterator { > + /** > + * Tuple format is used to perform field lookups in > + * format::fields JSON tree. > + */ > + struct tuple_format *format; > + /** > + * The pointer to the parent node in the format::fields > + * JSON tree. Is required for relative lookup for the > + * next field. > + */ > + struct json_token *parent; > + /** > + * Traversal stack of msgpack frames is used to determine > + * when the parsing of the current composite mp structure > + * (array or map) is completed to update to the parent > + * pointer accordingly. > + */ > + struct mp_stack stack; > + /** The current read position in msgpack. */ > + const char *pos; > +}; > + > +/** > + * Initialize tuple parse iterator with tuple format, data pointer > + * and the count of top-level msgpack fields to be processed. > + * > + * This function assumes that the msgpack header containing the > + * number of top-level msgpack fields (field_count) has already > + * been parsed and the data pointer has already been shifted > + * correspondingly. This allows directly limit the number of > + * fields that must be parsed. I would try to hide the first mp_decode_array inside the iterator, too, if possible. Otherwise we have tuple decoding split between two independent functions. > + > + * Function uses the region for the traversal stack allocation. > + * > + * Returns 0 on success. In case of memory allocation error sets > + * diag message and returns -1. > + */ > +int > +tuple_parse_iterator_create(struct tuple_parse_iterator *it, > + struct tuple_format *format, const char *data, > + uint32_t field_count, struct region *region); > + > +/** > + * Parse tuple in deep and update iterator state. > + * > + * Returns the number of fields at the current tuple nesting > + * level that have been processed (2 for map item, 1 for array > + * key:value pair, 0 on stop) and initializes: Using the function return value to inform the caller of whether a field is a map or an array in such an indirect way looks dubious. May be, use a separate argument to return the type of the currently decoded container instead (MP_MAP or MP_ARRAY)? > + * field - the tuple_field pointer to format::fields field > + * that matches to the currently processed msgpack field > + * (when exists), > + * data - the pointer to the currently processed msgpack field, > + * data_end - the pointer to the end of currently processed > + * msgpack field. In case of indexed array/map field data_end points not to the end of the current field, but to the end of array/map header. It works fine, because data_end isn't used by the caller. Still, it looks suspicious. > + */ > +int > +tuple_parse_iterator_advice(struct tuple_parse_iterator *it, > + struct tuple_field **field, const char **data, > + const char **data_end); > + Not sure about the name: strictly speaking, this function doesn't parse a tuple - it decodes it. Also, we usually call functions subject_verb, not subject_noun. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 5/7] box: introduce tuple_parse_iterator class 2019-04-03 14:04 ` Vladimir Davydov @ 2019-04-05 17:17 ` Kirill Shcherbatov 0 siblings, 0 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-05 17:17 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov Take a look for a new API. Also I've re-factored names. ============================================================ The similar code in tuple_field_map_create and vy_stmt_new_surrogate_delete_raw that performs tuple decode with tuple_format has been refactored as reusable tuple_format_iterator class. Being thus encapsulated, this code will be uniformly managed and extended in the further patches in scope of multikey indexes. Needed for #1257 --- src/box/tuple_format.c | 236 +++++++++++++++++++++++------------------ src/box/tuple_format.h | 75 +++++++++++++ src/box/vy_stmt.c | 109 +++++++------------ 3 files changed, 248 insertions(+), 172 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 093046b37..a072b96e0 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -790,8 +790,13 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, const char *pos = tuple; int rc = 0; + struct tuple_format_iterator it; + if (tuple_format_iterator_create(&it, format, tuple, region) != 0) + goto error; + /* Check to see if the tuple has a sufficient number of fields. */ - uint32_t field_count = mp_decode_array(&pos); + uint32_t field_count = !mp_stack_is_empty(&it.stack) ? + mp_stack_top(&it.stack)->count : 0; if (validate && format->exact_field_count > 0 && format->exact_field_count != field_count) { diag_set(ClientError, ER_EXACT_FIELD_COUNT, @@ -826,115 +831,37 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, uint32_t defined_field_count = MIN(field_count, validate ? tuple_format_field_count(format) : format->index_field_count); + tuple_format_iterator_limit(&it, defined_field_count); + /* * Nullify field map to be able to detect by 0, * which key fields are absent in tuple_field(). */ memset((char *)*field_map - *field_map_size, 0, *field_map_size); - /* - * Prepare mp stack of the size equal to the maximum depth - * of the indexed field in the format::fields tree - * (fields_depth) to carry out a simultaneous parsing of - * the tuple and tree traversal to process type - * validations and field map initialization. - */ - uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); - struct mp_frame *frames = region_alloc(region, frames_sz); - if (frames == NULL) { - diag_set(OutOfMemory, frames_sz, "region", "frames"); - goto error; - } - struct mp_stack stack; - mp_stack_create(&stack, format->fields_depth, frames); - mp_stack_push(&stack, MP_ARRAY, defined_field_count); + const char *pos_end; struct tuple_field *field; - struct json_token *parent = &format->fields.root; - while (true) { - struct mp_frame *frame = mp_stack_top(&stack); - while (!mp_frame_advance(frame)) { - /* - * If the elements of the current frame - * are over, pop this frame out of stack - * and climb one position in the - * format::fields tree to match the - * changed JSON path to the data in the - * tuple. - */ - mp_stack_pop(&stack); - if (mp_stack_is_empty(&stack)) - goto finish; - frame = mp_stack_top(&stack); - parent = parent->parent; - } - /* - * Use the top frame of the stack and the - * current data offset to prepare the JSON token - * for the subsequent format::fields lookup. - */ - struct json_token token; - switch (frame->type) { - case MP_ARRAY: - token.type = JSON_TOKEN_NUM; - token.num = frame->idx; - break; - case MP_MAP: - if (mp_typeof(*pos) != MP_STR) { - /* - * JSON path support only string - * keys for map. Skip this entry. - */ - mp_next(&pos); - mp_next(&pos); - continue; - } - token.type = JSON_TOKEN_STR; - token.str = mp_decode_str(&pos, (uint32_t *)&token.len); - break; - default: - unreachable(); - } - /* - * Perform lookup for a field in format::fields, - * that represents the field metadata by JSON path - * corresponding to the current position in the - * tuple. - */ - enum mp_type type = mp_typeof(*pos); - assert(parent != NULL); - field = json_tree_lookup_entry(&format->fields, parent, &token, - struct tuple_field, token); - if (field != NULL) { - bool is_nullable = tuple_field_is_nullable(field); - if (validate && - !field_mp_type_is_compatible(field->type, type, - is_nullable) != 0) { - diag_set(ClientError, ER_FIELD_TYPE, - tuple_field_path(field), - field_type_strs[field->type]); - goto error; - } - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - (*field_map)[field->offset_slot] = pos - tuple; - if (required_fields != NULL) - bit_clear(required_fields, field->id); - } + while (tuple_format_iterator_advice(&it, &field, &pos, &pos_end)) { + if (field == NULL) + continue; /* - * If the current position of the data in tuple - * matches the container type (MP_MAP or MP_ARRAY) - * and the format::fields tree has such a record, - * prepare a new stack frame because it needs to - * be analyzed in the next iterations. + * Check if field mp_type is compatible with type + * defined in format. */ - if ((type == MP_ARRAY || type == MP_MAP) && - !mp_stack_is_full(&stack) && field != NULL) { - uint32_t size = type == MP_ARRAY ? - mp_decode_array(&pos) : - mp_decode_map(&pos); - mp_stack_push(&stack, type, size); - parent = &field->token; - } else { - mp_next(&pos); + bool is_nullable = tuple_field_is_nullable(field); + if (validate && + !field_mp_type_is_compatible(field->type, mp_typeof(*pos), + is_nullable) != 0) { + diag_set(ClientError, ER_FIELD_TYPE, + tuple_field_path(field), + field_type_strs[field->type]); + goto error; } + /* Initialize field_map with data offset. */ + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) + (*field_map)[field->offset_slot] = pos - tuple; + /* Mark this field as present in the tuple. */ + if (required_fields != NULL) + bit_clear(required_fields, field->id); } finish: /* @@ -1030,3 +957,110 @@ box_tuple_format_unref(box_tuple_format_t *format) tuple_format_unref(format); } +int +tuple_format_iterator_create(struct tuple_format_iterator *it, + struct tuple_format *format, const char *tuple, + struct region *region) +{ + assert(mp_typeof(*tuple) == MP_ARRAY); + const char *field = tuple; + uint32_t field_count = mp_decode_array(&field); + it->parent = &format->fields.root; + it->format = format; + it->pos = field; + if (field_count == 0) { + mp_stack_create(&it->stack, 0, NULL); + return 0; + } + /* + * Prepare mp stack of the size equal to the maximum depth + * of the indexed field in the format::fields tree + * (fields_depth) to carry out a simultaneous parsing of + * the tuple and tree traversal. + */ + uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); + struct mp_frame *frames = region_alloc(region, frames_sz); + if (frames == NULL) { + diag_set(OutOfMemory, frames_sz, "region", "frames"); + return -1; + } + mp_stack_create(&it->stack, format->fields_depth, frames); + mp_stack_push(&it->stack, MP_ARRAY, field_count); + return 0; +} + +bool +tuple_format_iterator_advice(struct tuple_format_iterator *it, + struct tuple_field **field, const char **data, + const char **data_end) +{ + struct mp_frame *frame = mp_stack_top(&it->stack); + while (!mp_frame_advance(frame)) { + /* + * If the elements of the current frame + * are over, pop this frame out of stack + * and climb one position in the format::fields + * tree to match the changed JSON path to the + * data in the tuple. + */ + mp_stack_pop(&it->stack); + if (mp_stack_is_empty(&it->stack)) + return false; + frame = mp_stack_top(&it->stack); + it->parent = it->parent->parent; + } + /* + * Use the top frame of the stack and the + * current data offset to prepare the JSON token + * for the subsequent format::fields lookup. + */ + struct json_token token; + switch (frame->type) { + case MP_ARRAY: + token.type = JSON_TOKEN_NUM; + token.num = frame->idx; + break; + case MP_MAP: + if (mp_typeof(*it->pos) != MP_STR) { + mp_next(&it->pos); + mp_next(&it->pos); + *field = NULL; + return true; + } + token.type = JSON_TOKEN_STR; + token.str = mp_decode_str(&it->pos, (uint32_t *)&token.len); + break; + default: + unreachable(); + } + /* + * Perform lookup for a field in format::fields, + * that represents the field metadata by JSON path + * corresponding to the current position in the + * tuple. + */ + assert(it->parent != NULL); + *field = json_tree_lookup_entry(&it->format->fields, it->parent, &token, + struct tuple_field, token); + *data = it->pos; + /* + * If the current position of the data in tuple + * matches the container type (MP_MAP or MP_ARRAY) + * and the format::fields tree has such a record, + * prepare a new stack frame because it needs to + * be analyzed in the next iterations. + */ + enum mp_type type = mp_typeof(*it->pos); + if ((type == MP_ARRAY || type == MP_MAP) && + !mp_stack_is_full(&it->stack) && *field != NULL) { + uint32_t size = type == MP_ARRAY ? + mp_decode_array(&it->pos) : + mp_decode_map(&it->pos); + mp_stack_push(&it->stack, type, size); + it->parent = &(*field)->token; + } else { + mp_next(&it->pos); + } + *data_end = it->pos; + return true; +} diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 22a0fb232..41774eb9c 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -412,6 +412,81 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, int tuple_format_init(); +/** + * A tuple msgpack iterator that decodes the tuple and returns + * only fields that are described in the tuple_format. + */ +struct tuple_format_iterator { + /** + * Tuple format is used to perform field lookups in + * format::fields JSON tree. + */ + struct tuple_format *format; + /** + * The pointer to the parent node in the format::fields + * JSON tree. Is required for relative lookup for the + * next field. + */ + struct json_token *parent; + /** + * Traversal stack of msgpack frames is used to determine + * when the parsing of the current composite mp structure + * (array or map) is completed to update to the parent + * pointer accordingly. + */ + struct mp_stack stack; + /** The current read position in msgpack. */ + const char *pos; +}; + +/** + * Initialize tuple decode iterator with tuple format and tuple + * data pointer. + * + * Function uses the region for the traversal stack allocation. + * + * Returns 0 on success. In case of memory allocation error sets + * diag message and returns -1. + */ +int +tuple_format_iterator_create(struct tuple_format_iterator *it, + struct tuple_format *format, const char *tuple, + struct region *region); + +/** + * Perform tuple decode step and update iterator state. + * + * Returns true when decode step succeeded and initialize: + * field - the tuple_field pointer to format::fields field + * that matches to the currently processed msgpack field + * (when exists), + * data - the pointer to the currently processed msgpack field, + * data_end - the pointer to the end of currently processed + * msgpack field(in case of MP_MAP or MP_ARRAY that + * is described in format this is the end of field + * header). + */ +bool +tuple_format_iterator_advice(struct tuple_format_iterator *it, + struct tuple_field **field, const char **data, + const char **data_end); + +/** + * Limit the number of fields that iterator must decode for + * the current nesting level. + * + * The field_count argument must not exceed the number of items + * available for scanning at the current nesting level. + */ +static inline void +tuple_format_iterator_limit(struct tuple_format_iterator *it, + uint32_t field_count) +{ + struct mp_frame *frame = mp_stack_top(&it->stack); + assert(field_count <= (uint32_t)frame->count); + frame->count = field_count; +} + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 776b1f69c..c6a1f4033 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -417,95 +417,62 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format, } char *field_map_begin = data + src_size; uint32_t *field_map = (uint32_t *) (data + total_size); - + /* + * Perform simultaneous parsing of the tuple and + * format::fields tree traversal to copy indexed field + * data and initialize field map. + */ + struct tuple_format_iterator it; const char *src_pos = src_data; - uint32_t src_count = mp_decode_array(&src_pos); - uint32_t field_count = MIN(src_count, format->index_field_count); + if (tuple_format_iterator_create(&it, format, src_pos, region) != 0) + goto out; + + uint32_t field_count = MIN((uint32_t)mp_stack_top(&it.stack)->count, + format->index_field_count); + tuple_format_iterator_limit(&it, field_count); + /* * Nullify field map to be able to detect by 0, which key * fields are absent in tuple_field(). */ memset((char *)field_map - format->field_map_size, 0, format->field_map_size); + char *pos = mp_encode_array(data, field_count); - /* - * Perform simultaneous parsing of the tuple and - * format::fields tree traversal to copy indexed field - * data and initialize field map. In many details the code - * above works like tuple_field_map_create, read it's - * comments for more details. - */ - uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); - struct mp_frame *frames = region_alloc(region, frames_sz); - if (frames == NULL) { - diag_set(OutOfMemory, frames_sz, "region", "frames"); - goto out; - } - struct mp_stack stack; - mp_stack_create(&stack, format->fields_depth, frames); - mp_stack_push(&stack, MP_ARRAY, field_count); + const char *src_pos_end; struct tuple_field *field; - struct json_token *parent = &format->fields.root; - while (true) { - struct mp_frame *frame = mp_stack_top(&stack); - while (!mp_frame_advance(frame)) { - mp_stack_pop(&stack); - if (mp_stack_is_empty(&stack)) - goto finish; - parent = parent->parent; - } - struct json_token token; - switch (frame->type) { - case MP_ARRAY: - token.type = JSON_TOKEN_NUM; - token.num = frame->idx; - break; - case MP_MAP: - if (mp_typeof(*src_pos) != MP_STR) { - mp_next(&src_pos); - mp_next(&src_pos); - pos = mp_encode_nil(pos); - pos = mp_encode_nil(pos); - continue; - } - token.type = JSON_TOKEN_STR; - token.str = mp_decode_str(&src_pos, (uint32_t *)&token.len); - pos = mp_encode_str(pos, token.str, token.len); - break; - default: - unreachable(); - } - assert(parent != NULL); - field = json_tree_lookup_entry(&format->fields, parent, &token, - struct tuple_field, token); + while (tuple_format_iterator_advice(&it, &field, &src_pos, + &src_pos_end)) { + struct mp_frame *frame = mp_stack_top(&it.stack); if (field == NULL || !field->is_key_part) { - mp_next(&src_pos); + /* + * Instead of copying useless data having + * no representation in tuple_format, + * write nil. + */ pos = mp_encode_nil(pos); + if (frame->type == MP_MAP) + pos = mp_encode_nil(pos); continue; } + if (field->token.type == JSON_TOKEN_STR) { + assert(frame->type == MP_MAP); + pos = mp_encode_str(pos, field->token.str, + field->token.len); + } + /* Initialize field_map with data offset. */ if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) field_map[field->offset_slot] = pos - data; - enum mp_type type = mp_typeof(*src_pos); - if ((type == MP_ARRAY || type == MP_MAP) && - !mp_stack_is_full(&stack)) { - uint32_t size; - if (type == MP_ARRAY) { - size = mp_decode_array(&src_pos); - pos = mp_encode_array(pos, size); - } else { - size = mp_decode_map(&src_pos); - pos = mp_encode_map(pos, size); - } - mp_stack_push(&stack, type, size); - parent = &field->token; + /* Copy field data. */ + if (field->type == FIELD_TYPE_ARRAY) { + pos = mp_encode_array(pos, frame->count); + } else if (field->type == FIELD_TYPE_MAP) { + pos = mp_encode_map(pos, frame->count); } else { - const char *src_field = src_pos; - mp_next(&src_pos); - memcpy(pos, src_field, src_pos - src_field); - pos += src_pos - src_field; + memcpy(pos, src_pos, src_pos_end - src_pos); + pos += src_pos_end - src_pos; } } -finish: assert(pos <= data + src_size); uint32_t bsize = pos - data; stmt = vy_stmt_alloc(format, bsize); -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 6/7] box: introduce field_map_builder class 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (4 preceding siblings ...) 2019-04-02 15:49 ` [PATCH v3 5/7] box: introduce tuple_parse_iterator class Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-03 14:38 ` Vladimir Davydov 2019-04-03 16:30 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 7/7] box: introduce multikey indexes in memtx Kirill Shcherbatov 6 siblings, 2 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov The new field_map_builder class encapsulates the logic associated with field_map allocation and initialization. In the future it will be extended to allocate field_map that has extensions. Needed for #1257 --- src/box/CMakeLists.txt | 1 + src/box/field_map.c | 61 ++++++++++++++++++ src/box/field_map.h | 139 +++++++++++++++++++++++++++++++++++++++++ src/box/memtx_engine.c | 8 +-- src/box/sql.c | 5 +- src/box/tuple.c | 16 ++--- src/box/tuple.h | 6 +- src/box/tuple_format.c | 48 ++++++-------- src/box/tuple_format.h | 12 ++-- src/box/vy_stmt.c | 9 +-- 10 files changed, 252 insertions(+), 53 deletions(-) create mode 100644 src/box/field_map.c create mode 100644 src/box/field_map.h diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt index 31600745a..11f568fca 100644 --- a/src/box/CMakeLists.txt +++ b/src/box/CMakeLists.txt @@ -35,6 +35,7 @@ target_link_libraries(xrow server core small vclock misc box_error add_library(tuple STATIC tuple.c + field_map.c tuple_format.c tuple_update.c tuple_compare.cc diff --git a/src/box/field_map.c b/src/box/field_map.c new file mode 100644 index 000000000..690aa461d --- /dev/null +++ b/src/box/field_map.c @@ -0,0 +1,61 @@ +/* + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include "diag.h" +#include "field_map.h" +#include "small/region.h" + +int +field_map_builder_create(struct field_map_builder *builder, + uint32_t minimal_field_map_size, + struct region *region) +{ + builder->slot_count = minimal_field_map_size / sizeof(uint32_t); + if (minimal_field_map_size == 0) { + builder->slots = NULL; + return 0; + } + uint32_t sz = builder->slot_count * sizeof(builder->slots[0]); + builder->slots = region_alloc(region, sz); + if (builder->slots == NULL) { + diag_set(OutOfMemory, sz, "region_alloc", "field_map"); + return -1; + } + memset((char *)builder->slots, 0, sz); + builder->slots = builder->slots + builder->slot_count; + return 0; +} + +void +field_map_build(struct field_map_builder *builder, char *buffer) +{ + uint32_t field_map_size = field_map_build_size(builder); + memcpy(buffer, (char *)builder->slots - field_map_size, field_map_size); +} diff --git a/src/box/field_map.h b/src/box/field_map.h new file mode 100644 index 000000000..47ecbd009 --- /dev/null +++ b/src/box/field_map.h @@ -0,0 +1,139 @@ +#ifndef TARANTOOL_BOX_FIELD_MAP_H_INCLUDED +#define TARANTOOL_BOX_FIELD_MAP_H_INCLUDED +/* + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include <assert.h> +#include <stdint.h> + +struct region; + +/** + * A field map is a special area is reserved before tuple's + * MessagePack data. It is a sequence of the 32-bit unsigned + * offsets of tuple's indexed fields. + * + * These slots are numbered with negative indices called + * offset_slot(s) starting with -1 (this is necessary to organize + * the inheritance of tuples). Allocation and assignment of + * offset_slot(s) is performed on tuple_format creation on index + * create or alter (see tuple_format_create()). + * + * 4b 4b MessagePack data. + * +------+----+------+---------------------------+ + * tuple: | offN | .. | off1 | header ..|key1|..|keyN|.. | + * +--+---+----+--+---+---------------------------+ + * | ... | ^ ^ + * | +-----------------+ | + * +--------------------------------------+ + * + * This field_map_builder class is used for tuple field_map + * construction. It encapsulates field_map build logic and size + * estimation implementation-specific details. + * + * Each field offset is a positive number, except the case when + * a field is not in the tuple. In this case offset is 0. + */ +struct field_map_builder { + /** + * The pointer to the end of filed_map allocation. + * Its elements are accessible by negative indexes + * that coinciding with offset_slot(s). + */ + uint32_t *slots; + /** + * The count of slots in field_map_builder::slots + * allocation. + */ + uint32_t slot_count; +}; + +/** + * Get offset of the field in tuple data MessagePack using + * tuple's field_map and required field's offset_slot. + * + * When a field is not in the data tuple, its offset is 0. + */ +static inline uint32_t +field_map_get_offset(const uint32_t *field_map, int32_t offset_slot) +{ + return field_map[offset_slot]; +} + +/** + * Initialize field_map_builder. + * + * The field_map_size argument is a size of the minimal field_map + * allocation where each indexed field has own offset slot. + * + * Routine uses region to perform memory allocation for internal + * structures. + * + * Returns 0 on success. In case of memory allocation error sets + * diag message and returns -1. + */ +int +field_map_builder_create(struct field_map_builder *builder, + uint32_t minimal_field_map_size, + struct region *region); + +/** + * Set data offset for a field identified by unique offset_slot. + * + * The offset_slot argument must be negative and offset must be + * positive (by definition). + */ +static inline void +field_map_builder_set_slot(struct field_map_builder *builder, + int32_t offset_slot, uint32_t offset) +{ + assert(offset_slot < 0); + assert(offset > 0); + builder->slots[offset_slot] = offset; +} + +/** + * Calculate the size of tuple field_map to be built. + */ +static inline uint32_t +field_map_build_size(struct field_map_builder *builder) +{ + return builder->slot_count * sizeof(uint32_t); +} + +/** + * Write constructed field_map to the destination buffer field_map. + * + * The buffer must have at least field_map_build_size(builder) bytes. + */ +void +field_map_build(struct field_map_builder *builder, char *buffer); + +#endif /* TARANTOOL_BOX_FIELD_MAP_H_INCLUDED */ diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 924f8bbc4..06da203e2 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1111,10 +1111,10 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end) struct tuple *tuple = NULL; struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, data, true, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(&builder, format, data, true) != 0) goto end; + uint32_t field_map_size = field_map_build_size(&builder); size_t tuple_len = end - data; size_t total = sizeof(struct memtx_tuple) + field_map_size + tuple_len; @@ -1154,7 +1154,7 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end) */ tuple->data_offset = sizeof(struct tuple) + field_map_size; char *raw = (char *) tuple + tuple->data_offset; - memcpy(raw - field_map_size, field_map, field_map_size); + field_map_build(&builder, raw - field_map_size); memcpy(raw, data, tuple_len); say_debug("%s(%zu) = %p", __func__, tuple_len, memtx_tuple); end: diff --git a/src/box/sql.c b/src/box/sql.c index 855c2b741..7d5c3a8e0 100644 --- a/src/box/sql.c +++ b/src/box/sql.c @@ -786,7 +786,10 @@ tarantoolsqlIdxKeyCompare(struct BtCursor *cursor, while (j++ != fieldno) mp_next(&p); } else { - p = base + field_map[field->offset_slot]; + uint32_t field_offset = + field_map_get_offset(field_map, + field->offset_slot); + p = base + field_offset; } } next_fieldno = fieldno + 1; diff --git a/src/box/tuple.c b/src/box/tuple.c index 7f06d4053..570e4c192 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -78,10 +78,10 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end struct tuple *tuple = NULL; struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, data, true, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(&builder, format, data, true) != 0) goto end; + uint32_t field_map_size = field_map_build_size(&builder); size_t data_len = end - data; size_t total = sizeof(struct tuple) + field_map_size + data_len; @@ -98,7 +98,7 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end tuple_format_ref(format); tuple->data_offset = sizeof(struct tuple) + field_map_size; char *raw = (char *) tuple + tuple->data_offset; - memcpy(raw - field_map_size, field_map, field_map_size); + field_map_build(&builder, raw - field_map_size); memcpy(raw, data, data_len); say_debug("%s(%zu) = %p", __func__, data_len, tuple); end: @@ -126,11 +126,11 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - int rc = tuple_field_map_create(format, tuple, true, &field_map, - &field_map_size); - assert(rc != 0 || field_map_size == format->field_map_size); + struct field_map_builder builder; + int rc = tuple_field_map_create(&builder, format, tuple, true); region_truncate(region, region_svp); + assert(rc != 0 || + field_map_build_size(&builder) == format->field_map_size); return rc; } diff --git a/src/box/tuple.h b/src/box/tuple.h index 8b12fd5a8..da4085bcf 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -542,6 +542,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, goto offset_slot_access; } if (likely(fieldno < format->index_field_count)) { + uint32_t offset; struct tuple_field *field; if (path == NULL && fieldno == 0) { mp_decode_array(&tuple); @@ -559,9 +560,10 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, *offset_slot_hint = offset_slot; offset_slot_access: /* Indexed field */ - if (field_map[offset_slot] == 0) + offset = field_map_get_offset(field_map, offset_slot); + if (offset == 0) return NULL; - tuple += field_map[offset_slot]; + tuple += offset; } else { uint32_t field_count; parse: diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 070897ec2..9a643b700 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -769,27 +769,21 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, /** @sa declaration for details. */ int -tuple_field_map_create(struct tuple_format *format, const char *tuple, - bool validate, uint32_t **field_map, - uint32_t *field_map_size) +tuple_field_map_create(struct field_map_builder *builder, + struct tuple_format *format, const char *tuple, + bool validate) { + struct region *region = &fiber()->gc; if (tuple_format_field_count(format) == 0) { - *field_map = NULL; - *field_map_size = 0; + /* Nothing to initialize */ + if (field_map_builder_create(builder, 0, region) != 0) + unreachable(); return 0; /* Nothing to initialize */ } - struct region *region = &fiber()->gc; - *field_map_size = format->field_map_size; - *field_map = region_alloc(region, *field_map_size); - if (*field_map == NULL) { - diag_set(OutOfMemory, *field_map_size, "region_alloc", - "field_map"); + if (field_map_builder_create(builder, format->field_map_size, + region) != 0) return -1; - } - *field_map = (uint32_t *)((char *)*field_map + *field_map_size); - const char *pos = tuple; - int rc = 0; /* Check to see if the tuple has a sufficient number of fields. */ uint32_t field_count = mp_decode_array(&pos); if (validate && format->exact_field_count > 0 && @@ -797,7 +791,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, diag_set(ClientError, ER_EXACT_FIELD_COUNT, (unsigned) field_count, (unsigned) format->exact_field_count); - goto error; + return -1; } /* * Allocate a field bitmap that will be used for checking @@ -811,7 +805,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, if (required_fields == NULL) { diag_set(OutOfMemory, required_fields_sz, "region", "required field bitmap"); - goto error; + return -1; } memcpy(required_fields, format->required_fields, required_fields_sz); @@ -830,11 +824,10 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, * Nullify field map to be able to detect by 0, * which key fields are absent in tuple_field(). */ - memset((char *)*field_map - *field_map_size, 0, *field_map_size); struct tuple_parse_iterator it; if (tuple_parse_iterator_create(&it, format, pos, defined_field_count, region) != 0) - goto error; + return -1; const char *pos_end; struct tuple_field *field; while (tuple_parse_iterator_advice(&it, &field, &pos, &pos_end) > 0) { @@ -851,11 +844,13 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field), field_type_strs[field->type]); - goto error; + return -1; } /* Initialize field_map with data offset. */ - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - (*field_map)[field->offset_slot] = pos - tuple; + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + field_map_builder_set_slot(builder, field->offset_slot, + pos - tuple); + } /* Mark this field as present in the tuple. */ if (required_fields != NULL) bit_clear(required_fields, field->id); @@ -875,15 +870,10 @@ finish: assert(field != NULL); diag_set(ClientError, ER_FIELD_MISSING, tuple_field_path(field)); - goto error; + return -1; } } -out: - *field_map = (uint32_t *)((char *)*field_map - *field_map_size); - return rc; -error: - rc = -1; - goto out; + return 0; } uint32_t diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index bef1d0903..88f03d5eb 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -36,6 +36,7 @@ #include "errinj.h" #include "json/json.h" #include "tuple_dictionary.h" +#include "field_map.h" #if defined(__cplusplus) extern "C" { @@ -169,8 +170,9 @@ struct tuple_format { */ bool is_ephemeral; /** - * Size of field map of tuple in bytes. - * \sa struct tuple + * Size of minimal field map of tuple where each indexed + * field has own offset slot (in bytes). + * \sa struct field_map_builder */ uint16_t field_map_size; /** @@ -401,9 +403,9 @@ box_tuple_format_unref(box_tuple_format_t *format); * tuple + off_i = indexed_field_i; */ int -tuple_field_map_create(struct tuple_format *format, const char *tuple, - bool validate, uint32_t **field_map, - uint32_t *field_map_size); +tuple_field_map_create(struct field_map_builder *builder, + struct tuple_format *format, const char *tuple, + bool validate); /** * Initialize tuple format subsystem. diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 1e8bb7825..3e35788ac 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -314,10 +314,11 @@ vy_stmt_new_with_ops(struct tuple_format *format, const char *tuple_begin, * tuples inserted into a space are validated explicitly * with tuple_validate() anyway. */ - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, tuple_begin, false, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(&builder, format, tuple_begin, false) != 0) goto end; + uint32_t field_map_size = field_map_build_size(&builder); + assert(field_map_size == format->field_map_size); /* * Allocate stmt. Offsets: one per key part + offset of the * statement end. @@ -330,7 +331,7 @@ vy_stmt_new_with_ops(struct tuple_format *format, const char *tuple_begin, /* Copy MsgPack data */ char *raw = (char *) tuple_data(stmt); char *wpos = raw; - memcpy(wpos - field_map_size, field_map, field_map_size); + field_map_build(&builder, wpos - field_map_size); memcpy(wpos, tuple_begin, mpsize); wpos += mpsize; for (struct iovec *op = ops, *end = ops + op_count; -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 6/7] box: introduce field_map_builder class 2019-04-02 15:49 ` [PATCH v3 6/7] box: introduce field_map_builder class Kirill Shcherbatov @ 2019-04-03 14:38 ` Vladimir Davydov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-03 16:30 ` Vladimir Davydov 1 sibling, 1 reply; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 14:38 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:37PM +0300, Kirill Shcherbatov wrote: > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 070897ec2..9a643b700 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -769,27 +769,21 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, > > /** @sa declaration for details. */ > int > -tuple_field_map_create(struct tuple_format *format, const char *tuple, > - bool validate, uint32_t **field_map, > - uint32_t *field_map_size) > +tuple_field_map_create(struct field_map_builder *builder, > + struct tuple_format *format, const char *tuple, > + bool validate) The field map builder should be the last in the argument list as it is an out parameter. > { > + struct region *region = &fiber()->gc; > if (tuple_format_field_count(format) == 0) { > - *field_map = NULL; > - *field_map_size = 0; > + /* Nothing to initialize */ > + if (field_map_builder_create(builder, 0, region) != 0) > + unreachable(); Nit: no need in this branching. You could create a builder with format->field_map_size in both cases. > return 0; /* Nothing to initialize */ > } > - struct region *region = &fiber()->gc; > - *field_map_size = format->field_map_size; > - *field_map = region_alloc(region, *field_map_size); > - if (*field_map == NULL) { > - diag_set(OutOfMemory, *field_map_size, "region_alloc", > - "field_map"); > + if (field_map_builder_create(builder, format->field_map_size, > + region) != 0) > return -1; > - } > - *field_map = (uint32_t *)((char *)*field_map + *field_map_size); > - > const char *pos = tuple; > - int rc = 0; > /* Check to see if the tuple has a sufficient number of fields. */ > uint32_t field_count = mp_decode_array(&pos); > if (validate && format->exact_field_count > 0 && > diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > index bef1d0903..88f03d5eb 100644 > --- a/src/box/tuple_format.h > +++ b/src/box/tuple_format.h > @@ -36,6 +36,7 @@ > #include "errinj.h" > #include "json/json.h" > #include "tuple_dictionary.h" > +#include "field_map.h" > > #if defined(__cplusplus) > extern "C" { > @@ -169,8 +170,9 @@ struct tuple_format { > */ > bool is_ephemeral; > /** > - * Size of field map of tuple in bytes. > - * \sa struct tuple > + * Size of minimal field map of tuple where each indexed > + * field has own offset slot (in bytes). > + * \sa struct field_map_builder > */ Minimal? What's that supposed to mean? Is it from the next patch? Other than that, looks okay to me. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 6/7] box: introduce field_map_builder class 2019-04-03 14:38 ` Vladimir Davydov @ 2019-04-05 17:17 ` Kirill Shcherbatov 0 siblings, 0 replies; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-05 17:17 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov > Other than that, looks okay to me. Fixed all notes. ============================================================ The new field_map_builder class encapsulates the logic associated with field_map allocation and initialization. In the future it will be extended to allocate field_map that has extensions. Needed for #1257 --- src/box/CMakeLists.txt | 1 + src/box/field_map.c | 61 ++++++++++++++++++ src/box/field_map.h | 140 +++++++++++++++++++++++++++++++++++++++++ src/box/memtx_engine.c | 8 +-- src/box/sql.c | 5 +- src/box/tuple.c | 14 ++--- src/box/tuple.h | 6 +- src/box/tuple_format.c | 48 +++++--------- src/box/tuple_format.h | 12 ++-- src/box/vy_stmt.c | 9 +-- 10 files changed, 245 insertions(+), 59 deletions(-) create mode 100644 src/box/field_map.c create mode 100644 src/box/field_map.h diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt index 31600745a..11f568fca 100644 --- a/src/box/CMakeLists.txt +++ b/src/box/CMakeLists.txt @@ -35,6 +35,7 @@ target_link_libraries(xrow server core small vclock misc box_error add_library(tuple STATIC tuple.c + field_map.c tuple_format.c tuple_update.c tuple_compare.cc diff --git a/src/box/field_map.c b/src/box/field_map.c new file mode 100644 index 000000000..690aa461d --- /dev/null +++ b/src/box/field_map.c @@ -0,0 +1,61 @@ +/* + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include "diag.h" +#include "field_map.h" +#include "small/region.h" + +int +field_map_builder_create(struct field_map_builder *builder, + uint32_t minimal_field_map_size, + struct region *region) +{ + builder->slot_count = minimal_field_map_size / sizeof(uint32_t); + if (minimal_field_map_size == 0) { + builder->slots = NULL; + return 0; + } + uint32_t sz = builder->slot_count * sizeof(builder->slots[0]); + builder->slots = region_alloc(region, sz); + if (builder->slots == NULL) { + diag_set(OutOfMemory, sz, "region_alloc", "field_map"); + return -1; + } + memset((char *)builder->slots, 0, sz); + builder->slots = builder->slots + builder->slot_count; + return 0; +} + +void +field_map_build(struct field_map_builder *builder, char *buffer) +{ + uint32_t field_map_size = field_map_build_size(builder); + memcpy(buffer, (char *)builder->slots - field_map_size, field_map_size); +} diff --git a/src/box/field_map.h b/src/box/field_map.h new file mode 100644 index 000000000..ea5417751 --- /dev/null +++ b/src/box/field_map.h @@ -0,0 +1,140 @@ +#ifndef TARANTOOL_BOX_FIELD_MAP_H_INCLUDED +#define TARANTOOL_BOX_FIELD_MAP_H_INCLUDED +/* + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include <assert.h> +#include <stdint.h> + +struct region; + +/** + * A field map is a special area is reserved before tuple's + * MessagePack data. It is a sequence of the 32-bit unsigned + * offsets of tuple's indexed fields. + * + * These slots are numbered with negative indices called + * offset_slot(s) starting with -1 (this is necessary to organize + * the inheritance of tuples). Allocation and assignment of + * offset_slot(s) is performed on tuple_format creation on index + * create or alter (see tuple_format_create()). + * + * 4b 4b MessagePack data. + * +------+----+------+---------------------------+ + * tuple: | offN | .. | off1 | header ..|key1|..|keyN|.. | + * +--+---+----+--+---+---------------------------+ + * | ... | ^ ^ + * | +-----------------+ | + * +--------------------------------------+ + * + * This field_map_builder class is used for tuple field_map + * construction. It encapsulates field_map build logic and size + * estimation implementation-specific details. + * + * Each field offset is a positive number, except the case when + * a field is not in the tuple. In this case offset is 0. + */ +struct field_map_builder { + /** + * The pointer to the end of filed_map allocation. + * Its elements are accessible by negative indexes + * that coinciding with offset_slot(s). + */ + uint32_t *slots; + /** + * The count of slots in field_map_builder::slots + * allocation. + */ + uint32_t slot_count; +}; + +/** + * Get offset of the field in tuple data MessagePack using + * tuple's field_map and required field's offset_slot. + * + * When a field is not in the data tuple, its offset is 0. + */ +static inline uint32_t +field_map_get_offset(const uint32_t *field_map, int32_t offset_slot) +{ + return field_map[offset_slot]; +} + +/** + * Initialize field_map_builder. + * + * The field_map_size argument is a size of the minimal field_map + * allocation where each indexed field has own offset slot. + * + * Routine uses region to perform memory allocation for internal + * structures. + * + * Returns 0 on success. In case of memory allocation error sets + * diag message and returns -1. + */ +int +field_map_builder_create(struct field_map_builder *builder, + uint32_t minimal_field_map_size, + struct region *region); + +/** + * Set data offset for a field identified by unique offset_slot. + * + * The offset_slot argument must be negative and offset must be + * positive (by definition). + */ +static inline void +field_map_builder_set_slot(struct field_map_builder *builder, + int32_t offset_slot, uint32_t offset) +{ + assert(offset_slot < 0); + assert((uint32_t)-offset_slot <= builder->slot_count); + assert(offset > 0); + builder->slots[offset_slot] = offset; +} + +/** + * Calculate the size of tuple field_map to be built. + */ +static inline uint32_t +field_map_build_size(struct field_map_builder *builder) +{ + return builder->slot_count * sizeof(uint32_t); +} + +/** + * Write constructed field_map to the destination buffer field_map. + * + * The buffer must have at least field_map_build_size(builder) bytes. + */ +void +field_map_build(struct field_map_builder *builder, char *buffer); + +#endif /* TARANTOOL_BOX_FIELD_MAP_H_INCLUDED */ diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 924f8bbc4..cd9dabbf9 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1111,10 +1111,10 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end) struct tuple *tuple = NULL; struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, data, true, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(format, data, true, &builder) != 0) goto end; + uint32_t field_map_size = field_map_build_size(&builder); size_t tuple_len = end - data; size_t total = sizeof(struct memtx_tuple) + field_map_size + tuple_len; @@ -1154,7 +1154,7 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end) */ tuple->data_offset = sizeof(struct tuple) + field_map_size; char *raw = (char *) tuple + tuple->data_offset; - memcpy(raw - field_map_size, field_map, field_map_size); + field_map_build(&builder, raw - field_map_size); memcpy(raw, data, tuple_len); say_debug("%s(%zu) = %p", __func__, tuple_len, memtx_tuple); end: diff --git a/src/box/sql.c b/src/box/sql.c index 1fb93e106..2310ee5e3 100644 --- a/src/box/sql.c +++ b/src/box/sql.c @@ -780,7 +780,10 @@ tarantoolsqlIdxKeyCompare(struct BtCursor *cursor, while (j++ != fieldno) mp_next(&p); } else { - p = base + field_map[field->offset_slot]; + uint32_t field_offset = + field_map_get_offset(field_map, + field->offset_slot); + p = base + field_offset; } } next_fieldno = fieldno + 1; diff --git a/src/box/tuple.c b/src/box/tuple.c index 9d1768440..c325f58c9 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -78,10 +78,10 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end struct tuple *tuple = NULL; struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, data, true, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(format, data, true, &builder) != 0) goto end; + uint32_t field_map_size = field_map_build_size(&builder); size_t data_len = end - data; size_t total = sizeof(struct tuple) + field_map_size + data_len; @@ -98,7 +98,7 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end tuple_format_ref(format); tuple->data_offset = sizeof(struct tuple) + field_map_size; char *raw = (char *) tuple + tuple->data_offset; - memcpy(raw - field_map_size, field_map, field_map_size); + field_map_build(&builder, raw - field_map_size); memcpy(raw, data, data_len); say_debug("%s(%zu) = %p", __func__, data_len, tuple); end: @@ -126,10 +126,8 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - int rc = tuple_field_map_create(format, tuple, true, &field_map, - &field_map_size); - assert(rc != 0 || field_map_size == format->field_map_size); + struct field_map_builder builder; + int rc = tuple_field_map_create(format, tuple, true, &builder); region_truncate(region, region_svp); return rc; } diff --git a/src/box/tuple.h b/src/box/tuple.h index eed8e1a34..d261d4cc9 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -540,6 +540,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, goto offset_slot_access; } if (likely(fieldno < format->index_field_count)) { + uint32_t offset; struct tuple_field *field; if (path == NULL && fieldno == 0) { mp_decode_array(&tuple); @@ -557,9 +558,10 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, *offset_slot_hint = offset_slot; offset_slot_access: /* Indexed field */ - if (field_map[offset_slot] == 0) + offset = field_map_get_offset(field_map, offset_slot); + if (offset == 0) return NULL; - tuple += field_map[offset_slot]; + tuple += offset; } else { uint32_t field_count; parse: diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index a072b96e0..93505aaf1 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -770,29 +770,19 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, /** @sa declaration for details. */ int tuple_field_map_create(struct tuple_format *format, const char *tuple, - bool validate, uint32_t **field_map, - uint32_t *field_map_size) + bool validate, struct field_map_builder *builder) { - if (tuple_format_field_count(format) == 0) { - *field_map = NULL; - *field_map_size = 0; - return 0; /* Nothing to initialize */ - } struct region *region = &fiber()->gc; - *field_map_size = format->field_map_size; - *field_map = region_alloc(region, *field_map_size); - if (*field_map == NULL) { - diag_set(OutOfMemory, *field_map_size, "region_alloc", - "field_map"); + if (field_map_builder_create(builder, format->field_map_size, + region) != 0) return -1; - } - *field_map = (uint32_t *)((char *)*field_map + *field_map_size); + if (tuple_format_field_count(format) == 0) + return 0; /* Nothing to initialize */ const char *pos = tuple; - int rc = 0; struct tuple_format_iterator it; if (tuple_format_iterator_create(&it, format, tuple, region) != 0) - goto error; + return -1; /* Check to see if the tuple has a sufficient number of fields. */ uint32_t field_count = !mp_stack_is_empty(&it.stack) ? @@ -802,7 +792,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, diag_set(ClientError, ER_EXACT_FIELD_COUNT, (unsigned) field_count, (unsigned) format->exact_field_count); - goto error; + return -1; } /* * Allocate a field bitmap that will be used for checking @@ -816,7 +806,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, if (required_fields == NULL) { diag_set(OutOfMemory, required_fields_sz, "region", "required field bitmap"); - goto error; + return -1; } memcpy(required_fields, format->required_fields, required_fields_sz); @@ -833,11 +823,6 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, format->index_field_count); tuple_format_iterator_limit(&it, defined_field_count); - /* - * Nullify field map to be able to detect by 0, - * which key fields are absent in tuple_field(). - */ - memset((char *)*field_map - *field_map_size, 0, *field_map_size); const char *pos_end; struct tuple_field *field; while (tuple_format_iterator_advice(&it, &field, &pos, &pos_end)) { @@ -854,11 +839,13 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field), field_type_strs[field->type]); - goto error; + return -1; } /* Initialize field_map with data offset. */ - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - (*field_map)[field->offset_slot] = pos - tuple; + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + field_map_builder_set_slot(builder, field->offset_slot, + pos - tuple); + } /* Mark this field as present in the tuple. */ if (required_fields != NULL) bit_clear(required_fields, field->id); @@ -878,15 +865,10 @@ finish: assert(field != NULL); diag_set(ClientError, ER_FIELD_MISSING, tuple_field_path(field)); - goto error; + return -1; } } -out: - *field_map = (uint32_t *)((char *)*field_map - *field_map_size); - return rc; -error: - rc = -1; - goto out; + return 0; } uint32_t diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 41774eb9c..597d130d5 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -36,6 +36,7 @@ #include "errinj.h" #include "json/json.h" #include "tuple_dictionary.h" +#include "field_map.h" #if defined(__cplusplus) extern "C" { @@ -170,7 +171,7 @@ struct tuple_format { bool is_ephemeral; /** * Size of field map of tuple in bytes. - * \sa struct tuple + * \sa struct field_map_builder */ uint16_t field_map_size; /** @@ -386,10 +387,8 @@ box_tuple_format_unref(box_tuple_format_t *format); * @param format Tuple format. * @param tuple MessagePack array. * @param validate If set, validate the tuple against the format. - * @param field_map[out] The pointer to store field map - * allocation. - * @param field_map_size[out] The pointer to variable to store - * field map size. + * @param builder[out] The pointer to field map builder object to + * be prepared. * * @retval 0 Success. * @retval -1 Format error. @@ -402,8 +401,7 @@ box_tuple_format_unref(box_tuple_format_t *format); */ int tuple_field_map_create(struct tuple_format *format, const char *tuple, - bool validate, uint32_t **field_map, - uint32_t *field_map_size); + bool validate, struct field_map_builder *builder); /** * Initialize tuple format subsystem. diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index c6a1f4033..f8ed0c8a5 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -314,10 +314,11 @@ vy_stmt_new_with_ops(struct tuple_format *format, const char *tuple_begin, * tuples inserted into a space are validated explicitly * with tuple_validate() anyway. */ - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, tuple_begin, false, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(format, tuple_begin, false, &builder) != 0) goto end; + uint32_t field_map_size = field_map_build_size(&builder); + assert(field_map_size == format->field_map_size); /* * Allocate stmt. Offsets: one per key part + offset of the * statement end. @@ -330,7 +331,7 @@ vy_stmt_new_with_ops(struct tuple_format *format, const char *tuple_begin, /* Copy MsgPack data */ char *raw = (char *) tuple_data(stmt); char *wpos = raw; - memcpy(wpos - field_map_size, field_map, field_map_size); + field_map_build(&builder, wpos - field_map_size); memcpy(wpos, tuple_begin, mpsize); wpos += mpsize; for (struct iovec *op = ops, *end = ops + op_count; -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 6/7] box: introduce field_map_builder class 2019-04-02 15:49 ` [PATCH v3 6/7] box: introduce field_map_builder class Kirill Shcherbatov 2019-04-03 14:38 ` Vladimir Davydov @ 2019-04-03 16:30 ` Vladimir Davydov 1 sibling, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-03 16:30 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:37PM +0300, Kirill Shcherbatov wrote: > +/** > + * Set data offset for a field identified by unique offset_slot. > + * > + * The offset_slot argument must be negative and offset must be > + * positive (by definition). > + */ > +static inline void > +field_map_builder_set_slot(struct field_map_builder *builder, > + int32_t offset_slot, uint32_t offset) > +{ > + assert(offset_slot < 0); > + assert(offset > 0); An assertion making sure that the offset is within the map would be appreciated. > + builder->slots[offset_slot] = offset; > +} > @@ -126,11 +126,11 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) > > struct region *region = &fiber()->gc; > size_t region_svp = region_used(region); > - uint32_t *field_map, field_map_size; > - int rc = tuple_field_map_create(format, tuple, true, &field_map, > - &field_map_size); > - assert(rc != 0 || field_map_size == format->field_map_size); > + struct field_map_builder builder; > + int rc = tuple_field_map_create(&builder, format, tuple, true); > region_truncate(region, region_svp); > + assert(rc != 0 || > + field_map_build_size(&builder) == format->field_map_size); This assertion is rather pointless after the next patch, where you replace '==' with '>='. Please remove. > return rc; > } ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 7/7] box: introduce multikey indexes in memtx 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (5 preceding siblings ...) 2019-04-02 15:49 ` [PATCH v3 6/7] box: introduce field_map_builder class Kirill Shcherbatov @ 2019-04-02 15:49 ` Kirill Shcherbatov 2019-04-04 14:20 ` Vladimir Davydov 6 siblings, 1 reply; 29+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov - In the case of multikey index arises an ambiguity: which key should be used in the comparison. The previously introduced comparison hints act as an non-negative numeric index of key to use, - Memtx B+ tree replace and build_next methods have been patched to insert the same tuple multiple times by different logical indexes of the key in the array, - Map fields have been expanded service areas "extent" that contain an offset of multikey index keys by additional logical index. Part of #1257 @TarantoolBot document Title: introduce multikey indexes in memtx Any JSON index in which at least one partition contains "[*]" - array index placeholder sign is called "Multikey". Such indexes allows you to automatically index set of documents having same document structure. Multikey indexes design have a number of restrictions that must be taken into account: - it cannot be primary because of the ambiguity arising from it's definition (primary index requires the one unique key that identify tuple), - if some node in the JSON tree of all defined indexes contains an array index placeholder [*], no other JSON path can use an explicit JSON index on it's nested field. - it support "unique" semantics, but it's uniqueness a little different from conventional indexes: you may insert a tuple in which the same key occurs multiple times in a unique multikey index, but you cannot insert a tuple when any of its keys is in some other tuple stored in space, - the unique multikey index "duplicate" conflict occurs when the sets of extracted keys have a non-empty logical intersection - to identify the different keys by which a given data tuple is indexed, each key is assigned a logical sequence number in the array defined with array index placeholder [*] in index (such array is called multikey index root), - no index partition can contain more than one array index placeholder sign [*] in it's JSON path, - all parts containing JSON paths with array index placeholder [*] must have the same (in terms of json tokens) prefix before this placeholder sign. Example: s = box.schema.space.create('withdata') pk = s:create_index('pk') parts = { {2, 'str', path = 'data[*].name'}, {2, 'str', path = 'data[*].extra.phone'} } idx = s:create_index('idx', {parts = parts}) s:insert({1, {data = {{name="A", extra={phone="111"}}, {name="B", extra={phone="111"}}}, garbage = 1}} idx:get({'A', '111'}) --- src/box/errcode.h | 1 + src/box/field_map.c | 68 +++++++- src/box/field_map.h | 137 ++++++++++++++-- src/box/index_def.c | 5 + src/box/key_def.c | 112 +++++++++---- src/box/key_def.h | 34 ++++ src/box/memtx_space.c | 18 ++ src/box/memtx_tree.c | 180 +++++++++++++++++--- src/box/sql.c | 3 +- src/box/tuple.c | 28 +++- src/box/tuple.h | 81 +++++++-- src/box/tuple_compare.cc | 150 ++++++++++++++--- src/box/tuple_extract_key.cc | 41 ++++- src/box/tuple_format.c | 117 +++++++++++-- src/box/tuple_format.h | 32 ++++ src/box/vinyl.c | 5 + test/box/misc.result | 1 + test/engine/json.result | 13 -- test/engine/json.test.lua | 7 - test/engine/multikey.result | 298 ++++++++++++++++++++++++++++++++++ test/engine/multikey.test.lua | 88 ++++++++++ 21 files changed, 1262 insertions(+), 157 deletions(-) create mode 100644 test/engine/multikey.result create mode 100644 test/engine/multikey.test.lua diff --git a/src/box/errcode.h b/src/box/errcode.h index 3f8cb8e0e..ea992a6b2 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -246,6 +246,7 @@ struct errcode_record { /*191 */_(ER_SQL_PARSER_LIMIT, "%s %d exceeds the limit (%d)") \ /*192 */_(ER_INDEX_DEF_UNSUPPORTED, "%s are prohibited in an index definition") \ /*193 */_(ER_CK_DEF_UNSUPPORTED, "%s are prohibited in a CHECK constraint definition") \ + /*194 */_(ER_INDEX_MULTIKEY_INVALID, "Multikey index is invalid: %s") \ /* * !IMPORTANT! Please follow instructions at start of the file diff --git a/src/box/field_map.c b/src/box/field_map.c index 690aa461d..ae6c5be5f 100644 --- a/src/box/field_map.c +++ b/src/box/field_map.c @@ -37,6 +37,8 @@ field_map_builder_create(struct field_map_builder *builder, uint32_t minimal_field_map_size, struct region *region) { + builder->region = region; + builder->extents_size = 0; builder->slot_count = minimal_field_map_size / sizeof(uint32_t); if (minimal_field_map_size == 0) { builder->slots = NULL; @@ -53,9 +55,71 @@ field_map_builder_create(struct field_map_builder *builder, return 0; } +/** + * Get size of extention (in bytes) by count of items it + * must contain. + */ +static uint32_t +field_map_ext_size(uint32_t items) +{ + return sizeof(struct field_map_ext) + items * sizeof(uint32_t); +} + +struct field_map_ext * +field_map_builder_ext_get(struct field_map_builder *builder, + int32_t offset_slot, uint32_t extent_items) +{ + struct field_map_ext *extent; + if (builder->slots[offset_slot].has_extent) { + extent = builder->slots[offset_slot].extent; + assert(extent != NULL); + assert(extent->items == extent_items); + return extent; + } + uint32_t sz = field_map_ext_size(extent_items); + extent = region_alloc(builder->region, sz); + if (extent == NULL) { + diag_set(OutOfMemory, sz, "region", "extent"); + return NULL; + } + memset(extent, 0, sz); + extent->items = extent_items; + builder->slots[offset_slot].extent = extent; + builder->slots[offset_slot].has_extent = true; + builder->extents_size += sz; + return extent; +} + void field_map_build(struct field_map_builder *builder, char *buffer) { - uint32_t field_map_size = field_map_build_size(builder); - memcpy(buffer, (char *)builder->slots - field_map_size, field_map_size); + /* + * To initialize the field map and its extents, prepare + * the following memory layout with pointers: + * + * offset + * +---------------------+ + * | | + * [extentK]..[extent1][[slotN]..[slot2][slot1]] + * | |extent_wptr |field_map + * <- <- <- + * + * The buffer size is assumed to be sufficient to write + * field_map_build_size(builder) bytes there. + */ + uint32_t *field_map = + (uint32_t *)(buffer + field_map_build_size(builder)); + char *extent_wptr = buffer + builder->extents_size; + for (int32_t i = -1; i >= -(int32_t)builder->slot_count; i--) { + if (!builder->slots[i].has_extent) { + field_map[i] = builder->slots[i].offset; + continue; + } + struct field_map_ext *extent = builder->slots[i].extent; + /** Retrive memory for the extent. */ + uint32_t sz = field_map_ext_size(extent->items); + extent_wptr -= sz; + field_map[i] = (char *)field_map - (char *)extent_wptr; + memcpy(extent_wptr, extent, sz); + } } diff --git a/src/box/field_map.h b/src/box/field_map.h index 47ecbd009..af578ec6b 100644 --- a/src/box/field_map.h +++ b/src/box/field_map.h @@ -34,6 +34,7 @@ #include <stdint.h> struct region; +struct field_map_builder_slot; /** * A field map is a special area is reserved before tuple's @@ -46,13 +47,15 @@ struct region; * offset_slot(s) is performed on tuple_format creation on index * create or alter (see tuple_format_create()). * - * 4b 4b MessagePack data. - * +------+----+------+---------------------------+ - * tuple: | offN | .. | off1 | header ..|key1|..|keyN|.. | - * +--+---+----+--+---+---------------------------+ - * | ... | ^ ^ - * | +-----------------+ | - * +--------------------------------------+ + * 4b 4b 4b 4b MessagePack data. + * +-----------+------+----+------+------------------------+ + *tuple: |cnt|off1|..| offN | .. | off1 | header ..|key1|..|keyN|| + * +-----+-----+--+---+----+--+---+------------------------+ + * ext1 ^ | | ... | ^ ^ + * +-----|--------+ | | | + * indirection | +-----------------+ | + * +----------------------------------------------+ + * (offset_slot = N, extent_slot = 1) --> offset * * This field_map_builder class is used for tuple field_map * construction. It encapsulates field_map build logic and size @@ -60,19 +63,88 @@ struct region; * * Each field offset is a positive number, except the case when * a field is not in the tuple. In this case offset is 0. + * + * Some slots may store an offset of the field_map_ext structure, + * which contains an additional sequence of offsets of size + * defined above(see field_map_ext layout). The caller needs to + * be aware of when the slot is an offset of the data and when + * it is the offset of the field_map_ext. + * + * Now these extents are used to organize a multikey index. + * The count of keys in the multikey index imposes the count of + * items in extent while the i-th extent's slot contains the + * offset of the i-th key field. */ struct field_map_builder { /** - * The pointer to the end of filed_map allocation. + * The pointer to the end of field_map_builder_slot(s) + * allocation. * Its elements are accessible by negative indexes * that coinciding with offset_slot(s). */ - uint32_t *slots; + struct field_map_builder_slot *slots; /** * The count of slots in field_map_builder::slots * allocation. */ uint32_t slot_count; + /** + * Total size of memory is allocated for field_map + * extentions. + */ + uint32_t extents_size; + /** + * Region to use to perform memory allocations. + */ + struct region *region; +}; + +/** + * Internal stucture representing field_map extent. + * (see field_map_builder description). + */ +struct field_map_ext { + /** Count of field_map_ext::offset[] elements. */ + uint32_t items; + /** Data offset in tuple array. */ + uint32_t offset[0]; +}; + +/** + * Internal function to get or allocate field map extent + * by offset_slot, and count of items. + */ +struct field_map_ext * +field_map_builder_ext_get(struct field_map_builder *builder, + int32_t offset_slot, uint32_t extent_items); + +/** + * Instead of using uint32_t offset slots directly the + * field_map_builder uses this structure as a storage atom. + * When there is a need to initialize an extent, the + * field_map_builder allocates a new memory chunk and sets + * field_map_builder_slot::pointer (instead of real field_map + * reallocation). + * + * On field_map_build, all of the extents are dumped to the same + * memory chunk that the regular field_map slots and corresponding + * slots represent relative field_map_ext offset instead of + * field data offset. + * + * The allocated memory is accounted for in extents_size. + */ +struct field_map_builder_slot { + /** + * True when this slot must be interpret as + * extention pointer. + */ + bool has_extent; + union { + /** Data offset in tuple. */ + uint32_t offset; + /** Pointer to field_map_ext extention. */ + struct field_map_ext *extent; + }; }; /** @@ -82,9 +154,22 @@ struct field_map_builder { * When a field is not in the data tuple, its offset is 0. */ static inline uint32_t -field_map_get_offset(const uint32_t *field_map, int32_t offset_slot) +field_map_get_offset(const uint32_t *field_map, int32_t offset_slot, + int multikey_idx) { - return field_map[offset_slot]; + uint32_t offset; + if (multikey_idx >= 0) { + assert(field_map[offset_slot] != 0); + struct field_map_ext *extent = + (struct field_map_ext *)((char *)field_map - + field_map[offset_slot]); + if ((uint32_t)multikey_idx >= extent->items) + return 0; + offset = extent->offset[multikey_idx]; + } else { + offset = field_map[offset_slot]; + } + return offset; } /** @@ -116,7 +201,32 @@ field_map_builder_set_slot(struct field_map_builder *builder, { assert(offset_slot < 0); assert(offset > 0); - builder->slots[offset_slot] = offset; + builder->slots[offset_slot].offset = offset; +} + +/** + * Set data offset in field map extent (by given offset_slot, + * extent_slot and extent_items) for a field identified by + * unique offset_slot. + * + * The offset_slot argument must be negative and offset must be + * positive (by definition). + */ +static inline int +field_map_builder_set_extent_slot(struct field_map_builder *builder, + int32_t offset_slot, int32_t extent_slot, + uint32_t extent_items, uint32_t offset) +{ + assert(offset_slot < 0); + assert(offset > 0); + assert(extent_slot >= 0 && extent_items > 0); + struct field_map_ext *extent = + field_map_builder_ext_get(builder, offset_slot, extent_items); + if (extent == NULL) + return -1; + assert(extent->items == extent_items); + extent->offset[extent_slot] = offset; + return 0; } /** @@ -125,7 +235,8 @@ field_map_builder_set_slot(struct field_map_builder *builder, static inline uint32_t field_map_build_size(struct field_map_builder *builder) { - return builder->slot_count * sizeof(uint32_t); + return builder->slot_count * sizeof(uint32_t) + + builder->extents_size; } /** diff --git a/src/box/index_def.c b/src/box/index_def.c index c743d12ce..8c9a99551 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -269,6 +269,11 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) space_name, "too many key parts"); return false; } + if (index_def->iid == 0 && key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, + space_name, "primary key cannot be multikey"); + return false; + } for (uint32_t i = 0; i < index_def->key_def->part_count; i++) { assert(index_def->key_def->parts[i].type < field_type_MAX); if (index_def->key_def->parts[i].fieldno > BOX_INDEX_FIELD_MAX) { diff --git a/src/box/key_def.c b/src/box/key_def.c index 55dcf1eb5..4d36560e0 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -138,7 +138,57 @@ key_def_set_func(struct key_def *def) key_def_set_extract_func(def); } -static void +static int +key_def_set_part_path(struct key_def *def, uint32_t part_no, const char *path, + uint32_t path_len, char **path_pool) +{ + struct key_part *part = &def->parts[part_no]; + if (path == NULL) { + part->path = NULL; + part->path_len = 0; + return 0; + } + assert(path_pool != NULL); + part->path = *path_pool; + *path_pool += path_len; + memcpy(part->path, path, path_len); + part->path_len = path_len; + + /* + * Test whether this key_part has array index + * placeholder [*] (i.e. is a part of multikey index + * definition). + */ + int multikey_path_len = + json_path_multikey_offset(path, path_len, TUPLE_INDEX_BASE); + if (multikey_path_len < 0) + return 0; + + /* + * In case of multikey index key_parts must have the + * same JSON prefix. + */ + if (!key_def_is_multikey(def)) { + /* + * Keep the index of the first multikey key_part + * and the length of JSON path substring to the + * array index placeholder sign [*]. + */ + def->multikey_path = part->path; + def->multikey_fieldno = part->fieldno; + def->multikey_path_len = (uint32_t)multikey_path_len; + } else if (json_path_cmp(path, multikey_path_len, def->multikey_path, + def->multikey_path_len, + TUPLE_INDEX_BASE) != 0) { + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part_no + TUPLE_INDEX_BASE, + "incompatable multikey index path"); + return -1; + } + return 0; +} + +static int key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, enum on_conflict_action nullable_action, struct coll *coll, uint32_t coll_id, @@ -158,17 +208,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, def->parts[part_no].sort_order = sort_order; def->parts[part_no].offset_slot_cache = offset_slot; def->parts[part_no].format_epoch = format_epoch; - if (path != NULL) { - assert(path_pool != NULL); - def->parts[part_no].path = *path_pool; - *path_pool += path_len; - memcpy(def->parts[part_no].path, path, path_len); - def->parts[part_no].path_len = path_len; - } else { - def->parts[part_no].path = NULL; - def->parts[part_no].path_len = 0; - } column_mask_set_fieldno(&def->column_mask, fieldno); + return key_def_set_part_path(def, part_no, path, path_len, path_pool); } struct key_def * @@ -203,10 +244,14 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count) coll = coll_id->coll; } uint32_t path_len = part->path != NULL ? strlen(part->path) : 0; - key_def_set_part(def, i, part->fieldno, part->type, - part->nullable_action, coll, part->coll_id, - part->sort_order, part->path, path_len, - &path_pool, TUPLE_OFFSET_SLOT_NIL, 0); + if (key_def_set_part(def, i, part->fieldno, part->type, + part->nullable_action, coll, part->coll_id, + part->sort_order, part->path, path_len, + &path_pool, TUPLE_OFFSET_SLOT_NIL, + 0) != 0) { + key_def_delete(def); + return NULL; + } } assert(path_pool == (char *)def + sz); key_def_set_func(def); @@ -256,11 +301,12 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) key_def->unique_part_count = part_count; for (uint32_t item = 0; item < part_count; ++item) { - key_def_set_part(key_def, item, fields[item], - (enum field_type)types[item], - ON_CONFLICT_ACTION_DEFAULT, - NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0, - NULL, TUPLE_OFFSET_SLOT_NIL, 0); + if (key_def_set_part(key_def, item, fields[item], + (enum field_type)types[item], + ON_CONFLICT_ACTION_DEFAULT, NULL, + COLL_NONE, SORT_ORDER_ASC, NULL, 0, NULL, + TUPLE_OFFSET_SLOT_NIL, 0) != 0) + unreachable(); } key_def_set_func(key_def); return key_def; @@ -685,11 +731,13 @@ key_def_merge(const struct key_def *first, const struct key_def *second) part = first->parts; end = part + first->part_count; for (; part != end; part++) { - key_def_set_part(new_def, pos++, part->fieldno, part->type, - part->nullable_action, part->coll, - part->coll_id, part->sort_order, part->path, - part->path_len, &path_pool, - part->offset_slot_cache, part->format_epoch); + if (key_def_set_part(new_def, pos++, part->fieldno, part->type, + part->nullable_action, part->coll, + part->coll_id, part->sort_order, + part->path, part->path_len, &path_pool, + part->offset_slot_cache, + part->format_epoch) != 0) + unreachable(); } /* Set-append second key def's part to the new key def. */ @@ -698,11 +746,15 @@ key_def_merge(const struct key_def *first, const struct key_def *second) for (; part != end; part++) { if (!key_def_can_merge(first, part)) continue; - key_def_set_part(new_def, pos++, part->fieldno, part->type, - part->nullable_action, part->coll, - part->coll_id, part->sort_order, part->path, - part->path_len, &path_pool, - part->offset_slot_cache, part->format_epoch); + if (key_def_set_part(new_def, pos++, part->fieldno, part->type, + part->nullable_action, part->coll, + part->coll_id, part->sort_order, + part->path, part->path_len, &path_pool, + part->offset_slot_cache, + part->format_epoch) != 0) { + key_def_delete(new_def); + return NULL; + } } assert(path_pool == (char *)new_def + sz); key_def_set_func(new_def); diff --git a/src/box/key_def.h b/src/box/key_def.h index 288cf7270..3adf20453 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -234,6 +234,34 @@ struct key_def { bool has_optional_parts; /** Key fields mask. @sa column_mask.h for details. */ uint64_t column_mask; + /** + * In case of the multikey index, a pointer to the + * JSON path string, the path to the root node of + * multikey index that contains the array having + * index placeholder sign [*]. + * + * This pointer duplicates the JSON path of some key_part. + * This path is not 0-terminated. Moreover, it is only + * JSON path subpath so key_def::multikey_path_len must be + * directly used in all cases. + * + * This field is not NULL iff this is multikey index + * key definition. + */ + const char *multikey_path; + /** + * The length of the key_def::multikey_path. + * Valid when key_def_is_multikey(key_def) is true, + * undefined otherwise. + */ + uint32_t multikey_path_len; + /** + * The index of the root field of the multikey JSON + * path index key_def::multikey_path. + * Valid when key_def_is_multikey(key_def) is true, + * undefined otherwise. + */ + uint32_t multikey_fieldno; /** The size of the 'parts' array. */ uint32_t part_count; /** Description of parts of a multipart index. */ @@ -472,6 +500,12 @@ key_def_is_sequential(const struct key_def *key_def) return true; } +static inline bool +key_def_is_multikey(const struct key_def *key_def) +{ + return key_def->multikey_path != NULL; +} + /** * Return true if @a key_def defines has fields that requires * special collation comparison. diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c index d8529fe0e..ca5f7426d 100644 --- a/src/box/memtx_space.c +++ b/src/box/memtx_space.c @@ -638,6 +638,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "HASH index must be unique"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "HASH index cannot be multikey"); + return -1; + } break; case TREE: /* TREE index has no limitations. */ @@ -661,6 +667,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "RTREE index field type must be ARRAY"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "RTREE index cannot be multikey"); + return -1; + } /* no furter checks of parts needed */ return 0; case BITSET: @@ -683,6 +695,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "BITSET index field type must be NUM or STR"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "BITSET index cannot be multikey"); + return -1; + } /* no furter checks of parts needed */ return 0; default: diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c index fe037c54a..fa8a04664 100644 --- a/src/box/memtx_tree.c +++ b/src/box/memtx_tree.c @@ -584,6 +584,77 @@ memtx_tree_index_replace(struct index *base, struct tuple *old_tuple, return 0; } +static int +memtx_tree_index_replace_multikey(struct index *base, struct tuple *old_tuple, + struct tuple *new_tuple, + enum dup_replace_mode mode, + struct tuple **result) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + if (new_tuple != NULL) { + int errcode = 0, tree_res = 0; + struct memtx_tree_data new_data, dup_data; + new_data.tuple = new_tuple; + int multikey_idx = 0; + uint32_t items = tuple_field_multikey_items(new_tuple, cmp_def); + for (; (uint32_t)multikey_idx < items; multikey_idx++) { + new_data.hint = multikey_idx; + dup_data.tuple = NULL; + tree_res = memtx_tree_insert(&index->tree, new_data, + &dup_data); + if (tree_res != 0) { + diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, + "memtx_tree_index", "replace"); + break; + } + errcode = new_tuple == dup_data.tuple ? 0 : + replace_check_dup(old_tuple, dup_data.tuple, + mode); + if (errcode != 0) { + memtx_tree_delete(&index->tree, new_data); + if (dup_data.tuple != NULL) { + /* Rollback replace. */ + memtx_tree_insert(&index->tree, + dup_data, NULL); + } + struct space *sp = + space_cache_find(base->def->space_id); + if (sp != NULL) { + diag_set(ClientError, errcode, + base->def->name, + space_name(sp)); + } + break; + } + } + if (tree_res != 0 || errcode != 0) { + multikey_idx--; + for (; multikey_idx >= 0; multikey_idx--) { + new_data.hint = multikey_idx; + memtx_tree_delete(&index->tree, new_data); + } + return -1; + } + if (dup_data.tuple != NULL) { + *result = dup_data.tuple; + return 0; + } + } + if (old_tuple != NULL) { + struct memtx_tree_data old_data; + old_data.tuple = old_tuple; + uint32_t items = tuple_field_multikey_items(old_tuple, cmp_def); + for (uint32_t multikey_idx = 0; multikey_idx < items; + multikey_idx++) { + old_data.hint = multikey_idx; + memtx_tree_delete(&index->tree, old_data); + } + } + *result = old_tuple; + return 0; +} + static struct iterator * memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, const char *key, uint32_t part_count) @@ -656,34 +727,42 @@ memtx_tree_index_reserve(struct index *base, uint32_t size_hint) } static int -memtx_tree_index_build_next(struct index *base, struct tuple *tuple) +memtx_tree_index_build_array_realloc(struct memtx_tree_index *index, + uint32_t items) { - struct memtx_tree_index *index = (struct memtx_tree_index *)base; - struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + bool need_realloc = false; if (index->build_array == NULL) { - index->build_array = malloc(MEMTX_EXTENT_SIZE); - if (index->build_array == NULL) { - diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, - "memtx_tree_index", "build_next"); - return -1; - } - index->build_array_alloc_size = - MEMTX_EXTENT_SIZE / sizeof(index->build_array[0]); + index->build_array_alloc_size = MEMTX_EXTENT_SIZE / + sizeof(index->build_array[0]); + need_realloc = true; } - assert(index->build_array_size <= index->build_array_alloc_size); - if (index->build_array_size == index->build_array_alloc_size) { - index->build_array_alloc_size = index->build_array_alloc_size + - index->build_array_alloc_size / 2; - struct memtx_tree_data *tmp = - realloc(index->build_array, - index->build_array_alloc_size * sizeof(*tmp)); - if (tmp == NULL) { - diag_set(OutOfMemory, index->build_array_alloc_size * - sizeof(*tmp), "memtx_tree_index", "build_next"); - return -1; - } - index->build_array = tmp; + uint32_t build_array_new_size = index->build_array_size + items; + if (build_array_new_size > index->build_array_alloc_size) { + index->build_array_alloc_size += + MAX(index->build_array_alloc_size / 2, + build_array_new_size - index->build_array_alloc_size); + need_realloc = true; + } + if (!need_realloc) + return 0; + struct memtx_tree_data *tmp = realloc(index->build_array, + index->build_array_alloc_size * sizeof(*tmp)); + if (tmp == NULL) { + diag_set(OutOfMemory, index->build_array_alloc_size * + sizeof(*tmp), "memtx_tree_index", "build_next"); + return -1; } + index->build_array = tmp; + return 0; +} + +static int +memtx_tree_index_build_next(struct index *base, struct tuple *tuple) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + if (memtx_tree_index_build_array_realloc(index, 1) != 0) + return -1; struct memtx_tree_data *elem = &index->build_array[index->build_array_size++]; elem->tuple = tuple; @@ -691,6 +770,24 @@ memtx_tree_index_build_next(struct index *base, struct tuple *tuple) return 0; } +static int +memtx_tree_index_build_next_multikey(struct index *base, struct tuple *tuple) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + uint32_t items = tuple_field_multikey_items(tuple, cmp_def); + if (memtx_tree_index_build_array_realloc(index, items) != 0) + return -1; + for (uint32_t multikey_idx = 0; multikey_idx < items; multikey_idx++) { + struct memtx_tree_data *elem = + &index->build_array[index->build_array_size++]; + assert(index->build_array_size <= index->build_array_alloc_size); + elem->tuple = tuple; + elem->hint = multikey_idx; + } + return 0; +} + static void memtx_tree_index_end_build(struct index *base) { @@ -793,6 +890,36 @@ static const struct index_vtab memtx_tree_index_vtab = { /* .end_build = */ memtx_tree_index_end_build, }; +static const struct index_vtab memtx_tree_index_multikey_vtab = { + /* .destroy = */ memtx_tree_index_destroy, + /* .commit_create = */ generic_index_commit_create, + /* .abort_create = */ generic_index_abort_create, + /* .commit_modify = */ generic_index_commit_modify, + /* .commit_drop = */ generic_index_commit_drop, + /* .update_def = */ memtx_tree_index_update_def, + /* .depends_on_pk = */ memtx_tree_index_depends_on_pk, + /* .def_change_requires_rebuild = */ + memtx_index_def_change_requires_rebuild, + /* .size = */ memtx_tree_index_size, + /* .bsize = */ memtx_tree_index_bsize, + /* .min = */ generic_index_min, + /* .max = */ generic_index_max, + /* .random = */ memtx_tree_index_random, + /* .count = */ memtx_tree_index_count, + /* .get = */ memtx_tree_index_get, + /* .replace = */ memtx_tree_index_replace_multikey, + /* .create_iterator = */ memtx_tree_index_create_iterator, + /* .create_snapshot_iterator = */ + memtx_tree_index_create_snapshot_iterator, + /* .stat = */ generic_index_stat, + /* .compact = */ generic_index_compact, + /* .reset_stat = */ generic_index_reset_stat, + /* .begin_build = */ memtx_tree_index_begin_build, + /* .reserve = */ memtx_tree_index_reserve, + /* .build_next = */ memtx_tree_index_build_next_multikey, + /* .end_build = */ memtx_tree_index_end_build, +}; + struct index * memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def) { @@ -803,8 +930,11 @@ memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def) "malloc", "struct memtx_tree_index"); return NULL; } + const struct index_vtab *vtab = key_def_is_multikey(def->key_def) ? + &memtx_tree_index_multikey_vtab : + &memtx_tree_index_vtab; if (index_create(&index->base, (struct engine *)memtx, - &memtx_tree_index_vtab, def) != 0) { + vtab, def) != 0) { free(index); return NULL; } diff --git a/src/box/sql.c b/src/box/sql.c index 7d5c3a8e0..be2212ef1 100644 --- a/src/box/sql.c +++ b/src/box/sql.c @@ -788,7 +788,8 @@ tarantoolsqlIdxKeyCompare(struct BtCursor *cursor, } else { uint32_t field_offset = field_map_get_offset(field_map, - field->offset_slot); + field->offset_slot, + -1); p = base + field_offset; } } diff --git a/src/box/tuple.c b/src/box/tuple.c index 570e4c192..8c2cd7611 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -130,7 +130,7 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) int rc = tuple_field_map_create(&builder, format, tuple, true); region_truncate(region, region_svp); assert(rc != 0 || - field_map_build_size(&builder) == format->field_map_size); + field_map_build_size(&builder) >= format->field_map_size); return rc; } @@ -455,7 +455,8 @@ tuple_field_go_to_key(const char **field, const char *key, int len) } int -tuple_go_to_path(const char **data, const char *path, uint32_t path_len) +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, + int multikey_idx) { int rc; struct json_lexer lexer; @@ -463,6 +464,10 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len) json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { switch (token.type) { + case JSON_TOKEN_ANY: + assert(multikey_idx >= 0); + token.num = multikey_idx; + FALLTHROUGH; case JSON_TOKEN_NUM: rc = tuple_field_go_to_index(data, token.num); break; @@ -532,7 +537,24 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple, } return tuple_field_raw_by_path(format, tuple, field_map, fieldno, path + lexer.offset, - path_len - lexer.offset, NULL); + path_len - lexer.offset, NULL, -1); +} + +uint32_t +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_def *key_def) +{ + assert(key_def_is_multikey(key_def)); + const char *array_raw = + tuple_field_raw_by_path(format, data, field_map, + key_def->multikey_fieldno, + key_def->multikey_path, + key_def->multikey_path_len, NULL, -1); + if (array_raw == NULL) + return 0; + assert(mp_typeof(*array_raw) == MP_ARRAY); + return mp_decode_array(&array_raw); } /* }}} tuple_field_* getters */ diff --git a/src/box/tuple.h b/src/box/tuple.h index da4085bcf..99753c8d5 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -508,14 +508,19 @@ tuple_field_count(const struct tuple *tuple) * with NULL. * @param path The path to process. * @param path_len The length of the @path. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. * @retval 0 On success. * @retval -1 In case of error in JSON path. */ int -tuple_go_to_path(const char **data, const char *path, uint32_t path_len); +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, + int multikey_idx); /** - * Get tuple field by field index and relative JSON path. + * Get tuple field by field index, relative JSON path and + * multikey_idx. * @param format Tuple format. * @param tuple MessagePack tuple's body. * @param field_map Tuple field map. @@ -528,12 +533,15 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len); * access data in a single operation. * Else it is initialized with offset_slot * of format field by path. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. */ static inline const char * tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, const uint32_t *field_map, uint32_t fieldno, const char *path, uint32_t path_len, - int32_t *offset_slot_hint) + int32_t *offset_slot_hint, int multikey_idx) { int32_t offset_slot; if (offset_slot_hint != NULL && @@ -560,7 +568,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, *offset_slot_hint = offset_slot; offset_slot_access: /* Indexed field */ - offset = field_map_get_offset(field_map, offset_slot); + offset = field_map_get_offset(field_map, offset_slot, + multikey_idx); if (offset == 0) return NULL; tuple += offset; @@ -574,8 +583,8 @@ parse: for (uint32_t k = 0; k < fieldno; k++) mp_next(&tuple); if (path != NULL && - unlikely(tuple_go_to_path(&tuple, path, - path_len) != 0)) + unlikely(tuple_go_to_path(&tuple, path, path_len, + multikey_idx) != 0)) return NULL; } return tuple; @@ -597,7 +606,7 @@ tuple_field_raw(struct tuple_format *format, const char *tuple, const uint32_t *field_map, uint32_t field_no) { return tuple_field_raw_by_path(format, tuple, field_map, field_no, - NULL, 0, NULL); + NULL, 0, NULL, -1); } /** @@ -636,16 +645,19 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple, uint32_t path_len, uint32_t path_hash); /** - * Get a tuple field pointed to by an index part. + * Get a tuple field pointed to by an index part and multikey + * index hint. * @param format Tuple format. - * @param tuple A pointer to MessagePack array. + * @param data A pointer to MessagePack array. * @param field_map A pointer to the LAST element of field map. * @param part Index part to use. + * @param multikey_idx A multikey index hint. * @retval Field data if the field exists or NULL. */ static inline const char * -tuple_field_raw_by_part(struct tuple_format *format, const char *data, - const uint32_t *field_map, struct key_part *part) +tuple_field_raw_by_part_multikey(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_part *part, int multikey_idx) { if (unlikely(part->format_epoch != format->epoch)) { assert(format->epoch != 0); @@ -658,7 +670,23 @@ tuple_field_raw_by_part(struct tuple_format *format, const char *data, } return tuple_field_raw_by_path(format, data, field_map, part->fieldno, part->path, part->path_len, - &part->offset_slot_cache); + &part->offset_slot_cache, multikey_idx); +} + +/** + * Get a tuple field pointed to by an index part. + * @param format Tuple format. + * @param data A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. + */ +static inline const char * +tuple_field_raw_by_part(struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) +{ + return tuple_field_raw_by_part_multikey(format, data, field_map, + part, -1); } /** @@ -674,6 +702,35 @@ tuple_field_by_part(const struct tuple *tuple, struct key_part *part) tuple_field_map(tuple), part); } +/** + * Get count of documents in tuple by given multikey index + * definition. + * @param format Tuple format. + * @param data A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param key_def Index key_definition. + * @retval Count of documents in the multikey index. + */ +uint32_t +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_def *key_def); + +/** + * Get count of documents in tuple by given multikey index + * definition. + * @param tuple Tuple to get the count of documents from. + * @param key_def Index key_definition. + * @retval Count of documents in the multikey index. + */ +static inline uint32_t +tuple_field_multikey_items(struct tuple *tuple, struct key_def *key_def) +{ + return tuple_field_raw_multikey_items(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), key_def); +} + /** * @brief Tuple Interator */ diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index 93756365b..bf3a23097 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -375,6 +375,29 @@ mp_compare_scalar_coll(const char *field_a, const char *field_b, return mp_compare_scalar_with_type(field_a, type_a, field_b, type_b); } +static inline int +tuple_compare_stub(const struct tuple *tuple_a, const struct tuple *tuple_b, + struct key_def *key_def) +{ + (void)tuple_a; + (void)tuple_b; + (void)key_def; + unreachable(); + return 0; +} + +static inline int +tuple_compare_with_key_stub(const struct tuple *tuple, const char *key, + uint32_t part_count, struct key_def *key_def) +{ + (void) tuple; + (void) key; + (void) part_count; + (void) key_def; + unreachable(); + return 0; +} + /** * @brief Compare two fields parts using a type definition * @param field_a field @@ -445,7 +468,8 @@ tuple_compare_field_with_type(const char *field_a, enum mp_type a_type, } } -template<bool is_nullable, bool has_optional_parts, bool has_json_paths> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths, + bool is_multikey> static inline int tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, const struct tuple *tuple_b, hint_t tuple_b_hint, @@ -455,8 +479,10 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); - int rc = hint_cmp(tuple_a_hint, tuple_b_hint); - if (rc != 0) + assert(key_def_is_multikey(key_def) == is_multikey); + assert(!is_multikey || is_multikey == has_json_paths); + int rc = 0; + if (!is_multikey && (rc = hint_cmp(tuple_a_hint, tuple_b_hint)) != 0) return rc; struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); @@ -499,7 +525,14 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, end = part + key_def->part_count; for (; part < end; part++) { - if (has_json_paths) { + if (is_multikey) { + field_a = tuple_field_raw_by_part_multikey(format_a, + tuple_a_raw, field_map_a, part, + (int)tuple_a_hint); + field_b = tuple_field_raw_by_part_multikey(format_b, + tuple_b_raw, field_map_b, part, + (int)tuple_b_hint); + } else if (has_json_paths) { field_a = tuple_field_raw_by_part(format_a, tuple_a_raw, field_map_a, part); field_b = tuple_field_raw_by_part(format_b, tuple_b_raw, @@ -556,7 +589,14 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, */ end = key_def->parts + key_def->part_count; for (; part < end; ++part) { - if (has_json_paths) { + if (is_multikey) { + field_a = tuple_field_raw_by_part_multikey(format_a, + tuple_a_raw, field_map_a, part, + (int)tuple_a_hint); + field_b = tuple_field_raw_by_part_multikey(format_b, + tuple_b_raw, field_map_b, part, + (int)tuple_b_hint); + } else if (has_json_paths) { field_a = tuple_field_raw_by_part(format_a, tuple_a_raw, field_map_a, part); field_b = tuple_field_raw_by_part(format_b, tuple_b_raw, @@ -586,11 +626,12 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, struct key_def *key_def) { return tuple_compare_slowpath_hinted - <is_nullable, has_optional_parts, has_json_paths> + <is_nullable, has_optional_parts, has_json_paths, false> (tuple_a, HINT_NONE, tuple_b, HINT_NONE, key_def); } -template<bool is_nullable, bool has_optional_parts, bool has_json_paths> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths, + bool is_multikey> static inline int tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, hint_t tuple_hint, const char *key, uint32_t part_count, @@ -602,8 +643,10 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, assert(has_optional_parts == key_def->has_optional_parts); assert(key != NULL || part_count == 0); assert(part_count <= key_def->part_count); - int rc = hint_cmp(tuple_hint, key_hint); - if (rc != 0) + assert(key_def_is_multikey(key_def) == is_multikey); + assert(!is_multikey || is_multikey == has_json_paths); + int rc = 0; + if (!is_multikey && (rc = hint_cmp(tuple_hint, key_hint)) != 0) return rc; struct key_part *part = key_def->parts; struct tuple_format *format = tuple_format(tuple); @@ -612,7 +655,11 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, enum mp_type a_type, b_type; if (likely(part_count == 1)) { const char *field; - if (has_json_paths) { + if (is_multikey) { + field = tuple_field_raw_by_part_multikey(format, + tuple_raw, field_map, part, + (int)tuple_hint); + } else if (has_json_paths) { field = tuple_field_raw_by_part(format, tuple_raw, field_map, part); } else { @@ -642,7 +689,11 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, struct key_part *end = part + part_count; for (; part < end; ++part, mp_next(&key)) { const char *field; - if (has_json_paths) { + if (is_multikey) { + field = tuple_field_raw_by_part_multikey(format, + tuple_raw, field_map, part, + (int)tuple_hint); + } else if (has_json_paths) { field = tuple_field_raw_by_part(format, tuple_raw, field_map, part); } else { @@ -684,7 +735,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, struct key_def *key_def) { return tuple_compare_with_key_slowpath_hinted - <is_nullable, has_optional_parts, has_json_paths> + <is_nullable, has_optional_parts, has_json_paths, false> (tuple, HINT_NONE, key, part_count, HINT_NONE, key_def); } @@ -1314,6 +1365,16 @@ static const comparator_with_key_signature cmp_wk_arr[] = { /* {{{ tuple_hint */ /** + * Hints are now used for two purposes - passing the index of the + * key used in the case of multikey index and to speed up the + * comparators. + * + * Scenario I. Pass the multikey index of the key to comparator. + * In the case of multikey index arises an ambiguity: which key + * should be used in the comparison. Hints act as an non-negative + * numeric index of key to use. + * + * Scenario II. Speed up comparators. * A comparison hint is an unsigned integer number that has * the following layout: * @@ -1611,12 +1672,35 @@ tuple_hint(const struct tuple *tuple, struct key_def *key_def) return field_hint<type, is_nullable>(field, key_def->parts->coll); } +static hint_t +key_hint_stub(const char *key, uint32_t part_count, struct key_def *key_def) +{ + (void) key; + (void) part_count; + (void) key_def; + return HINT_NONE; +} + +static hint_t +tuple_hint_stub(const struct tuple *tuple, struct key_def *key_def) +{ + (void) tuple; + (void) key_def; + unreachable(); + return HINT_NONE; +} + template<enum field_type type, bool is_nullable> static void key_def_set_hint_func(struct key_def *def) { - def->key_hint = key_hint<type, is_nullable>; - def->tuple_hint = tuple_hint<type, is_nullable>; + if (key_def_is_multikey(def)) { + def->key_hint = key_hint_stub; + def->tuple_hint = tuple_hint_stub; + } else { + def->key_hint = key_hint<type, is_nullable>; + def->tuple_hint = tuple_hint<type, is_nullable>; + } } template<enum field_type type> @@ -1710,7 +1794,7 @@ key_def_set_compare_func_fast(struct key_def *def) tuple_compare_slowpath<false, false, false>; cmp_hinted = is_sequential ? tuple_compare_sequential_hinted<false, false> : - tuple_compare_slowpath_hinted<false, false, false>; + tuple_compare_slowpath_hinted<false, false, false, false>; } if (cmp_wk == NULL) { cmp_wk = is_sequential ? @@ -1718,7 +1802,8 @@ key_def_set_compare_func_fast(struct key_def *def) tuple_compare_with_key_slowpath<false, false, false>; cmp_wk_hinted = is_sequential ? tuple_compare_with_key_sequential_hinted<false, false> : - tuple_compare_with_key_slowpath_hinted<false, false, false>; + tuple_compare_with_key_slowpath_hinted<false, false, + false, false>; } def->tuple_compare = cmp; @@ -1746,12 +1831,13 @@ key_def_set_compare_func_plain(struct key_def *def) def->tuple_compare = tuple_compare_slowpath <is_nullable, has_optional_parts, false>; def->tuple_compare_hinted = tuple_compare_slowpath_hinted - <is_nullable, has_optional_parts, false>; + <is_nullable, has_optional_parts, false, false>; def->tuple_compare_with_key = tuple_compare_with_key_slowpath <is_nullable, has_optional_parts, false>; def->tuple_compare_with_key_hinted = tuple_compare_with_key_slowpath_hinted - <is_nullable, has_optional_parts, false>; + <is_nullable, has_optional_parts, + false, false>; } } @@ -1760,15 +1846,25 @@ static void key_def_set_compare_func_json(struct key_def *def) { assert(def->has_json_paths); - def->tuple_compare = tuple_compare_slowpath - <is_nullable, has_optional_parts, true>; - def->tuple_compare_hinted = tuple_compare_slowpath_hinted - <is_nullable, has_optional_parts, true>; - def->tuple_compare_with_key = tuple_compare_with_key_slowpath - <is_nullable, has_optional_parts, true>; - def->tuple_compare_with_key_hinted = - tuple_compare_with_key_slowpath_hinted - <is_nullable, has_optional_parts, true>; + if (key_def_is_multikey(def)) { + def->tuple_compare = tuple_compare_stub; + def->tuple_compare_hinted = tuple_compare_slowpath_hinted + <is_nullable, has_optional_parts, true, true>; + def->tuple_compare_with_key = tuple_compare_with_key_stub; + def->tuple_compare_with_key_hinted = + tuple_compare_with_key_slowpath_hinted + <is_nullable, has_optional_parts, true, true>; + } else { + def->tuple_compare = tuple_compare_slowpath + <is_nullable, has_optional_parts, true>; + def->tuple_compare_hinted = tuple_compare_slowpath_hinted + <is_nullable, has_optional_parts, true, false>; + def->tuple_compare_with_key = tuple_compare_with_key_slowpath + <is_nullable, has_optional_parts, true>; + def->tuple_compare_with_key_hinted = + tuple_compare_with_key_slowpath_hinted + <is_nullable, has_optional_parts, true, false>; + } } void diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 836d4e565..081677ff6 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -30,6 +30,29 @@ key_def_contains_sequential_parts(const struct key_def *def) return false; } +static inline char * +tuple_extract_key_stub(const struct tuple *tuple, struct key_def *key_def, + uint32_t *key_size) +{ + (void) tuple; + (void) key_def; + (void) key_size; + unreachable(); + return NULL; +} + +static char * +tuple_extract_key_raw_stub(const char *data, const char *data_end, + struct key_def *key_def, uint32_t *key_size) +{ + (void) data; + (void) data_end; + (void) key_def; + (void) key_size; + unreachable(); + return NULL; +} + /** * Optimized version of tuple_extract_key_raw() for sequential key defs * @copydoc tuple_extract_key_raw() @@ -310,7 +333,7 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, const char *src_end = field_end; if (has_json_paths && part->path != NULL) { if (tuple_go_to_path(&src, part->path, - part->path_len) != 0) { + part->path_len, -1) != 0) { /* * The path must be correct as * it has already been validated @@ -349,6 +372,7 @@ static void key_def_set_extract_func_plain(struct key_def *def) { assert(!def->has_json_paths); + assert(!key_def_is_multikey(def)); if (key_def_is_sequential(def)) { assert(contains_sequential_parts || def->part_count == 1); def->tuple_extract_key = tuple_extract_key_sequential @@ -369,11 +393,16 @@ static void key_def_set_extract_func_json(struct key_def *def) { assert(def->has_json_paths); - def->tuple_extract_key = tuple_extract_key_slowpath - <contains_sequential_parts, - has_optional_parts, true>; - def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw - <has_optional_parts, true>; + if (!key_def_is_multikey(def)) { + def->tuple_extract_key = tuple_extract_key_slowpath + <contains_sequential_parts, + has_optional_parts, true>; + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw + <has_optional_parts, true>; + } else { + def->tuple_extract_key = tuple_extract_key_stub; + def->tuple_extract_key_raw = tuple_extract_key_raw_stub; + } } void diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 9a643b700..33d6db420 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -194,6 +194,50 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id) return NULL; } +/** + * Check if child_field may be attached to parent_field, + * update the parent_field container type if required. + */ +static int +tuple_format_field_update_type(struct tuple_field *parent_field, + struct tuple_field *child_field) +{ + enum field_type expected_type = + child_field->token.type == JSON_TOKEN_STR ? + FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; + if (child_field->token.type == JSON_TOKEN_ANY && + !json_token_is_multikey(&parent_field->token) && + !json_token_is_leaf(&parent_field->token)) { + assert(expected_type == FIELD_TYPE_ARRAY); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("field %s is already refer to document by " + "identifier and cannot use array index " + "placeholder [*]", + tuple_field_path(parent_field))); + return -1; + } + if (json_token_is_multikey(&parent_field->token) && + child_field->token.type != JSON_TOKEN_ANY) { + assert(expected_type == FIELD_TYPE_ARRAY); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("field %s is already used with array index " + "placeholder [*] and cannot refer to " + "document by identifier", + tuple_field_path(parent_field))); + return -1; + } + if (field_type1_contains_type2(parent_field->type, expected_type)) { + parent_field->type = expected_type; + } else { + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, + tuple_field_path(parent_field), + field_type_strs[parent_field->type], + field_type_strs[expected_type]); + return -1; + } + return 0; +} + /** * Given a field number and a path, add the corresponding field * to the tuple format, allocating intermediate fields if @@ -228,29 +272,16 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, *path_pool += path_len; int rc = 0; - uint32_t token_count = 0; + uint32_t token_count = 0, json_token_any_count = 0; struct json_tree *tree = &format->fields; struct json_lexer lexer; json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 && field->token.type != JSON_TOKEN_END) { - if (field->token.type == JSON_TOKEN_ANY) { - diag_set(ClientError, ER_UNSUPPORTED, - "Tarantool", "multikey indexes"); + if (tuple_format_field_update_type(parent, field) != 0) goto fail; - } - enum field_type expected_type = - field->token.type == JSON_TOKEN_STR ? - FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; - if (field_type1_contains_type2(parent->type, expected_type)) { - parent->type = expected_type; - } else { - diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, - tuple_field_path(parent), - field_type_strs[parent->type], - field_type_strs[expected_type]); - goto fail; - } + if (field->token.type == JSON_TOKEN_ANY) + json_token_any_count++; struct tuple_field *next = json_tree_lookup_entry(tree, &parent->token, &field->token, @@ -268,6 +299,18 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, if (field == NULL) goto fail; } + if (json_token_is_multikey(&parent->token) && + parent->offset_slot == TUPLE_OFFSET_SLOT_NIL) { + /** + * Allocate offset slot for array is used + * in multikey index. This is required to + * quickly extract its size. + * @see tuple_field_multikey_items() + */ + assert(parent->type == FIELD_TYPE_ARRAY); + *current_slot = *current_slot - 1; + parent->offset_slot = *current_slot; + } parent->is_key_part = true; parent = next; token_count++; @@ -280,6 +323,13 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, assert(parent != NULL); /* Update tree depth information. */ format->fields_depth = MAX(format->fields_depth, token_count + 1); + if (json_token_any_count > 1) { + assert(path_len > 0); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + "no more than one array index placeholder [*] is " + "allowed in JSON path"); + goto fail; + } cleanup: tuple_field_delete(field); end: @@ -847,7 +897,16 @@ tuple_field_map_create(struct field_map_builder *builder, return -1; } /* Initialize field_map with data offset. */ - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + int multikey_idx = tuple_parse_iterator_multikey_idx(&it); + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL && + multikey_idx >= 0) { + int multikey_items = + tuple_parse_iterator_multikey_items(&it); + if (field_map_builder_set_extent_slot(builder, + field->offset_slot, multikey_idx, + multikey_items, pos - tuple) != 0) + return -1; + } else if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { field_map_builder_set_slot(builder, field->offset_slot, pos - tuple); } @@ -966,6 +1025,7 @@ tuple_parse_iterator_create(struct tuple_parse_iterator *it, it->parent = &format->fields.root; it->format = format; it->pos = data; + it->multikey_frame = NULL; return 0; } @@ -986,6 +1046,14 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, mp_stack_pop(&it->stack); if (mp_stack_is_empty(&it->stack)) return rc; + if (json_token_is_multikey(it->parent)) { + /* + * As we leave the multikey branch, + * we need to reset the pointer to + * multikey_frame. + */ + it->multikey_frame = NULL; + } it->parent = it->parent->parent; } /* @@ -1038,6 +1106,19 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, mp_decode_array(&it->pos) : mp_decode_map(&it->pos); mp_stack_push(&it->stack, type, size); + if (json_token_is_multikey(&(*field)->token)) { + /** + * Keep a pointer to the frame that + * describes an array with index + * placeholder [*]. The "current" item + * of this frame matches the logical + * index of document in multikey index + * and is equal to multikey index + * comparison hint. + */ + assert(type == MP_ARRAY); + it->multikey_frame = mp_stack_top(&it->stack); + } it->parent = &(*field)->token; } else { mp_next(&it->pos); diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 88f03d5eb..d07ca91eb 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -437,6 +437,12 @@ struct tuple_parse_iterator { * pointer accordingly. */ struct mp_stack stack; + /** + * The pointer to the stack frame representing an array + * filed that has JSON_TOKEN_ANY child, i.e. the root + * of the multikey index. + */ + struct mp_frame *multikey_frame; /** The current read position in msgpack. */ const char *pos; }; @@ -479,6 +485,32 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, struct tuple_field **field, const char **data, const char **data_end); +/** + * Returns non-negative multikey index comparison hint when + * the iterator's position in the format::fields subtree + * corresponds to the multikey subtree and -1 otherwise. + */ +static inline int +tuple_parse_iterator_multikey_idx(struct tuple_parse_iterator *it) +{ + if (it->multikey_frame == NULL) + return -1; + return it->multikey_frame->curr; +} + +/** + * Returns positive number - a count of document's in multikey + * index for when the iterator's position in the format::fields + * subtree corresponds to the multikey subtree and -1 otherwise. + */ +static inline int +tuple_parse_iterator_multikey_items(struct tuple_parse_iterator *it) +{ + if (it->multikey_frame == NULL) + return -1; + return it->multikey_frame->count; +} + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ diff --git a/src/box/vinyl.c b/src/box/vinyl.c index da8910255..c2860098a 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -699,6 +699,11 @@ vinyl_space_check_index_def(struct space *space, struct index_def *index_def) return -1; } } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_UNSUPPORTED, + "Vinyl", "multikey indexes"); + return -1; + } return 0; } diff --git a/test/box/misc.result b/test/box/misc.result index e2b618c9c..b89a7ed2f 100644 --- a/test/box/misc.result +++ b/test/box/misc.result @@ -522,6 +522,7 @@ t; 191: box.error.SQL_PARSER_LIMIT 192: box.error.INDEX_DEF_UNSUPPORTED 193: box.error.CK_DEF_UNSUPPORTED + 194: box.error.INDEX_MULTIKEY_INVALID ... test_run:cmd("setopt delimiter ''"); --- diff --git a/test/engine/json.result b/test/engine/json.result index 09c704963..6ccfe92b7 100644 --- a/test/engine/json.result +++ b/test/engine/json.result @@ -705,16 +705,3 @@ s:replace({4, {"d1", name='D1'}, "test"}) s:drop() --- ... --- --- gh-1260: Multikey indexes --- -s = box.schema.space.create('withdata') ---- -... -idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) ---- -- error: Tarantool does not support multikey indexes -... -s:drop() ---- -... diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua index 5c235e1ba..05c64ea93 100644 --- a/test/engine/json.test.lua +++ b/test/engine/json.test.lua @@ -201,10 +201,3 @@ idx0 = s:create_index('idx0', {parts = {{2, 'str', path = 'name'}, {3, "str"}}}) s:insert({4, {"d", name='D'}, "test"}) s:replace({4, {"d1", name='D1'}, "test"}) s:drop() - --- --- gh-1260: Multikey indexes --- -s = box.schema.space.create('withdata') -idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) -s:drop() diff --git a/test/engine/multikey.result b/test/engine/multikey.result new file mode 100644 index 000000000..dd6487383 --- /dev/null +++ b/test/engine/multikey.result @@ -0,0 +1,298 @@ +test_run = require('test_run').new() +--- +... +engine = test_run:get_cfg('engine') +--- +... +-- +-- gh-1260: Multikey indexes +-- +s = box.schema.space.create('withdata', {engine = 'vinyl'}) +--- +... +pk = s:create_index('pk') +--- +... +-- Vinyl's space can't be multikey (yet). +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +- error: Vinyl does not support multikey indexes +... +s:drop() +--- +... +s = box.schema.space.create('withdata', {engine = 'memtx'}) +--- +... +-- Primary index must be unique so it can't be multikey. +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': primary key + cannot be multikey' +... +pk = s:create_index('pk') +--- +... +-- Only tree index type may be mutlikey. +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': HASH index + cannot be multikey' +... +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': BITSET index + cannot be multikey' +... +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': RTREE index + cannot be multikey' +... +-- Test incompatible multikey index parts. +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) +--- +- error: 'Wrong index options (field 2): incompatable multikey index path' +... +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) +--- +- error: 'Multikey index is invalid: no more than one array index placeholder [*] + is allowed in JSON path' +... +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) +--- +... +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +- error: 'Multikey index is invalid: field 3 is already refer to document by identifier + and cannot use array index placeholder [*]' +... +idx0:drop() +--- +... +-- Unique multikey index. +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +... +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) +--- +- error: 'Multikey index is invalid: field 3 is already used with array index placeholder + [*] and cannot refer to document by identifier' +... +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) +--- +- [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) +--- +- error: Duplicate key exists in unique index 'arr_idx' in space 'withdata' +... +-- Non-unique multikey index; two multikey indexes per space. +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) +--- +... +arr_idx:select() +--- +- - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +idx:get({'James', 'Bond'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:get({'Ivan', 'Ivanych'}) +--- +- [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +idx:get({'Vasya', 'Pupkin'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +-- Duplicates in multikey parts. +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) +--- +- [5, [1, 1, 1], [{'fname': 'A', 'sname': 'B'}, {'fname': 'C', 'sname': 'D'}, {'fname': 'A', + 'sname': 'B'}]] +... +arr_idx:select({1}) +--- +- - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [5, [1, 1, 1], [{'fname': 'A', 'sname': 'B'}, {'fname': 'C', 'sname': 'D'}, { + 'fname': 'A', 'sname': 'B'}]] +... +s:delete(5) +--- +- [5, [1, 1, 1], [{'fname': 'A', 'sname': 'B'}, {'fname': 'C', 'sname': 'D'}, {'fname': 'A', + 'sname': 'B'}]] +... +-- Check that there is no garbage in index +arr_idx:select({1}) +--- +- - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:get({'A', 'B'}) +--- +... +idx:get({'C', 'D'}) +--- +... +idx:delete({'Vasya', 'Pupkin'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +--- +- [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) +--- +- [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +arr_idx:select({1}) +--- +- - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +-- Snapshot & recovery. +box.snapshot() +--- +- ok +... +test_run:cmd("restart server default") +s = box.space["withdata"] +--- +... +idx = s.index["idx"] +--- +... +arr_idx = s.index["arr_idx"] +--- +... +s:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +arr_idx:select() +--- +- - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +s:drop() +--- +... +-- Assymetric multikey index paths. +s = box.schema.space.create('withdata') +--- +... +pk = s:create_index('pk') +--- +... +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) +--- +... +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) +--- +- [1, 1, [{'fname': 'A1', 'extra': {'sname': 'A2'}}, {'fname': 'B1'}, {'fname': 'C1', + 'extra': {'sname': 'C2'}}]] +... +s:drop() +--- +... +-- Unique multikey index peculiar properties +s = box.schema.space.create('withdata') +--- +... +pk = s:create_index('pk') +--- +... +idx0 = s:create_index('idx0', {parts = {{2, 'int', path = '[*]'}}}) +--- +... +s:insert({1, {1, 1, 1}}) +--- +- [1, [1, 1, 1]] +... +s:insert({2, {2, 2}}) +--- +- [2, [2, 2]] +... +s:insert({3, {3, 3, 2, 2, 1, 1}}) +--- +- error: Duplicate key exists in unique index 'idx0' in space 'withdata' +... +idx0:get(2) +--- +- [2, [2, 2]] +... +idx0:get(1) +--- +- [1, [1, 1, 1]] +... +idx0:get(3) +--- +... +idx0:select() +--- +- - [1, [1, 1, 1]] + - [2, [2, 2]] +... +idx0:delete(2) +--- +- [2, [2, 2]] +... +idx0:get(2) +--- +... +idx0:select() +--- +- - [1, [1, 1, 1]] +... +s:drop() +--- +... diff --git a/test/engine/multikey.test.lua b/test/engine/multikey.test.lua new file mode 100644 index 000000000..71fc82a5f --- /dev/null +++ b/test/engine/multikey.test.lua @@ -0,0 +1,88 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +-- +-- gh-1260: Multikey indexes +-- +s = box.schema.space.create('withdata', {engine = 'vinyl'}) +pk = s:create_index('pk') +-- Vinyl's space can't be multikey (yet). +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +s:drop() + +s = box.schema.space.create('withdata', {engine = 'memtx'}) +-- Primary index must be unique so it can't be multikey. +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) +pk = s:create_index('pk') +-- Only tree index type may be mutlikey. +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) +-- Test incompatible multikey index parts. +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +idx0:drop() +-- Unique multikey index. +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) +-- Non-unique multikey index; two multikey indexes per space. +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) +arr_idx:select() +idx:get({'James', 'Bond'}) +idx:get({'Ivan', 'Ivanych'}) +idx:get({'Vasya', 'Pupkin'}) +idx:select() +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) +idx:select() +-- Duplicates in multikey parts. +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) +arr_idx:select({1}) +s:delete(5) +-- Check that there is no garbage in index +arr_idx:select({1}) +idx:get({'A', 'B'}) +idx:get({'C', 'D'}) +idx:delete({'Vasya', 'Pupkin'}) +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) +arr_idx:select({1}) +idx:select() +-- Snapshot & recovery. +box.snapshot() +test_run:cmd("restart server default") +s = box.space["withdata"] +idx = s.index["idx"] +arr_idx = s.index["arr_idx"] +s:select() +idx:select() +arr_idx:select() +s:drop() + +-- Assymetric multikey index paths. +s = box.schema.space.create('withdata') +pk = s:create_index('pk') +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) +s:drop() + +-- Unique multikey index peculiar properties +s = box.schema.space.create('withdata') +pk = s:create_index('pk') +idx0 = s:create_index('idx0', {parts = {{2, 'int', path = '[*]'}}}) +s:insert({1, {1, 1, 1}}) +s:insert({2, {2, 2}}) +s:insert({3, {3, 3, 2, 2, 1, 1}}) +idx0:get(2) +idx0:get(1) +idx0:get(3) +idx0:select() +idx0:delete(2) +idx0:get(2) +idx0:select() +s:drop() -- 2.21.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 7/7] box: introduce multikey indexes in memtx 2019-04-02 15:49 ` [PATCH v3 7/7] box: introduce multikey indexes in memtx Kirill Shcherbatov @ 2019-04-04 14:20 ` Vladimir Davydov 0 siblings, 0 replies; 29+ messages in thread From: Vladimir Davydov @ 2019-04-04 14:20 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches On Tue, Apr 02, 2019 at 06:49:38PM +0300, Kirill Shcherbatov wrote: > diff --git a/src/box/errcode.h b/src/box/errcode.h > index 3f8cb8e0e..ea992a6b2 100644 > --- a/src/box/errcode.h > +++ b/src/box/errcode.h > @@ -246,6 +246,7 @@ struct errcode_record { > /*191 */_(ER_SQL_PARSER_LIMIT, "%s %d exceeds the limit (%d)") \ > /*192 */_(ER_INDEX_DEF_UNSUPPORTED, "%s are prohibited in an index definition") \ > /*193 */_(ER_CK_DEF_UNSUPPORTED, "%s are prohibited in a CHECK constraint definition") \ > + /*194 */_(ER_INDEX_MULTIKEY_INVALID, "Multikey index is invalid: %s") \ Other similar errors take a path to a field as a separate argument, e.g. ER_INDEX_PART_TYPE_MISMATCH. I think this one should work the same way. > > /* > * !IMPORTANT! Please follow instructions at start of the file > diff --git a/src/box/field_map.c b/src/box/field_map.c > index 690aa461d..ae6c5be5f 100644 > --- a/src/box/field_map.c > +++ b/src/box/field_map.c > @@ -37,6 +37,8 @@ field_map_builder_create(struct field_map_builder *builder, > uint32_t minimal_field_map_size, > struct region *region) > { > + builder->region = region; Region shouldn't be a property of a builder. Pass it in function arguments where appropriate instead. > + builder->extents_size = 0; > builder->slot_count = minimal_field_map_size / sizeof(uint32_t); > if (minimal_field_map_size == 0) { > builder->slots = NULL; > @@ -53,9 +55,71 @@ field_map_builder_create(struct field_map_builder *builder, > return 0; > } > > +/** > + * Get size of extention (in bytes) by count of items it > + * must contain. > + */ > +static uint32_t > +field_map_ext_size(uint32_t items) IMO helpers like this just obfuscate the code. > +{ > + return sizeof(struct field_map_ext) + items * sizeof(uint32_t); > +} > + > +struct field_map_ext * > +field_map_builder_ext_get(struct field_map_builder *builder, > + int32_t offset_slot, uint32_t extent_items) > +{ > + struct field_map_ext *extent; > + if (builder->slots[offset_slot].has_extent) { > + extent = builder->slots[offset_slot].extent; > + assert(extent != NULL); > + assert(extent->items == extent_items); > + return extent; > + } > + uint32_t sz = field_map_ext_size(extent_items); > + extent = region_alloc(builder->region, sz); > + if (extent == NULL) { > + diag_set(OutOfMemory, sz, "region", "extent"); > + return NULL; > + } > + memset(extent, 0, sz); > + extent->items = extent_items; > + builder->slots[offset_slot].extent = extent; > + builder->slots[offset_slot].has_extent = true; > + builder->extents_size += sz; > + return extent; > +} > + > void > field_map_build(struct field_map_builder *builder, char *buffer) > { > - uint32_t field_map_size = field_map_build_size(builder); > - memcpy(buffer, (char *)builder->slots - field_map_size, field_map_size); > + /* > + * To initialize the field map and its extents, prepare > + * the following memory layout with pointers: > + * > + * offset > + * +---------------------+ > + * | | > + * [extentK]..[extent1][[slotN]..[slot2][slot1]] > + * | |extent_wptr |field_map > + * <- <- <- > + * Why allocate extents starting from the end? There's no requirement for this while IMO this makes the code more complicated. Or am I missing something? > + * The buffer size is assumed to be sufficient to write > + * field_map_build_size(builder) bytes there. > + */ > + uint32_t *field_map = > + (uint32_t *)(buffer + field_map_build_size(builder)); > + char *extent_wptr = buffer + builder->extents_size; > + for (int32_t i = -1; i >= -(int32_t)builder->slot_count; i--) { > + if (!builder->slots[i].has_extent) { > + field_map[i] = builder->slots[i].offset; > + continue; > + } > + struct field_map_ext *extent = builder->slots[i].extent; > + /** Retrive memory for the extent. */ > + uint32_t sz = field_map_ext_size(extent->items); > + extent_wptr -= sz; > + field_map[i] = (char *)field_map - (char *)extent_wptr; > + memcpy(extent_wptr, extent, sz); Please add an assertion checking the extents size. > + } > } > diff --git a/src/box/field_map.h b/src/box/field_map.h > index 47ecbd009..af578ec6b 100644 > --- a/src/box/field_map.h > +++ b/src/box/field_map.h > @@ -34,6 +34,7 @@ > #include <stdint.h> > > struct region; > +struct field_map_builder_slot; > > /** > * A field map is a special area is reserved before tuple's > @@ -46,13 +47,15 @@ struct region; > * offset_slot(s) is performed on tuple_format creation on index > * create or alter (see tuple_format_create()). > * > - * 4b 4b MessagePack data. > - * +------+----+------+---------------------------+ > - * tuple: | offN | .. | off1 | header ..|key1|..|keyN|.. | > - * +--+---+----+--+---+---------------------------+ > - * | ... | ^ ^ > - * | +-----------------+ | > - * +--------------------------------------+ > + * 4b 4b 4b 4b MessagePack data. > + * +-----------+------+----+------+------------------------+ > + *tuple: |cnt|off1|..| offN | .. | off1 | header ..|key1|..|keyN|| > + * +-----+-----+--+---+----+--+---+------------------------+ > + * ext1 ^ | | ... | ^ ^ > + * +-----|--------+ | | | > + * indirection | +-----------------+ | > + * +----------------------------------------------+ > + * (offset_slot = N, extent_slot = 1) --> offset > * > * This field_map_builder class is used for tuple field_map > * construction. It encapsulates field_map build logic and size > @@ -60,19 +63,88 @@ struct region; > * > * Each field offset is a positive number, except the case when > * a field is not in the tuple. In this case offset is 0. > + * > + * Some slots may store an offset of the field_map_ext structure, "Some slots may store" - sounds ambiguous. Let's rephrase the comment to emphasize that it's only used for multikey indexes? > + * which contains an additional sequence of offsets of size > + * defined above(see field_map_ext layout). The caller needs to > + * be aware of when the slot is an offset of the data and when > + * it is the offset of the field_map_ext. I don't like that it's impossible to say which slots are multikey and which are not by looking at a field map. It may result in funny memory corruption bugs. I think we should mark multikey slots somehow and add appropriate assertions to field_map_get_offset. E.g. write negative offsets there. Would it make sense or I'm over-engineering? > + * > + * Now these extents are used to organize a multikey index. > + * The count of keys in the multikey index imposes the count of "imposes"? > + * items in extent while the i-th extent's slot contains the > + * offset of the i-th key field. > */ > struct field_map_builder { > /** > - * The pointer to the end of filed_map allocation. Typo: filed -> field. Please fix the previous commit. > + * The pointer to the end of field_map_builder_slot(s) > + * allocation. Why change this comment at all? It's still correct. > * Its elements are accessible by negative indexes > * that coinciding with offset_slot(s). > */ > - uint32_t *slots; > + struct field_map_builder_slot *slots; > /** > * The count of slots in field_map_builder::slots > * allocation. > */ > uint32_t slot_count; > + /** > + * Total size of memory is allocated for field_map > + * extentions. There's no word "extention". Let's use "extent" everywhere, including field_map_ext. > + */ > + uint32_t extents_size; > + /** > + * Region to use to perform memory allocations. > + */ > + struct region *region; > +}; > + > +/** > + * Internal stucture representing field_map extent. > + * (see field_map_builder description). > + */ > +struct field_map_ext { > + /** Count of field_map_ext::offset[] elements. */ > + uint32_t items; > + /** Data offset in tuple array. */ > + uint32_t offset[0]; > +}; > + > +/** > + * Internal function to get or allocate field map extent > + * by offset_slot, and count of items. > + */ > +struct field_map_ext * > +field_map_builder_ext_get(struct field_map_builder *builder, > + int32_t offset_slot, uint32_t extent_items); > + > +/** > + * Instead of using uint32_t offset slots directly the > + * field_map_builder uses this structure as a storage atom. > + * When there is a need to initialize an extent, the > + * field_map_builder allocates a new memory chunk and sets > + * field_map_builder_slot::pointer (instead of real field_map > + * reallocation). > + * > + * On field_map_build, all of the extents are dumped to the same > + * memory chunk that the regular field_map slots and corresponding > + * slots represent relative field_map_ext offset instead of > + * field data offset. > + * > + * The allocated memory is accounted for in extents_size. > + */ > +struct field_map_builder_slot { > + /** > + * True when this slot must be interpret as > + * extention pointer. > + */ > + bool has_extent; > + union { > + /** Data offset in tuple. */ > + uint32_t offset; > + /** Pointer to field_map_ext extention. */ > + struct field_map_ext *extent; > + }; > }; > > /** > @@ -82,9 +154,22 @@ struct field_map_builder { > * When a field is not in the data tuple, its offset is 0. > */ > static inline uint32_t > -field_map_get_offset(const uint32_t *field_map, int32_t offset_slot) > +field_map_get_offset(const uint32_t *field_map, int32_t offset_slot, > + int multikey_idx) > { > - return field_map[offset_slot]; > + uint32_t offset; > + if (multikey_idx >= 0) { > + assert(field_map[offset_slot] != 0); What if a multikey field is absent (e.g. nullable)? Can it actually happen? > + struct field_map_ext *extent = > + (struct field_map_ext *)((char *)field_map - > + field_map[offset_slot]); > + if ((uint32_t)multikey_idx >= extent->items) > + return 0; I'm not sure that using field_map_ext is a good idea. See, we don't have a separate data structure for field map, still we have one for an extent. What is more confusing, we use the same data structure both in builder and here. Since this code wouldn't look more complex without field_map_ext, I'd rename field_map_ext to field_map_builder_slot_extent and use it only in the builder code while accessing the field map directly here. > + offset = extent->offset[multikey_idx]; > + } else { > + offset = field_map[offset_slot]; > + } > + return offset; > } > > /** > @@ -116,7 +201,32 @@ field_map_builder_set_slot(struct field_map_builder *builder, > { > assert(offset_slot < 0); > assert(offset > 0); > - builder->slots[offset_slot] = offset; > + builder->slots[offset_slot].offset = offset; > +} > + > +/** > + * Set data offset in field map extent (by given offset_slot, > + * extent_slot and extent_items) for a field identified by > + * unique offset_slot. > + * > + * The offset_slot argument must be negative and offset must be > + * positive (by definition). > + */ > +static inline int > +field_map_builder_set_extent_slot(struct field_map_builder *builder, IMO better call it field_map_builder_set_multikey to emphasize that it only makes sense for multikey indexes. > + int32_t offset_slot, int32_t extent_slot, extent_slot is called multikey_idx everywhere else and has type int. I'd use the same name here. > + uint32_t extent_items, uint32_t offset) Don't quite like 'items' - sounds like it's an array or a list, not a number. What about multikey_count: field_map_builder_set_slot_multikey(builder, int32_t offset_slot, uint32_t offset, int multikey_idx, int multikey_count); > +{ > + assert(offset_slot < 0); > + assert(offset > 0); > + assert(extent_slot >= 0 && extent_items > 0); Would be more useful to ensure that 'multikey_idx' is less than 'multikey_count'. > + struct field_map_ext *extent = > + field_map_builder_ext_get(builder, offset_slot, extent_items); This is an unconditional function call on a hot path. I'd create a function to allocate a multikey slot instead and call it only if the slot is empty. > + if (extent == NULL) > + return -1; > + assert(extent->items == extent_items); > + extent->offset[extent_slot] = offset; > + return 0; > } > > /** > @@ -125,7 +235,8 @@ field_map_builder_set_slot(struct field_map_builder *builder, > static inline uint32_t > field_map_build_size(struct field_map_builder *builder) > { > - return builder->slot_count * sizeof(uint32_t); > + return builder->slot_count * sizeof(uint32_t) + > + builder->extents_size; > } > > /** > diff --git a/src/box/key_def.c b/src/box/key_def.c > index 55dcf1eb5..4d36560e0 100644 > --- a/src/box/key_def.c > +++ b/src/box/key_def.c > @@ -138,7 +138,57 @@ key_def_set_func(struct key_def *def) > key_def_set_extract_func(def); > } > > -static void > +static int > +key_def_set_part_path(struct key_def *def, uint32_t part_no, const char *path, > + uint32_t path_len, char **path_pool) > +{ > + struct key_part *part = &def->parts[part_no]; > + if (path == NULL) { > + part->path = NULL; > + part->path_len = 0; > + return 0; > + } > + assert(path_pool != NULL); > + part->path = *path_pool; > + *path_pool += path_len; > + memcpy(part->path, path, path_len); > + part->path_len = path_len; > + > + /* > + * Test whether this key_part has array index > + * placeholder [*] (i.e. is a part of multikey index > + * definition). > + */ > + int multikey_path_len = > + json_path_multikey_offset(path, path_len, TUPLE_INDEX_BASE); > + if (multikey_path_len < 0) > + return 0; Please move the check that no part contains two [*] tokens here so that one can't create an invalid key_def, even temporarily. The check that we don't index a multikey path with a normal index is OK to live in tuple format creation code, because it's in fact a limitation of the tuple field map format. > + > + /* > + * In case of multikey index key_parts must have the > + * same JSON prefix. > + */ > + if (!key_def_is_multikey(def)) { Please check def->multikey_path here directly, without the aid of this helper, since you set it directly below. > + /* > + * Keep the index of the first multikey key_part > + * and the length of JSON path substring to the > + * array index placeholder sign [*]. > + */ > + def->multikey_path = part->path; > + def->multikey_fieldno = part->fieldno; > + def->multikey_path_len = (uint32_t)multikey_path_len; > + } else if (json_path_cmp(path, multikey_path_len, def->multikey_path, > + def->multikey_path_len, > + TUPLE_INDEX_BASE) != 0) { > + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, > + part_no + TUPLE_INDEX_BASE, > + "incompatable multikey index path"); > + return -1; > + } > + return 0; > +} > @@ -256,11 +301,12 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) > key_def->unique_part_count = part_count; > > for (uint32_t item = 0; item < part_count; ++item) { > - key_def_set_part(key_def, item, fields[item], > - (enum field_type)types[item], > - ON_CONFLICT_ACTION_DEFAULT, > - NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0, > - NULL, TUPLE_OFFSET_SLOT_NIL, 0); > + if (key_def_set_part(key_def, item, fields[item], > + (enum field_type)types[item], > + ON_CONFLICT_ACTION_DEFAULT, NULL, > + COLL_NONE, SORT_ORDER_ASC, NULL, 0, NULL, > + TUPLE_OFFSET_SLOT_NIL, 0) != 0) > + unreachable(); Please don't use unreachable(). Handle errors instead so that if we add some more checks to key_def_set_part we don't have patch these other places. > diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c > index fe037c54a..fa8a04664 100644 > --- a/src/box/memtx_tree.c > +++ b/src/box/memtx_tree.c > @@ -656,34 +727,42 @@ memtx_tree_index_reserve(struct index *base, uint32_t size_hint) > } > > static int > -memtx_tree_index_build_next(struct index *base, struct tuple *tuple) > +memtx_tree_index_build_array_realloc(struct memtx_tree_index *index, > + uint32_t items) Please refactor it in a slightly different manner: instead of this cumbersome realloc helper, simply add a function that sets a particular slot, reallocating the array if necessary: memtx_tree_index_build_array_append(index, tuple, hint) > diff --git a/src/box/sql.c b/src/box/sql.c > index 7d5c3a8e0..be2212ef1 100644 > --- a/src/box/sql.c > +++ b/src/box/sql.c > @@ -788,7 +788,8 @@ tarantoolsqlIdxKeyCompare(struct BtCursor *cursor, > } else { > uint32_t field_offset = > field_map_get_offset(field_map, > - field->offset_slot); > + field->offset_slot, > + -1); > p = base + field_offset; > } > } > diff --git a/src/box/tuple.c b/src/box/tuple.c > index 570e4c192..8c2cd7611 100644 > --- a/src/box/tuple.c > +++ b/src/box/tuple.c > @@ -130,7 +130,7 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) > int rc = tuple_field_map_create(&builder, format, tuple, true); > region_truncate(region, region_svp); > assert(rc != 0 || > - field_map_build_size(&builder) == format->field_map_size); > + field_map_build_size(&builder) >= format->field_map_size); This assertion is pointless. Please remove it (in the previous patch). > return rc; > } > > @@ -455,7 +455,8 @@ tuple_field_go_to_key(const char **field, const char *key, int len) > } > > int > -tuple_go_to_path(const char **data, const char *path, uint32_t path_len) > +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, > + int multikey_idx) > { > int rc; > struct json_lexer lexer; > @@ -463,6 +464,10 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len) > json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); > while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { > switch (token.type) { > + case JSON_TOKEN_ANY: > + assert(multikey_idx >= 0); > + token.num = multikey_idx; > + FALLTHROUGH; The function must not crash if there's no mutlikey_idx - it should return nil: t = box.tuple.new{1, {a = 1, b = 2}, 3} t[2][*] // crash! Please fix and add a test case. > diff --git a/src/box/tuple.h b/src/box/tuple.h > index da4085bcf..99753c8d5 100644 > --- a/src/box/tuple.h > +++ b/src/box/tuple.h > @@ -674,6 +702,35 @@ tuple_field_by_part(const struct tuple *tuple, struct key_part *part) > tuple_field_map(tuple), part); > } > > +/** > + * Get count of documents in tuple by given multikey index > + * definition. > + * @param format Tuple format. > + * @param data A pointer to MessagePack array. > + * @param field_map A pointer to the LAST element of field map. > + * @param key_def Index key_definition. > + * @retval Count of documents in the multikey index. > + */ > +uint32_t > +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, > + const uint32_t *field_map, > + struct key_def *key_def); > + > +/** > + * Get count of documents in tuple by given multikey index > + * definition. > + * @param tuple Tuple to get the count of documents from. > + * @param key_def Index key_definition. > + * @retval Count of documents in the multikey index. What's a 'document'? I wouldn't introduce a new term. > + */ > +static inline uint32_t > +tuple_field_multikey_items(struct tuple *tuple, struct key_def *key_def) Don't like the name. This function doesn't return a tuple field so using tuple_field_ prefix is wrong IMO. And again 'items' sounds like it returns a set of items, not a count. What about tuple_multikey_count? Do you have better alternatives in mind? > +{ > + return tuple_field_raw_multikey_items(tuple_format(tuple), > + tuple_data(tuple), > + tuple_field_map(tuple), key_def); > +} > + > /** > * @brief Tuple Interator > */ > diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc > index 93756365b..bf3a23097 100644 > --- a/src/box/tuple_compare.cc > +++ b/src/box/tuple_compare.cc > @@ -455,8 +479,10 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, > assert(!has_optional_parts || is_nullable); > assert(is_nullable == key_def->is_nullable); > assert(has_optional_parts == key_def->has_optional_parts); > - int rc = hint_cmp(tuple_a_hint, tuple_b_hint); > - if (rc != 0) > + assert(key_def_is_multikey(key_def) == is_multikey); > + assert(!is_multikey || is_multikey == has_json_paths); Please add an assertion checking that a hint cannot be HINT_NONE for a multikey index. Here and everywhere else where applicable. > @@ -1314,6 +1365,16 @@ static const comparator_with_key_signature cmp_wk_arr[] = { > /* {{{ tuple_hint */ > > /** > + * Hints are now used for two purposes - passing the index of the > + * key used in the case of multikey index and to speed up the > + * comparators. > + * > + * Scenario I. Pass the multikey index of the key to comparator. > + * In the case of multikey index arises an ambiguity: which key > + * should be used in the comparison. Hints act as an non-negative > + * numeric index of key to use. > + * > + * Scenario II. Speed up comparators. Please move this comment to hint_t definition. Here just say that hint comparison only makes sense for non-multikey indexes. > * A comparison hint is an unsigned integer number that has > * the following layout: > * > @@ -1611,12 +1672,35 @@ tuple_hint(const struct tuple *tuple, struct key_def *key_def) > return field_hint<type, is_nullable>(field, key_def->parts->coll); > } > > +static hint_t > +key_hint_stub(const char *key, uint32_t part_count, struct key_def *key_def) key_hint_multikey? > +{ > + (void) key; > + (void) part_count; > + (void) key_def; > + return HINT_NONE; Please add a comment explaining why this function can actually be called while tuple_hint_multikey cannot. > +} > + > +static hint_t > +tuple_hint_stub(const struct tuple *tuple, struct key_def *key_def) > +{ > + (void) tuple; > + (void) key_def; > + unreachable(); > + return HINT_NONE; > +} > + > template<enum field_type type, bool is_nullable> > static void > key_def_set_hint_func(struct key_def *def) > { > - def->key_hint = key_hint<type, is_nullable>; > - def->tuple_hint = tuple_hint<type, is_nullable>; > + if (key_def_is_multikey(def)) { > + def->key_hint = key_hint_stub; > + def->tuple_hint = tuple_hint_stub; This should be moved upper level, to key_def_set_hint_func - no need in all these branching to just set it to the stub. > + } else { > + def->key_hint = key_hint<type, is_nullable>; > + def->tuple_hint = tuple_hint<type, is_nullable>; > + } > } > > template<enum field_type type> > diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc > index 836d4e565..081677ff6 100644 > --- a/src/box/tuple_extract_key.cc > +++ b/src/box/tuple_extract_key.cc > @@ -369,11 +393,16 @@ static void > key_def_set_extract_func_json(struct key_def *def) > { > assert(def->has_json_paths); > - def->tuple_extract_key = tuple_extract_key_slowpath > - <contains_sequential_parts, > - has_optional_parts, true>; > - def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw > - <has_optional_parts, true>; > + if (!key_def_is_multikey(def)) { > + def->tuple_extract_key = tuple_extract_key_slowpath > + <contains_sequential_parts, > + has_optional_parts, true>; > + def->tuple_extract_key_raw = tuple_extract_key_slowpath_raw > + <has_optional_parts, true>; > + } else { > + def->tuple_extract_key = tuple_extract_key_stub; > + def->tuple_extract_key_raw = tuple_extract_key_raw_stub; > + } Come to think of it, let's add assert(!key_def_is_multikey(key_def)); to original extractors rather than adding those stubs. > } > > void > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 9a643b700..33d6db420 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -194,6 +194,50 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id) > return NULL; > } > > +/** > + * Check if child_field may be attached to parent_field, > + * update the parent_field container type if required. > + */ > +static int > +tuple_format_field_update_type(struct tuple_field *parent_field, > + struct tuple_field *child_field) The name's kinda confusing - this function doesn't necessarily update a field type - it rather checks field compatibility. Please come up with a better name. > +{ > + enum field_type expected_type = > + child_field->token.type == JSON_TOKEN_STR ? > + FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; > + if (child_field->token.type == JSON_TOKEN_ANY && > + !json_token_is_multikey(&parent_field->token) && > + !json_token_is_leaf(&parent_field->token)) { > + assert(expected_type == FIELD_TYPE_ARRAY); > + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, > + tt_sprintf("field %s is already refer to document by " > + "identifier and cannot use array index " > + "placeholder [*]", > + tuple_field_path(parent_field))); > + return -1; > + } > + if (json_token_is_multikey(&parent_field->token) && > + child_field->token.type != JSON_TOKEN_ANY) { Bad indentation. > + assert(expected_type == FIELD_TYPE_ARRAY); > + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, > + tt_sprintf("field %s is already used with array index " > + "placeholder [*] and cannot refer to " > + "document by identifier", > + tuple_field_path(parent_field))); These error messages look bad to me. And the code is difficult for understanding as well and needs a comment explaining what exactly we prohibit here and why. What about the following error message (the same for both cases)? ER_MULTIKEY_INDEX_MISMATCH Field %s is used as multikey in one index and as single key in another. Please think about it. > + return -1; > + } > + if (field_type1_contains_type2(parent_field->type, expected_type)) { > + parent_field->type = expected_type; > + } else { > + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, > + tuple_field_path(parent_field), > + field_type_strs[parent_field->type], > + field_type_strs[expected_type]); > + return -1; > + } > + return 0; > +} > + > /** > * Given a field number and a path, add the corresponding field > * to the tuple format, allocating intermediate fields if > @@ -280,6 +323,13 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, > assert(parent != NULL); > /* Update tree depth information. */ > format->fields_depth = MAX(format->fields_depth, token_count + 1); > + if (json_token_any_count > 1) { > + assert(path_len > 0); > + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, > + "no more than one array index placeholder [*] is " > + "allowed in JSON path"); > + goto fail; > + } As I wrote above, this check should be done in key_def_new. > diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > index 88f03d5eb..d07ca91eb 100644 > --- a/src/box/tuple_format.h > +++ b/src/box/tuple_format.h > @@ -437,6 +437,12 @@ struct tuple_parse_iterator { > * pointer accordingly. > */ > struct mp_stack stack; > + /** > + * The pointer to the stack frame representing an array > + * filed that has JSON_TOKEN_ANY child, i.e. the root > + * of the multikey index. > + */ > + struct mp_frame *multikey_frame; > /** The current read position in msgpack. */ > const char *pos; > }; > @@ -479,6 +485,32 @@ tuple_parse_iterator_advice(struct tuple_parse_iterator *it, > struct tuple_field **field, const char **data, > const char **data_end); > > +/** > + * Returns non-negative multikey index comparison hint when > + * the iterator's position in the format::fields subtree > + * corresponds to the multikey subtree and -1 otherwise. > + */ > +static inline int > +tuple_parse_iterator_multikey_idx(struct tuple_parse_iterator *it) > +{ > + if (it->multikey_frame == NULL) > + return -1; > + return it->multikey_frame->curr; > +} I don't like that some variables are passed as iteration function arguments, some are returned by separate functions. Why not access them all directly, for instance? But then again, the iterator API is flawed and we need to put more thoughts in how to improve it. > diff --git a/test/engine/multikey.test.lua b/test/engine/multikey.test.lua > new file mode 100644 > index 000000000..71fc82a5f > --- /dev/null > +++ b/test/engine/multikey.test.lua > @@ -0,0 +1,88 @@ > +test_run = require('test_run').new() > +engine = test_run:get_cfg('engine') > + > +-- > +-- gh-1260: Multikey indexes > +-- > +s = box.schema.space.create('withdata', {engine = 'vinyl'}) > +pk = s:create_index('pk') > +-- Vinyl's space can't be multikey (yet). > +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) > +s:drop() > + > +s = box.schema.space.create('withdata', {engine = 'memtx'}) > +-- Primary index must be unique so it can't be multikey. > +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) > +pk = s:create_index('pk') > +-- Only tree index type may be mutlikey. > +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) > +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) > +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) > +-- Test incompatible multikey index parts. > +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) > +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) > +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) > +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) > +idx0:drop() > +-- Unique multikey index. > +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) > +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) > +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) > +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) > +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) > +-- Non-unique multikey index; two multikey indexes per space. > +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) > +arr_idx:select() > +idx:get({'James', 'Bond'}) > +idx:get({'Ivan', 'Ivanych'}) > +idx:get({'Vasya', 'Pupkin'}) > +idx:select() > +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) > +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) > +idx:select() > +-- Duplicates in multikey parts. > +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) > +arr_idx:select({1}) > +s:delete(5) > +-- Check that there is no garbage in index > +arr_idx:select({1}) > +idx:get({'A', 'B'}) > +idx:get({'C', 'D'}) > +idx:delete({'Vasya', 'Pupkin'}) > +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) > +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) > +arr_idx:select({1}) > +idx:select() > +-- Snapshot & recovery. > +box.snapshot() > +test_run:cmd("restart server default") > +s = box.space["withdata"] > +idx = s.index["idx"] > +arr_idx = s.index["arr_idx"] > +s:select() > +idx:select() > +arr_idx:select() > +s:drop() > + > +-- Assymetric multikey index paths. > +s = box.schema.space.create('withdata') > +pk = s:create_index('pk') > +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) > +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) > +s:drop() > + > +-- Unique multikey index peculiar properties > +s = box.schema.space.create('withdata') Use proper test_run.engine for creating all spaces. Simply disable vinyl in suite.ini for now. > +pk = s:create_index('pk') > +idx0 = s:create_index('idx0', {parts = {{2, 'int', path = '[*]'}}}) > +s:insert({1, {1, 1, 1}}) > +s:insert({2, {2, 2}}) > +s:insert({3, {3, 3, 2, 2, 1, 1}}) > +idx0:get(2) > +idx0:get(1) > +idx0:get(3) > +idx0:select() > +idx0:delete(2) > +idx0:get(2) > +idx0:select() > +s:drop() Add many more tests. Check that update/upsert/replace all work (currently you only check insert/delete). Check reverse iterators. Check space.format() and index.alter(). Check nullable with an absent field. Check everything else you can think of to make sure there's no bugs. Check creation of an index after some tuples have been inserted - it crashes BTW: tarantool> s = box.schema.space.create('test') --- ... tarantool> i = s:create_index('pk') --- ... tarantool> s:replace{1, {{1, 1}} > } --- - [1, [[1, 1]]] ... tarantool> s:replace{2, {{2, 3}}} --- - [2, [[2, 3]]] ... tarantool> i2 = s:create_index('sk', {parts = {{2, 'unsigned', path = '[*]'}}}) --- - error: 'Tuple field [2][*] type does not match one required by operation: expected unsigned' ... tarantool> i2 = s:create_index('sk', {parts = {{2, 'unsigned', path = '[1][*]'}}}) tarantool: /home/vlad/src/tarantool/src/box/memtx_space.c:928: memtx_space_build_index: Assertion `old_tuple == NULL' failed. Aborted ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2019-04-07 12:22 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-02 15:49 [PATCH v3 0/7] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 1/7] box: cleanup key_def virtual extract_key setter Kirill Shcherbatov 2019-04-03 12:42 ` Vladimir Davydov 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-03 18:01 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 2/7] lib: introduce a new json_path_multikey_offset helper Kirill Shcherbatov 2019-04-03 12:56 ` Vladimir Davydov 2019-04-03 16:22 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-03 18:02 ` Vladimir Davydov 2019-04-04 6:17 ` Konstantin Osipov 2019-04-02 15:49 ` [PATCH v3 3/7] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov 2019-04-03 12:57 ` Vladimir Davydov 2019-04-03 18:02 ` Vladimir Davydov 2019-04-04 6:19 ` [tarantool-patches] " Konstantin Osipov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 4/7] lib: update msgpuck library Kirill Shcherbatov 2019-04-03 17:49 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-04 15:54 ` Vladimir Davydov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-07 12:22 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 5/7] box: introduce tuple_parse_iterator class Kirill Shcherbatov 2019-04-03 14:04 ` Vladimir Davydov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-02 15:49 ` [PATCH v3 6/7] box: introduce field_map_builder class Kirill Shcherbatov 2019-04-03 14:38 ` Vladimir Davydov 2019-04-05 17:17 ` [tarantool-patches] " Kirill Shcherbatov 2019-04-03 16:30 ` Vladimir Davydov 2019-04-02 15:49 ` [PATCH v3 7/7] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-04-04 14:20 ` Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox