* [PATCH v2 0/5] box: introduce multikey indexes in memtx @ 2019-03-19 12:32 Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 1/5] lib: introduce json_path_is_multikey helper Kirill Shcherbatov ` (5 more replies) 0 siblings, 6 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 12:32 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Multikey indexes allows you to automatically index set of documents by JSON paths having array index placeholder "[*]". Multikey index cannot be primary as it cannot be unique(by definition). Multikey index parts must be compatible: only one "[*]" placeholder is allowed in same position(for all JSON paths in index parts). Changes in version 2: - introduced field_map_builder class to perform field_map preparation - reworked code: new helpers json_path_is_multikey and new flag json_path_cmp(is_weak_cmp = true) - rebased to actual tuple hints head - new tests http://github.com/tarantool/tarantool/tree/kshch/gh-1257-multikey-indexes https://github.com/tarantool/tarantool/issues/1257 Kirill Shcherbatov (5): lib: introduce json_path_is_multikey helper lib: introduce is_weak_cmp flag for json_path_cmp box: move offset_slot init to tuple_format_add_field box: introduce field_map_builder for field_map init box: introduce multikey indexes in memtx src/box/CMakeLists.txt | 1 + src/box/errcode.h | 1 + src/box/index_def.c | 27 ++- src/box/key_def.c | 14 +- src/box/key_def.h | 18 ++ src/box/memtx_engine.c | 10 +- src/box/memtx_space.c | 18 ++ src/box/memtx_tree.c | 183 ++++++++++++++--- src/box/tuple.c | 40 +++- src/box/tuple.h | 111 +++++++++-- src/box/tuple_compare.cc | 87 +++++++-- src/box/tuple_extract_key.cc | 2 +- src/box/tuple_field_map.c | 80 ++++++++ src/box/tuple_field_map.h | 163 ++++++++++++++++ src/box/tuple_format.c | 315 ++++++++++++++++++------------ src/box/tuple_format.h | 7 +- src/box/vinyl.c | 8 +- src/box/vy_stmt.c | 9 +- src/lib/json/json.c | 25 ++- src/lib/json/json.h | 14 +- test/box/misc.result | 1 + test/engine/json.result | 13 -- test/engine/json.test.lua | 7 - test/engine/multikey_idx.result | 228 +++++++++++++++++++++ test/engine/multikey_idx.test.lua | 67 +++++++ test/unit/json.c | 20 +- test/unit/json.result | 7 +- 27 files changed, 1242 insertions(+), 234 deletions(-) create mode 100644 src/box/tuple_field_map.c create mode 100644 src/box/tuple_field_map.h create mode 100644 test/engine/multikey_idx.result create mode 100644 test/engine/multikey_idx.test.lua -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/5] lib: introduce json_path_is_multikey helper 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov @ 2019-03-19 12:32 ` Kirill Shcherbatov 2019-03-19 15:43 ` [tarantool-patches] " Konstantin Osipov 2019-03-19 12:32 ` [PATCH v2 2/5] lib: introduce is_weak_cmp flag for json_path_cmp Kirill Shcherbatov ` (4 subsequent siblings) 5 siblings, 1 reply; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 12:32 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Introduced a new json_path_is_multikey routine to test if JSON path contains array index placeholder JSON_TOKEN_ANY and return multikey_path_len if so. Needed for #1257 --- src/lib/json/json.c | 21 +++++++++++++++++++++ src/lib/json/json.h | 8 ++++++++ test/unit/json.c | 9 ++++++++- test/unit/json.result | 5 ++++- 4 files changed, 41 insertions(+), 2 deletions(-) diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 854158f63..568e9c246 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -321,6 +321,27 @@ json_path_validate(const char *path, int path_len, int index_base) return rc; } +bool +json_path_is_multikey(const char *path, int path_len, int *multikey_path_len, + int index_base) +{ + struct json_lexer lexer; + json_lexer_create(&lexer, path, path_len, index_base); + struct json_token token; + int rc, last_lexer_offset = 0; + while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { + if (token.type == JSON_TOKEN_ANY) { + *multikey_path_len = last_lexer_offset; + return true; + } else if (token.type == JSON_TOKEN_END) { + break; + } + last_lexer_offset = lexer.offset; + } + assert(rc == 0); + return false; +} + /** * An snprint-style helper to print an individual token key. */ diff --git a/src/lib/json/json.h b/src/lib/json/json.h index c1de3e579..4c1bc18bc 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -259,6 +259,14 @@ json_path_cmp(const char *a, int a_len, const char *b, int b_len, int json_path_validate(const char *path, int path_len, int index_base); +/** + * Test if the passed JSON path contains array index placeholder + * [*] and return length of the subpath before first "[*]" token. + */ +bool +json_path_is_multikey(const char *path, int path_len, int *multikey_path_len, + int index_base); + /** * Test if a given JSON token is a JSON tree leaf, i.e. * has no child nodes. diff --git a/test/unit/json.c b/test/unit/json.c index fd320c9eb..1db7e9364 100644 --- a/test/unit/json.c +++ b/test/unit/json.c @@ -479,7 +479,7 @@ test_path_cmp() {"Data[1][\"Info\"].fname[1]", -1}, }; header(); - plan(lengthof(rc) + 3); + plan(lengthof(rc) + 6); for (size_t i = 0; i < lengthof(rc); ++i) { const char *path = rc[i].path; int errpos = rc[i].errpos; @@ -503,6 +503,13 @@ test_path_cmp() ret = json_path_validate(invalid, strlen(invalid), INDEX_BASE); is(ret, 6, "path %s error pos %d expected %d", invalid, ret, 6); + is(json_path_is_multikey(a, a_len, &ret, INDEX_BASE), false, + "test json_path_is_multikey: non-multikey path"); + is(json_path_is_multikey(multikey_b, strlen(multikey_b), &ret, + INDEX_BASE), true, + "test json_path_is_multikey: multikey path"); + is(ret, 8, "test json_path_is_multikey: multikey path %d/%d", ret, 8); + check_plan(); footer(); } diff --git a/test/unit/json.result b/test/unit/json.result index 3a15e84bb..0da6c8c6b 100644 --- a/test/unit/json.result +++ b/test/unit/json.result @@ -170,7 +170,7 @@ ok 2 - subtests ok 3 - subtests *** test_tree: done *** *** test_path_cmp *** - 1..8 + 1..11 ok 1 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname": have 0, expected 0 ok 2 - path cmp result "Data[1]["FIO"].fname" with "["Data"][1].FIO["fname"]": have 0, expected 0 ok 3 - path cmp result "Data[1]["FIO"].fname" with "Data[1]": have 1, expected 1 @@ -179,6 +179,9 @@ ok 3 - subtests ok 6 - path cmp result "Data[*]["FIO"].fname[*]" with "["Data"][*].FIO["fname"][*]": have 0, expected 0 ok 7 - path Data[1]["FIO"].fname is valid ok 8 - path Data[[1]["FIO"].fname error pos 6 expected 6 + ok 9 - test json_path_is_multikey: non-multikey path + ok 10 - test json_path_is_multikey: multikey path + ok 11 - test json_path_is_multikey: multikey path 8/8 ok 4 - subtests *** test_path_cmp: done *** *** test_path_snprint *** -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tarantool-patches] [PATCH v2 1/5] lib: introduce json_path_is_multikey helper 2019-03-19 12:32 ` [PATCH v2 1/5] lib: introduce json_path_is_multikey helper Kirill Shcherbatov @ 2019-03-19 15:43 ` Konstantin Osipov 2019-04-02 15:49 ` [tarantool-patches] " Kirill Shcherbatov 0 siblings, 1 reply; 13+ messages in thread From: Konstantin Osipov @ 2019-03-19 15:43 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov * Kirill Shcherbatov <kshcherbatov@tarantool.org> [19/03/19 16:15]: > Introduced a new json_path_is_multikey routine to test if > JSON path contains array index placeholder JSON_TOKEN_ANY and > return multikey_path_len if so. > + ok 9 - test json_path_is_multikey: non-multikey path > + ok 10 - test json_path_is_multikey: multikey path > + ok 11 - test json_path_is_multikey: multikey path 8/8 This coverage is insufficient. You should at least test an empty path, a path with * in the beginning, middle and end. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v2 1/5] lib: introduce json_path_is_multikey helper 2019-03-19 15:43 ` [tarantool-patches] " Konstantin Osipov @ 2019-04-02 15:49 ` Kirill Shcherbatov 0 siblings, 0 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, Konstantin Osipov; +Cc: vdavydov.dev Hi! Thank you for participating. We decided to introduce the other helper json_multikey_path_offset /** * Scan the JSON path string and return the offset of the first * character [*] (the array index placeholder). * - if [*] is not found, -1 is returned. * - specified JSON path must be valid * (may be tested with json_path_validate). */ int json_path_multikey_offset(const char *path, int path_len, int index_base); > This coverage is insufficient. You should at least test an empty > path, a path with * in the beginning, middle and end. I tried to take all the cases into account. In fact, this is not really necessary because corner cases already well covered by tests (for json_lexer_next_token). static struct { const char *str; int rc; } test_cases[] = { {"", -1}, {"[1]Data[1]extra[1]", -1}, {"[*]Data[1]extra[1]", 0}, {"[*]Data[*]extra[1]", 0}, {"[1]Data[*]extra[1]", 7}, {"[1]Data[1]extra[*]", 15}, }; I've resend the whole patchset. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 2/5] lib: introduce is_weak_cmp flag for json_path_cmp 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 1/5] lib: introduce json_path_is_multikey helper Kirill Shcherbatov @ 2019-03-19 12:32 ` Kirill Shcherbatov 2019-03-19 15:47 ` [tarantool-patches] " Konstantin Osipov 2019-03-19 12:32 ` [PATCH v2 3/5] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov ` (3 subsequent siblings) 5 siblings, 1 reply; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 12:32 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Introduced is_weak_cmp flag for json_path_cmp routine: when this flag is set, strings are equal when whole b path is an a subpath (in terms of tokens). Needed for #1257 --- src/box/index_def.c | 2 +- src/box/key_def.c | 4 ++-- src/box/memtx_engine.c | 2 +- src/box/vinyl.c | 2 +- src/lib/json/json.c | 4 +++- src/lib/json/json.h | 6 +++++- test/unit/json.c | 13 ++++++++++--- test/unit/json.result | 4 +++- 8 files changed, 26 insertions(+), 11 deletions(-) diff --git a/src/box/index_def.c b/src/box/index_def.c index 6c37f9f1d..f0c7dc4f6 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -285,7 +285,7 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) if (part_a->fieldno == part_b->fieldno && json_path_cmp(part_a->path, part_a->path_len, part_b->path, part_b->path_len, - TUPLE_INDEX_BASE) == 0) { + false, TUPLE_INDEX_BASE) == 0) { diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, space_name, "same key part is indexed twice"); diff --git a/src/box/key_def.c b/src/box/key_def.c index 55dcf1eb5..f9e464402 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -311,7 +311,7 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, key_part_is_nullable(part2) ? -1 : 1; int rc = json_path_cmp(part1->path, part1->path_len, part2->path, part2->path_len, - TUPLE_INDEX_BASE); + false, TUPLE_INDEX_BASE); if (rc != 0) return rc; } @@ -606,7 +606,7 @@ key_def_find(const struct key_def *key_def, const struct key_part *to_find) if (part->fieldno == to_find->fieldno && json_path_cmp(part->path, part->path_len, to_find->path, to_find->path_len, - TUPLE_INDEX_BASE) == 0) + false, TUPLE_INDEX_BASE) == 0) return part; } return NULL; diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index d468d1cd8..3e959ff2e 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1317,7 +1317,7 @@ memtx_index_def_change_requires_rebuild(struct index *index, return true; if (json_path_cmp(old_part->path, old_part->path_len, new_part->path, new_part->path_len, - TUPLE_INDEX_BASE) != 0) + false, TUPLE_INDEX_BASE) != 0) return true; } return false; diff --git a/src/box/vinyl.c b/src/box/vinyl.c index c2b7eb78f..78f29c624 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -1013,7 +1013,7 @@ vinyl_index_def_change_requires_rebuild(struct index *index, return true; if (json_path_cmp(old_part->path, old_part->path_len, new_part->path, new_part->path_len, - TUPLE_INDEX_BASE) != 0) + false, TUPLE_INDEX_BASE) != 0) return true; } return false; diff --git a/src/lib/json/json.c b/src/lib/json/json.c index 568e9c246..4e1efb7a9 100644 --- a/src/lib/json/json.c +++ b/src/lib/json/json.c @@ -281,7 +281,7 @@ json_token_cmp(const struct json_token *a, const struct json_token *b) int json_path_cmp(const char *a, int a_len, const char *b, int b_len, - int index_base) + bool is_weak_cmp, int index_base) { struct json_lexer lexer_a, lexer_b; json_lexer_create(&lexer_a, a, a_len, index_base); @@ -301,6 +301,8 @@ json_path_cmp(const char *a, int a_len, const char *b, int b_len, } /* Paths a and b must be valid. */ assert(rc_b == 0 && rc_b == 0); + if (is_weak_cmp) + return 0; /* * The parser stopped because the end of one of the paths * was reached. As JSON_TOKEN_END > JSON_TOKEN_{NUM, STR}, diff --git a/src/lib/json/json.h b/src/lib/json/json.h index 4c1bc18bc..53a0da5f9 100644 --- a/src/lib/json/json.h +++ b/src/lib/json/json.h @@ -245,12 +245,16 @@ json_lexer_next_token(struct json_lexer *lexer, struct json_token *token); * Compare two JSON paths using Lexer class. * - in case of paths that have same token-sequence prefix, * the path having more tokens is assumed to be greater + * (except the case when is_weak_cmp flag is set). * - both paths must be valid * (may be tested with json_path_validate). + * + * When is_weak_cmp flag is set, strings are assumed to be equal + * when all b path tokens are present in a path. */ int json_path_cmp(const char *a, int a_len, const char *b, int b_len, - int index_base); + bool is_weak_cmp, int index_base); /** * Check if the passed JSON path is valid. diff --git a/test/unit/json.c b/test/unit/json.c index 1db7e9364..10704451b 100644 --- a/test/unit/json.c +++ b/test/unit/json.c @@ -479,12 +479,12 @@ test_path_cmp() {"Data[1][\"Info\"].fname[1]", -1}, }; header(); - plan(lengthof(rc) + 6); + plan(lengthof(rc) + 8); for (size_t i = 0; i < lengthof(rc); ++i) { const char *path = rc[i].path; int errpos = rc[i].errpos; int rc = json_path_cmp(a, a_len, path, strlen(path), - INDEX_BASE); + false, INDEX_BASE); if (rc > 0) rc = 1; if (rc < 0) rc = -1; is(rc, errpos, "path cmp result \"%s\" with \"%s\": " @@ -493,7 +493,7 @@ test_path_cmp() char *multikey_a = "Data[*][\"FIO\"].fname[*]"; char *multikey_b = "[\"Data\"][*].FIO[\"fname\"][*]"; int ret = json_path_cmp(multikey_a, strlen(multikey_a), multikey_b, - strlen(multikey_b), INDEX_BASE); + strlen(multikey_b), false, INDEX_BASE); is(ret, 0, "path cmp result \"%s\" with \"%s\": have %d, expected %d", multikey_a, multikey_b, ret, 0); @@ -510,6 +510,13 @@ test_path_cmp() "test json_path_is_multikey: multikey path"); is(ret, 8, "test json_path_is_multikey: multikey path %d/%d", ret, 8); + is(json_path_cmp(a, a_len, multikey_b, ret, false, INDEX_BASE) == 0, false, + "test json_path_cmp with is_weak_cmp = false: compare %s and %.*s", + a, ret, multikey_b); + is(json_path_cmp(a, a_len, multikey_b, ret, true, INDEX_BASE) == 0, true, + "test json_path_cmp with is_weak_cmp = true: compare %s and %.*s", + a, ret, multikey_b); + check_plan(); footer(); } diff --git a/test/unit/json.result b/test/unit/json.result index 0da6c8c6b..e57dc748a 100644 --- a/test/unit/json.result +++ b/test/unit/json.result @@ -170,7 +170,7 @@ ok 2 - subtests ok 3 - subtests *** test_tree: done *** *** test_path_cmp *** - 1..11 + 1..13 ok 1 - path cmp result "Data[1]["FIO"].fname" with "Data[1]["FIO"].fname": have 0, expected 0 ok 2 - path cmp result "Data[1]["FIO"].fname" with "["Data"][1].FIO["fname"]": have 0, expected 0 ok 3 - path cmp result "Data[1]["FIO"].fname" with "Data[1]": have 1, expected 1 @@ -182,6 +182,8 @@ ok 3 - subtests ok 9 - test json_path_is_multikey: non-multikey path ok 10 - test json_path_is_multikey: multikey path ok 11 - test json_path_is_multikey: multikey path 8/8 + ok 12 - test json_path_cmp with is_weak_cmp = false: compare Data[1]["FIO"].fname and ["Data"] + ok 13 - test json_path_cmp with is_weak_cmp = true: compare Data[1]["FIO"].fname and ["Data"] ok 4 - subtests *** test_path_cmp: done *** *** test_path_snprint *** -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tarantool-patches] [PATCH v2 2/5] lib: introduce is_weak_cmp flag for json_path_cmp 2019-03-19 12:32 ` [PATCH v2 2/5] lib: introduce is_weak_cmp flag for json_path_cmp Kirill Shcherbatov @ 2019-03-19 15:47 ` Konstantin Osipov 0 siblings, 0 replies; 13+ messages in thread From: Konstantin Osipov @ 2019-03-19 15:47 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov * Kirill Shcherbatov <kshcherbatov@tarantool.org> [19/03/19 16:15]: > Introduced is_weak_cmp flag for json_path_cmp routine: when this > flag is set, strings are equal when whole b path is an a subpath > (in terms of tokens). Could you please provide an example in the changeset comment and function comment? Why did you choose "weak" as a description of this type of comparison? Perhaps "is_prefix"? cmp is symmetrical, is the new function symmetrical? > * Compare two JSON paths using Lexer class. > * - in case of paths that have same token-sequence prefix, > * the path having more tokens is assumed to be greater > + * (except the case when is_weak_cmp flag is set). > * - both paths must be valid > * (may be tested with json_path_validate). > + * > + * When is_weak_cmp flag is set, strings are assumed to be equal > + * when all b path tokens are present in a path. > */ Please add an example. > int Are you sure you want a flag an existing function and not an entirely separate one? > json_path_cmp(const char *a, int a_len, const char *b, int b_len, > - int index_base); > + bool is_weak_cmp, int index_base); > > /** > * Check if the passed JSON path is valid. > diff --git a/test/unit/json.c b/test/unit/json.c > index 1db7e9364..10704451b 100644 > --- a/test/unit/json.c > +++ b/test/unit/json.c > @@ -479,12 +479,12 @@ test_path_cmp() > {"Data[1][\"Info\"].fname[1]", -1}, > }; > header(); > - plan(lengthof(rc) + 6); > + plan(lengthof(rc) + 8); > for (size_t i = 0; i < lengthof(rc); ++i) { > const char *path = rc[i].path; > int errpos = rc[i].errpos; > int rc = json_path_cmp(a, a_len, path, strlen(path), > - INDEX_BASE); > + false, INDEX_BASE); > if (rc > 0) rc = 1; > if (rc < 0) rc = -1; > is(rc, errpos, "path cmp result \"%s\" with \"%s\": " > @@ -493,7 +493,7 @@ test_path_cmp() > char *multikey_a = "Data[*][\"FIO\"].fname[*]"; > char *multikey_b = "[\"Data\"][*].FIO[\"fname\"][*]"; > int ret = json_path_cmp(multikey_a, strlen(multikey_a), multikey_b, > - strlen(multikey_b), INDEX_BASE); > + strlen(multikey_b), false, INDEX_BASE); > is(ret, 0, "path cmp result \"%s\" with \"%s\": have %d, expected %d", > multikey_a, multikey_b, ret, 0); > > @@ -510,6 +510,13 @@ test_path_cmp() > "test json_path_is_multikey: multikey path"); > is(ret, 8, "test json_path_is_multikey: multikey path %d/%d", ret, 8); > > + is(json_path_cmp(a, a_len, multikey_b, ret, false, INDEX_BASE) == 0, false, > + "test json_path_cmp with is_weak_cmp = false: compare %s and %.*s", > + a, ret, multikey_b); > + is(json_path_cmp(a, a_len, multikey_b, ret, true, INDEX_BASE) == 0, true, > + "test json_path_cmp with is_weak_cmp = true: compare %s and %.*s", > + a, ret, multikey_b); The tests are vastly insufficient. Left empty path, two empty paths, right empty paths, the subpath is in the middle, front or tail, a path with wildcards, two paths with wildcards, multiple wildcards. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 3/5] box: move offset_slot init to tuple_format_add_field 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 1/5] lib: introduce json_path_is_multikey helper Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 2/5] lib: introduce is_weak_cmp flag for json_path_cmp Kirill Shcherbatov @ 2019-03-19 12:32 ` Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 4/5] box: introduce field_map_builder for field_map init Kirill Shcherbatov ` (2 subsequent siblings) 5 siblings, 0 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 12:32 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Due to the fact that the allocation of offset_slot in the case of multikey indexes will become more complicated and will be necessary for intermediate nodes of the tuple_field tree, let's move this logic to the tuple_format_add_field. Needed for #1257 --- src/box/tuple_format.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 22a84af86..1043707ad 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -205,13 +205,14 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id) */ static struct tuple_field * tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, - const char *path, uint32_t path_len, char **path_pool) + const char *path, uint32_t path_len, bool is_sequential, + int *current_slot, char **path_pool) { struct tuple_field *field = NULL; struct tuple_field *parent = tuple_format_field(format, fieldno); assert(parent != NULL); if (path == NULL) - return parent; + goto end; field = tuple_field_new(); if (field == NULL) goto fail; @@ -279,12 +280,23 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, assert(parent != NULL); /* Update tree depth information. */ format->fields_depth = MAX(format->fields_depth, token_count + 1); -end: +cleanup: tuple_field_delete(field); +end: + /* + * In the tuple, store only offsets necessary to access + * fields of non-sequential keys. First field is always + * simply accessible, so we don't store an offset for it. + */ + if (parent != NULL && parent->offset_slot == TUPLE_OFFSET_SLOT_NIL && + is_sequential == false && (fieldno > 0 || path != NULL)) { + *current_slot = *current_slot - 1; + parent->offset_slot = *current_slot; + } return parent; fail: parent = NULL; - goto end; + goto cleanup; } static int @@ -295,7 +307,8 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, assert(part->fieldno < tuple_format_field_count(format)); struct tuple_field *field = tuple_format_add_field(format, part->fieldno, part->path, - part->path_len, path_pool); + part->path_len, is_sequential, + current_slot, path_pool); if (field == NULL) return -1; /* @@ -350,17 +363,6 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, return -1; } field->is_key_part = true; - /* - * In the tuple, store only offsets necessary to access - * fields of non-sequential keys. First field is always - * simply accessible, so we don't store an offset for it. - */ - if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && - is_sequential == false && - (part->fieldno > 0 || part->path != NULL)) { - *current_slot = *current_slot - 1; - field->offset_slot = *current_slot; - } return 0; } -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 4/5] box: introduce field_map_builder for field_map init 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (2 preceding siblings ...) 2019-03-19 12:32 ` [PATCH v2 3/5] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov @ 2019-03-19 12:32 ` Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 5/5] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-03-28 10:21 ` [PATCH v2 0/5] " Vladimir Davydov 5 siblings, 0 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 12:32 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov The new field_map_builder class encapsulates the logic associated with field_map allocation and initialization. In the future it will be extended to allocate field_map that has extentions. Needed for #1257 --- src/box/CMakeLists.txt | 1 + src/box/memtx_engine.c | 8 +- src/box/tuple.c | 16 ++-- src/box/tuple_field_map.c | 53 ++++++++++++ src/box/tuple_field_map.h | 86 ++++++++++++++++++ src/box/tuple_format.c | 178 +++++++++++++++++++------------------- src/box/tuple_format.h | 7 +- src/box/vy_stmt.c | 9 +- 8 files changed, 252 insertions(+), 106 deletions(-) create mode 100644 src/box/tuple_field_map.c create mode 100644 src/box/tuple_field_map.h diff --git a/src/box/CMakeLists.txt b/src/box/CMakeLists.txt index 59e91b65a..3ae47e336 100644 --- a/src/box/CMakeLists.txt +++ b/src/box/CMakeLists.txt @@ -35,6 +35,7 @@ target_link_libraries(xrow server core small vclock misc box_error add_library(tuple STATIC tuple.c + tuple_field_map.c tuple_format.c tuple_update.c tuple_compare.cc diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 3e959ff2e..bcd170a89 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1125,10 +1125,10 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end) struct tuple *tuple = NULL; struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, data, true, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(&builder, format, data, true) != 0) goto end; + uint32_t field_map_size = field_map_builder_size(&builder); size_t tuple_len = end - data; size_t total = sizeof(struct memtx_tuple) + field_map_size + tuple_len; @@ -1168,7 +1168,7 @@ memtx_tuple_new(struct tuple_format *format, const char *data, const char *end) */ tuple->data_offset = sizeof(struct tuple) + field_map_size; char *raw = (char *) tuple + tuple->data_offset; - memcpy(raw - field_map_size, field_map, field_map_size); + field_map_builder_build(&builder, raw - field_map_size); memcpy(raw, data, tuple_len); say_debug("%s(%zu) = %p", __func__, tuple_len, memtx_tuple); end: diff --git a/src/box/tuple.c b/src/box/tuple.c index 7f06d4053..68f0670f2 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -78,10 +78,10 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end struct tuple *tuple = NULL; struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, data, true, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(&builder, format, data, true) != 0) goto end; + uint32_t field_map_size = field_map_builder_size(&builder); size_t data_len = end - data; size_t total = sizeof(struct tuple) + field_map_size + data_len; @@ -98,7 +98,7 @@ runtime_tuple_new(struct tuple_format *format, const char *data, const char *end tuple_format_ref(format); tuple->data_offset = sizeof(struct tuple) + field_map_size; char *raw = (char *) tuple + tuple->data_offset; - memcpy(raw - field_map_size, field_map, field_map_size); + field_map_builder_build(&builder, raw - field_map_size); memcpy(raw, data, data_len); say_debug("%s(%zu) = %p", __func__, data_len, tuple); end: @@ -126,11 +126,11 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) struct region *region = &fiber()->gc; size_t region_svp = region_used(region); - uint32_t *field_map, field_map_size; - int rc = tuple_field_map_create(format, tuple, true, &field_map, - &field_map_size); - assert(rc != 0 || field_map_size == format->field_map_size); + struct field_map_builder builder; + int rc = tuple_field_map_create(&builder, format, tuple, true); region_truncate(region, region_svp); + assert(rc != 0 || + field_map_builder_size(&builder) == format->field_map_size); return rc; } diff --git a/src/box/tuple_field_map.c b/src/box/tuple_field_map.c new file mode 100644 index 000000000..ea77745aa --- /dev/null +++ b/src/box/tuple_field_map.c @@ -0,0 +1,53 @@ +/* + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include "diag.h" +#include "tuple_field_map.h" +#include "small/region.h" + +int +field_map_builder_create(struct field_map_builder *builder, + uint32_t field_map_size, struct region *region) +{ + builder->item_count = field_map_size / sizeof(uint32_t); + if (field_map_size == 0) { + builder->items = NULL; + return 0; + } + uint32_t sz = builder->item_count * sizeof(field_map_builder_item); + builder->items = region_alloc(region, sz); + if (builder->items == NULL) { + diag_set(OutOfMemory, sz, "region_alloc", "field_map"); + return -1; + } + memset((char *)builder->items, 0, sz); + builder->items = (field_map_builder_item *)((char *)builder->items + sz); + return 0; +} diff --git a/src/box/tuple_field_map.h b/src/box/tuple_field_map.h new file mode 100644 index 000000000..3a9927faa --- /dev/null +++ b/src/box/tuple_field_map.h @@ -0,0 +1,86 @@ +#ifndef TARANTOOL_BOX_FIELD_MAP_H_INCLUDED +#define TARANTOOL_BOX_FIELD_MAP_H_INCLUDED +/* + * Copyright 2010-2019, Tarantool AUTHORS, please see AUTHORS file. + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * 1. Redistributions of source code must retain the above + * copyright notice, this list of conditions and the + * following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL + * <COPYRIGHT HOLDER> OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF + * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ +#include <stdint.h> + +struct region; + +/** Preliminary field_map atom. */ +typedef uint32_t field_map_builder_item; + +/** A class that contains a preliminary view of the field_map. */ +struct field_map_builder { + /** + * Allocation with preliminary offset slot + * representation. + */ + field_map_builder_item *items; + /** + * Number of fields in the resulting field_map; + * count of field_map_builder::items. + */ + uint32_t item_count; +}; + +/** + * Initialize field_map_builder to prepare field_map of size + * field_map_size. Use region for temporary allocations. + */ +int +field_map_builder_create(struct field_map_builder *builder, + uint32_t field_map_size, struct region *region); + +/** Initialize offset_slot with offset value. */ +static inline void +field_map_builder_slot_set(struct field_map_builder *builder, + int32_t offset_slot, uint32_t offset) +{ + assert(offset_slot < 0); + builder->items[offset_slot] = offset; +} + +/** Return built field_map size. */ +static inline uint32_t +field_map_builder_size(struct field_map_builder *builder) +{ + return builder->item_count * sizeof(uint32_t); +} + +/** Write build field_map size in the preallocated buffer wptr. */ +static inline void +field_map_builder_build(struct field_map_builder *builder, char *wptr) +{ + uint32_t field_map_size = field_map_builder_size(builder); + memcpy(wptr, (char *)builder->items - field_map_size, field_map_size); +} + +#endif /* TARANTOOL_BOX_FIELD_MAP_H_INCLUDED */ diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1043707ad..1d765c7c7 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -767,70 +767,13 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, return true; } -/** @sa declaration for details. */ -int -tuple_field_map_create(struct tuple_format *format, const char *tuple, - bool validate, uint32_t **field_map, - uint32_t *field_map_size) +static int +tuple_field_map_initialize(struct field_map_builder *builder, + struct tuple_format *format, const char **pos, + const char *tuple, uint32_t defined_field_count, + void *required_fields, struct region *region) { - if (tuple_format_field_count(format) == 0) { - *field_map = NULL; - *field_map_size = 0; - return 0; /* Nothing to initialize */ - } - struct region *region = &fiber()->gc; - *field_map_size = format->field_map_size; - *field_map = region_alloc(region, *field_map_size); - if (*field_map == NULL) { - diag_set(OutOfMemory, *field_map_size, "region_alloc", - "field_map"); - return -1; - } - *field_map = (uint32_t *)((char *)*field_map + *field_map_size); - - const char *pos = tuple; - int rc = 0; - /* Check to see if the tuple has a sufficient number of fields. */ - uint32_t field_count = mp_decode_array(&pos); - if (validate && format->exact_field_count > 0 && - format->exact_field_count != field_count) { - diag_set(ClientError, ER_EXACT_FIELD_COUNT, - (unsigned) field_count, - (unsigned) format->exact_field_count); - goto error; - } - /* - * Allocate a field bitmap that will be used for checking - * that all mandatory fields are present. - */ - void *required_fields = NULL; - size_t required_fields_sz = 0; - if (validate) { - required_fields_sz = bitmap_size(format->total_field_count); - required_fields = region_alloc(region, required_fields_sz); - if (required_fields == NULL) { - diag_set(OutOfMemory, required_fields_sz, - "region", "required field bitmap"); - goto error; - } - memcpy(required_fields, format->required_fields, - required_fields_sz); - } - /* - * Initialize the tuple field map and validate field types. - */ - if (field_count == 0) { - /* Empty tuple, nothing to do. */ - goto finish; - } - uint32_t defined_field_count = MIN(field_count, validate ? - tuple_format_field_count(format) : - format->index_field_count); - /* - * Nullify field map to be able to detect by 0, - * which key fields are absent in tuple_field(). - */ - memset((char *)*field_map - *field_map_size, 0, *field_map_size); + bool validate = required_fields != NULL; /* * Prepare mp stack of the size equal to the maximum depth * of the indexed field in the format::fields tree @@ -842,12 +785,11 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, struct mp_frame *frames = region_alloc(region, frames_sz); if (frames == NULL) { diag_set(OutOfMemory, frames_sz, "region", "frames"); - goto error; + return -1; } struct mp_stack stack; mp_stack_create(&stack, format->fields_depth, frames); mp_stack_push(&stack, MP_ARRAY, defined_field_count); - struct tuple_field *field; struct json_token *parent = &format->fields.root; while (true) { int idx; @@ -862,7 +804,7 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, */ mp_stack_pop(&stack); if (mp_stack_is_empty(&stack)) - goto finish; + goto end; parent = parent->parent; } /* @@ -877,17 +819,17 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, token.num = idx; break; case MP_MAP: - if (mp_typeof(*pos) != MP_STR) { + if (mp_typeof(**pos) != MP_STR) { /* * JSON path support only string * keys for map. Skip this entry. */ - mp_next(&pos); - mp_next(&pos); + mp_next(pos); + mp_next(pos); continue; } token.type = JSON_TOKEN_STR; - token.str = mp_decode_str(&pos, (uint32_t *)&token.len); + token.str = mp_decode_str(pos, (uint32_t *)&token.len); break; default: unreachable(); @@ -898,9 +840,10 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, * corresponding to the current position in the * tuple. */ - enum mp_type type = mp_typeof(*pos); + enum mp_type type = mp_typeof(**pos); assert(parent != NULL); - field = json_tree_lookup_entry(&format->fields, parent, &token, + struct tuple_field *field = + json_tree_lookup_entry(&format->fields, parent, &token, struct tuple_field, token); if (field != NULL) { bool is_nullable = tuple_field_is_nullable(field); @@ -910,11 +853,14 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field), field_type_strs[field->type]); - goto error; + return -1; } - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - (*field_map)[field->offset_slot] = pos - tuple; - if (required_fields != NULL) + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + field_map_builder_slot_set(builder, + field->offset_slot, + *pos - tuple); + } + if (validate) bit_clear(required_fields, field->id); } /* @@ -927,14 +873,76 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, if ((type == MP_ARRAY || type == MP_MAP) && !mp_stack_is_full(&stack) && field != NULL) { uint32_t size = type == MP_ARRAY ? - mp_decode_array(&pos) : - mp_decode_map(&pos); + mp_decode_array(pos) : + mp_decode_map(pos); mp_stack_push(&stack, type, size); parent = &field->token; } else { - mp_next(&pos); + mp_next(pos); } } +end: + return 0; +} + +/** @sa declaration for details. */ +int +tuple_field_map_create(struct field_map_builder *builder, + struct tuple_format *format, const char *tuple, + bool validate) +{ + struct region *region = &fiber()->gc; + if (tuple_format_field_count(format) == 0) { + /* Nothing to initialize */ + if (field_map_builder_create(builder, 0, region) != 0) + unreachable(); + return 0; + } + if (field_map_builder_create(builder, format->field_map_size, + region) != 0) + return -1; + const char *pos = tuple; + /* Check to see if the tuple has a sufficient number of fields. */ + uint32_t field_count = mp_decode_array(&pos); + if (validate && format->exact_field_count > 0 && + format->exact_field_count != field_count) { + diag_set(ClientError, ER_EXACT_FIELD_COUNT, + (unsigned) field_count, + (unsigned) format->exact_field_count); + return -1; + } + /* + * Allocate a field bitmap that will be used for checking + * that all mandatory fields are present. + */ + void *required_fields = NULL; + size_t required_fields_sz = 0; + if (validate) { + required_fields_sz = bitmap_size(format->total_field_count); + required_fields = region_alloc(region, required_fields_sz); + if (required_fields == NULL) { + diag_set(OutOfMemory, required_fields_sz, + "region", "required field bitmap"); + return -1; + } + memcpy(required_fields, format->required_fields, + required_fields_sz); + } + /* + * Initialize the tuple field map and validate field types. + */ + if (field_count == 0) { + /* Empty tuple, nothing to do. */ + goto finish; + } + uint32_t defined_field_count = MIN(field_count, validate ? + tuple_format_field_count(format) : + format->index_field_count); + assert(!validate || required_fields != NULL); + if (tuple_field_map_initialize(builder, format, &pos, tuple, + defined_field_count, required_fields, + region) != 0) + return -1; finish: /* * Check the required field bitmap for missing fields. @@ -946,19 +954,15 @@ finish: size_t id = bit_iterator_next(&it); if (id < SIZE_MAX) { /* A field is missing, report an error. */ - field = tuple_format_field_by_id(format, id); + struct tuple_field *field = + tuple_format_field_by_id(format, id); assert(field != NULL); diag_set(ClientError, ER_FIELD_MISSING, tuple_field_path(field)); - goto error; + return -1; } } -out: - *field_map = (uint32_t *)((char *)*field_map - *field_map_size); - return rc; -error: - rc = -1; - goto out; + return 0; } uint32_t diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 22a0fb232..6874374cc 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -36,6 +36,7 @@ #include "errinj.h" #include "json/json.h" #include "tuple_dictionary.h" +#include "tuple_field_map.h" #if defined(__cplusplus) extern "C" { @@ -401,9 +402,9 @@ box_tuple_format_unref(box_tuple_format_t *format); * tuple + off_i = indexed_field_i; */ int -tuple_field_map_create(struct tuple_format *format, const char *tuple, - bool validate, uint32_t **field_map, - uint32_t *field_map_size); +tuple_field_map_create(struct field_map_builder *builder, + struct tuple_format *format, const char *tuple, + bool validate); /** * Initialize tuple format subsystem. diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 5d1e10f07..e355e3650 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -314,10 +314,11 @@ vy_stmt_new_with_ops(struct tuple_format *format, const char *tuple_begin, * tuples inserted into a space are validated explicitly * with tuple_validate() anyway. */ - uint32_t *field_map, field_map_size; - if (tuple_field_map_create(format, tuple_begin, false, &field_map, - &field_map_size) != 0) + struct field_map_builder builder; + if (tuple_field_map_create(&builder, format, tuple_begin, false) != 0) goto end; + uint32_t field_map_size = field_map_builder_size(&builder); + assert(field_map_size == format->field_map_size); /* * Allocate stmt. Offsets: one per key part + offset of the * statement end. @@ -330,7 +331,7 @@ vy_stmt_new_with_ops(struct tuple_format *format, const char *tuple_begin, /* Copy MsgPack data */ char *raw = (char *) tuple_data(stmt); char *wpos = raw; - memcpy(wpos - field_map_size, field_map, field_map_size); + field_map_builder_build(&builder, wpos - field_map_size); memcpy(wpos, tuple_begin, mpsize); wpos += mpsize; for (struct iovec *op = ops, *end = ops + op_count; -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 5/5] box: introduce multikey indexes in memtx 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (3 preceding siblings ...) 2019-03-19 12:32 ` [PATCH v2 4/5] box: introduce field_map_builder for field_map init Kirill Shcherbatov @ 2019-03-19 12:32 ` Kirill Shcherbatov 2019-03-19 16:05 ` [tarantool-patches] " Kirill Shcherbatov 2019-03-21 12:35 ` Kirill Shcherbatov 2019-03-28 10:21 ` [PATCH v2 0/5] " Vladimir Davydov 5 siblings, 2 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 12:32 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: Kirill Shcherbatov Part of #1257 @TarantoolBot document Title: introduce multikey indexes in memtx Multikey indexes allows you to automatically index set of documents by JSON paths having array index placeholder "[*]". Multikey index cannot be primary as it cannot be unique(by definition). Multikey index parts must be compatible: only one "[*]" placeholder is allowed in same position(for all JSON paths in index parts). Example: s = box.schema.space.create('withdata') pk = s:create_index('pk') parts = { {2, 'str', path = 'data[*].name'}, {2, 'str', path = 'data[*].extra.phone'} } idx = s:create_index('idx', {parts = parts}) s:insert({1, {data = {{name="A", extra={phone="111"}}, {name="B", extra={phone="111"}}}, garbage = 1}} idx:get({'A', '111'}) --- src/box/errcode.h | 1 + src/box/index_def.c | 25 +++- src/box/key_def.c | 10 ++ src/box/key_def.h | 18 +++ src/box/memtx_space.c | 18 +++ src/box/memtx_tree.c | 183 ++++++++++++++++++++---- src/box/tuple.c | 28 +++- src/box/tuple.h | 111 +++++++++++++-- src/box/tuple_compare.cc | 87 +++++++++--- src/box/tuple_extract_key.cc | 2 +- src/box/tuple_field_map.c | 27 ++++ src/box/tuple_field_map.h | 87 +++++++++++- src/box/tuple_format.c | 105 +++++++++++--- src/box/vinyl.c | 6 + test/box/misc.result | 1 + test/engine/json.result | 13 -- test/engine/json.test.lua | 7 - test/engine/multikey_idx.result | 228 ++++++++++++++++++++++++++++++ test/engine/multikey_idx.test.lua | 67 +++++++++ 19 files changed, 915 insertions(+), 109 deletions(-) create mode 100644 test/engine/multikey_idx.result create mode 100644 test/engine/multikey_idx.test.lua diff --git a/src/box/errcode.h b/src/box/errcode.h index d234d26c8..16cec30c0 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -239,6 +239,7 @@ struct errcode_record { /*184 */_(ER_SQL_UNRECOGNIZED_SYNTAX, "Syntax error near '%.*s'") \ /*185 */_(ER_SQL_UNKNOWN_TOKEN, "Syntax error: unrecognized token: '%.*s'") \ /*186 */_(ER_SQL_PARSER_GENERIC, "%s") \ + /*187 */_(ER_INDEX_MULTIKEY_INVALID, "Multikey index is invalid: %s") \ /* * !IMPORTANT! Please follow instructions at start of the file diff --git a/src/box/index_def.c b/src/box/index_def.c index f0c7dc4f6..dfa539f86 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -268,9 +268,15 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) space_name, "too many key parts"); return false; } + if (index_def->iid == 0 && key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, + space_name, "primary key cannot be multikey"); + return false; + } for (uint32_t i = 0; i < index_def->key_def->part_count; i++) { - assert(index_def->key_def->parts[i].type < field_type_MAX); - if (index_def->key_def->parts[i].fieldno > BOX_INDEX_FIELD_MAX) { + struct key_part *parts = index_def->key_def->parts; + assert(parts[i].type < field_type_MAX); + if (parts[i].fieldno > BOX_INDEX_FIELD_MAX) { diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, space_name, "field no is too big"); return false; @@ -280,8 +286,8 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) * Courtesy to a user who could have made * a typo. */ - struct key_part *part_a = &index_def->key_def->parts[i]; - struct key_part *part_b = &index_def->key_def->parts[j]; + struct key_part *part_a = &parts[i]; + struct key_part *part_b = &parts[j]; if (part_a->fieldno == part_b->fieldno && json_path_cmp(part_a->path, part_a->path_len, part_b->path, part_b->path_len, @@ -292,6 +298,17 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) return false; } } + if (key_def_is_multikey(index_def->key_def) && + json_path_cmp(parts[i].path, parts[i].path_len, + parts[index_def->key_def-> + multikey_part_idx].path, + index_def->key_def->multikey_path_len, + true, TUPLE_INDEX_BASE) != 0) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name, + "multikey index parts are incompatible"); + return false; + } } return true; } diff --git a/src/box/key_def.c b/src/box/key_def.c index f9e464402..ffc1e1665 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -164,6 +164,13 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, *path_pool += path_len; memcpy(def->parts[part_no].path, path, path_len); def->parts[part_no].path_len = path_len; + int multikey_path_len; + if (json_path_is_multikey(path, path_len, &multikey_path_len, + TUPLE_INDEX_BASE) && + def->multikey_path_len <= (uint32_t)multikey_path_len) { + def->multikey_part_idx = part_no; + def->multikey_path_len = (uint32_t)multikey_path_len; + } } else { def->parts[part_no].path = NULL; def->parts[part_no].path_len = 0; @@ -185,6 +192,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count) } def->part_count = part_count; + def->multikey_part_idx = part_count; def->unique_part_count = part_count; /* A pointer to the JSON paths data in the new key_def. */ @@ -253,6 +261,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) } key_def->part_count = part_count; + key_def->multikey_part_idx = part_count; key_def->unique_part_count = part_count; for (uint32_t item = 0; item < part_count; ++item) { @@ -672,6 +681,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second) return NULL; } new_def->part_count = new_part_count; + new_def->multikey_part_idx = new_part_count; new_def->unique_part_count = new_part_count; new_def->is_nullable = first->is_nullable || second->is_nullable; new_def->has_optional_parts = first->has_optional_parts || diff --git a/src/box/key_def.h b/src/box/key_def.h index 174790f72..690954f2d 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -247,6 +247,18 @@ struct key_def { bool has_optional_parts; /** Key fields mask. @sa column_mask.h for details. */ uint64_t column_mask; + /** + * In case of multikey index, the index of the key_part + * containing JSON path with array index placeholder "[*]". + * Otherwise multikey_part_idx == part_count. + */ + uint32_t multikey_part_idx; + /** + * In case of multikey index, the length of the + * parts[multikey_part_idx].path substring "...[*]" + * @see json_path_is_multikey(). + */ + uint32_t multikey_path_len; /** The size of the 'parts' array. */ uint32_t part_count; /** Description of parts of a multipart index. */ @@ -485,6 +497,12 @@ key_def_is_sequential(const struct key_def *key_def) return true; } +static inline bool +key_def_is_multikey(const struct key_def *key_def) +{ + return key_def->multikey_part_idx < key_def->part_count; +} + /** * Return true if @a key_def defines has fields that requires * special collation comparison. diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c index 92ec0b300..9a5132979 100644 --- a/src/box/memtx_space.c +++ b/src/box/memtx_space.c @@ -638,6 +638,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "HASH index must be unique"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "HASH index cannot be multikey"); + return -1; + } break; case TREE: /* TREE index has no limitations. */ @@ -661,6 +667,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "RTREE index field type must be ARRAY"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "RTREE index cannot be multikey"); + return -1; + } /* no furter checks of parts needed */ return 0; case BITSET: @@ -683,6 +695,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "BITSET index field type must be NUM or STR"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "BITSET index cannot be multikey"); + return -1; + } /* no furter checks of parts needed */ return 0; default: diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c index c2c8a19e7..38a6992ef 100644 --- a/src/box/memtx_tree.c +++ b/src/box/memtx_tree.c @@ -531,7 +531,8 @@ memtx_tree_index_get(struct index *base, const char *key, struct memtx_tree_key_data key_data; key_data.key = key; key_data.part_count = part_count; - key_data.hint = key_hint(key, cmp_def); + if (!key_def_is_multikey(cmp_def)) + key_data.hint = key_hint(key, cmp_def); struct memtx_tree_data *res = memtx_tree_find(&index->tree, &key_data); *result = res != NULL ? res->tuple : NULL; return 0; @@ -587,6 +588,79 @@ memtx_tree_index_replace(struct index *base, struct tuple *old_tuple, return 0; } +int +memtx_tree_index_replace_multikey(struct index *base, struct tuple *old_tuple, + struct tuple *new_tuple, + enum dup_replace_mode mode, + struct tuple **result) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *key_def = memtx_tree_index_cmp_def(index); + if (new_tuple != NULL) { + int errcode = 0, tree_res = 0; + struct tuple *dup_tuple = NULL; + int multikey_idx = 0; + uint32_t items = tuple_field_multikey_items(new_tuple, key_def); + for (; (uint32_t)multikey_idx < items; multikey_idx++) { + struct memtx_tree_data new_data; + new_data.tuple = new_tuple; + new_data.hint = multikey_idx; + struct memtx_tree_data dup_data; + dup_data.tuple = NULL; + tree_res = memtx_tree_insert(&index->tree, new_data, + &dup_data); + if (tree_res != 0) { + diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, + "memtx_tree_index", "replace"); + break; + } + dup_tuple = dup_data.tuple; + errcode = replace_check_dup(old_tuple, dup_tuple, mode); + if (errcode != 0) { + memtx_tree_delete(&index->tree, new_data); + if (dup_tuple != NULL) { + memtx_tree_insert(&index->tree, + dup_data, NULL); + } + struct space *sp = + space_cache_find(base->def->space_id); + if (sp != NULL) { + diag_set(ClientError, errcode, + base->def->name, + space_name(sp)); + } + break; + } + } + if (tree_res != 0 || errcode != 0) { + multikey_idx--; + for (; multikey_idx >= 0; multikey_idx--) { + struct memtx_tree_data data; + data.tuple = new_tuple; + data.hint = multikey_idx; + memtx_tree_delete(&index->tree, data); + } + return -1; + } + if (dup_tuple != NULL) { + *result = dup_tuple; + return 0; + } + } + if (old_tuple != NULL) { + uint32_t items = tuple_field_multikey_items(old_tuple, key_def); + for (uint32_t multikey_idx = 0; multikey_idx < items; + multikey_idx++) { + struct memtx_tree_data old_data; + old_data.tuple = old_tuple; + old_data.hint = multikey_idx; + memtx_tree_delete(&index->tree, old_data); + } + } + *result = old_tuple; + return 0; +} + static struct iterator * memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, const char *key, uint32_t part_count) @@ -624,7 +698,8 @@ memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, it->key_data.key = key; it->key_data.part_count = part_count; struct key_def *cmp_def = memtx_tree_index_cmp_def(index); - it->key_data.hint = key_hint(key, cmp_def); + if (!key_def_is_multikey(cmp_def)) + it->key_data.hint = key_hint(key, cmp_def); it->index_def = base->def; it->tree = &index->tree; it->tree_iterator = memtx_tree_invalid_iterator(); @@ -659,33 +734,42 @@ memtx_tree_index_reserve(struct index *base, uint32_t size_hint) } static int -memtx_tree_index_build_next(struct index *base, struct tuple *tuple) +memtx_tree_index_build_array_realloc(struct memtx_tree_index *index, + uint32_t items) { - struct memtx_tree_index *index = (struct memtx_tree_index *)base; + bool need_realloc = false; if (index->build_array == NULL) { - index->build_array = malloc(MEMTX_EXTENT_SIZE); - if (index->build_array == NULL) { - diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, - "memtx_tree_index", "build_next"); - return -1; - } index->build_array_alloc_size = MEMTX_EXTENT_SIZE / sizeof(index->build_array[0]); + need_realloc = true; } - assert(index->build_array_size <= index->build_array_alloc_size); - if (index->build_array_size == index->build_array_alloc_size) { - index->build_array_alloc_size = index->build_array_alloc_size + - index->build_array_alloc_size / 2; - struct memtx_tree_data *tmp = - realloc(index->build_array, - index->build_array_alloc_size * sizeof(*tmp)); - if (tmp == NULL) { - diag_set(OutOfMemory, index->build_array_alloc_size * - sizeof(*tmp), "memtx_tree_index", "build_next"); - return -1; - } - index->build_array = tmp; + uint32_t build_array_new_size = index->build_array_size + items; + if (build_array_new_size > index->build_array_alloc_size) { + index->build_array_alloc_size += + MAX(index->build_array_alloc_size / 2, + build_array_new_size - index->build_array_alloc_size); + need_realloc = true; + } + if (!need_realloc) + return 0; + struct memtx_tree_data *tmp = + realloc(index->build_array, + index->build_array_alloc_size * sizeof(*tmp)); + if (tmp == NULL) { + diag_set(OutOfMemory, index->build_array_alloc_size * + sizeof(*tmp), "memtx_tree_index", "build_next"); + return -1; } + index->build_array = tmp; + return 0; +} + +static int +memtx_tree_index_build_next(struct index *base, struct tuple *tuple) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + if (memtx_tree_index_build_array_realloc(index, 1) != 0) + return -1; struct memtx_tree_data *elem = &index->build_array[index->build_array_size++]; elem->tuple = tuple; @@ -694,6 +778,24 @@ memtx_tree_index_build_next(struct index *base, struct tuple *tuple) return 0; } +static int +memtx_tree_index_build_next_multikey(struct index *base, struct tuple *tuple) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *key_def = memtx_tree_index_cmp_def(index); + uint32_t items = tuple_field_multikey_items(tuple, key_def); + if (memtx_tree_index_build_array_realloc(index, items) != 0) + return -1; + for (uint32_t multikey_idx = 0; multikey_idx < items; multikey_idx++) { + struct memtx_tree_data *elem = + &index->build_array[index->build_array_size++]; + assert(index->build_array_size <= index->build_array_alloc_size); + elem->tuple = tuple; + elem->hint = multikey_idx; + } + return 0; +} + static void memtx_tree_index_end_build(struct index *base) { @@ -796,6 +898,36 @@ static const struct index_vtab memtx_tree_index_vtab = { /* .end_build = */ memtx_tree_index_end_build, }; +static const struct index_vtab memtx_tree_index_multikey_vtab = { + /* .destroy = */ memtx_tree_index_destroy, + /* .commit_create = */ generic_index_commit_create, + /* .abort_create = */ generic_index_abort_create, + /* .commit_modify = */ generic_index_commit_modify, + /* .commit_drop = */ generic_index_commit_drop, + /* .update_def = */ memtx_tree_index_update_def, + /* .depends_on_pk = */ memtx_tree_index_depends_on_pk, + /* .def_change_requires_rebuild = */ + memtx_index_def_change_requires_rebuild, + /* .size = */ memtx_tree_index_size, + /* .bsize = */ memtx_tree_index_bsize, + /* .min = */ generic_index_min, + /* .max = */ generic_index_max, + /* .random = */ memtx_tree_index_random, + /* .count = */ memtx_tree_index_count, + /* .get = */ memtx_tree_index_get, + /* .replace = */ memtx_tree_index_replace_multikey, + /* .create_iterator = */ memtx_tree_index_create_iterator, + /* .create_snapshot_iterator = */ + memtx_tree_index_create_snapshot_iterator, + /* .stat = */ generic_index_stat, + /* .compact = */ generic_index_compact, + /* .reset_stat = */ generic_index_reset_stat, + /* .begin_build = */ memtx_tree_index_begin_build, + /* .reserve = */ memtx_tree_index_reserve, + /* .build_next = */ memtx_tree_index_build_next_multikey, + /* .end_build = */ memtx_tree_index_end_build, +}; + struct index * memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def) { @@ -806,8 +938,11 @@ memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def) "malloc", "struct memtx_tree_index"); return NULL; } + const struct index_vtab *vtab = key_def_is_multikey(def->key_def) ? + &memtx_tree_index_multikey_vtab : + &memtx_tree_index_vtab; if (index_create(&index->base, (struct engine *)memtx, - &memtx_tree_index_vtab, def) != 0) { + vtab, def) != 0) { free(index); return NULL; } diff --git a/src/box/tuple.c b/src/box/tuple.c index 68f0670f2..985117957 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -129,8 +129,6 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) struct field_map_builder builder; int rc = tuple_field_map_create(&builder, format, tuple, true); region_truncate(region, region_svp); - assert(rc != 0 || - field_map_builder_size(&builder) == format->field_map_size); return rc; } @@ -455,7 +453,8 @@ tuple_field_go_to_key(const char **field, const char *key, int len) } int -tuple_go_to_path(const char **data, const char *path, uint32_t path_len) +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, + int multikey_idx) { int rc; struct json_lexer lexer; @@ -463,6 +462,10 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len) json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { switch (token.type) { + case JSON_TOKEN_ANY: + assert(multikey_idx >= 0); + token.num = multikey_idx; + FALLTHROUGH; case JSON_TOKEN_NUM: rc = tuple_field_go_to_index(data, token.num); break; @@ -532,7 +535,24 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple, } return tuple_field_raw_by_path(format, tuple, field_map, fieldno, path + lexer.offset, - path_len - lexer.offset, NULL); + path_len - lexer.offset, NULL, -1); +} + +uint32_t +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_def *key_def) +{ + assert(key_def_is_multikey(key_def)); + struct key_part *part = &key_def->parts[key_def->multikey_part_idx]; + const char *array_raw = + tuple_field_raw_by_path(format, data, field_map, part->fieldno, + part->path, key_def->multikey_path_len, + NULL, -1); + if (array_raw == NULL) + return 0; + assert(mp_typeof(*array_raw) == MP_ARRAY); + return mp_decode_array(&array_raw); } /* }}} tuple_field_* getters */ diff --git a/src/box/tuple.h b/src/box/tuple.h index 8b12fd5a8..060e4a076 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -508,14 +508,51 @@ tuple_field_count(const struct tuple *tuple) * with NULL. * @param path The path to process. * @param path_len The length of the @path. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. * @retval 0 On success. * @retval -1 In case of error in JSON path. */ int -tuple_go_to_path(const char **data, const char *path, uint32_t path_len); +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, + int multikey_idx); /** - * Get tuple field by field index and relative JSON path. + * Get tuple field by field_map, offset_slot and multikey_idx. + * @param tuple MessagePack tuple's body. + * @param field_map Tuple field map. + * @param offset_slot The field_map slot with field to get offset. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. + * @retval Not NULL field pointer on success. + * @retval NULL otherwise, when field is absents. + */ +static const char * +tuple_field_raw_by_offset_slot(const char *tuple, const uint32_t *field_map, + int32_t offset_slot, int multikey_idx) +{ + assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); + uint32_t offset; + if (multikey_idx >= 0) { + assert(field_map[offset_slot] != 0); + struct field_map_ext *extent = + field_map_ext_get(field_map, offset_slot); + if ((uint32_t)multikey_idx >= extent->items) + return NULL; + offset = extent->offset[multikey_idx]; + } else { + offset = field_map[offset_slot]; + } + if (offset == 0) + return NULL; + return tuple + offset; +} + +/** + * Get tuple field by field index, relative JSON path and + * multikey_idx. * @param format Tuple format. * @param tuple MessagePack tuple's body. * @param field_map Tuple field map. @@ -528,12 +565,15 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len); * access data in a single operation. * Else it is initialized with offset_slot * of format field by path. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. */ static inline const char * tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, const uint32_t *field_map, uint32_t fieldno, const char *path, uint32_t path_len, - int32_t *offset_slot_hint) + int32_t *offset_slot_hint, int multikey_idx) { int32_t offset_slot; if (offset_slot_hint != NULL && @@ -558,10 +598,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, if (offset_slot_hint != NULL) *offset_slot_hint = offset_slot; offset_slot_access: - /* Indexed field */ - if (field_map[offset_slot] == 0) - return NULL; - tuple += field_map[offset_slot]; + tuple = tuple_field_raw_by_offset_slot(tuple, field_map, + offset_slot, multikey_idx); } else { uint32_t field_count; parse: @@ -572,8 +610,8 @@ parse: for (uint32_t k = 0; k < fieldno; k++) mp_next(&tuple); if (path != NULL && - unlikely(tuple_go_to_path(&tuple, path, - path_len) != 0)) + unlikely(tuple_go_to_path(&tuple, path, path_len, + multikey_idx) != 0)) return NULL; } return tuple; @@ -595,7 +633,7 @@ tuple_field_raw(struct tuple_format *format, const char *tuple, const uint32_t *field_map, uint32_t field_no) { return tuple_field_raw_by_path(format, tuple, field_map, field_no, - NULL, 0, NULL); + NULL, 0, NULL, -1); } /** @@ -634,16 +672,18 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple, uint32_t path_len, uint32_t path_hash); /** - * Get a tuple field pointed to by an index part. + * Get a tuple field pointed to by an index part and multikey hint. * @param format Tuple format. * @param tuple A pointer to MessagePack array. * @param field_map A pointer to the LAST element of field map. + * @param multikey_idx A multikey hint. * @param part Index part to use. * @retval Field data if the field exists or NULL. */ static inline const char * -tuple_field_raw_by_part(struct tuple_format *format, const char *data, - const uint32_t *field_map, struct key_part *part) +tuple_field_raw_by_part_multikey(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_part *part, int multikey_idx) { if (unlikely(part->format_epoch != format->epoch)) { assert(format->epoch != 0); @@ -656,7 +696,23 @@ tuple_field_raw_by_part(struct tuple_format *format, const char *data, } return tuple_field_raw_by_path(format, data, field_map, part->fieldno, part->path, part->path_len, - &part->offset_slot_cache); + &part->offset_slot_cache, multikey_idx); +} + +/** + * Get a tuple field pointed to by an index part. + * @param format Tuple format. + * @param tuple A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. + */ +static inline const char * +tuple_field_raw_by_part(struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) +{ + return tuple_field_raw_by_part_multikey(format, data, field_map, + part, -1); } /** @@ -672,6 +728,33 @@ tuple_field_by_part(const struct tuple *tuple, struct key_part *part) tuple_field_map(tuple), part); } +/** + * Get count of documents in the multikey index. + * @param format Tuple format. + * @param data A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param key_def Index key_definition. + * @retval Count of documents in the multikey index. + */ +uint32_t +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_def *key_def); + +/** + * Get count of documents in the multikey index. + * @param tuple Tuple to get the count of documents from. + * @param key_def Index key_definition. + * @retval Count of documents in the multikey index. + */ +static inline uint32_t +tuple_field_multikey_items(struct tuple *tuple, struct key_def *key_def) +{ + return tuple_field_raw_multikey_items(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), key_def); +} + /** * @brief Tuple Interator */ diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index 508aff04c..8896390e3 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -433,7 +433,8 @@ tuple_compare_field_with_hint(const char *field_a, enum mp_type a_type, } } -template<bool is_nullable, bool has_optional_parts, bool has_json_paths> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths, + bool is_multikey> static inline int tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, const struct tuple *tuple_b, @@ -443,8 +444,10 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); + assert(key_def_is_multikey(key_def) == is_multikey); + assert(!is_multikey || is_multikey == has_json_paths); int rc = 0; - if ((rc = hint_cmp(tuple_a_hint, tuple_b_hint)) != 0) + if (!is_multikey && (rc = hint_cmp(tuple_a_hint, tuple_b_hint)) != 0) return rc; struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); @@ -487,7 +490,14 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, end = part + key_def->part_count; for (; part < end; part++) { - if (has_json_paths) { + if (is_multikey) { + field_a = tuple_field_raw_by_part_multikey(format_a, + tuple_a_raw, field_map_a, part, + (int)tuple_a_hint); + field_b = tuple_field_raw_by_part_multikey(format_b, + tuple_b_raw, field_map_b, part, + (int)tuple_b_hint); + } else if (has_json_paths) { field_a = tuple_field_raw_by_part(format_a, tuple_a_raw, field_map_a, part); field_b = tuple_field_raw_by_part(format_b, tuple_b_raw, @@ -544,7 +554,14 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, */ end = key_def->parts + key_def->part_count; for (; part < end; ++part) { - if (has_json_paths) { + if (is_multikey) { + field_a = tuple_field_raw_by_part_multikey(format_a, + tuple_a_raw, field_map_a, part, + (int)tuple_a_hint); + field_b = tuple_field_raw_by_part_multikey(format_b, + tuple_b_raw, field_map_b, part, + (int)tuple_b_hint); + } else if (has_json_paths) { field_a = tuple_field_raw_by_part(format_a, tuple_a_raw, field_map_a, part); field_b = tuple_field_raw_by_part(format_b, tuple_b_raw, @@ -574,24 +591,28 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, struct key_def *key_def) { return tuple_compare_slowpath_hinted<is_nullable, has_optional_parts, - has_json_paths>(tuple_a, HINT_NONE, tuple_b, HINT_NONE, - key_def); + has_json_paths, false>(tuple_a, HINT_NONE, tuple_b, + HINT_NONE, key_def); } -template<bool is_nullable, bool has_optional_parts, bool has_json_paths> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths, + bool is_multikey> static inline int tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, hint_t tuple_hint, const char *key, uint32_t part_count, hint_t key_hint, struct key_def *key_def) { + (void)key_hint; assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(key != NULL || part_count == 0); assert(part_count <= key_def->part_count); + assert(key_def_is_multikey(key_def) == is_multikey); + assert(!is_multikey || is_multikey == has_json_paths); int rc = 0; - if ((rc = hint_cmp(tuple_hint, key_hint)) != 0) + if (!is_multikey && (rc = hint_cmp(tuple_hint, key_hint)) != 0) return rc; struct key_part *part = key_def->parts; struct tuple_format *format = tuple_format(tuple); @@ -600,7 +621,11 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, enum mp_type a_type, b_type; if (likely(part_count == 1)) { const char *field; - if (has_json_paths) { + if (is_multikey) { + field = tuple_field_raw_by_part_multikey(format, + tuple_raw, field_map, part, + (int)tuple_hint); + } else if (has_json_paths) { field = tuple_field_raw_by_part(format, tuple_raw, field_map, part); } else { @@ -630,7 +655,11 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, struct key_part *end = part + part_count; for (; part < end; ++part, mp_next(&key)) { const char *field; - if (has_json_paths) { + if (is_multikey) { + field = tuple_field_raw_by_part_multikey(format, + tuple_raw, field_map, part, + (int)tuple_hint); + } else if (has_json_paths) { field = tuple_field_raw_by_part(format, tuple_raw, field_map, part); } else { @@ -672,7 +701,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, struct key_def *key_def) { return tuple_compare_with_key_slowpath_hinted<is_nullable, - has_optional_parts, has_json_paths>(tuple, + has_optional_parts, has_json_paths, false>(tuple, HINT_NONE, key, part_count, HINT_NONE, key_def); } @@ -1301,16 +1330,22 @@ static const comparator_with_key_signature precompiled_cmp_wk_arr[] = { #define COMPARE_SLOWPATH(...) \ {tuple_compare_slowpath<__VA_ARGS__>, \ tuple_compare_with_key_slowpath<__VA_ARGS__>, \ - tuple_compare_slowpath_hinted<__VA_ARGS__>, \ - tuple_compare_with_key_slowpath_hinted<__VA_ARGS__>, \ - __VA_ARGS__, false} + tuple_compare_slowpath_hinted<__VA_ARGS__, false>, \ + tuple_compare_with_key_slowpath_hinted<__VA_ARGS__, false>, \ + __VA_ARGS__, false, false} + +#define COMPARE_SLOWPATH_MULTIKEY(...) \ + {NULL, NULL, \ + tuple_compare_slowpath_hinted<__VA_ARGS__, true>, \ + tuple_compare_with_key_slowpath_hinted<__VA_ARGS__, true>, \ + __VA_ARGS__, false, true} #define COMPARE_SEQUENTIAL(...) \ {tuple_compare_sequential<__VA_ARGS__>, \ tuple_compare_with_key_sequential<__VA_ARGS__>, \ tuple_compare_sequential_hinted<__VA_ARGS__>, \ tuple_compare_with_key_sequential_hinted<__VA_ARGS__>, \ - __VA_ARGS__, false, true} + __VA_ARGS__, false, true, false} static struct { tuple_compare_t tuple_compare; @@ -1321,6 +1356,7 @@ static struct { bool has_optional_parts; bool has_json_paths; bool is_sequential; + bool is_multikey; } cmp_arr[] = { COMPARE_SLOWPATH(false, false, false), COMPARE_SLOWPATH(true, false, false), @@ -1334,10 +1370,19 @@ static struct { COMPARE_SEQUENTIAL(true, false), COMPARE_SEQUENTIAL(false, true), COMPARE_SEQUENTIAL(true, true), + COMPARE_SLOWPATH_MULTIKEY(false, false, false), + COMPARE_SLOWPATH_MULTIKEY(true, false, false), + COMPARE_SLOWPATH_MULTIKEY(false, true, false), + COMPARE_SLOWPATH_MULTIKEY(true, true, false), + COMPARE_SLOWPATH_MULTIKEY(false, false, true), + COMPARE_SLOWPATH_MULTIKEY(true, false, true), + COMPARE_SLOWPATH_MULTIKEY(false, true, true), + COMPARE_SLOWPATH_MULTIKEY(true, true, true), }; #undef COMPARE_SLOWPATH #undef COMPARE_SEQUENTIAL +#undef COMPARE_SLOWPATH_MULTIKEY /* }}} tuple_compare_with_key */ @@ -1571,8 +1616,10 @@ key_def_set_compare_func(struct key_def *def) def->tuple_compare_with_key_hinted = NULL; def->key_hint = NULL; def->tuple_hint = NULL; + bool is_multikey = key_def_is_multikey(def); if (!def->is_nullable && !key_def_has_collation(def) && !def->has_json_paths) { + assert(!is_multikey); /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(precompiled_cmp_arr) / sizeof(precompiled_cmp_arr[0]); k++) { @@ -1611,7 +1658,8 @@ key_def_set_compare_func(struct key_def *def) } } } - if (def->tuple_compare == NULL || def->tuple_compare_with_key == NULL) { + if (def->tuple_compare_hinted == NULL || + def->tuple_compare_with_key_hinted == NULL) { for (uint32_t k = 0; k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) { if (def->is_nullable == cmp_arr[k].is_nullable && @@ -1619,7 +1667,8 @@ key_def_set_compare_func(struct key_def *def) cmp_arr[k].has_optional_parts && def->has_json_paths == cmp_arr[k].has_json_paths && key_def_is_sequential(def) == - cmp_arr[k].is_sequential) { + cmp_arr[k].is_sequential && + is_multikey == cmp_arr[k].is_multikey) { if (def->tuple_compare == NULL) { def->tuple_compare = cmp_arr[k].tuple_compare; @@ -1638,8 +1687,8 @@ key_def_set_compare_func(struct key_def *def) } } } - assert(def->tuple_compare != NULL && - def->tuple_compare_with_key != NULL); + assert(def->tuple_compare_hinted != NULL && + def->tuple_compare_with_key_hinted != NULL); for (uint32_t k = 0; k < sizeof(hint_arr) / sizeof(hint_arr[0]); k++) { if (def->parts->type == hint_arr[k].type && key_part_is_nullable(def->parts) == diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 0a8337102..00bcfacdd 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -310,7 +310,7 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, const char *src_end = field_end; if (has_json_paths && part->path != NULL) { if (tuple_go_to_path(&src, part->path, - part->path_len) != 0) { + part->path_len, -1) != 0) { /* * The path must be correct as * it has already been validated diff --git a/src/box/tuple_field_map.c b/src/box/tuple_field_map.c index ea77745aa..e76068619 100644 --- a/src/box/tuple_field_map.c +++ b/src/box/tuple_field_map.c @@ -36,6 +36,8 @@ int field_map_builder_create(struct field_map_builder *builder, uint32_t field_map_size, struct region *region) { + builder->region = region; + builder->extents_size = 0; builder->item_count = field_map_size / sizeof(uint32_t); if (field_map_size == 0) { builder->items = NULL; @@ -51,3 +53,28 @@ field_map_builder_create(struct field_map_builder *builder, builder->items = (field_map_builder_item *)((char *)builder->items + sz); return 0; } + +struct field_map_ext * +field_map_builder_ext_get(struct field_map_builder *builder, int32_t offset_slot, + uint32_t extent_items) +{ + struct field_map_ext *extent; + if (builder->items[offset_slot].has_extent) { + extent = builder->items[offset_slot].extent; + assert(extent != NULL); + assert(extent->items == extent_items); + return extent; + } + uint32_t sz = field_map_ext_size(extent_items); + extent = region_alloc(builder->region, sz); + if (extent == NULL) { + diag_set(OutOfMemory, sz, "region", "extent"); + return NULL; + } + memset(extent, 0, sz); + extent->items = extent_items; + builder->items[offset_slot].extent = extent; + builder->items[offset_slot].has_extent = true; + builder->extents_size += sz; + return extent; +} diff --git a/src/box/tuple_field_map.h b/src/box/tuple_field_map.h index 3a9927faa..042209a80 100644 --- a/src/box/tuple_field_map.h +++ b/src/box/tuple_field_map.h @@ -31,11 +31,48 @@ * SUCH DAMAGE. */ #include <stdint.h> +#include <stdlib.h> +#include <stdbool.h> struct region; +/** The field_map extent. */ +struct field_map_ext { + /** Count of field_map_ext::offset[] items. */ + uint32_t items; + /** Data offset in tuple array. */ + uint32_t offset[0]; +}; + +/** Return field_map extention allocation size. */ +static inline uint32_t +field_map_ext_size(uint32_t items) +{ + return sizeof(struct field_map_ext) + items * sizeof(uint32_t); +} + +/** Return field_map extention for field_map and offset_slot. */ +static inline struct field_map_ext * +field_map_ext_get(const uint32_t *field_map, int32_t offset_slot) +{ + return (struct field_map_ext *)((char *)field_map - + field_map[offset_slot]); +} + /** Preliminary field_map atom. */ -typedef uint32_t field_map_builder_item; +typedef struct { + /** + * True when this slot must be interpret as + * extention pointer. + */ + bool has_extent; + union { + /** Data offset in tuple. */ + uint32_t offset; + /** Pointer to field_map_ext extention. */ + struct field_map_ext *extent; + }; +} field_map_builder_item; /** A class that contains a preliminary view of the field_map. */ struct field_map_builder { @@ -49,8 +86,20 @@ struct field_map_builder { * count of field_map_builder::items. */ uint32_t item_count; + /** + * Total size of memory is allocated for field_map + * extentions. + */ + uint32_t extents_size; + /** Region to use for allocations. */ + struct region *region; }; +/** Get or allocate field_map_ext for offset_slot. */ +struct field_map_ext * +field_map_builder_ext_get(struct field_map_builder *builder, + int32_t offset_slot, uint32_t extent_items); + /** * Initialize field_map_builder to prepare field_map of size * field_map_size. Use region for temporary allocations. @@ -65,22 +114,50 @@ field_map_builder_slot_set(struct field_map_builder *builder, int32_t offset_slot, uint32_t offset) { assert(offset_slot < 0); - builder->items[offset_slot] = offset; + builder->items[offset_slot].offset = offset; +} + +/** Initialize corresponding field_map_ext with offset value. */ +static inline int +field_map_builder_ext_slot_set(struct field_map_builder *builder, + int32_t offset_slot, int32_t extent_slot, + uint32_t extent_items, uint32_t offset) +{ + assert(offset_slot < 0 && extent_slot >= 0 && extent_items > 0); + struct field_map_ext *extent = + field_map_builder_ext_get(builder, offset_slot, extent_items); + if (extent == NULL) + return -1; + assert(extent->items == extent_items); + extent->offset[extent_slot] = offset; + return 0; } /** Return built field_map size. */ static inline uint32_t field_map_builder_size(struct field_map_builder *builder) { - return builder->item_count * sizeof(uint32_t); + return builder->item_count * sizeof(uint32_t) + builder->extents_size; } /** Write build field_map size in the preallocated buffer wptr. */ static inline void field_map_builder_build(struct field_map_builder *builder, char *wptr) { - uint32_t field_map_size = field_map_builder_size(builder); - memcpy(wptr, (char *)builder->items - field_map_size, field_map_size); + uint32_t *field_map = + (uint32_t *)(wptr + field_map_builder_size(builder)); + char *extent_wptr = wptr + builder->extents_size; + for (int32_t i = -1; i >= -(int32_t)builder->item_count; i--) { + if (!builder->items[i].has_extent) { + field_map[i] = builder->items[i].offset; + continue; + } + struct field_map_ext *extent = builder->items[i].extent; + uint32_t sz = field_map_ext_size(extent->items); + extent_wptr -= sz; + field_map[i] = (char *)field_map - (char *)extent_wptr; + memcpy(extent_wptr, extent, sz); + } } #endif /* TARANTOOL_BOX_FIELD_MAP_H_INCLUDED */ diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1d765c7c7..60cca677b 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -30,6 +30,7 @@ */ #include "bit/bit.h" #include "fiber.h" +#include "tuple.h" #include "json/json.h" #include "tuple_format.h" #include "coll_id_cache.h" @@ -194,6 +195,50 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id) return NULL; } +/** + * Check if child_field may be attached to parent_field, + * update the parent_field container type if required. + */ +static int +tuple_format_field_update_type(struct tuple_field *parent_field, + struct tuple_field *child_field) +{ + enum field_type expected_type = + child_field->token.type == JSON_TOKEN_STR ? + FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; + if (child_field->token.type == JSON_TOKEN_ANY && + !json_token_is_multikey(&parent_field->token) && + !json_token_is_leaf(&parent_field->token)) { + assert(expected_type == FIELD_TYPE_ARRAY); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("field %s is already refer to document by " + "identifier and cannot use array index " + "placeholder [*]", + tuple_field_path(parent_field))); + return -1; + } + if (json_token_is_multikey(&parent_field->token) && + child_field->token.type != JSON_TOKEN_ANY) { + assert(expected_type == FIELD_TYPE_ARRAY); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("field %s is already used with array index " + "placeholder [*] and cannot refer to " + "document by identifier", + tuple_field_path(parent_field))); + return -1; + } + if (field_type1_contains_type2(parent_field->type, expected_type)) { + parent_field->type = expected_type; + } else { + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, + tuple_field_path(parent_field), + field_type_strs[parent_field->type], + field_type_strs[expected_type]); + return -1; + } + return 0; +} + /** * Given a field number and a path, add the corresponding field * to the tuple format, allocating intermediate fields if @@ -228,29 +273,16 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, *path_pool += path_len; int rc = 0; - uint32_t token_count = 0; + uint32_t token_count = 0, multikey_token_count = 0; struct json_tree *tree = &format->fields; struct json_lexer lexer; json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 && field->token.type != JSON_TOKEN_END) { - if (field->token.type == JSON_TOKEN_ANY) { - diag_set(ClientError, ER_UNSUPPORTED, - "Tarantool", "multikey indexes"); - goto fail; - } - enum field_type expected_type = - field->token.type == JSON_TOKEN_STR ? - FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; - if (field_type1_contains_type2(parent->type, expected_type)) { - parent->type = expected_type; - } else { - diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, - tuple_field_path(parent), - field_type_strs[parent->type], - field_type_strs[expected_type]); + if (tuple_format_field_update_type(parent, field) != 0) goto fail; - } + if (field->token.type == JSON_TOKEN_ANY) + multikey_token_count++; struct tuple_field *next = json_tree_lookup_entry(tree, &parent->token, &field->token, @@ -268,6 +300,18 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, if (field == NULL) goto fail; } + if (json_token_is_multikey(&parent->token) && + parent->offset_slot == TUPLE_OFFSET_SLOT_NIL) { + /** + * Allocate offset slot for array is used + * in multikey index. This is required to + * quickly extract its size. + * @see tuple_field_multikey_items() + */ + assert(parent->type == FIELD_TYPE_ARRAY); + *current_slot = *current_slot - 1; + parent->offset_slot = *current_slot; + } parent->is_key_part = true; parent = next; token_count++; @@ -280,6 +324,13 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, assert(parent != NULL); /* Update tree depth information. */ format->fields_depth = MAX(format->fields_depth, token_count + 1); + if (multikey_token_count > 1) { + assert(path_len > 0); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("no more than one array index placeholder " + "[*] is allowed in JSON path")); + goto fail; + } cleanup: tuple_field_delete(field); end: @@ -782,6 +833,7 @@ tuple_field_map_initialize(struct field_map_builder *builder, * validations and field map initialization. */ uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); + struct mp_frame *mk_parent_frame = NULL; struct mp_frame *frames = region_alloc(region, frames_sz); if (frames == NULL) { diag_set(OutOfMemory, frames_sz, "region", "frames"); @@ -805,6 +857,10 @@ tuple_field_map_initialize(struct field_map_builder *builder, mp_stack_pop(&stack); if (mp_stack_is_empty(&stack)) goto end; + if (json_token_is_multikey(parent)) { + /* Leave multikey index branch. */ + mk_parent_frame = NULL; + } parent = parent->parent; } /* @@ -855,7 +911,16 @@ tuple_field_map_initialize(struct field_map_builder *builder, field_type_strs[field->type]); return -1; } - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL && + mk_parent_frame != NULL) { + int multikey_idx = mk_parent_frame->curr; + uint32_t multikey_items = mk_parent_frame->count; + if (field_map_builder_ext_slot_set(builder, + field->offset_slot, multikey_idx, + multikey_items, + *pos - tuple) != 0) + return -1; + } else if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { field_map_builder_slot_set(builder, field->offset_slot, *pos - tuple); @@ -876,6 +941,10 @@ tuple_field_map_initialize(struct field_map_builder *builder, mp_decode_array(pos) : mp_decode_map(pos); mp_stack_push(&stack, type, size); + if (json_token_is_multikey(&field->token)) { + assert(type == MP_ARRAY); + mk_parent_frame = &frames[stack.used - 1]; + } parent = &field->token; } else { mp_next(pos); diff --git a/src/box/vinyl.c b/src/box/vinyl.c index 78f29c624..39ba76394 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -699,6 +699,12 @@ vinyl_space_check_index_def(struct space *space, struct index_def *index_def) return -1; } } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "vinyl space index cannot be multikey"); + return -1; + } return 0; } diff --git a/test/box/misc.result b/test/box/misc.result index 9f0b2c73a..c0afc7e64 100644 --- a/test/box/misc.result +++ b/test/box/misc.result @@ -515,6 +515,7 @@ t; 184: box.error.SQL_UNRECOGNIZED_SYNTAX 185: box.error.SQL_UNKNOWN_TOKEN 186: box.error.SQL_PARSER_GENERIC + 187: box.error.INDEX_MULTIKEY_INVALID ... test_run:cmd("setopt delimiter ''"); --- diff --git a/test/engine/json.result b/test/engine/json.result index a790cdfbc..1bac85edd 100644 --- a/test/engine/json.result +++ b/test/engine/json.result @@ -683,16 +683,3 @@ town:select() s:drop() --- ... --- --- gh-1260: Multikey indexes --- -s = box.schema.space.create('withdata') ---- -... -idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) ---- -- error: Tarantool does not support multikey indexes -... -s:drop() ---- -... diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua index f9a7180b1..9afa3daa2 100644 --- a/test/engine/json.test.lua +++ b/test/engine/json.test.lua @@ -192,10 +192,3 @@ town:select() name:drop() town:select() s:drop() - --- --- gh-1260: Multikey indexes --- -s = box.schema.space.create('withdata') -idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) -s:drop() diff --git a/test/engine/multikey_idx.result b/test/engine/multikey_idx.result new file mode 100644 index 000000000..f41ddafa8 --- /dev/null +++ b/test/engine/multikey_idx.result @@ -0,0 +1,228 @@ +test_run = require('test_run').new() +--- +... +engine = test_run:get_cfg('engine') +--- +... +-- +-- gh-1260: Multikey indexes +-- +s = box.schema.space.create('withdata', {engine = 'vinyl'}) +--- +... +-- Vinyl's space can't be multikey (yet). +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': primary key + cannot be multikey' +... +s:drop() +--- +... +s = box.schema.space.create('withdata', {engine = 'memtx'}) +--- +... +-- Primary index must be unique so it can't be multikey. +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': primary key + cannot be multikey' +... +pk = s:create_index('pk') +--- +... +-- Only tree index type may be mutlikey. +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': HASH index + cannot be multikey' +... +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': BITSET index + cannot be multikey' +... +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': RTREE index + cannot be multikey' +... +-- Test incompatible multikey index parts. +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) +--- +- error: 'Can''t create or modify index ''idx3'' in space ''withdata'': multikey index + parts are incompatible' +... +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) +--- +- error: 'Multikey index is invalid: no more than one array index placeholder [*] + is allowed in JSON path' +... +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) +--- +... +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +- error: 'Multikey index is invalid: field 3 is already refer to document by identifier + and cannot use array index placeholder [*]' +... +idx0:drop() +--- +... +-- Unique multikey index. +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +... +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) +--- +- error: 'Multikey index is invalid: field 3 is already used with array index placeholder + [*] and cannot refer to document by identifier' +... +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) +--- +- [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) +--- +- error: Duplicate key exists in unique index 'arr_idx' in space 'withdata' +... +-- Non-unique multikey index; two multikey indexes per space. +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) +--- +... +arr_idx:select() +--- +- - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +idx:get({'James', 'Bond'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:get({'Ivan', 'Ivanych'}) +--- +- [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +idx:get({'Vasya', 'Pupkin'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +-- Duplicates in multikey parts. +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +idx:get({'A', 'B'}) +--- +... +idx:get({'C', 'D'}) +--- +... +idx:delete({'Vasya', 'Pupkin'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +--- +- [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) +--- +- [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +arr_idx:select({1}) +--- +- - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +-- Snapshot & recovery. +box.snapshot() +--- +- ok +... +test_run:cmd("restart server default") +s = box.space["withdata"] +--- +... +idx = s.index["idx"] +--- +... +arr_idx = s.index["arr_idx"] +--- +... +s:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +arr_idx:select() +--- +- - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +s:drop() +--- +... +-- Assymetric multikey index paths. +s = box.schema.space.create('withdata') +--- +... +pk = s:create_index('pk') +--- +... +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) +--- +... +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) +--- +- [1, 1, [{'fname': 'A1', 'extra': {'sname': 'A2'}}, {'fname': 'B1'}, {'fname': 'C1', + 'extra': {'sname': 'C2'}}]] +... +s:drop() +--- +... diff --git a/test/engine/multikey_idx.test.lua b/test/engine/multikey_idx.test.lua new file mode 100644 index 000000000..2b25063cf --- /dev/null +++ b/test/engine/multikey_idx.test.lua @@ -0,0 +1,67 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +-- +-- gh-1260: Multikey indexes +-- +s = box.schema.space.create('withdata', {engine = 'vinyl'}) +-- Vinyl's space can't be multikey (yet). +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +s:drop() + +s = box.schema.space.create('withdata', {engine = 'memtx'}) +-- Primary index must be unique so it can't be multikey. +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) +pk = s:create_index('pk') +-- Only tree index type may be mutlikey. +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) +-- Test incompatible multikey index parts. +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +idx0:drop() +-- Unique multikey index. +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) +-- Non-unique multikey index; two multikey indexes per space. +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) +arr_idx:select() +idx:get({'James', 'Bond'}) +idx:get({'Ivan', 'Ivanych'}) +idx:get({'Vasya', 'Pupkin'}) +idx:select() +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) +idx:select() +-- Duplicates in multikey parts. +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) +idx:get({'A', 'B'}) +idx:get({'C', 'D'}) +idx:delete({'Vasya', 'Pupkin'}) +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) +arr_idx:select({1}) +idx:select() +-- Snapshot & recovery. +box.snapshot() +test_run:cmd("restart server default") +s = box.space["withdata"] +idx = s.index["idx"] +arr_idx = s.index["arr_idx"] +s:select() +idx:select() +arr_idx:select() +s:drop() + +-- Assymetric multikey index paths. +s = box.schema.space.create('withdata') +pk = s:create_index('pk') +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) +s:drop() -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tarantool-patches] [PATCH v2 5/5] box: introduce multikey indexes in memtx 2019-03-19 12:32 ` [PATCH v2 5/5] box: introduce multikey indexes in memtx Kirill Shcherbatov @ 2019-03-19 16:05 ` Kirill Shcherbatov 2019-03-21 12:35 ` Kirill Shcherbatov 1 sibling, 0 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-19 16:05 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev > #endif /* TARANTOOL_BOX_FIELD_MAP_H_INCLUDED */ > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 1d765c7c7..60cca677b 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -30,6 +30,7 @@ > */ > #include "bit/bit.h" > #include "fiber.h" > +#include "tuple.h" This header is redundant here. Dropped on branch. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tarantool-patches] [PATCH v2 5/5] box: introduce multikey indexes in memtx 2019-03-19 12:32 ` [PATCH v2 5/5] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-03-19 16:05 ` [tarantool-patches] " Kirill Shcherbatov @ 2019-03-21 12:35 ` Kirill Shcherbatov 1 sibling, 0 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-03-21 12:35 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev I've rebased patchset on master branch with new hints. ================================================= Part of #1257 @TarantoolBot document Title: introduce multikey indexes in memtx Multikey indexes allows you to automatically index set of documents by JSON paths having array index placeholder "[*]". Multikey index cannot be primary as it cannot be unique(by definition). Multikey index parts must be compatible: only one "[*]" placeholder is allowed in same position(for all JSON paths in index parts). Example: s = box.schema.space.create('withdata') pk = s:create_index('pk') parts = { {2, 'str', path = 'data[*].name'}, {2, 'str', path = 'data[*].extra.phone'} } idx = s:create_index('idx', {parts = parts}) s:insert({1, {data = {{name="A", extra={phone="111"}}, {name="B", extra={phone="111"}}}, garbage = 1}} idx:get({'A', '111'}) --- src/box/errcode.h | 1 + src/box/index_def.c | 25 +++- src/box/key_def.c | 10 ++ src/box/key_def.h | 18 +++ src/box/memtx_space.c | 18 +++ src/box/memtx_tree.c | 185 ++++++++++++++++++++---- src/box/tuple.c | 28 +++- src/box/tuple.h | 111 +++++++++++++-- src/box/tuple_compare.cc | 84 ++++++++--- src/box/tuple_extract_key.cc | 2 +- src/box/tuple_field_map.c | 27 ++++ src/box/tuple_field_map.h | 87 +++++++++++- src/box/tuple_format.c | 104 +++++++++++--- src/box/vinyl.c | 6 + test/box/misc.result | 1 + test/engine/json.result | 13 -- test/engine/json.test.lua | 7 - test/engine/multikey_idx.result | 228 ++++++++++++++++++++++++++++++ test/engine/multikey_idx.test.lua | 67 +++++++++ 19 files changed, 909 insertions(+), 113 deletions(-) create mode 100644 test/engine/multikey_idx.result create mode 100644 test/engine/multikey_idx.test.lua diff --git a/src/box/errcode.h b/src/box/errcode.h index 7764aa352..c6f42463d 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -240,6 +240,7 @@ struct errcode_record { /*185 */_(ER_SQL_UNKNOWN_TOKEN, "Syntax error: unrecognized token: '%.*s'") \ /*186 */_(ER_SQL_PARSER_GENERIC, "%s") \ /*187 */_(ER_SQL_ANALYZE_ARGUMENT, "ANALYZE statement argument %s is not a base table") \ + /*188 */_(ER_INDEX_MULTIKEY_INVALID, "Multikey index is invalid: %s") \ /* * !IMPORTANT! Please follow instructions at start of the file diff --git a/src/box/index_def.c b/src/box/index_def.c index f0c7dc4f6..dfa539f86 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -268,9 +268,15 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) space_name, "too many key parts"); return false; } + if (index_def->iid == 0 && key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, + space_name, "primary key cannot be multikey"); + return false; + } for (uint32_t i = 0; i < index_def->key_def->part_count; i++) { - assert(index_def->key_def->parts[i].type < field_type_MAX); - if (index_def->key_def->parts[i].fieldno > BOX_INDEX_FIELD_MAX) { + struct key_part *parts = index_def->key_def->parts; + assert(parts[i].type < field_type_MAX); + if (parts[i].fieldno > BOX_INDEX_FIELD_MAX) { diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, space_name, "field no is too big"); return false; @@ -280,8 +286,8 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) * Courtesy to a user who could have made * a typo. */ - struct key_part *part_a = &index_def->key_def->parts[i]; - struct key_part *part_b = &index_def->key_def->parts[j]; + struct key_part *part_a = &parts[i]; + struct key_part *part_b = &parts[j]; if (part_a->fieldno == part_b->fieldno && json_path_cmp(part_a->path, part_a->path_len, part_b->path, part_b->path_len, @@ -292,6 +298,17 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) return false; } } + if (key_def_is_multikey(index_def->key_def) && + json_path_cmp(parts[i].path, parts[i].path_len, + parts[index_def->key_def-> + multikey_part_idx].path, + index_def->key_def->multikey_path_len, + true, TUPLE_INDEX_BASE) != 0) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name, + "multikey index parts are incompatible"); + return false; + } } return true; } diff --git a/src/box/key_def.c b/src/box/key_def.c index f9e464402..ffc1e1665 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -164,6 +164,13 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, *path_pool += path_len; memcpy(def->parts[part_no].path, path, path_len); def->parts[part_no].path_len = path_len; + int multikey_path_len; + if (json_path_is_multikey(path, path_len, &multikey_path_len, + TUPLE_INDEX_BASE) && + def->multikey_path_len <= (uint32_t)multikey_path_len) { + def->multikey_part_idx = part_no; + def->multikey_path_len = (uint32_t)multikey_path_len; + } } else { def->parts[part_no].path = NULL; def->parts[part_no].path_len = 0; @@ -185,6 +192,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count) } def->part_count = part_count; + def->multikey_part_idx = part_count; def->unique_part_count = part_count; /* A pointer to the JSON paths data in the new key_def. */ @@ -253,6 +261,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) } key_def->part_count = part_count; + key_def->multikey_part_idx = part_count; key_def->unique_part_count = part_count; for (uint32_t item = 0; item < part_count; ++item) { @@ -672,6 +681,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second) return NULL; } new_def->part_count = new_part_count; + new_def->multikey_part_idx = new_part_count; new_def->unique_part_count = new_part_count; new_def->is_nullable = first->is_nullable || second->is_nullable; new_def->has_optional_parts = first->has_optional_parts || diff --git a/src/box/key_def.h b/src/box/key_def.h index 288cf7270..004794969 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -234,6 +234,18 @@ struct key_def { bool has_optional_parts; /** Key fields mask. @sa column_mask.h for details. */ uint64_t column_mask; + /** + * In case of multikey index, the index of the key_part + * containing JSON path with array index placeholder "[*]". + * Otherwise multikey_part_idx == part_count. + */ + uint32_t multikey_part_idx; + /** + * In case of multikey index, the length of the + * parts[multikey_part_idx].path substring "...[*]" + * @see json_path_is_multikey(). + */ + uint32_t multikey_path_len; /** The size of the 'parts' array. */ uint32_t part_count; /** Description of parts of a multipart index. */ @@ -472,6 +484,12 @@ key_def_is_sequential(const struct key_def *key_def) return true; } +static inline bool +key_def_is_multikey(const struct key_def *key_def) +{ + return key_def->multikey_part_idx < key_def->part_count; +} + /** * Return true if @a key_def defines has fields that requires * special collation comparison. diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c index 92ec0b300..9a5132979 100644 --- a/src/box/memtx_space.c +++ b/src/box/memtx_space.c @@ -638,6 +638,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "HASH index must be unique"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "HASH index cannot be multikey"); + return -1; + } break; case TREE: /* TREE index has no limitations. */ @@ -661,6 +667,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "RTREE index field type must be ARRAY"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "RTREE index cannot be multikey"); + return -1; + } /* no furter checks of parts needed */ return 0; case BITSET: @@ -683,6 +695,12 @@ memtx_space_check_index_def(struct space *space, struct index_def *index_def) "BITSET index field type must be NUM or STR"); return -1; } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "BITSET index cannot be multikey"); + return -1; + } /* no furter checks of parts needed */ return 0; default: diff --git a/src/box/memtx_tree.c b/src/box/memtx_tree.c index fe037c54a..c9d3b2130 100644 --- a/src/box/memtx_tree.c +++ b/src/box/memtx_tree.c @@ -528,7 +528,8 @@ memtx_tree_index_get(struct index *base, const char *key, struct memtx_tree_key_data key_data; key_data.key = key; key_data.part_count = part_count; - key_data.hint = key_hint(key, part_count, cmp_def); + if (!key_def_is_multikey(cmp_def)) + key_data.hint = key_hint(key, part_count, cmp_def); struct memtx_tree_data *res = memtx_tree_find(&index->tree, &key_data); *result = res != NULL ? res->tuple : NULL; return 0; @@ -584,6 +585,79 @@ memtx_tree_index_replace(struct index *base, struct tuple *old_tuple, return 0; } +int +memtx_tree_index_replace_multikey(struct index *base, struct tuple *old_tuple, + struct tuple *new_tuple, + enum dup_replace_mode mode, + struct tuple **result) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + if (new_tuple != NULL) { + int errcode = 0, tree_res = 0; + struct tuple *dup_tuple = NULL; + int multikey_idx = 0; + uint32_t items = tuple_field_multikey_items(new_tuple, cmp_def); + for (; (uint32_t)multikey_idx < items; multikey_idx++) { + struct memtx_tree_data new_data; + new_data.tuple = new_tuple; + new_data.hint = multikey_idx; + struct memtx_tree_data dup_data; + dup_data.tuple = NULL; + tree_res = memtx_tree_insert(&index->tree, new_data, + &dup_data); + if (tree_res != 0) { + diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, + "memtx_tree_index", "replace"); + break; + } + dup_tuple = dup_data.tuple; + errcode = replace_check_dup(old_tuple, dup_tuple, mode); + if (errcode != 0) { + memtx_tree_delete(&index->tree, new_data); + if (dup_tuple != NULL) { + memtx_tree_insert(&index->tree, + dup_data, NULL); + } + struct space *sp = + space_cache_find(base->def->space_id); + if (sp != NULL) { + diag_set(ClientError, errcode, + base->def->name, + space_name(sp)); + } + break; + } + } + if (tree_res != 0 || errcode != 0) { + multikey_idx--; + for (; multikey_idx >= 0; multikey_idx--) { + struct memtx_tree_data data; + data.tuple = new_tuple; + data.hint = multikey_idx; + memtx_tree_delete(&index->tree, data); + } + return -1; + } + if (dup_tuple != NULL) { + *result = dup_tuple; + return 0; + } + } + if (old_tuple != NULL) { + uint32_t items = tuple_field_multikey_items(old_tuple, cmp_def); + for (uint32_t multikey_idx = 0; multikey_idx < items; + multikey_idx++) { + struct memtx_tree_data old_data; + old_data.tuple = old_tuple; + old_data.hint = multikey_idx; + memtx_tree_delete(&index->tree, old_data); + } + } + *result = old_tuple; + return 0; +} + static struct iterator * memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, const char *key, uint32_t part_count) @@ -621,7 +695,8 @@ memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, it->type = type; it->key_data.key = key; it->key_data.part_count = part_count; - it->key_data.hint = key_hint(key, part_count, cmp_def); + if (!key_def_is_multikey(cmp_def)) + it->key_data.hint = key_hint(key, part_count, cmp_def); it->index_def = base->def; it->tree = &index->tree; it->tree_iterator = memtx_tree_invalid_iterator(); @@ -656,34 +731,43 @@ memtx_tree_index_reserve(struct index *base, uint32_t size_hint) } static int -memtx_tree_index_build_next(struct index *base, struct tuple *tuple) +memtx_tree_index_build_array_realloc(struct memtx_tree_index *index, + uint32_t items) { - struct memtx_tree_index *index = (struct memtx_tree_index *)base; - struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + bool need_realloc = false; if (index->build_array == NULL) { - index->build_array = malloc(MEMTX_EXTENT_SIZE); - if (index->build_array == NULL) { - diag_set(OutOfMemory, MEMTX_EXTENT_SIZE, - "memtx_tree_index", "build_next"); - return -1; - } index->build_array_alloc_size = MEMTX_EXTENT_SIZE / sizeof(index->build_array[0]); + need_realloc = true; } - assert(index->build_array_size <= index->build_array_alloc_size); - if (index->build_array_size == index->build_array_alloc_size) { - index->build_array_alloc_size = index->build_array_alloc_size + - index->build_array_alloc_size / 2; - struct memtx_tree_data *tmp = - realloc(index->build_array, - index->build_array_alloc_size * sizeof(*tmp)); - if (tmp == NULL) { - diag_set(OutOfMemory, index->build_array_alloc_size * - sizeof(*tmp), "memtx_tree_index", "build_next"); - return -1; - } - index->build_array = tmp; + uint32_t build_array_new_size = index->build_array_size + items; + if (build_array_new_size > index->build_array_alloc_size) { + index->build_array_alloc_size += + MAX(index->build_array_alloc_size / 2, + build_array_new_size - index->build_array_alloc_size); + need_realloc = true; + } + if (!need_realloc) + return 0; + struct memtx_tree_data *tmp = + realloc(index->build_array, + index->build_array_alloc_size * sizeof(*tmp)); + if (tmp == NULL) { + diag_set(OutOfMemory, index->build_array_alloc_size * + sizeof(*tmp), "memtx_tree_index", "build_next"); + return -1; } + index->build_array = tmp; + return 0; +} + +static int +memtx_tree_index_build_next(struct index *base, struct tuple *tuple) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + if (memtx_tree_index_build_array_realloc(index, 1) != 0) + return -1; struct memtx_tree_data *elem = &index->build_array[index->build_array_size++]; elem->tuple = tuple; @@ -691,6 +775,24 @@ memtx_tree_index_build_next(struct index *base, struct tuple *tuple) return 0; } +static int +memtx_tree_index_build_next_multikey(struct index *base, struct tuple *tuple) +{ + struct memtx_tree_index *index = (struct memtx_tree_index *)base; + struct key_def *cmp_def = memtx_tree_cmp_def(&index->tree); + uint32_t items = tuple_field_multikey_items(tuple, cmp_def); + if (memtx_tree_index_build_array_realloc(index, items) != 0) + return -1; + for (uint32_t multikey_idx = 0; multikey_idx < items; multikey_idx++) { + struct memtx_tree_data *elem = + &index->build_array[index->build_array_size++]; + assert(index->build_array_size <= index->build_array_alloc_size); + elem->tuple = tuple; + elem->hint = multikey_idx; + } + return 0; +} + static void memtx_tree_index_end_build(struct index *base) { @@ -793,6 +895,36 @@ static const struct index_vtab memtx_tree_index_vtab = { /* .end_build = */ memtx_tree_index_end_build, }; +static const struct index_vtab memtx_tree_index_multikey_vtab = { + /* .destroy = */ memtx_tree_index_destroy, + /* .commit_create = */ generic_index_commit_create, + /* .abort_create = */ generic_index_abort_create, + /* .commit_modify = */ generic_index_commit_modify, + /* .commit_drop = */ generic_index_commit_drop, + /* .update_def = */ memtx_tree_index_update_def, + /* .depends_on_pk = */ memtx_tree_index_depends_on_pk, + /* .def_change_requires_rebuild = */ + memtx_index_def_change_requires_rebuild, + /* .size = */ memtx_tree_index_size, + /* .bsize = */ memtx_tree_index_bsize, + /* .min = */ generic_index_min, + /* .max = */ generic_index_max, + /* .random = */ memtx_tree_index_random, + /* .count = */ memtx_tree_index_count, + /* .get = */ memtx_tree_index_get, + /* .replace = */ memtx_tree_index_replace_multikey, + /* .create_iterator = */ memtx_tree_index_create_iterator, + /* .create_snapshot_iterator = */ + memtx_tree_index_create_snapshot_iterator, + /* .stat = */ generic_index_stat, + /* .compact = */ generic_index_compact, + /* .reset_stat = */ generic_index_reset_stat, + /* .begin_build = */ memtx_tree_index_begin_build, + /* .reserve = */ memtx_tree_index_reserve, + /* .build_next = */ memtx_tree_index_build_next_multikey, + /* .end_build = */ memtx_tree_index_end_build, +}; + struct index * memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def) { @@ -803,8 +935,11 @@ memtx_tree_index_new(struct memtx_engine *memtx, struct index_def *def) "malloc", "struct memtx_tree_index"); return NULL; } + const struct index_vtab *vtab = key_def_is_multikey(def->key_def) ? + &memtx_tree_index_multikey_vtab : + &memtx_tree_index_vtab; if (index_create(&index->base, (struct engine *)memtx, - &memtx_tree_index_vtab, def) != 0) { + vtab, def) != 0) { free(index); return NULL; } diff --git a/src/box/tuple.c b/src/box/tuple.c index 68f0670f2..985117957 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -129,8 +129,6 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) struct field_map_builder builder; int rc = tuple_field_map_create(&builder, format, tuple, true); region_truncate(region, region_svp); - assert(rc != 0 || - field_map_builder_size(&builder) == format->field_map_size); return rc; } @@ -455,7 +453,8 @@ tuple_field_go_to_key(const char **field, const char *key, int len) } int -tuple_go_to_path(const char **data, const char *path, uint32_t path_len) +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, + int multikey_idx) { int rc; struct json_lexer lexer; @@ -463,6 +462,10 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len) json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &token)) == 0) { switch (token.type) { + case JSON_TOKEN_ANY: + assert(multikey_idx >= 0); + token.num = multikey_idx; + FALLTHROUGH; case JSON_TOKEN_NUM: rc = tuple_field_go_to_index(data, token.num); break; @@ -532,7 +535,24 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple, } return tuple_field_raw_by_path(format, tuple, field_map, fieldno, path + lexer.offset, - path_len - lexer.offset, NULL); + path_len - lexer.offset, NULL, -1); +} + +uint32_t +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_def *key_def) +{ + assert(key_def_is_multikey(key_def)); + struct key_part *part = &key_def->parts[key_def->multikey_part_idx]; + const char *array_raw = + tuple_field_raw_by_path(format, data, field_map, part->fieldno, + part->path, key_def->multikey_path_len, + NULL, -1); + if (array_raw == NULL) + return 0; + assert(mp_typeof(*array_raw) == MP_ARRAY); + return mp_decode_array(&array_raw); } /* }}} tuple_field_* getters */ diff --git a/src/box/tuple.h b/src/box/tuple.h index 8b12fd5a8..060e4a076 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -508,14 +508,51 @@ tuple_field_count(const struct tuple *tuple) * with NULL. * @param path The path to process. * @param path_len The length of the @path. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. * @retval 0 On success. * @retval -1 In case of error in JSON path. */ int -tuple_go_to_path(const char **data, const char *path, uint32_t path_len); +tuple_go_to_path(const char **data, const char *path, uint32_t path_len, + int multikey_idx); /** - * Get tuple field by field index and relative JSON path. + * Get tuple field by field_map, offset_slot and multikey_idx. + * @param tuple MessagePack tuple's body. + * @param field_map Tuple field map. + * @param offset_slot The field_map slot with field to get offset. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. + * @retval Not NULL field pointer on success. + * @retval NULL otherwise, when field is absents. + */ +static const char * +tuple_field_raw_by_offset_slot(const char *tuple, const uint32_t *field_map, + int32_t offset_slot, int multikey_idx) +{ + assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); + uint32_t offset; + if (multikey_idx >= 0) { + assert(field_map[offset_slot] != 0); + struct field_map_ext *extent = + field_map_ext_get(field_map, offset_slot); + if ((uint32_t)multikey_idx >= extent->items) + return NULL; + offset = extent->offset[multikey_idx]; + } else { + offset = field_map[offset_slot]; + } + if (offset == 0) + return NULL; + return tuple + offset; +} + +/** + * Get tuple field by field index, relative JSON path and + * multikey_idx. * @param format Tuple format. * @param tuple MessagePack tuple's body. * @param field_map Tuple field map. @@ -528,12 +565,15 @@ tuple_go_to_path(const char **data, const char *path, uint32_t path_len); * access data in a single operation. * Else it is initialized with offset_slot * of format field by path. + * @param multikey_idx The multikey index hint - index of + * document to retrieve when array index + * placeholder "[*]" is met. */ static inline const char * tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, const uint32_t *field_map, uint32_t fieldno, const char *path, uint32_t path_len, - int32_t *offset_slot_hint) + int32_t *offset_slot_hint, int multikey_idx) { int32_t offset_slot; if (offset_slot_hint != NULL && @@ -558,10 +598,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, if (offset_slot_hint != NULL) *offset_slot_hint = offset_slot; offset_slot_access: - /* Indexed field */ - if (field_map[offset_slot] == 0) - return NULL; - tuple += field_map[offset_slot]; + tuple = tuple_field_raw_by_offset_slot(tuple, field_map, + offset_slot, multikey_idx); } else { uint32_t field_count; parse: @@ -572,8 +610,8 @@ parse: for (uint32_t k = 0; k < fieldno; k++) mp_next(&tuple); if (path != NULL && - unlikely(tuple_go_to_path(&tuple, path, - path_len) != 0)) + unlikely(tuple_go_to_path(&tuple, path, path_len, + multikey_idx) != 0)) return NULL; } return tuple; @@ -595,7 +633,7 @@ tuple_field_raw(struct tuple_format *format, const char *tuple, const uint32_t *field_map, uint32_t field_no) { return tuple_field_raw_by_path(format, tuple, field_map, field_no, - NULL, 0, NULL); + NULL, 0, NULL, -1); } /** @@ -634,16 +672,18 @@ tuple_field_raw_by_full_path(struct tuple_format *format, const char *tuple, uint32_t path_len, uint32_t path_hash); /** - * Get a tuple field pointed to by an index part. + * Get a tuple field pointed to by an index part and multikey hint. * @param format Tuple format. * @param tuple A pointer to MessagePack array. * @param field_map A pointer to the LAST element of field map. + * @param multikey_idx A multikey hint. * @param part Index part to use. * @retval Field data if the field exists or NULL. */ static inline const char * -tuple_field_raw_by_part(struct tuple_format *format, const char *data, - const uint32_t *field_map, struct key_part *part) +tuple_field_raw_by_part_multikey(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_part *part, int multikey_idx) { if (unlikely(part->format_epoch != format->epoch)) { assert(format->epoch != 0); @@ -656,7 +696,23 @@ tuple_field_raw_by_part(struct tuple_format *format, const char *data, } return tuple_field_raw_by_path(format, data, field_map, part->fieldno, part->path, part->path_len, - &part->offset_slot_cache); + &part->offset_slot_cache, multikey_idx); +} + +/** + * Get a tuple field pointed to by an index part. + * @param format Tuple format. + * @param tuple A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. + */ +static inline const char * +tuple_field_raw_by_part(struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) +{ + return tuple_field_raw_by_part_multikey(format, data, field_map, + part, -1); } /** @@ -672,6 +728,33 @@ tuple_field_by_part(const struct tuple *tuple, struct key_part *part) tuple_field_map(tuple), part); } +/** + * Get count of documents in the multikey index. + * @param format Tuple format. + * @param data A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param key_def Index key_definition. + * @retval Count of documents in the multikey index. + */ +uint32_t +tuple_field_raw_multikey_items(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_def *key_def); + +/** + * Get count of documents in the multikey index. + * @param tuple Tuple to get the count of documents from. + * @param key_def Index key_definition. + * @retval Count of documents in the multikey index. + */ +static inline uint32_t +tuple_field_multikey_items(struct tuple *tuple, struct key_def *key_def) +{ + return tuple_field_raw_multikey_items(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), key_def); +} + /** * @brief Tuple Interator */ diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index 93756365b..14c630af5 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -445,7 +445,8 @@ tuple_compare_field_with_type(const char *field_a, enum mp_type a_type, } } -template<bool is_nullable, bool has_optional_parts, bool has_json_paths> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths, + bool is_multikey> static inline int tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, const struct tuple *tuple_b, hint_t tuple_b_hint, @@ -455,8 +456,10 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); - int rc = hint_cmp(tuple_a_hint, tuple_b_hint); - if (rc != 0) + assert(key_def_is_multikey(key_def) == is_multikey); + assert(!is_multikey || is_multikey == has_json_paths); + int rc = 0; + if (!is_multikey && (rc = hint_cmp(tuple_a_hint, tuple_b_hint)) != 0) return rc; struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); @@ -499,7 +502,14 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, end = part + key_def->part_count; for (; part < end; part++) { - if (has_json_paths) { + if (is_multikey) { + field_a = tuple_field_raw_by_part_multikey(format_a, + tuple_a_raw, field_map_a, part, + (int)tuple_a_hint); + field_b = tuple_field_raw_by_part_multikey(format_b, + tuple_b_raw, field_map_b, part, + (int)tuple_b_hint); + } else if (has_json_paths) { field_a = tuple_field_raw_by_part(format_a, tuple_a_raw, field_map_a, part); field_b = tuple_field_raw_by_part(format_b, tuple_b_raw, @@ -556,7 +566,14 @@ tuple_compare_slowpath_hinted(const struct tuple *tuple_a, hint_t tuple_a_hint, */ end = key_def->parts + key_def->part_count; for (; part < end; ++part) { - if (has_json_paths) { + if (is_multikey) { + field_a = tuple_field_raw_by_part_multikey(format_a, + tuple_a_raw, field_map_a, part, + (int)tuple_a_hint); + field_b = tuple_field_raw_by_part_multikey(format_b, + tuple_b_raw, field_map_b, part, + (int)tuple_b_hint); + } else if (has_json_paths) { field_a = tuple_field_raw_by_part(format_a, tuple_a_raw, field_map_a, part); field_b = tuple_field_raw_by_part(format_b, tuple_b_raw, @@ -586,24 +603,28 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, struct key_def *key_def) { return tuple_compare_slowpath_hinted - <is_nullable, has_optional_parts, has_json_paths> + <is_nullable, has_optional_parts, has_json_paths, false> (tuple_a, HINT_NONE, tuple_b, HINT_NONE, key_def); } -template<bool is_nullable, bool has_optional_parts, bool has_json_paths> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths, + bool is_multikey> static inline int tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, hint_t tuple_hint, const char *key, uint32_t part_count, hint_t key_hint, struct key_def *key_def) { + (void)key_hint; assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(key != NULL || part_count == 0); assert(part_count <= key_def->part_count); - int rc = hint_cmp(tuple_hint, key_hint); - if (rc != 0) + assert(key_def_is_multikey(key_def) == is_multikey); + assert(!is_multikey || is_multikey == has_json_paths); + int rc = 0; + if (!is_multikey && (rc = hint_cmp(tuple_hint, key_hint)) != 0) return rc; struct key_part *part = key_def->parts; struct tuple_format *format = tuple_format(tuple); @@ -612,7 +633,11 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, enum mp_type a_type, b_type; if (likely(part_count == 1)) { const char *field; - if (has_json_paths) { + if (is_multikey) { + field = tuple_field_raw_by_part_multikey(format, + tuple_raw, field_map, part, + (int)tuple_hint); + } else if (has_json_paths) { field = tuple_field_raw_by_part(format, tuple_raw, field_map, part); } else { @@ -642,7 +667,11 @@ tuple_compare_with_key_slowpath_hinted(const struct tuple *tuple, struct key_part *end = part + part_count; for (; part < end; ++part, mp_next(&key)) { const char *field; - if (has_json_paths) { + if (is_multikey) { + field = tuple_field_raw_by_part_multikey(format, + tuple_raw, field_map, part, + (int)tuple_hint); + } else if (has_json_paths) { field = tuple_field_raw_by_part(format, tuple_raw, field_map, part); } else { @@ -684,7 +713,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, struct key_def *key_def) { return tuple_compare_with_key_slowpath_hinted - <is_nullable, has_optional_parts, has_json_paths> + <is_nullable, has_optional_parts, has_json_paths, false> (tuple, HINT_NONE, key, part_count, HINT_NONE, key_def); } @@ -1710,7 +1739,7 @@ key_def_set_compare_func_fast(struct key_def *def) tuple_compare_slowpath<false, false, false>; cmp_hinted = is_sequential ? tuple_compare_sequential_hinted<false, false> : - tuple_compare_slowpath_hinted<false, false, false>; + tuple_compare_slowpath_hinted<false, false, false, false>; } if (cmp_wk == NULL) { cmp_wk = is_sequential ? @@ -1718,7 +1747,8 @@ key_def_set_compare_func_fast(struct key_def *def) tuple_compare_with_key_slowpath<false, false, false>; cmp_wk_hinted = is_sequential ? tuple_compare_with_key_sequential_hinted<false, false> : - tuple_compare_with_key_slowpath_hinted<false, false, false>; + tuple_compare_with_key_slowpath_hinted<false, false, + false, false>; } def->tuple_compare = cmp; @@ -1746,16 +1776,17 @@ key_def_set_compare_func_plain(struct key_def *def) def->tuple_compare = tuple_compare_slowpath <is_nullable, has_optional_parts, false>; def->tuple_compare_hinted = tuple_compare_slowpath_hinted - <is_nullable, has_optional_parts, false>; + <is_nullable, has_optional_parts, false, false>; def->tuple_compare_with_key = tuple_compare_with_key_slowpath <is_nullable, has_optional_parts, false>; def->tuple_compare_with_key_hinted = tuple_compare_with_key_slowpath_hinted - <is_nullable, has_optional_parts, false>; + <is_nullable, has_optional_parts, + false, false>; } } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool is_multikey> static void key_def_set_compare_func_json(struct key_def *def) { @@ -1763,12 +1794,12 @@ key_def_set_compare_func_json(struct key_def *def) def->tuple_compare = tuple_compare_slowpath <is_nullable, has_optional_parts, true>; def->tuple_compare_hinted = tuple_compare_slowpath_hinted - <is_nullable, has_optional_parts, true>; + <is_nullable, has_optional_parts, true, is_multikey>; def->tuple_compare_with_key = tuple_compare_with_key_slowpath <is_nullable, has_optional_parts, true>; def->tuple_compare_with_key_hinted = tuple_compare_with_key_slowpath_hinted - <is_nullable, has_optional_parts, true>; + <is_nullable, has_optional_parts, true, is_multikey>; } void @@ -1786,14 +1817,23 @@ key_def_set_compare_func(struct key_def *def) assert(!def->is_nullable && !def->has_optional_parts); key_def_set_compare_func_plain<false, false>(def); } + } else if (key_def_is_multikey(def)) { + if (def->is_nullable && def->has_optional_parts) { + key_def_set_compare_func_json<true, true, true>(def); + } else if (def->is_nullable && !def->has_optional_parts) { + key_def_set_compare_func_json<true, false, true>(def); + } else { + assert(!def->is_nullable && !def->has_optional_parts); + key_def_set_compare_func_json<false, false, true>(def); + } } else { if (def->is_nullable && def->has_optional_parts) { - key_def_set_compare_func_json<true, true>(def); + key_def_set_compare_func_json<true, true, false>(def); } else if (def->is_nullable && !def->has_optional_parts) { - key_def_set_compare_func_json<true, false>(def); + key_def_set_compare_func_json<true, false, false>(def); } else { assert(!def->is_nullable && !def->has_optional_parts); - key_def_set_compare_func_json<false, false>(def); + key_def_set_compare_func_json<false, false, false>(def); } } key_def_set_hint_func(def); diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 0a8337102..00bcfacdd 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -310,7 +310,7 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, const char *src_end = field_end; if (has_json_paths && part->path != NULL) { if (tuple_go_to_path(&src, part->path, - part->path_len) != 0) { + part->path_len, -1) != 0) { /* * The path must be correct as * it has already been validated diff --git a/src/box/tuple_field_map.c b/src/box/tuple_field_map.c index ea77745aa..e76068619 100644 --- a/src/box/tuple_field_map.c +++ b/src/box/tuple_field_map.c @@ -36,6 +36,8 @@ int field_map_builder_create(struct field_map_builder *builder, uint32_t field_map_size, struct region *region) { + builder->region = region; + builder->extents_size = 0; builder->item_count = field_map_size / sizeof(uint32_t); if (field_map_size == 0) { builder->items = NULL; @@ -51,3 +53,28 @@ field_map_builder_create(struct field_map_builder *builder, builder->items = (field_map_builder_item *)((char *)builder->items + sz); return 0; } + +struct field_map_ext * +field_map_builder_ext_get(struct field_map_builder *builder, int32_t offset_slot, + uint32_t extent_items) +{ + struct field_map_ext *extent; + if (builder->items[offset_slot].has_extent) { + extent = builder->items[offset_slot].extent; + assert(extent != NULL); + assert(extent->items == extent_items); + return extent; + } + uint32_t sz = field_map_ext_size(extent_items); + extent = region_alloc(builder->region, sz); + if (extent == NULL) { + diag_set(OutOfMemory, sz, "region", "extent"); + return NULL; + } + memset(extent, 0, sz); + extent->items = extent_items; + builder->items[offset_slot].extent = extent; + builder->items[offset_slot].has_extent = true; + builder->extents_size += sz; + return extent; +} diff --git a/src/box/tuple_field_map.h b/src/box/tuple_field_map.h index 3a9927faa..042209a80 100644 --- a/src/box/tuple_field_map.h +++ b/src/box/tuple_field_map.h @@ -31,11 +31,48 @@ * SUCH DAMAGE. */ #include <stdint.h> +#include <stdlib.h> +#include <stdbool.h> struct region; +/** The field_map extent. */ +struct field_map_ext { + /** Count of field_map_ext::offset[] items. */ + uint32_t items; + /** Data offset in tuple array. */ + uint32_t offset[0]; +}; + +/** Return field_map extention allocation size. */ +static inline uint32_t +field_map_ext_size(uint32_t items) +{ + return sizeof(struct field_map_ext) + items * sizeof(uint32_t); +} + +/** Return field_map extention for field_map and offset_slot. */ +static inline struct field_map_ext * +field_map_ext_get(const uint32_t *field_map, int32_t offset_slot) +{ + return (struct field_map_ext *)((char *)field_map - + field_map[offset_slot]); +} + /** Preliminary field_map atom. */ -typedef uint32_t field_map_builder_item; +typedef struct { + /** + * True when this slot must be interpret as + * extention pointer. + */ + bool has_extent; + union { + /** Data offset in tuple. */ + uint32_t offset; + /** Pointer to field_map_ext extention. */ + struct field_map_ext *extent; + }; +} field_map_builder_item; /** A class that contains a preliminary view of the field_map. */ struct field_map_builder { @@ -49,8 +86,20 @@ struct field_map_builder { * count of field_map_builder::items. */ uint32_t item_count; + /** + * Total size of memory is allocated for field_map + * extentions. + */ + uint32_t extents_size; + /** Region to use for allocations. */ + struct region *region; }; +/** Get or allocate field_map_ext for offset_slot. */ +struct field_map_ext * +field_map_builder_ext_get(struct field_map_builder *builder, + int32_t offset_slot, uint32_t extent_items); + /** * Initialize field_map_builder to prepare field_map of size * field_map_size. Use region for temporary allocations. @@ -65,22 +114,50 @@ field_map_builder_slot_set(struct field_map_builder *builder, int32_t offset_slot, uint32_t offset) { assert(offset_slot < 0); - builder->items[offset_slot] = offset; + builder->items[offset_slot].offset = offset; +} + +/** Initialize corresponding field_map_ext with offset value. */ +static inline int +field_map_builder_ext_slot_set(struct field_map_builder *builder, + int32_t offset_slot, int32_t extent_slot, + uint32_t extent_items, uint32_t offset) +{ + assert(offset_slot < 0 && extent_slot >= 0 && extent_items > 0); + struct field_map_ext *extent = + field_map_builder_ext_get(builder, offset_slot, extent_items); + if (extent == NULL) + return -1; + assert(extent->items == extent_items); + extent->offset[extent_slot] = offset; + return 0; } /** Return built field_map size. */ static inline uint32_t field_map_builder_size(struct field_map_builder *builder) { - return builder->item_count * sizeof(uint32_t); + return builder->item_count * sizeof(uint32_t) + builder->extents_size; } /** Write build field_map size in the preallocated buffer wptr. */ static inline void field_map_builder_build(struct field_map_builder *builder, char *wptr) { - uint32_t field_map_size = field_map_builder_size(builder); - memcpy(wptr, (char *)builder->items - field_map_size, field_map_size); + uint32_t *field_map = + (uint32_t *)(wptr + field_map_builder_size(builder)); + char *extent_wptr = wptr + builder->extents_size; + for (int32_t i = -1; i >= -(int32_t)builder->item_count; i--) { + if (!builder->items[i].has_extent) { + field_map[i] = builder->items[i].offset; + continue; + } + struct field_map_ext *extent = builder->items[i].extent; + uint32_t sz = field_map_ext_size(extent->items); + extent_wptr -= sz; + field_map[i] = (char *)field_map - (char *)extent_wptr; + memcpy(extent_wptr, extent, sz); + } } #endif /* TARANTOOL_BOX_FIELD_MAP_H_INCLUDED */ diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1d765c7c7..883656bf9 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -194,6 +194,50 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id) return NULL; } +/** + * Check if child_field may be attached to parent_field, + * update the parent_field container type if required. + */ +static int +tuple_format_field_update_type(struct tuple_field *parent_field, + struct tuple_field *child_field) +{ + enum field_type expected_type = + child_field->token.type == JSON_TOKEN_STR ? + FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; + if (child_field->token.type == JSON_TOKEN_ANY && + !json_token_is_multikey(&parent_field->token) && + !json_token_is_leaf(&parent_field->token)) { + assert(expected_type == FIELD_TYPE_ARRAY); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("field %s is already refer to document by " + "identifier and cannot use array index " + "placeholder [*]", + tuple_field_path(parent_field))); + return -1; + } + if (json_token_is_multikey(&parent_field->token) && + child_field->token.type != JSON_TOKEN_ANY) { + assert(expected_type == FIELD_TYPE_ARRAY); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("field %s is already used with array index " + "placeholder [*] and cannot refer to " + "document by identifier", + tuple_field_path(parent_field))); + return -1; + } + if (field_type1_contains_type2(parent_field->type, expected_type)) { + parent_field->type = expected_type; + } else { + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, + tuple_field_path(parent_field), + field_type_strs[parent_field->type], + field_type_strs[expected_type]); + return -1; + } + return 0; +} + /** * Given a field number and a path, add the corresponding field * to the tuple format, allocating intermediate fields if @@ -228,29 +272,16 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, *path_pool += path_len; int rc = 0; - uint32_t token_count = 0; + uint32_t token_count = 0, multikey_token_count = 0; struct json_tree *tree = &format->fields; struct json_lexer lexer; json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 && field->token.type != JSON_TOKEN_END) { - if (field->token.type == JSON_TOKEN_ANY) { - diag_set(ClientError, ER_UNSUPPORTED, - "Tarantool", "multikey indexes"); - goto fail; - } - enum field_type expected_type = - field->token.type == JSON_TOKEN_STR ? - FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; - if (field_type1_contains_type2(parent->type, expected_type)) { - parent->type = expected_type; - } else { - diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, - tuple_field_path(parent), - field_type_strs[parent->type], - field_type_strs[expected_type]); + if (tuple_format_field_update_type(parent, field) != 0) goto fail; - } + if (field->token.type == JSON_TOKEN_ANY) + multikey_token_count++; struct tuple_field *next = json_tree_lookup_entry(tree, &parent->token, &field->token, @@ -268,6 +299,18 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, if (field == NULL) goto fail; } + if (json_token_is_multikey(&parent->token) && + parent->offset_slot == TUPLE_OFFSET_SLOT_NIL) { + /** + * Allocate offset slot for array is used + * in multikey index. This is required to + * quickly extract its size. + * @see tuple_field_multikey_items() + */ + assert(parent->type == FIELD_TYPE_ARRAY); + *current_slot = *current_slot - 1; + parent->offset_slot = *current_slot; + } parent->is_key_part = true; parent = next; token_count++; @@ -280,6 +323,13 @@ tuple_format_add_field(struct tuple_format *format, uint32_t fieldno, assert(parent != NULL); /* Update tree depth information. */ format->fields_depth = MAX(format->fields_depth, token_count + 1); + if (multikey_token_count > 1) { + assert(path_len > 0); + diag_set(ClientError, ER_INDEX_MULTIKEY_INVALID, + tt_sprintf("no more than one array index placeholder " + "[*] is allowed in JSON path")); + goto fail; + } cleanup: tuple_field_delete(field); end: @@ -782,6 +832,7 @@ tuple_field_map_initialize(struct field_map_builder *builder, * validations and field map initialization. */ uint32_t frames_sz = format->fields_depth * sizeof(struct mp_frame); + struct mp_frame *mk_parent_frame = NULL; struct mp_frame *frames = region_alloc(region, frames_sz); if (frames == NULL) { diag_set(OutOfMemory, frames_sz, "region", "frames"); @@ -805,6 +856,10 @@ tuple_field_map_initialize(struct field_map_builder *builder, mp_stack_pop(&stack); if (mp_stack_is_empty(&stack)) goto end; + if (json_token_is_multikey(parent)) { + /* Leave multikey index branch. */ + mk_parent_frame = NULL; + } parent = parent->parent; } /* @@ -855,7 +910,16 @@ tuple_field_map_initialize(struct field_map_builder *builder, field_type_strs[field->type]); return -1; } - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL && + mk_parent_frame != NULL) { + int multikey_idx = mk_parent_frame->curr; + uint32_t multikey_items = mk_parent_frame->count; + if (field_map_builder_ext_slot_set(builder, + field->offset_slot, multikey_idx, + multikey_items, + *pos - tuple) != 0) + return -1; + } else if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { field_map_builder_slot_set(builder, field->offset_slot, *pos - tuple); @@ -876,6 +940,10 @@ tuple_field_map_initialize(struct field_map_builder *builder, mp_decode_array(pos) : mp_decode_map(pos); mp_stack_push(&stack, type, size); + if (json_token_is_multikey(&field->token)) { + assert(type == MP_ARRAY); + mk_parent_frame = &frames[stack.used - 1]; + } parent = &field->token; } else { mp_next(pos); diff --git a/src/box/vinyl.c b/src/box/vinyl.c index 78f29c624..39ba76394 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -699,6 +699,12 @@ vinyl_space_check_index_def(struct space *space, struct index_def *index_def) return -1; } } + if (key_def_is_multikey(index_def->key_def)) { + diag_set(ClientError, ER_MODIFY_INDEX, + index_def->name, space_name(space), + "vinyl space index cannot be multikey"); + return -1; + } return 0; } diff --git a/test/box/misc.result b/test/box/misc.result index c350bbd73..88cc64daf 100644 --- a/test/box/misc.result +++ b/test/box/misc.result @@ -516,6 +516,7 @@ t; 185: box.error.SQL_UNKNOWN_TOKEN 186: box.error.SQL_PARSER_GENERIC 187: box.error.SQL_ANALYZE_ARGUMENT + 188: box.error.INDEX_MULTIKEY_INVALID ... test_run:cmd("setopt delimiter ''"); --- diff --git a/test/engine/json.result b/test/engine/json.result index a790cdfbc..1bac85edd 100644 --- a/test/engine/json.result +++ b/test/engine/json.result @@ -683,16 +683,3 @@ town:select() s:drop() --- ... --- --- gh-1260: Multikey indexes --- -s = box.schema.space.create('withdata') ---- -... -idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) ---- -- error: Tarantool does not support multikey indexes -... -s:drop() ---- -... diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua index f9a7180b1..9afa3daa2 100644 --- a/test/engine/json.test.lua +++ b/test/engine/json.test.lua @@ -192,10 +192,3 @@ town:select() name:drop() town:select() s:drop() - --- --- gh-1260: Multikey indexes --- -s = box.schema.space.create('withdata') -idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) -s:drop() diff --git a/test/engine/multikey_idx.result b/test/engine/multikey_idx.result new file mode 100644 index 000000000..f41ddafa8 --- /dev/null +++ b/test/engine/multikey_idx.result @@ -0,0 +1,228 @@ +test_run = require('test_run').new() +--- +... +engine = test_run:get_cfg('engine') +--- +... +-- +-- gh-1260: Multikey indexes +-- +s = box.schema.space.create('withdata', {engine = 'vinyl'}) +--- +... +-- Vinyl's space can't be multikey (yet). +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': primary key + cannot be multikey' +... +s:drop() +--- +... +s = box.schema.space.create('withdata', {engine = 'memtx'}) +--- +... +-- Primary index must be unique so it can't be multikey. +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': primary key + cannot be multikey' +... +pk = s:create_index('pk') +--- +... +-- Only tree index type may be mutlikey. +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': HASH index + cannot be multikey' +... +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': BITSET index + cannot be multikey' +... +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) +--- +- error: 'Can''t create or modify index ''idx'' in space ''withdata'': RTREE index + cannot be multikey' +... +-- Test incompatible multikey index parts. +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) +--- +- error: 'Can''t create or modify index ''idx3'' in space ''withdata'': multikey index + parts are incompatible' +... +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) +--- +- error: 'Multikey index is invalid: no more than one array index placeholder [*] + is allowed in JSON path' +... +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) +--- +... +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +- error: 'Multikey index is invalid: field 3 is already refer to document by identifier + and cannot use array index placeholder [*]' +... +idx0:drop() +--- +... +-- Unique multikey index. +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +--- +... +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) +--- +- error: 'Multikey index is invalid: field 3 is already used with array index placeholder + [*] and cannot refer to document by identifier' +... +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) +--- +- [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) +--- +- error: Duplicate key exists in unique index 'arr_idx' in space 'withdata' +... +-- Non-unique multikey index; two multikey indexes per space. +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) +--- +... +arr_idx:select() +--- +- - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +idx:get({'James', 'Bond'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:get({'Ivan', 'Ivanych'}) +--- +- [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +idx:get({'Vasya', 'Pupkin'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +-- Duplicates in multikey parts. +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) +--- +- error: Duplicate key exists in unique index 'idx' in space 'withdata' +... +idx:get({'A', 'B'}) +--- +... +idx:get({'C', 'D'}) +--- +... +idx:delete({'Vasya', 'Pupkin'}) +--- +- [1, [1, 2, 3], [{'fname': 'James', 'sname': 'Bond'}, {'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +--- +- [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) +--- +- [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +arr_idx:select({1}) +--- +- - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +-- Snapshot & recovery. +box.snapshot() +--- +- ok +... +test_run:cmd("restart server default") +s = box.space["withdata"] +--- +... +idx = s.index["idx"] +--- +... +arr_idx = s.index["arr_idx"] +--- +... +s:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] +... +idx:select() +--- +- - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] +... +arr_idx:select() +--- +- - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [7, [1], [{'fname': 'James', 'sname': 'Bond'}]] + - [6, [1, 2], [{'fname': 'Vasya', 'sname': 'Pupkin'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] + - [2, [3, 4, 5], [{'fname': 'Ivan', 'sname': 'Ivanych'}]] +... +s:drop() +--- +... +-- Assymetric multikey index paths. +s = box.schema.space.create('withdata') +--- +... +pk = s:create_index('pk') +--- +... +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) +--- +... +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) +--- +- [1, 1, [{'fname': 'A1', 'extra': {'sname': 'A2'}}, {'fname': 'B1'}, {'fname': 'C1', + 'extra': {'sname': 'C2'}}]] +... +s:drop() +--- +... diff --git a/test/engine/multikey_idx.test.lua b/test/engine/multikey_idx.test.lua new file mode 100644 index 000000000..2b25063cf --- /dev/null +++ b/test/engine/multikey_idx.test.lua @@ -0,0 +1,67 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') + +-- +-- gh-1260: Multikey indexes +-- +s = box.schema.space.create('withdata', {engine = 'vinyl'}) +-- Vinyl's space can't be multikey (yet). +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +s:drop() + +s = box.schema.space.create('withdata', {engine = 'memtx'}) +-- Primary index must be unique so it can't be multikey. +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}}}) +pk = s:create_index('pk') +-- Only tree index type may be mutlikey. +_ = s:create_index('idx', {type = 'hash', unique = true, parts = {{3, 'str', path = '[*].fname'}}}) +_ = s:create_index('idx', {type = 'bitset', unique = false, parts = {{3, 'str', path = '[*].fname'}}}) +_ = s:create_index('idx', {type = 'rtree', unique = false, parts = {{3, 'array', path = '[*].fname'}}}) +-- Test incompatible multikey index parts. +_ = s:create_index('idx3', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '["data"][*].sname'}}}) +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname[*].a'}}}) +idx0 = s:create_index('idx0', {parts = {{3, 'str', path = '[1].fname'}}}) +_ = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +idx0:drop() +-- Unique multikey index. +idx = s:create_index('idx', {unique = true, parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].sname'}}}) +_ = s:create_index('idx2', {parts = {{3, 'str', path = '[1].fname'}, {3, 'str', path = '[1].sname'}}}) +s:insert({1, {1, 2, 3}, {{fname='James', sname='Bond'}, {fname='Vasya', sname='Pupkin'}}}) +s:insert({2, {3, 4, 5}, {{fname='Ivan', sname='Ivanych'}}}) +_ = s:create_index('arr_idx', {unique = true, parts = {{2, 'unsigned', path = '[*]'}}}) +-- Non-unique multikey index; two multikey indexes per space. +arr_idx = s:create_index('arr_idx', {unique = false, parts = {{2, 'unsigned', path = '[*]'}}}) +arr_idx:select() +idx:get({'James', 'Bond'}) +idx:get({'Ivan', 'Ivanych'}) +idx:get({'Vasya', 'Pupkin'}) +idx:select() +s:insert({3, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +s:insert({4, {1}, {{fname='James', sname='Bond'}}}) +idx:select() +-- Duplicates in multikey parts. +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) +idx:get({'A', 'B'}) +idx:get({'C', 'D'}) +idx:delete({'Vasya', 'Pupkin'}) +s:insert({6, {1, 2}, {{fname='Vasya', sname='Pupkin'}}}) +s:insert({7, {1}, {{fname='James', sname='Bond'}}}) +arr_idx:select({1}) +idx:select() +-- Snapshot & recovery. +box.snapshot() +test_run:cmd("restart server default") +s = box.space["withdata"] +idx = s.index["idx"] +arr_idx = s.index["arr_idx"] +s:select() +idx:select() +arr_idx:select() +s:drop() + +-- Assymetric multikey index paths. +s = box.schema.space.create('withdata') +pk = s:create_index('pk') +idx = s:create_index('idx', {parts = {{3, 'str', path = '[*].fname'}, {3, 'str', path = '[*].extra.sname', is_nullable = true}}}) +s:insert({1, 1, {{fname='A1', extra={sname='A2'}}, {fname='B1'}, {fname='C1', extra={sname='C2'}}}}) +s:drop() -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/5] box: introduce multikey indexes in memtx 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov ` (4 preceding siblings ...) 2019-03-19 12:32 ` [PATCH v2 5/5] box: introduce multikey indexes in memtx Kirill Shcherbatov @ 2019-03-28 10:21 ` Vladimir Davydov 2019-04-02 15:49 ` [tarantool-patches] " Kirill Shcherbatov 5 siblings, 1 reply; 13+ messages in thread From: Vladimir Davydov @ 2019-03-28 10:21 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches First, please address comments by Kostja - you seem to be silently ignoring them, which is impolite. Also, please stop replying to comments with a new patch. Mail threads exist to discuss, not to mindlessly fix everything that a review spotted. A reviewer may be wrong now and then, in which case you have to think yourself and argue if necessary. Once the change set is big enough, send a new series with a proper change log. Take a look at how Cyrill handles it, for example. Also, your hunger to quickly deliver a patch while sacrificing code quality and making dubious design decision is really annoying. And I dare say, I'm not alone thinking that - ask Nikita or Vlad. In a real-world open-source project (e.g. Linux kernel), your emails would soon land in junk - people out there are far not as patient as we are. You should finally start putting much more time and efforts into thinking through the design, self-reviewing and polishing your code and comments rather than mindlessly typing and thus wasting reviewer's time. I haven't reviewed the patches carefully yet (I hope you'll do it yourself before submitting a new version). Here's a few things that caught my eye after looking through the code. On Tue, Mar 19, 2019 at 03:32:05PM +0300, Kirill Shcherbatov wrote: > lib: introduce json_path_is_multikey helper The helper returns true iff multikey_len equals path hence the bool return value is useless. Return multikey offset instead. And rename the function appropriately. And as Kostja mentioned, the new function isn't tested properly. > lib: introduce is_weak_cmp flag for json_path_cmp This flag is ugly, both semantically and esthetically. As a matter fact, I don't see why you would need such a function at all. You can check if paths to the first multikey part is the same right in key_def_new. BTW I don't like that this check lives in index_def_is_valid, because it means that there may be a key_def with invalid configuration. Besides, this makes checks scattered all over the code, making it difficult to follow. > box: move offset_slot init to tuple_format_add_field > box: introduce field_map_builder for field_map init Cutting a function in two halves at a seemingly random place, as you did in case of tuple_field_map_create, is a wrong way to refactor it. What you need to do is move out semantically independent parts. E.g. required fields initialization/checking, tuple data/format traversal. There's a lot of code duplication between vy_stmt_new_surrogate_delete and tuple_create_field_map - you should try to eliminate it before extending field map. Also, field map querying should live in this new module, too, because keeping code building and using something in one place is easier for understanding. BTW, better call field_map.[hc] not tuple_field_map.[hc], because the object is called simply field_map. > +/** Preliminary field_map atom. */ > +typedef uint32_t field_map_builder_item; We don't use typedefs for things like that. The name's bad, too: 'item' is too vague. Other names are far not perfect, either. E.g. consider field_map_builder_size. Is it the size of a field_map_builder? No, it's the size of a field map it will build. Please try to come up with better names. Also, comments as they are aren't enough. You should explain in a nutshell how a builder is used, what field map looks like, why offsets are negative, everything that may raise questions. > @@ -877,17 +819,17 @@ tuple_field_map_create(struct tuple_format *format, const char *tuple, > token.num = idx; > break; > case MP_MAP: > - if (mp_typeof(*pos) != MP_STR) { > + if (mp_typeof(**pos) != MP_STR) { > /* > * JSON path support only string > * keys for map. Skip this entry. > */ > - mp_next(&pos); > - mp_next(&pos); > + mp_next(pos); > + mp_next(pos); I don't see why you'd need to change *pos to **pos. > +static inline void > +field_map_builder_build(struct field_map_builder *builder, char *wptr) The next patch makes this function quite a big one. I think we should move it to C file. > box: introduce multikey indexes in memtx > +/** The field_map extent. */ > +struct field_map_ext { The comment is useless: it simply reiterates the struct name without shedding any light on the nature of the object. It seems to me that your comments are bad not because of your poor English (it isn't much worse than of other members of our team), but because of your attitude - you simply don't believe that they are worth spending your precious time on so you hand this work over to a reviewer. Well, I'm tired of it. You ought to write good descriptive comments to every non-trivial thing you add. Comments are useful, especially when you try to understand the code in some time. Use translator/dictionary if you need to. And it's absolutely okay to spend substantially more time on writing comments than on the code itself. BTW you must update the comment to hint_t, pointing out what it's used for in case of multikey indexes. > +/** Return field_map extention allocation size. */ > +static inline uint32_t > +field_map_ext_size(uint32_t items) > +{ > + return sizeof(struct field_map_ext) + items * sizeof(uint32_t); > +} > + > +/** Return field_map extention for field_map and offset_slot. */ > +static inline struct field_map_ext * > +field_map_ext_get(const uint32_t *field_map, int32_t offset_slot) > +{ > + return (struct field_map_ext *)((char *)field_map - > + field_map[offset_slot]); > +} Neither of these functions is used outside field_map.h. Better inline them or move them to C file. A general rule is to keep the API as concise as possible provided it doesn't hurt readability. > @@ -234,6 +234,18 @@ struct key_def { > bool has_optional_parts; > /** Key fields mask. @sa column_mask.h for details. */ > uint64_t column_mask; > + /** > + * In case of multikey index, the index of the key_part > + * containing JSON path with array index placeholder "[*]". > + * Otherwise multikey_part_idx == part_count. > + */ > + uint32_t multikey_part_idx; > + /** > + * In case of multikey index, the length of the > + * parts[multikey_part_idx].path substring "...[*]" > + * @see json_path_is_multikey(). > + */ > + uint32_t multikey_path_len; I don't like mixing part_idx and path_len in one struct. Better store field id and path in here instead IMO. > @@ -876,6 +940,10 @@ tuple_field_map_initialize(struct field_map_builder *builder, > mp_decode_array(pos) : > mp_decode_map(pos); > mp_stack_push(&stack, type, size); > + if (json_token_is_multikey(&field->token)) { > + assert(type == MP_ARRAY); > + mk_parent_frame = &frames[stack.used - 1]; > + } This proves mp_stack API inadequate: clearly, mp_stack_top() is missing. > @@ -699,6 +699,12 @@ vinyl_space_check_index_def(struct space *space, struct index_def *index_def) > return -1; > } > } > + if (key_def_is_multikey(index_def->key_def)) { > + diag_set(ClientError, ER_MODIFY_INDEX, > + index_def->name, space_name(space), > + "vinyl space index cannot be multikey"); > + return -1; > + } Should be ER_UNSUPPORTED. > + if (child_field->token.type == JSON_TOKEN_ANY && > + !json_token_is_multikey(&parent_field->token) && > + !json_token_is_leaf(&parent_field->token)) { > + assert(expected_type == FIELD_TYPE_ARRAY); Bad indentation. > +int > +memtx_tree_index_replace_multikey(struct index *base, struct tuple *old_tuple, > + struct tuple *new_tuple, > + enum dup_replace_mode mode, > + struct tuple **result) The function should be static, apparently. > -template<bool is_nullable, bool has_optional_parts> > +template<bool is_nullable, bool has_optional_parts, bool is_multikey> > static void > key_def_set_compare_func_json(struct key_def *def) I split these functions by meaning: 'json' and 'plain' so that set_compare_func only checks is_nullable flag. 'multikey' case should be handled as a part of 'json', similarly to how 'sequential' is handled in 'plain'. > @@ -1763,12 +1794,12 @@ key_def_set_compare_func_json(struct key_def *def) > def->tuple_compare = tuple_compare_slowpath > <is_nullable, has_optional_parts, true>; > def->tuple_compare_hinted = tuple_compare_slowpath_hinted > - <is_nullable, has_optional_parts, true>; > + <is_nullable, has_optional_parts, true, is_multikey>; > def->tuple_compare_with_key = tuple_compare_with_key_slowpath > <is_nullable, has_optional_parts, true>; > def->tuple_compare_with_key_hinted = > tuple_compare_with_key_slowpath_hinted > - <is_nullable, has_optional_parts, true>; > + <is_nullable, has_optional_parts, true, is_multikey>; tuple_compare and tuple_compare_with_key don't make any sense for multikey indexes. You should add stubs for them. I guess the stubs should simply assert(0). The same's fair for extract_key and hints. > @@ -621,7 +695,8 @@ memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, > it->type = type; > it->key_data.key = key; > it->key_data.part_count = part_count; > - it->key_data.hint = key_hint(key, part_count, cmp_def); > + if (!key_def_is_multikey(cmp_def)) > + it->key_data.hint = key_hint(key, part_count, cmp_def); Please get rid of this 'if'. Overload key_hint instead. > diff --git a/test/engine/multikey_idx.test.lua b/test/engine/multikey_idx.test.lua I'd call the test simply 'multikey.test.lua'. > +-- Duplicates in multikey parts. > +s:insert({5, {1, 1, 1}, {{fname='A', sname='B'}, {fname='C', sname='D'}, {fname='A', sname='B'}}}) > +--- > +- error: Duplicate key exists in unique index 'idx' in space 'withdata' > +... AFAIR the RFC, this shouldn't result in an error. Am I wrong? > @TarantoolBot document > Title: introduce multikey indexes in memtx > Multikey indexes allows you to automatically index set of documents > by JSON paths having array index placeholder "[*]". Multikey index > cannot be primary as it cannot be unique(by definition). But it can! I can set is_unique, can't I? > Multikey index parts must be compatible: only one "[*]" placeholder > is allowed in same position(for all JSON paths in index parts). > > Example: > s = box.schema.space.create('withdata') > pk = s:create_index('pk') > parts = { > {2, 'str', path = 'data[*].name'}, > {2, 'str', path = 'data[*].extra.phone'} > } > idx = s:create_index('idx', {parts = parts}) > s:insert({1, {data = {{name="A", extra={phone="111"}}, > {name="B", extra={phone="111"}}}, > garbage = 1}} > idx:get({'A', '111'}) The docbot request is vastly insufficient: - how unique indexes behave - how duplicates are handled - what's incompatible parts Please explain every aspect that may raise questions. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v2 0/5] box: introduce multikey indexes in memtx 2019-03-28 10:21 ` [PATCH v2 0/5] " Vladimir Davydov @ 2019-04-02 15:49 ` Kirill Shcherbatov 0 siblings, 0 replies; 13+ messages in thread From: Kirill Shcherbatov @ 2019-04-02 15:49 UTC (permalink / raw) To: tarantool-patches, Vladimir Davydov Hi! Thank you for such a detailed answer! > The helper returns true iff multikey_len equals path hence the bool > return value is useless. Return multikey offset instead. And rename > the function appropriately. Refactored as json_path_multikey_offset helper. /** * Scan the JSON path string and return the offset of the first * character [*] (the array index placeholder). * - if [*] is not found, -1 is returned. * - specified JSON path must be valid * (may be tested with json_path_validate). */ int json_path_multikey_offset(const char *path, int path_len, int index_base); > This flag is ugly, both semantically and esthetically. As a matter fact, > I don't see why you would need such a function at all. You can check if > paths to the first multikey part is the same right in key_def_new. Now there is no such helper. > > BTW I don't like that this check lives in index_def_is_valid, because it > means that there may be a key_def with invalid configuration. Besides, > this makes checks scattered all over the code, making it difficult to > follow. Ok, I've moved it to key_def_new, in key_def_set_part. But this led to the fact that key_def_set_part should now return the error that made the code more complicated. > There's a lot of code duplication between vy_stmt_new_surrogate_delete and > tuple_create_field_map - you should try to eliminate it before extending > field map. I've introduced a new tuple_parse_iterator class that simplify this code. > tuple_compare and tuple_compare_with_key don't make any sense for > multikey indexes. You should add stubs for them. I guess the stubs > should simply assert(0). The same's fair for extract_key and hints. >> @@ -621,7 +695,8 @@ memtx_tree_index_create_iterator(struct index *base, enum iterator_type type, >> it->type = type; >> it->key_data.key = key; >> it->key_data.part_count = part_count; >> - it->key_data.hint = key_hint(key, part_count, cmp_def); >> + if (!key_def_is_multikey(cmp_def)) >> + it->key_data.hint = key_hint(key, part_count, cmp_def); > > Please get rid of this 'if'. Overload key_hint instead. In a patchset I just sent you, pay attention to key_hint's stub. It does not contain unreachable() checks, because these functions are invoked (in contrast to other stubs); It allows not to rewrite memtx_tree_index_get and memtx_tree_index_create_iterator. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-04-02 15:49 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-19 12:32 [PATCH v2 0/5] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 1/5] lib: introduce json_path_is_multikey helper Kirill Shcherbatov 2019-03-19 15:43 ` [tarantool-patches] " Konstantin Osipov 2019-04-02 15:49 ` [tarantool-patches] " Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 2/5] lib: introduce is_weak_cmp flag for json_path_cmp Kirill Shcherbatov 2019-03-19 15:47 ` [tarantool-patches] " Konstantin Osipov 2019-03-19 12:32 ` [PATCH v2 3/5] box: move offset_slot init to tuple_format_add_field Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 4/5] box: introduce field_map_builder for field_map init Kirill Shcherbatov 2019-03-19 12:32 ` [PATCH v2 5/5] box: introduce multikey indexes in memtx Kirill Shcherbatov 2019-03-19 16:05 ` [tarantool-patches] " Kirill Shcherbatov 2019-03-21 12:35 ` Kirill Shcherbatov 2019-03-28 10:21 ` [PATCH v2 0/5] " Vladimir Davydov 2019-04-02 15:49 ` [tarantool-patches] " Kirill Shcherbatov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox