* [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path @ 2018-08-27 7:37 Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 1/4] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov ` (4 more replies) 0 siblings, 5 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-08-27 7:37 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov Branch: http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes Issue: https://github.com/tarantool/tarantool/issues/1012 Sometimes field data could have complex document structure. When this structure is consistent across whole document, you are able to create an index by JSON path. This came possible with auxiliary structures per tuple_format: tree of intermediate path fields and hashtable refers to leaf field that use path as key. To speed-up data access by JSON index key_part structure extended with offset_slot cache that points to field_map item containing data offset for current tuple. RFC contains detailed description of those concepts. Finally, supported ability to define JSON paths in user-friendly form containing format field name(that could be changed). Changes in v3: - fixed JSON nullable fields - multiple binary optimizations for field map init for fields that have JSON paths - don't store same JSON path of different indexes twice in format for now - optimized and simplified tuple_field_by_part_raw - new teplate-based comparators/extractors - removed update space epoch to ModifySpaceFormat alter op - fixed key_def_find routine - much informative comments - exported LUA path resolve to core Kirill Shcherbatov (4): rfc: describe a Tarantool JSON indexes box: introduce slot_cache in key_part box: introduce JSON indexes box: specify indexes in user-friendly form doc/rfc/1012-json-indexes.md | 188 +++++++++++ src/box/alter.cc | 38 +++ src/box/errcode.h | 2 +- src/box/index_def.c | 10 +- src/box/key_def.c | 296 +++++++++++++++-- src/box/key_def.h | 42 ++- src/box/lua/index.c | 74 +++++ src/box/lua/schema.lua | 20 +- src/box/lua/space.cc | 5 + src/box/memtx_bitset.c | 8 +- src/box/memtx_engine.c | 5 + src/box/memtx_rtree.c | 6 +- src/box/schema.cc | 12 +- src/box/tuple.c | 11 +- src/box/tuple_compare.cc | 142 ++++++-- src/box/tuple_extract_key.cc | 146 ++++++--- src/box/tuple_format.c | 765 +++++++++++++++++++++++++++++++++++++++---- src/box/tuple_format.h | 80 ++++- src/box/tuple_hash.cc | 67 +++- src/box/vinyl.c | 5 + src/box/vy_log.c | 3 +- src/box/vy_lsm.c | 44 +++ src/box/vy_point_lookup.c | 2 - src/box/vy_stmt.c | 124 +++++-- src/box/vy_stmt.h | 7 +- test/box/misc.result | 57 ++-- test/engine/iterator.result | 2 +- test/engine/tuple.result | 353 ++++++++++++++++++++ test/engine/tuple.test.lua | 99 ++++++ 29 files changed, 2332 insertions(+), 281 deletions(-) create mode 100644 doc/rfc/1012-json-indexes.md -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v3 1/4] rfc: describe a Tarantool JSON indexes 2018-08-27 7:37 [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Kirill Shcherbatov @ 2018-08-27 7:37 ` Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 2/4] box: introduce slot_cache in key_part Kirill Shcherbatov ` (3 subsequent siblings) 4 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-08-27 7:37 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov Part of #1012. --- doc/rfc/1012-json-indexes.md | 188 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 188 insertions(+) create mode 100644 doc/rfc/1012-json-indexes.md diff --git a/doc/rfc/1012-json-indexes.md b/doc/rfc/1012-json-indexes.md new file mode 100644 index 0000000..a603f9a --- /dev/null +++ b/doc/rfc/1012-json-indexes.md @@ -0,0 +1,188 @@ +# Tarantool JSON Indexes + +* **Status**: In progress +* **Start date**: 26-07-2018 +* **Authors**: Vladimir Davydov @locker vdavydov.dev@gmail.com, Vladislav Shpilevoy @Gerold103 v.shpilevoy@tarantool.org, Kirill Shcherbatov @kshcherbatov kshcherbatov@tarantool.org +* **Issues**: [#1012](https://github.com/tarantool/tarantool/issues/1012) + +## Summary +Tarantool *JSON Indexes* is a feature that allows to index space's tuples that have consistent document-type complex structure. + + +## Background and motivation +The box.index submodule provides read-only access for index definitions and index keys. Indexes are contained in box.space.space-name.index array within each space object. They provide an API for ordered iteration over tuples. This API is a direct binding to corresponding methods of index objects of type box.index in the storage engine. +``` +s:create_index('primary', {parts = {1, 'unsigned'}, {2, 'str'}}) +``` + +Sometimes tuple space data have complex but consistent document structure. +``` +s:insert{1, {town = 'NY', name = {first = 'Richard', second = 'Feynman'}}} +s:insert{2, {town = 'NY', name = {first = 'Ayn', second = 'Rand'}}} +``` +Building index on document fields would be attractive feature that allows to search and iterate over such data. +``` +s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}}) +``` + +Fields having complex document structure should have 'map' or 'array' type in format if specified. + + +## Detailed design +All data in Tarantool stored in atomic database objects called *tuples*. They consist of payload(user data represented as *msgpack*) and extra information that describe how does database should operate with it. +``` + [[ Img 0.1 - Tarantool tuple structure ]] + + +----------------------+-----------------------+-------------+ + tuple_begin, ..., raw = | extra_size | offset N ... offset 1 | MessagePack | + | +----------------------+-----------------------+-------------+ + | ^ + +-----------------------------------------------------------------------data_offset +``` +Indexed information should be accessible really fast. In case of regular indexes on fields it comes possible with *field_map* - data offsets stored before user payload in each tuple. Metadata cells that are used to store those offsets *offset_slot* are not same for different *formats*(aggregated information of all indexes describing specific space). Without loss of generality this mean that tuples constructed at the moment 'A' when only PK index was registered have another *format* than the tuples constructed at the moment 'B' after new Index "name" was added. + +The concept idea with *offset_slot*s could be used for JSON Indexes too. + +JSON-path definitely would be a part of *key_part* structure. As we have already noted, *offset_slots* are different for formats of different generations, so we should find an universal and fast way to obtain it by JSON. As code that access tuple data by index is a really hot, the performance is our priority problem. +We already have *tuple_field_raw_by_path* that follows JSON path in tuple and returns pointer to requested field. Sometimes fields that nested in JSON-structured document may have coinciding path part: +e.g.: +``` +s:create_index('name'', { parts = {{ "name.first", "str" }, {"name.last", "str"}}}) +s:create_index('name'', { parts = {{ "town", "str" }, {"name.last", "str"}}}) +``` +We don't like to parse coinciding suffixes multiple time for same tuples. + +### 1.1. JSON-path tree in tuple format +Keeping in mind all challenges we suppose to introduce tree-based structure in *tuple_format* that contain JSON-path lexemes and has allocated tuple *offset_slots* as leaves nodes. +``` + [[ Img 1.1 - JSON-path tree in tuple format ]] + + ----\ + |-----> [name] \ + | |----> [first] <slot1 = -3> + | | + | |----> [last] <slot2 = -1> + | + \------> [town] <slot3 = -1> +``` + +For example, when first index part "name.first" is built, we would already have name offset and would able to start there parsing only "last" lexeme. +``` +s:create_index('name'', { parts = {{ "name.first", "str" }, + { "name.last", "str" }}}) +``` +The resulting information offset would be stored in slot obtained from tree leaf node. + +``` + [[ Img 1.2 - Tuple data stored in database of same format ]] + + | Description | + + {-3} {-2} {-1} | slot index | + +=======+=======+=======+======+===================+ + | slot1 | slot2 | slot3 | town | name | | tuple schema | + +=======+=======+=======+======+=====+======+======+ + | 2 | 4 | 0 | NY | Ayn | Rand | | tuple1 data | + +-------+-------+-------+------+-----+---+--+------+ + | 2 | 5 | 0 | NY | Richard | Feynman | | tuple2 data | + +-------+-------+-------+------+---------+---------+ + ^ ^ ^ + 0 2 4 | tuple1 offset | + + ^ ^ ^ + 0 2 5 | tuple2 offset | +``` + +### 1.2. Access data by JSON-path +The central API to access information is a routine +``` +static inline const char * +tuple_field_raw(const struct tuple_format *format, const char *tuple, + const uint32_t *field_map, uint32_t field_no) +``` + +In most scenarios related to indexes it is called with *field_no* extracted from *key_part* - a descriptor of part representing index. +``` +field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, part->fieldno); +``` +With introducing parts containing JSON path we need to rework such calls. + +To avoid parsing JSON path each time we suppose to introduce a cache for *offset_slot* in *key_part* structure: +``` +struct key_part { + /** Tuple field index for this part */ + uint32_t fieldno; + ... + + /** Data JSON path. */ + const char *data_path; + /** Epoch of tuple_format cached in offset_slot. */ + uint32_t slot_epoch; + /** Cache of corresponding tuple_format offset_slot. */ + int32_t slot_cache; +}; +``` +And extend *tuple_format* where such epoch would be initialized with a new bigger(when source key_parts have changed) value on creation. +``` +struct tuple_format { + ... + + /** Epoch of tuple format. */ + uint32_t epoch; + /** Formats of the fields. */ + struct tuple_field fields[0]; +}; +``` +We would modify tuple_field to have a tree structure if necessary and represent all intimidate records. + +In all index-related scenarios with *tuple_field_raw* we would rework with a new function: +``` +const char * +tuple_field_by_part(const struct tuple *tuple, const struct key_def *def, + uint32_t idx); + +``` + +With first argument *tuple* we can obtain tuple_format that contain *epoch* field. +``` + [[ Img 2.1 - Block diagram for tuple_field_by_part routine ]] + + [[def->epoch == tuple_format(tuple)->epoch]] + || + YES || NO + ________________/\________________ + | | + + use offset_slot from PARSE def->parts[idx]->data_path and + def->parts[idx].cached_slot observe tuple_format(tuple)->fields structure, + UPDATE def->parts[idx].slot_epoch and + def->parts[idx].slot_cache IF format epoch is bigger +``` + +PARSE is about parsing JSON path lexeme-by-lexeme and step down the fields tree until leaf with slot offset would reached. +Then we need to UPDATE key_part cache IF epoch has increased. This *may* make epoch oscillations not so significant. + +To sped-up data access we suppose to use global format hashtable with all paths: +``` + HASHTABLE<const char *data_path, int32_t offset_slot> + + lookup + "[2]name.first" ----> hash1 ------> [hash1, "name.first", offset_slot1] + "[2]name.last" ----> hash2 ------> [hash2, "name.last", offset_slot2] +``` + +## Rationale and alternatives +### 2.1 Reallocatable slot_cache +Epoch-based invalidation is looking complicated enough. +It is possible to allocate slot_cache[] array for each key_part and store slots for all epoch (where epoch would be an index in that array). +``` + format->epoch +key_part { || + slot_epoch[]: \/ + [1: s1, 2: s2, ... epoch: s3, ... max_epoch: sN] +} + +I.e.: + tuple_map[key_part->slot_epoch[format->epoch]] +``` +But as formats are created really often, this approach is really resource-consumable. -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v3 2/4] box: introduce slot_cache in key_part 2018-08-27 7:37 [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 1/4] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov @ 2018-08-27 7:37 ` Kirill Shcherbatov 2018-09-03 10:35 ` [tarantool-patches] " Vladislav Shpilevoy 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 3/4] box: introduce JSON indexes Kirill Shcherbatov ` (2 subsequent siblings) 4 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-08-27 7:37 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov Same key_part could be used in different formats multiple times, so different field->offset_slot would be allocated. In most scenarios we work with series of tuples of same format, and (in general) format lookup for field would be expensive operation for JSON-paths defined in key_part. New slot_cache field in key_part structure and epoch-based mechanism to validate it actuality should be effective approach to improve performance. New routine tuple_field_by_part use tuple and key_part to access field that allows to rework and speedup all scenarios of access tuple data by index. This also allows to work with JSON-path key_parts later. Part of #1012. --- src/box/alter.cc | 38 ++++++++++++++++++ src/box/key_def.c | 2 + src/box/key_def.h | 11 ++++++ src/box/memtx_bitset.c | 8 +++- src/box/memtx_rtree.c | 6 ++- src/box/tuple_compare.cc | 94 +++++++++++++++++++++++++++++++++----------- src/box/tuple_extract_key.cc | 44 ++++++++++++++------- src/box/tuple_format.c | 12 ++++++ src/box/tuple_format.h | 18 +++++++++ src/box/tuple_hash.cc | 53 ++++++++++++++++++++----- src/box/vy_stmt.h | 7 +++- 11 files changed, 243 insertions(+), 50 deletions(-) diff --git a/src/box/alter.cc b/src/box/alter.cc index a6299a1..a46b886 100644 --- a/src/box/alter.cc +++ b/src/box/alter.cc @@ -1032,6 +1032,42 @@ ModifySpace::~ModifySpace() space_def_delete(new_def); } +class ModifySpaceFormat: public AlterSpaceOp +{ +public: + ModifySpaceFormat(struct alter_space *alter) : AlterSpaceOp(alter) {} + virtual void alter(struct alter_space *alter); +}; + +void +ModifySpaceFormat:: alter(struct alter_space * alter) +{ + struct tuple_format *format = alter->new_space != NULL ? + alter->new_space->format : NULL; + if (format == NULL) + return; + struct rlist *key_list = &alter->key_list; + bool is_format_epoch_changed = false; + struct index_def *index_def; + rlist_foreach_entry(index_def, key_list, link) { + struct key_part *part = index_def->key_def->parts; + struct key_part *parts_end = + part + index_def->key_def->part_count; + for (; part < parts_end; part++) { + struct tuple_field *field = + &format->fields[part->fieldno]; + if (field->offset_slot != part->offset_slot) + is_format_epoch_changed = true; + } + } + format->epoch = alter->old_space != NULL && + alter->old_space->format != NULL ? + alter->old_space->format->epoch : 0; + if (is_format_epoch_changed) + format->epoch++; +} + + /** DropIndex - remove an index from space. */ class DropIndex: public AlterSpaceOp @@ -1316,6 +1352,7 @@ RebuildIndex::prepare(struct alter_space *alter) /* Get the new index and build it. */ new_index = space_index(alter->new_space, new_index_def->iid); assert(new_index != NULL); + assert(alter->new_space != NULL && alter->old_space != NULL); space_build_index_xc(alter->old_space, new_index, alter->new_space->format); } @@ -1922,6 +1959,7 @@ on_replace_dd_index(struct trigger * /* trigger */, void *event) index_def_guard.is_active = false; } } + (void) new ModifySpaceFormat(alter); /* * Create MoveIndex ops for the remaining indexes in the * old space. diff --git a/src/box/key_def.c b/src/box/key_def.c index ee09dc9..440d41e 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -258,6 +258,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, def->parts[part_no].type = type; def->parts[part_no].coll = coll; def->parts[part_no].coll_id = coll_id; + def->parts[part_no].offset_slot = TUPLE_OFFSET_SLOT_NIL; + def->parts[part_no].offset_slot_epoch = 0; column_mask_set_fieldno(&def->column_mask, fieldno); /** * When all parts are set, initialize the tuple diff --git a/src/box/key_def.h b/src/box/key_def.h index aecbe03..a32c34c 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -74,6 +74,17 @@ struct key_part { struct coll *coll; /** True if a part can store NULLs. */ bool is_nullable; + /** + * Epoch of offset slot cache. Initialized with + * incremental epoch of format on caching it's field's + * offset_slot via tuple_field_by_part_raw to speed up + * access on subsequent calls with same format. + * Cache is expected to use "the eldest format is most + * relevant" strategy. + */ + uint64_t offset_slot_epoch; + /** Cache with format's field offset slot. */ + int32_t offset_slot; }; struct key_def; diff --git a/src/box/memtx_bitset.c b/src/box/memtx_bitset.c index a665f1a..9529618 100644 --- a/src/box/memtx_bitset.c +++ b/src/box/memtx_bitset.c @@ -283,8 +283,12 @@ memtx_bitset_index_replace(struct index *base, struct tuple *old_tuple, } if (new_tuple != NULL) { - const char *field; - field = tuple_field(new_tuple, base->def->key_def->parts[0].fieldno); + const char *field = + tuple_field_by_part_raw(tuple_format(new_tuple), + tuple_data(new_tuple), + tuple_field_map(new_tuple), + (struct key_part *) + base->def->key_def->parts); uint32_t key_len; const void *key = make_key(field, &key_len); #ifndef OLD_GOOD_BITSET diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c index 0b12cda..00aaf79 100644 --- a/src/box/memtx_rtree.c +++ b/src/box/memtx_rtree.c @@ -112,7 +112,11 @@ extract_rectangle(struct rtree_rect *rect, const struct tuple *tuple, struct index_def *index_def) { assert(index_def->key_def->part_count == 1); - const char *elems = tuple_field(tuple, index_def->key_def->parts[0].fieldno); + const char *elems = + tuple_field_by_part_raw(tuple_format(tuple), tuple_data(tuple), + tuple_field_map(tuple), + (struct key_part *) + index_def->key_def->parts); unsigned dimension = index_def->opts.dimension; uint32_t count = mp_decode_array(&elems); return mp_decode_rect(rect, dimension, elems, count, "Field"); diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index e53afba..5d7df4d 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -431,10 +431,20 @@ tuple_common_key_parts(const struct tuple *tuple_a, const struct key_def *key_def) { uint32_t i; + struct tuple_format *tuple_a_format = tuple_format(tuple_a); + struct tuple_format *tuple_b_format = tuple_format(tuple_b); + const char *tuple_a_raw = tuple_data(tuple_a); + const char *tuple_b_raw = tuple_data(tuple_b); + const uint32_t *tuple_a_field_map = tuple_field_map(tuple_a); + const uint32_t *tuple_b_field_map = tuple_field_map(tuple_b); for (i = 0; i < key_def->part_count; i++) { - const struct key_part *part = &key_def->parts[i]; - const char *field_a = tuple_field(tuple_a, part->fieldno); - const char *field_b = tuple_field(tuple_b, part->fieldno); + struct key_part *part = (struct key_part *)&key_def->parts[i]; + const char *field_a = + tuple_field_by_part_raw(tuple_a_format, tuple_a_raw, + tuple_a_field_map, part); + const char *field_b = + tuple_field_by_part_raw(tuple_b_format, tuple_b_raw, + tuple_b_field_map, part); enum mp_type a_type = field_a != NULL ? mp_typeof(*field_a) : MP_NIL; enum mp_type b_type = field_b != NULL ? @@ -449,7 +459,7 @@ tuple_common_key_parts(const struct tuple *tuple_a, return i; } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool has_json_path> static inline int tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, const struct key_def *key_def) @@ -498,10 +508,23 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, end = part + key_def->part_count; for (; part < end; part++) { - field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a, - part->fieldno); - field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, - part->fieldno); + if (!has_json_path) { + field_a = tuple_field_raw(format_a, tuple_a_raw, + field_map_a, + part->fieldno); + field_b = tuple_field_raw(format_b, tuple_b_raw, + field_map_b, + part->fieldno); + } else { + field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, + field_map_a, + (struct key_part *) + part); + field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, + field_map_b, + (struct key_part *) + part); + } assert(has_optional_parts || (field_a != NULL && field_b != NULL)); if (! is_nullable) { @@ -548,10 +571,23 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, */ end = key_def->parts + key_def->part_count; for (; part < end; ++part) { - field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a, - part->fieldno); - field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, - part->fieldno); + if (!has_json_path) { + field_a = tuple_field_raw(format_a, tuple_a_raw, + field_map_a, + part->fieldno); + field_b = tuple_field_raw(format_b, tuple_b_raw, + field_map_b, + part->fieldno); + } else { + field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, + field_map_a, + (struct key_part *) + part); + field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, + field_map_b, + (struct key_part *) + part); + } /* * Extended parts are primary, and they can not * be absent or be NULLs. @@ -565,7 +601,7 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, return 0; } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths> static inline int tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, @@ -583,8 +619,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, enum mp_type a_type, b_type; if (likely(part_count == 1)) { const char *field; - field = tuple_field_raw(format, tuple_raw, field_map, - part->fieldno); + if (!has_json_paths) { + field = tuple_field_raw(format, tuple_raw, field_map, + part->fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, + field_map, + (struct key_part *)part); + } if (! is_nullable) { return tuple_compare_field(field, key, part->type, part->coll); @@ -609,8 +651,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, int rc; for (; part < end; ++part, mp_next(&key)) { const char *field; - field = tuple_field_raw(format, tuple_raw, field_map, - part->fieldno); + if (!has_json_paths) { + field = tuple_field_raw(format, tuple_raw, field_map, + part->fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, + field_map, + (struct key_part *)part); + } if (! is_nullable) { rc = tuple_compare_field(field, key, part->type, part->coll); @@ -1016,9 +1064,9 @@ tuple_compare_create(const struct key_def *def) else return tuple_compare_sequential<true, false>; } else if (def->has_optional_parts) { - return tuple_compare_slowpath<true, true>; + return tuple_compare_slowpath<true, true, false>; } else { - return tuple_compare_slowpath<true, false>; + return tuple_compare_slowpath<true, false, false>; } } assert(! def->has_optional_parts); @@ -1041,7 +1089,7 @@ tuple_compare_create(const struct key_def *def) if (key_def_is_sequential(def)) return tuple_compare_sequential<false, false>; else - return tuple_compare_slowpath<false, false>; + return tuple_compare_slowpath<false, false, false>; } /* }}} tuple_compare */ @@ -1236,9 +1284,9 @@ tuple_compare_with_key_create(const struct key_def *def) false>; } } else if (def->has_optional_parts) { - return tuple_compare_with_key_slowpath<true, true>; + return tuple_compare_with_key_slowpath<true, true, false>; } else { - return tuple_compare_with_key_slowpath<true, false>; + return tuple_compare_with_key_slowpath<true, false, false>; } } assert(! def->has_optional_parts); @@ -1264,7 +1312,7 @@ tuple_compare_with_key_create(const struct key_def *def) if (key_def_is_sequential(def)) return tuple_compare_with_key_sequential<false, false>; else - return tuple_compare_with_key_slowpath<false, false>; + return tuple_compare_with_key_slowpath<false, false, false>; } /* }}} tuple_compare_with_key */ diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 880abb6..2ea405a 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -91,7 +91,8 @@ tuple_extract_key_sequential(const struct tuple *tuple, * General-purpose implementation of tuple_extract_key() * @copydoc tuple_extract_key() */ -template <bool contains_sequential_parts, bool has_optional_parts> +template <bool contains_sequential_parts, bool has_optional_parts, + bool has_json_paths> static char * tuple_extract_key_slowpath(const struct tuple *tuple, const struct key_def *key_def, uint32_t *key_size) @@ -110,9 +111,15 @@ tuple_extract_key_slowpath(const struct tuple *tuple, /* Calculate the key size. */ for (uint32_t i = 0; i < part_count; ++i) { - const char *field = - tuple_field_raw(format, data, field_map, - key_def->parts[i].fieldno); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, data, field_map, + key_def->parts[i].fieldno); + } else { + field = tuple_field_by_part_raw(format, data, field_map, + (struct key_part *) + &key_def->parts[i]); + } if (has_optional_parts && field == NULL) { bsize += mp_sizeof_nil(); continue; @@ -152,9 +159,15 @@ tuple_extract_key_slowpath(const struct tuple *tuple, } char *key_buf = mp_encode_array(key, part_count); for (uint32_t i = 0; i < part_count; ++i) { - const char *field = - tuple_field_raw(format, data, field_map, - key_def->parts[i].fieldno); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, data, field_map, + key_def->parts[i].fieldno); + } else { + field = tuple_field_by_part_raw(format, data, field_map, + (struct key_part *) + &key_def->parts[i]); + } if (has_optional_parts && field == NULL) { key_buf = mp_encode_nil(key_buf); continue; @@ -201,7 +214,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * General-purpose version of tuple_extract_key_raw() * @copydoc tuple_extract_key_raw() */ -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> static char * tuple_extract_key_slowpath_raw(const char *data, const char *data_end, const struct key_def *key_def, @@ -318,18 +331,21 @@ tuple_extract_key_set(struct key_def *key_def) assert(key_def->is_nullable); if (key_def_contains_sequential_parts(key_def)) { key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, true>; + tuple_extract_key_slowpath<true, true, + false>; } else { key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, true>; + tuple_extract_key_slowpath<false, true, + false>; } } else { if (key_def_contains_sequential_parts(key_def)) { key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, false>; + tuple_extract_key_slowpath<true, false, + false>; } else { key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, + tuple_extract_key_slowpath<false, false, false>; } } @@ -337,9 +353,9 @@ tuple_extract_key_set(struct key_def *key_def) if (key_def->has_optional_parts) { assert(key_def->is_nullable); key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true>; + tuple_extract_key_slowpath_raw<true, false>; } else { key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false>; + tuple_extract_key_slowpath_raw<false, false>; } } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index b385c0d..2d4a85f 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -232,6 +232,11 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->dict = dict; tuple_dictionary_ref(dict); } + /* + * Set invalid epoch that should be changed later on + * attaching to space. + */ + format->epoch = 1; format->refs = 0; format->id = FORMAT_ID_NIL; format->field_count = field_count; @@ -541,6 +546,13 @@ tuple_field_go_to_key(const char **field, const char *key, int len) return -1; } +const char * +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) +{ + return tuple_field_raw(format, data, field_map, part->fieldno); +} + int tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *path, diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index c7dc48f..ecbf64c 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -115,6 +115,12 @@ struct tuple_field { * Tuple format describes how tuple is stored and information about its fields */ struct tuple_format { + /** + * Counter that grows incrementally on space rebuild if + * format has other distribution of offset slots comparing + * with previous one. + */ + uint64_t epoch; /** Virtual function table */ struct tuple_format_vtab vtab; /** Pointer to engine-specific data. */ @@ -324,6 +330,18 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, const char *tuple); /** + * Get a field refereed by multipart index @part in tuple. + * @param format Tuple format. + * @param tuple A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param part Multipart index part to use. + * @retval Field data if field exists or NULL. + */ +const char * +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part); + +/** * Get a field at the specific position in this MessagePack array. * Returns a pointer to MessagePack data. * @param format tuple format diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index dee9be3..561ca55 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -157,7 +157,12 @@ struct TupleHash uint32_t h = HASH_SEED; uint32_t carry = 0; uint32_t total_size = 0; - const char *field = tuple_field(tuple, key_def->parts->fieldno); + const char *field = + tuple_field_by_part_raw(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), + (struct key_part *) + key_def->parts); TupleFieldHash<TYPE, MORE_TYPES...>:: hash(&field, &h, &carry, &total_size); return PMurHash32_Result(h, carry, total_size); @@ -169,7 +174,12 @@ struct TupleHash<FIELD_TYPE_UNSIGNED> { static uint32_t hash(const struct tuple *tuple, const struct key_def *key_def) { - const char *field = tuple_field(tuple, key_def->parts->fieldno); + const char *field = + tuple_field_by_part_raw(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), + (struct key_part *) + key_def->parts); uint64_t val = mp_decode_uint(&field); if (likely(val <= UINT32_MAX)) return val; @@ -211,7 +221,7 @@ static const hasher_signature hash_arr[] = { #undef HASHER -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> uint32_t tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def); @@ -255,9 +265,9 @@ tuple_hash_func_set(struct key_def *key_def) { slowpath: if (key_def->has_optional_parts) - key_def->tuple_hash = tuple_hash_slowpath<true>; + key_def->tuple_hash = tuple_hash_slowpath<true, false>; else - key_def->tuple_hash = tuple_hash_slowpath<false>; + key_def->tuple_hash = tuple_hash_slowpath<false, false>; key_def->key_hash = key_hash_slowpath; } @@ -312,13 +322,16 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple, const struct key_part *part) { - const char *field = tuple_field(tuple, part->fieldno); + const char *field = + tuple_field_by_part_raw(tuple_format(tuple), tuple_data(tuple), + tuple_field_map(tuple), + (struct key_part *)part); if (field == NULL) return tuple_hash_null(ph1, pcarry); return tuple_hash_field(ph1, pcarry, &field, part->coll); } -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> uint32_t tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) { @@ -327,7 +340,17 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) uint32_t carry = 0; uint32_t total_size = 0; uint32_t prev_fieldno = key_def->parts[0].fieldno; - const char *field = tuple_field(tuple, key_def->parts[0].fieldno); + struct tuple_format *format = tuple_format(tuple); + const char *tuple_raw = tuple_data(tuple); + const uint32_t *field_map = tuple_field_map(tuple); + const char *field; + if (!has_json_paths) { + field = tuple_field(tuple, prev_fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, field_map, + (struct key_part *) + &key_def->parts); + } const char *end = (char *)tuple + tuple_size(tuple); if (has_optional_parts && field == NULL) { total_size += tuple_hash_null(&h, &carry); @@ -341,7 +364,19 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) * need of tuple_field */ if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) { - field = tuple_field(tuple, key_def->parts[part_id].fieldno); + if (!has_json_paths) { + field = tuple_field(tuple, + key_def->parts[part_id]. + fieldno); + } else { + struct key_part *part = + (struct key_part *) + &key_def->parts[part_id]; + field = tuple_field_by_part_raw(format, + tuple_raw, + field_map, + part); + } } if (has_optional_parts && (field == NULL || field >= end)) { total_size += tuple_hash_null(&h, &carry); diff --git a/src/box/vy_stmt.h b/src/box/vy_stmt.h index 273d5e8..c4885c0 100644 --- a/src/box/vy_stmt.h +++ b/src/box/vy_stmt.h @@ -719,7 +719,12 @@ static inline bool vy_tuple_key_contains_null(const struct tuple *tuple, const struct key_def *def) { for (uint32_t i = 0; i < def->part_count; ++i) { - const char *field = tuple_field(tuple, def->parts[i].fieldno); + const char *field = + tuple_field_by_part_raw(tuple_format(tuple), + tuple_data(tuple), + tuple_field_map(tuple), + (struct key_part *) + &def->parts[i]); if (field == NULL || mp_typeof(*field) == MP_NIL) return true; } -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v3 2/4] box: introduce slot_cache in key_part 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 2/4] box: introduce slot_cache in key_part Kirill Shcherbatov @ 2018-09-03 10:35 ` Vladislav Shpilevoy 2018-09-06 12:47 ` Kirill Shcherbatov 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-09-03 10:35 UTC (permalink / raw) To: tarantool-patches, Kirill Shcherbatov Hi! Thanks for the fixes! See 11 comments below, a commit on the branch and a diff at the bottom this email. > diff --git a/src/box/alter.cc b/src/box/alter.cc > index a6299a1..a46b886 100644 > --- a/src/box/alter.cc > +++ b/src/box/alter.cc > @@ -1032,6 +1032,42 @@ ModifySpace::~ModifySpace() > space_def_delete(new_def); > } > > +class ModifySpaceFormat: public AlterSpaceOp > +{ > +public: > + ModifySpaceFormat(struct alter_space *alter) : AlterSpaceOp(alter) {} > + virtual void alter(struct alter_space *alter); > +}; 1. Comments. > + > +void > +ModifySpaceFormat:: alter(struct alter_space * alter) 2. Redundant white spaces. (fixed by me) > +{ > + struct tuple_format *format = alter->new_space != NULL ? > + alter->new_space->format : NULL; 3. How new_space can be NULL? alter() called only after new_space is created. (fixed by me) > + if (format == NULL) > + return; > + struct rlist *key_list = &alter->key_list; > + bool is_format_epoch_changed = false; > + struct index_def *index_def; > + rlist_foreach_entry(index_def, key_list, link) { > + struct key_part *part = index_def->key_def->parts; > + struct key_part *parts_end = > + part + index_def->key_def->part_count; > + for (; part < parts_end; part++) { > + struct tuple_field *field = > + &format->fields[part->fieldno]; > + if (field->offset_slot != part->offset_slot) > + is_format_epoch_changed = true; 4. Makes no sense to continue the cycle from this moment. (fixed by me) > + } > + } > + format->epoch = alter->old_space != NULL && > + alter->old_space->format != NULL ? > + alter->old_space->format->epoch : 0; 5. How old_space can be NULL here? It is stored in struct alter right in its constructor. What is more, how old_space->format can be NULL, if a new space format is not NULL? And you do not need is_format_epoch_changed in such a case. (fixed by me) > + if (is_format_epoch_changed) > + format->epoch++; > +} > + > + > /** DropIndex - remove an index from space. */ > > class DropIndex: public AlterSpaceOp > @@ -1316,6 +1352,7 @@ RebuildIndex::prepare(struct alter_space *alter) > /* Get the new index and build it. */ > new_index = space_index(alter->new_space, new_index_def->iid); > assert(new_index != NULL); > + assert(alter->new_space != NULL && alter->old_space != NULL); > space_build_index_xc(alter->old_space, new_index, 6. Garbage diff. (fixed by me) > alter->new_space->format); > } > diff --git a/src/box/key_def.h b/src/box/key_def.h > index aecbe03..a32c34c 100644 > --- a/src/box/key_def.h > +++ b/src/box/key_def.h > @@ -74,6 +74,17 @@ struct key_part { > struct coll *coll; > /** True if a part can store NULLs. */ > bool is_nullable; > + /** > + * Epoch of offset slot cache. Initialized with > + * incremental epoch of format on caching it's field's > + * offset_slot via tuple_field_by_part_raw to speed up > + * access on subsequent calls with same format. > + * Cache is expected to use "the eldest format is most 7. Why 'the eldest'? As I remember, we decided to prefer newer format offset slots. > + * relevant" strategy. > + */ > + uint64_t offset_slot_epoch; > + /** Cache with format's field offset slot. */ > + int32_t offset_slot; > }; > > struct key_def; > diff --git a/src/box/memtx_bitset.c b/src/box/memtx_bitset.c > index a665f1a..9529618 100644 > --- a/src/box/memtx_bitset.c > +++ b/src/box/memtx_bitset.c > @@ -283,8 +283,12 @@ memtx_bitset_index_replace(struct index *base, struct tuple *old_tuple, > } > > if (new_tuple != NULL) { > - const char *field; > - field = tuple_field(new_tuple, base->def->key_def->parts[0].fieldno); > + const char *field = > + tuple_field_by_part_raw(tuple_format(new_tuple), > + tuple_data(new_tuple), > + tuple_field_map(new_tuple), > + (struct key_part *) > + base->def->key_def->parts); 8. Looks like you ignored my comment about why do you case struct key_part * to self. Let me fix it for you where possible. Where it is not possible, please, remove 'const struct key_def' qualifier from the function declaration and do not cast too. Also, as you could see, when a tuple_field function is introduced, it has two versions: raw and not raw. Not raw functions are used exactly for such hunks, when you do not need to get multiple fields. (partially fixed by me) > uint32_t key_len; > const void *key = make_key(field, &key_len); > #ifndef OLD_GOOD_BITSET > diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc > index e53afba..5d7df4d 100644 > --- a/src/box/tuple_compare.cc > +++ b/src/box/tuple_compare.cc > @@ -449,7 +459,7 @@ tuple_common_key_parts(const struct tuple *tuple_a, > return i; > } > > -template<bool is_nullable, bool has_optional_parts> > +template<bool is_nullable, bool has_optional_parts, bool has_json_path> 9. You said, that has_json_path is substituted with is_flat, but looks like you did not. Please, fix all the remarks of the previous review more accurate. > static inline int > tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, > const struct key_def *key_def) > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index b385c0d..2d4a85f 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -232,6 +232,11 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, > format->dict = dict; > tuple_dictionary_ref(dict); > } > + /* > + * Set invalid epoch that should be changed later on > + * attaching to space. > + */ > + format->epoch = 1; 10. Why is this epoch invalid? As I understand, a format can live with it normally but comparators will be slow. > format->refs = 0; > format->id = FORMAT_ID_NIL; > format->field_count = field_count; > diff --git a/src/box/vy_stmt.h b/src/box/vy_stmt.h > index 273d5e8..c4885c0 100644 > --- a/src/box/vy_stmt.h > +++ b/src/box/vy_stmt.h > @@ -719,7 +719,12 @@ static inline bool > vy_tuple_key_contains_null(const struct tuple *tuple, const struct key_def *def) > { > for (uint32_t i = 0; i < def->part_count; ++i) { > - const char *field = tuple_field(tuple, def->parts[i].fieldno); > + const char *field = > + tuple_field_by_part_raw(tuple_format(tuple), > + tuple_data(tuple), > + tuple_field_map(tuple), > + (struct key_part *) > + &def->parts[i]); 11. The same as in the previous review: do not call tuple_format() and other tuple functions multiple times when possible. (fixed by me) > if (field == NULL || mp_typeof(*field) == MP_NIL) > return true; > } > Diff of my fixes here and on the branch: commit 2bb228dafc7b8b1f69e1ed077c01861ea50e6ec9 Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Date: Thu Aug 30 16:32:28 2018 -0300 Review fixes diff --git a/src/box/alter.cc b/src/box/alter.cc index 0833635f6..3267b978a 100644 --- a/src/box/alter.cc +++ b/src/box/alter.cc @@ -1036,18 +1036,19 @@ class ModifySpaceFormat: public AlterSpaceOp { public: ModifySpaceFormat(struct alter_space *alter) : AlterSpaceOp(alter) {} - virtual void alter(struct alter_space *alter); + + virtual void + alter(struct alter_space *alter); }; void -ModifySpaceFormat:: alter(struct alter_space * alter) +ModifySpaceFormat::alter(struct alter_space *alter) { - struct tuple_format *format = alter->new_space != NULL ? - alter->new_space->format : NULL; + struct tuple_format *format = alter->new_space->format; if (format == NULL) return; + format->epoch = alter->old_space->format->epoch; struct rlist *key_list = &alter->key_list; - bool is_format_epoch_changed = false; struct index_def *index_def; rlist_foreach_entry(index_def, key_list, link) { struct key_part *part = index_def->key_def->parts; @@ -1056,15 +1057,12 @@ ModifySpaceFormat:: alter(struct alter_space * alter) for (; part < parts_end; part++) { struct tuple_field *field = &format->fields[part->fieldno]; - if (field->offset_slot != part->offset_slot) - is_format_epoch_changed = true; + if (field->offset_slot != part->offset_slot) { + ++format->epoch; + return; + } } } - format->epoch = alter->old_space != NULL && - alter->old_space->format != NULL ? - alter->old_space->format->epoch : 1; - if (is_format_epoch_changed) - format->epoch++; } @@ -1352,7 +1350,6 @@ RebuildIndex::prepare(struct alter_space *alter) /* Get the new index and build it. */ new_index = space_index(alter->new_space, new_index_def->iid); assert(new_index != NULL); - assert(alter->new_space != NULL && alter->old_space != NULL); space_build_index_xc(alter->old_space, new_index, alter->new_space->format); } diff --git a/src/box/memtx_bitset.c b/src/box/memtx_bitset.c index 9529618d2..cd7362ee1 100644 --- a/src/box/memtx_bitset.c +++ b/src/box/memtx_bitset.c @@ -284,11 +284,8 @@ memtx_bitset_index_replace(struct index *base, struct tuple *old_tuple, if (new_tuple != NULL) { const char *field = - tuple_field_by_part_raw(tuple_format(new_tuple), - tuple_data(new_tuple), - tuple_field_map(new_tuple), - (struct key_part *) - base->def->key_def->parts); + tuple_field_by_part(new_tuple, + base->def->key_def->parts); uint32_t key_len; const void *key = make_key(field, &key_len); #ifndef OLD_GOOD_BITSET diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c index 00aaf79d4..f2aa6c3e5 100644 --- a/src/box/memtx_rtree.c +++ b/src/box/memtx_rtree.c @@ -112,11 +112,8 @@ extract_rectangle(struct rtree_rect *rect, const struct tuple *tuple, struct index_def *index_def) { assert(index_def->key_def->part_count == 1); - const char *elems = - tuple_field_by_part_raw(tuple_format(tuple), tuple_data(tuple), - tuple_field_map(tuple), - (struct key_part *) - index_def->key_def->parts); + const char *elems = tuple_field_by_part(tuple, + index_def->key_def->parts); unsigned dimension = index_def->opts.dimension; uint32_t count = mp_decode_array(&elems); return mp_decode_rect(rect, dimension, elems, count, "Field"); diff --git a/src/box/tuple.h b/src/box/tuple.h index 2e84516de..b638f5086 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -43,6 +43,7 @@ extern "C" { struct slab_arena; struct quota; +struct key_part; /** * A format for standalone tuples allocated on runtime arena. @@ -521,6 +522,19 @@ tuple_field(const struct tuple *tuple, uint32_t fieldno) tuple_field_map(tuple), fieldno); } +/** + * Get a field refereed by index @part in tuple. + * @param tuple Tuple to get the field from. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. + */ +static inline const char * +tuple_field_by_part(const struct tuple *tuple, struct key_part *part) +{ + return tuple_field_by_part_raw(tuple_format(tuple), tuple_data(tuple), + tuple_field_map(tuple), part); +} + /** * Get tuple field by its JSON path. * @param tuple Tuple to get field from. diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index ecbf64c24..9406d5bbc 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -330,12 +330,12 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, const char *tuple); /** - * Get a field refereed by multipart index @part in tuple. + * Get a field refereed by index @part in tuple. * @param format Tuple format. * @param tuple A pointer to MessagePack array. * @param field_map A pointer to the LAST element of field map. - * @param part Multipart index part to use. - * @retval Field data if field exists or NULL. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. */ const char * tuple_field_by_part_raw(const struct tuple_format *format, const char *data, diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index 561ca550a..c9a16ee31 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -158,11 +158,8 @@ struct TupleHash uint32_t carry = 0; uint32_t total_size = 0; const char *field = - tuple_field_by_part_raw(tuple_format(tuple), - tuple_data(tuple), - tuple_field_map(tuple), - (struct key_part *) - key_def->parts); + tuple_field_by_part(tuple, + (struct key_part *) key_def->parts); TupleFieldHash<TYPE, MORE_TYPES...>:: hash(&field, &h, &carry, &total_size); return PMurHash32_Result(h, carry, total_size); @@ -175,11 +172,8 @@ struct TupleHash<FIELD_TYPE_UNSIGNED> { const struct key_def *key_def) { const char *field = - tuple_field_by_part_raw(tuple_format(tuple), - tuple_data(tuple), - tuple_field_map(tuple), - (struct key_part *) - key_def->parts); + tuple_field_by_part(tuple, + (struct key_part *) key_def->parts); uint64_t val = mp_decode_uint(&field); if (likely(val <= UINT32_MAX)) return val; @@ -322,10 +316,8 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple, const struct key_part *part) { - const char *field = - tuple_field_by_part_raw(tuple_format(tuple), tuple_data(tuple), - tuple_field_map(tuple), - (struct key_part *)part); + const char *field = tuple_field_by_part(tuple, + (struct key_part *) part); if (field == NULL) return tuple_hash_null(ph1, pcarry); return tuple_hash_field(ph1, pcarry, &field, part->coll); diff --git a/src/box/vy_stmt.h b/src/box/vy_stmt.h index c4885c09b..4f8cbbdd3 100644 --- a/src/box/vy_stmt.h +++ b/src/box/vy_stmt.h @@ -716,15 +716,15 @@ vy_tuple_format_new_with_colmask(struct tuple_format *mem_format); * @retval Does the key contain NULL or not? */ static inline bool -vy_tuple_key_contains_null(const struct tuple *tuple, const struct key_def *def) +vy_tuple_key_contains_null(const struct tuple *tuple, struct key_def *def) { - for (uint32_t i = 0; i < def->part_count; ++i) { + struct tuple_format *format = tuple_format(tuple); + const char *data = tuple_data(tuple); + const uint32_t *field_map = tuple_field_map(tuple); + for (struct key_part *part = def->parts, *end = part + def->part_count; + part < end; ++part) { const char *field = - tuple_field_by_part_raw(tuple_format(tuple), - tuple_data(tuple), - tuple_field_map(tuple), - (struct key_part *) - &def->parts[i]); + tuple_field_by_part_raw(format, data, field_map, part); if (field == NULL || mp_typeof(*field) == MP_NIL) return true; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 2/4] box: introduce slot_cache in key_part 2018-09-03 10:35 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-09-06 12:47 ` Kirill Shcherbatov 2018-09-17 17:08 ` Vladimir Davydov 0 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-09-06 12:47 UTC (permalink / raw) To: Vladislav Shpilevoy, Vladimir Davydov, Tarantool MailList > Hi! Thanks for the fixes! Hi. Thank you for review and fixes. > 1. Comments. /** ModifySpaceFormat - update format epoch in a new space. */ class ModifySpaceFormat: public AlterSpaceOp /* * Alter format epoch in a new space format to be greater than * previous one. */ void ModifySpaceFormat::alter(struct alter_space *alter) > 2. Redundant white spaces. > (fixed by me) > 3. How new_space can be NULL? alter() called only after new_space > is created. > (fixed by me) > 4. Makes no sense to continue the cycle from this > moment. > (fixed by me) > 5. How old_space can be NULL here? It is stored in struct alter > right in its constructor. What is more, how old_space->format > can be NULL, if a new space format is not NULL? And you do not > need is_format_epoch_changed in such a case. > (fixed by me) > 6. Garbage diff. > (fixed by me) > 7. Why 'the eldest'? As I remember, we decided to prefer newer > format offset slots. > 8. Looks like you ignored my comment about why do you > case struct key_part * to self. Let me fix it for you > where possible. Where it is not possible, please, > remove 'const struct key_def' qualifier from the function > declaration and do not cast too. > Also, as you could see, when a tuple_field function is > introduced, it has two versions: raw and not raw. Not raw > functions are used exactly for such hunks, when you do > not need to get multiple fields. > (partially fixed by me) This refactoring is really global and time-consuming. Going to do it at last. >> -template<bool is_nullable, bool has_optional_parts> >> +template<bool is_nullable, bool has_optional_parts, bool has_json_path> > 9. You said, that has_json_path is substituted with is_flat, but looks > like you did not. Please, fix all the remarks of the previous review > more accurate. Not, I've choosen has_json_path that I use everywhere. >> + format->epoch = 1; > 10. Why is this epoch invalid? As I understand, a format can live > with it normally but comparators will be slow. At first, invalid epoch is 0 and there is 0 on branch. Here is a bug of letter patch. At second, we need invalid epoch as vinyl creates indexes during space creations, before alter that setup correct epoch. > 11. The same as in the previous review: do not call tuple_format() > and other tuple functions multiple times when possible. > (fixed by me) ================================================= From 996ee351112bb070d511636bc702496bc445f047 Mon Sep 17 00:00:00 2001 Message-Id: <996ee351112bb070d511636bc702496bc445f047.1536237903.git.kshcherbatov@tarantool.org> In-Reply-To: <cover.1536237903.git.kshcherbatov@tarantool.org> References: <cover.1536237903.git.kshcherbatov@tarantool.org> From: Kirill Shcherbatov <kshcherbatov@tarantool.org> Date: Thu, 9 Aug 2018 15:02:44 +0300 Subject: [PATCH 2/4] box: introduce slot_cache in key_part Same key_part could be used in different formats multiple times, so different field->offset_slot would be allocated. In most scenarios we work with series of tuples of same format, and (in general) format lookup for field would be expensive operation for JSON-paths defined in key_part. New slot_cache field in key_part structure and epoch-based mechanism to validate it actuality should be effective approach to improve performance. New routine tuple_field_by_part use tuple and key_part to access field that allows to rework and speedup all scenarios of access tuple data by index. This also allows to work with JSON-path key_parts later. Part of #1012. --- src/box/alter.cc | 6 +- src/box/blackhole.c | 3 +- src/box/engine.h | 11 ++-- src/box/key_def.c | 16 +++-- src/box/key_def.h | 38 +++++++----- src/box/memtx_bitset.c | 5 +- src/box/memtx_engine.c | 4 +- src/box/memtx_hash.h | 4 +- src/box/memtx_rtree.c | 3 +- src/box/memtx_space.c | 3 +- src/box/memtx_space.h | 2 +- src/box/schema.cc | 4 +- src/box/space.c | 4 +- src/box/space.h | 8 ++- src/box/sysview.c | 3 +- src/box/tuple.h | 14 +++++ src/box/tuple_bloom.c | 8 +-- src/box/tuple_bloom.h | 8 +-- src/box/tuple_compare.cc | 141 ++++++++++++++++++++++++++----------------- src/box/tuple_compare.h | 5 +- src/box/tuple_extract_key.cc | 53 +++++++++------- src/box/tuple_format.c | 12 ++++ src/box/tuple_format.h | 18 ++++++ src/box/tuple_hash.cc | 63 ++++++++++++------- src/box/tuple_hash.h | 9 ++- src/box/vinyl.c | 3 +- src/box/vy_history.c | 2 +- src/box/vy_history.h | 2 +- src/box/vy_lsm.c | 2 + src/box/vy_mem.c | 8 +-- src/box/vy_mem.h | 10 +-- src/box/vy_range.c | 2 +- src/box/vy_range.h | 4 +- src/box/vy_run.c | 39 ++++++------ src/box/vy_run.h | 34 +++++------ src/box/vy_stmt.c | 15 ++--- src/box/vy_stmt.h | 39 ++++++------ src/box/vy_upsert.c | 2 +- src/box/vy_upsert.h | 2 +- src/box/vy_write_iterator.c | 8 +-- src/box/vy_write_iterator.h | 6 +- 41 files changed, 374 insertions(+), 249 deletions(-) diff --git a/src/box/alter.cc b/src/box/alter.cc index a6299a1..123db1f 100644 --- a/src/box/alter.cc +++ b/src/box/alter.cc @@ -883,7 +883,9 @@ alter_space_do(struct txn *txn, struct alter_space *alter) * Create a new (empty) space for the new definition. * Sic: the triggers are not moved over yet. */ - alter->new_space = space_new_xc(alter->space_def, &alter->key_list); + alter->new_space = space_new_xc(alter->space_def, &alter->key_list, + alter->old_space->format != NULL ? + alter->old_space->format->epoch : 0); /* * Copy the replace function, the new space is at the same recovery * phase as the old one. This hack is especially necessary for @@ -1604,7 +1606,7 @@ on_replace_dd_space(struct trigger * /* trigger */, void *event) auto def_guard = make_scoped_guard([=] { space_def_delete(def); }); RLIST_HEAD(empty_list); - struct space *space = space_new_xc(def, &empty_list); + struct space *space = space_new_xc(def, &empty_list, 0); /** * The new space must be inserted in the space * cache right away to achieve linearisable diff --git a/src/box/blackhole.c b/src/box/blackhole.c index f979304..160154b 100644 --- a/src/box/blackhole.c +++ b/src/box/blackhole.c @@ -135,7 +135,7 @@ blackhole_engine_shutdown(struct engine *engine) static struct space * blackhole_engine_create_space(struct engine *engine, struct space_def *def, - struct rlist *key_list) + struct rlist *key_list, uint64_t epoch) { if (!rlist_empty(key_list)) { diag_set(ClientError, ER_UNSUPPORTED, "Blackhole", "indexes"); @@ -158,6 +158,7 @@ blackhole_engine_create_space(struct engine *engine, struct space_def *def, return NULL; } format->exact_field_count = def->exact_field_count; + format->epoch = ++epoch; tuple_format_ref(format); if (space_create(space, engine, &blackhole_space_vtab, diff --git a/src/box/engine.h b/src/box/engine.h index 5b96c74..0e8c76c 100644 --- a/src/box/engine.h +++ b/src/box/engine.h @@ -72,7 +72,8 @@ struct engine_vtab { void (*shutdown)(struct engine *); /** Allocate a new space instance. */ struct space *(*create_space)(struct engine *engine, - struct space_def *def, struct rlist *key_list); + struct space_def *def, struct rlist *key_list, + uint64_t epoch); /** * Write statements stored in checkpoint @vclock to @stream. */ @@ -237,9 +238,9 @@ engine_find(const char *name) static inline struct space * engine_create_space(struct engine *engine, struct space_def *def, - struct rlist *key_list) + struct rlist *key_list, uint64_t epoch) { - return engine->vtab->create_space(engine, def, key_list); + return engine->vtab->create_space(engine, def, key_list, epoch); } static inline int @@ -390,9 +391,9 @@ engine_find_xc(const char *name) static inline struct space * engine_create_space_xc(struct engine *engine, struct space_def *def, - struct rlist *key_list) + struct rlist *key_list, uint64_t epoch) { - struct space *space = engine_create_space(engine, def, key_list); + struct space *space = engine_create_space(engine, def, key_list, epoch); if (space == NULL) diag_raise(); return space; diff --git a/src/box/key_def.c b/src/box/key_def.c index ee09dc9..2ef78c1 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -208,14 +208,14 @@ box_key_def_delete(box_key_def_t *key_def) int box_tuple_compare(const box_tuple_t *tuple_a, const box_tuple_t *tuple_b, - const box_key_def_t *key_def) + box_key_def_t *key_def) { return tuple_compare(tuple_a, tuple_b, key_def); } int box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b, - const box_key_def_t *key_def) + box_key_def_t *key_def) { uint32_t part_count = mp_decode_array(&key_b); return tuple_compare_with_key(tuple_a, key_b, part_count, key_def); @@ -258,6 +258,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, def->parts[part_no].type = type; def->parts[part_no].coll = coll; def->parts[part_no].coll_id = coll_id; + def->parts[part_no].offset_slot = TUPLE_OFFSET_SLOT_NIL; + def->parts[part_no].offset_slot_epoch = 0; column_mask_set_fieldno(&def->column_mask, fieldno); /** * When all parts are set, initialize the tuple @@ -556,8 +558,11 @@ key_def_merge(const struct key_def *first, const struct key_def *second) part = first->parts; end = part + first->part_count; for (; part != end; part++) { - key_def_set_part(new_def, pos++, part->fieldno, part->type, + key_def_set_part(new_def, pos, part->fieldno, part->type, part->is_nullable, part->coll, part->coll_id); + new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch; + new_def->parts[pos].offset_slot = part->offset_slot; + pos++; } /* Set-append second key def's part to the new key def. */ @@ -566,8 +571,11 @@ key_def_merge(const struct key_def *first, const struct key_def *second) for (; part != end; part++) { if (key_def_find(first, part->fieldno)) continue; - key_def_set_part(new_def, pos++, part->fieldno, part->type, + key_def_set_part(new_def, pos, part->fieldno, part->type, part->is_nullable, part->coll, part->coll_id); + new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch; + new_def->parts[pos].offset_slot = part->offset_slot; + pos++; } return new_def; } diff --git a/src/box/key_def.h b/src/box/key_def.h index aecbe03..07997b8 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -74,6 +74,17 @@ struct key_part { struct coll *coll; /** True if a part can store NULLs. */ bool is_nullable; + /** + * Epoch of offset slot cache. Initialized with + * incremental epoch of format on caching it's field's + * offset_slot via tuple_field_by_part_raw to speed up + * access on subsequent calls with same format. + * Cache is expected to use "the newest format is most + * relevant" strategy. + */ + uint64_t offset_slot_epoch; + /** Cache with format's field offset slot. */ + int32_t offset_slot; }; struct key_def; @@ -83,26 +94,26 @@ struct tuple; typedef int (*tuple_compare_with_key_t)(const struct tuple *tuple_a, const char *key, uint32_t part_count, - const struct key_def *key_def); + struct key_def *key_def); /** @copydoc tuple_compare() */ typedef int (*tuple_compare_t)(const struct tuple *tuple_a, const struct tuple *tuple_b, - const struct key_def *key_def); + struct key_def *key_def); /** @copydoc tuple_extract_key() */ typedef char *(*tuple_extract_key_t)(const struct tuple *tuple, - const struct key_def *key_def, + struct key_def *key_def, uint32_t *key_size); /** @copydoc tuple_extract_key_raw() */ typedef char *(*tuple_extract_key_raw_t)(const char *data, const char *data_end, - const struct key_def *key_def, + struct key_def *key_def, uint32_t *key_size); /** @copydoc tuple_hash() */ typedef uint32_t (*tuple_hash_t)(const struct tuple *tuple, - const struct key_def *key_def); + struct key_def *key_def); /** @copydoc key_hash() */ typedef uint32_t (*key_hash_t)(const char *key, - const struct key_def *key_def); + struct key_def *key_def); /* Definition of a multipart key. */ struct key_def { @@ -201,7 +212,7 @@ box_key_def_delete(box_key_def_t *key_def); */ int box_tuple_compare(const box_tuple_t *tuple_a, const box_tuple_t *tuple_b, - const box_key_def_t *key_def); + box_key_def_t *key_def); /** * @brief Compare tuple with key using the key definition. @@ -216,7 +227,7 @@ box_tuple_compare(const box_tuple_t *tuple_a, const box_tuple_t *tuple_b, int box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b, - const box_key_def_t *key_def); + box_key_def_t *key_def); /** \endcond public */ @@ -443,7 +454,7 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, * @retval NULL Memory allocation error */ static inline char * -tuple_extract_key(const struct tuple *tuple, const struct key_def *key_def, +tuple_extract_key(const struct tuple *tuple, struct key_def *key_def, uint32_t *key_size) { return key_def->tuple_extract_key(tuple, key_def, key_size); @@ -464,7 +475,7 @@ tuple_extract_key(const struct tuple *tuple, const struct key_def *key_def, */ static inline char * tuple_extract_key_raw(const char *data, const char *data_end, - const struct key_def *key_def, uint32_t *key_size) + struct key_def *key_def, uint32_t *key_size) { return key_def->tuple_extract_key_raw(data, data_end, key_def, key_size); @@ -483,8 +494,7 @@ tuple_extract_key_raw(const char *data, const char *data_end, * @retval >0 if key_a > key_b */ int -key_compare(const char *key_a, const char *key_b, - const struct key_def *key_def); +key_compare(const char *key_a, const char *key_b, struct key_def *key_def); /** * Compare tuples using the key definition. @@ -497,7 +507,7 @@ key_compare(const char *key_a, const char *key_b, */ static inline int tuple_compare(const struct tuple *tuple_a, const struct tuple *tuple_b, - const struct key_def *key_def) + struct key_def *key_def) { return key_def->tuple_compare(tuple_a, tuple_b, key_def); } @@ -515,7 +525,7 @@ tuple_compare(const struct tuple *tuple_a, const struct tuple *tuple_b, */ static inline int tuple_compare_with_key(const struct tuple *tuple, const char *key, - uint32_t part_count, const struct key_def *key_def) + uint32_t part_count, struct key_def *key_def) { return key_def->tuple_compare_with_key(tuple, key, part_count, key_def); } diff --git a/src/box/memtx_bitset.c b/src/box/memtx_bitset.c index a665f1a..cd7362e 100644 --- a/src/box/memtx_bitset.c +++ b/src/box/memtx_bitset.c @@ -283,8 +283,9 @@ memtx_bitset_index_replace(struct index *base, struct tuple *old_tuple, } if (new_tuple != NULL) { - const char *field; - field = tuple_field(new_tuple, base->def->key_def->parts[0].fieldno); + const char *field = + tuple_field_by_part(new_tuple, + base->def->key_def->parts); uint32_t key_len; const void *key = make_key(field, &key_len); #ifndef OLD_GOOD_BITSET diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 1f80ce5..4b7d377 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -358,10 +358,10 @@ memtx_engine_end_recovery(struct engine *engine) static struct space * memtx_engine_create_space(struct engine *engine, struct space_def *def, - struct rlist *key_list) + struct rlist *key_list, uint64_t epoch) { struct memtx_engine *memtx = (struct memtx_engine *)engine; - return memtx_space_new(memtx, def, key_list); + return memtx_space_new(memtx, def, key_list, epoch); } static int diff --git a/src/box/memtx_hash.h b/src/box/memtx_hash.h index a3b4805..10663fc 100644 --- a/src/box/memtx_hash.h +++ b/src/box/memtx_hash.h @@ -39,14 +39,14 @@ extern "C" { static inline bool memtx_hash_equal(struct tuple *tuple_a, struct tuple *tuple_b, - const struct key_def *key_def) + struct key_def *key_def) { return tuple_compare(tuple_a, tuple_b, key_def) == 0; } static inline bool memtx_hash_equal_key(struct tuple *tuple, const char *key, - const struct key_def *key_def) + struct key_def *key_def) { return tuple_compare_with_key(tuple, key, key_def->part_count, key_def) == 0; diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c index 0b12cda..f2aa6c3 100644 --- a/src/box/memtx_rtree.c +++ b/src/box/memtx_rtree.c @@ -112,7 +112,8 @@ extract_rectangle(struct rtree_rect *rect, const struct tuple *tuple, struct index_def *index_def) { assert(index_def->key_def->part_count == 1); - const char *elems = tuple_field(tuple, index_def->key_def->parts[0].fieldno); + const char *elems = tuple_field_by_part(tuple, + index_def->key_def->parts); unsigned dimension = index_def->opts.dimension; uint32_t count = mp_decode_array(&elems); return mp_decode_rect(rect, dimension, elems, count, "Field"); diff --git a/src/box/memtx_space.c b/src/box/memtx_space.c index 08ae0da..8b30fc2 100644 --- a/src/box/memtx_space.c +++ b/src/box/memtx_space.c @@ -884,7 +884,7 @@ static const struct space_vtab memtx_space_vtab = { struct space * memtx_space_new(struct memtx_engine *memtx, - struct space_def *def, struct rlist *key_list) + struct space_def *def, struct rlist *key_list, uint64_t epoch) { struct memtx_space *memtx_space = malloc(sizeof(*memtx_space)); if (memtx_space == NULL) { @@ -918,6 +918,7 @@ memtx_space_new(struct memtx_engine *memtx, format->engine = memtx; format->is_temporary = def->opts.is_temporary; format->exact_field_count = def->exact_field_count; + format->epoch = ++epoch; tuple_format_ref(format); if (space_create((struct space *)memtx_space, (struct engine *)memtx, diff --git a/src/box/memtx_space.h b/src/box/memtx_space.h index 7dc3410..b5bec0c 100644 --- a/src/box/memtx_space.h +++ b/src/box/memtx_space.h @@ -79,7 +79,7 @@ memtx_space_replace_all_keys(struct space *, struct tuple *, struct tuple *, struct space * memtx_space_new(struct memtx_engine *memtx, - struct space_def *def, struct rlist *key_list); + struct space_def *def, struct rlist *key_list, uint64_t epoch); #if defined(__cplusplus) } /* extern "C" */ diff --git a/src/box/schema.cc b/src/box/schema.cc index 7f20f36..e52e19d 100644 --- a/src/box/schema.cc +++ b/src/box/schema.cc @@ -214,7 +214,7 @@ sc_space_new(uint32_t id, const char *name, struct key_def *key_def, struct rlist key_list; rlist_create(&key_list); rlist_add_entry(&key_list, index_def, link); - struct space *space = space_new_xc(def, &key_list); + struct space *space = space_new_xc(def, &key_list, 0); (void) space_cache_replace(space); if (replace_trigger) trigger_add(&space->on_replace, replace_trigger); @@ -380,7 +380,7 @@ schema_init() space_def_delete(def); }); RLIST_HEAD(key_list); - struct space *space = space_new_xc(def, &key_list); + struct space *space = space_new_xc(def, &key_list, 0); space_cache_replace(space); init_system_space(space); trigger_run_xc(&on_alter_space, space); diff --git a/src/box/space.c b/src/box/space.c index 871cc67..2e4df74 100644 --- a/src/box/space.c +++ b/src/box/space.c @@ -181,12 +181,12 @@ fail: } struct space * -space_new(struct space_def *def, struct rlist *key_list) +space_new(struct space_def *def, struct rlist *key_list, uint64_t epoch) { struct engine *engine = engine_find(def->engine_name); if (engine == NULL) return NULL; - return engine_create_space(engine, def, key_list); + return engine_create_space(engine, def, key_list, epoch); } void diff --git a/src/box/space.h b/src/box/space.h index 8888ec8..068ea4b 100644 --- a/src/box/space.h +++ b/src/box/space.h @@ -378,10 +378,11 @@ struct field_def; * Allocate and initialize a space. * @param space_def Space definition. * @param key_list List of index_defs. + * @param epoch Last epoch to initialize format. * @retval Space object. */ struct space * -space_new(struct space_def *space_def, struct rlist *key_list); +space_new(struct space_def *space_def, struct rlist *key_list, uint64_t epoch); /** Destroy and free a space. */ void @@ -416,9 +417,10 @@ int generic_space_prepare_alter(struct space *, struct space *); } /* extern "C" */ static inline struct space * -space_new_xc(struct space_def *space_def, struct rlist *key_list) +space_new_xc(struct space_def *space_def, struct rlist *key_list, + uint64_t epoch) { - struct space *space = space_new(space_def, key_list); + struct space *space = space_new(space_def, key_list, epoch); if (space == NULL) diag_raise(); return space; diff --git a/src/box/sysview.c b/src/box/sysview.c index a636c68..d35ff71 100644 --- a/src/box/sysview.c +++ b/src/box/sysview.c @@ -504,8 +504,9 @@ sysview_engine_shutdown(struct engine *engine) static struct space * sysview_engine_create_space(struct engine *engine, struct space_def *def, - struct rlist *key_list) + struct rlist *key_list, uint64_t epoch) { + (void)epoch; struct space *space = (struct space *)calloc(1, sizeof(*space)); if (space == NULL) { diag_set(OutOfMemory, sizeof(*space), diff --git a/src/box/tuple.h b/src/box/tuple.h index 2e84516..b638f50 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -43,6 +43,7 @@ extern "C" { struct slab_arena; struct quota; +struct key_part; /** * A format for standalone tuples allocated on runtime arena. @@ -522,6 +523,19 @@ tuple_field(const struct tuple *tuple, uint32_t fieldno) } /** + * Get a field refereed by index @part in tuple. + * @param tuple Tuple to get the field from. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. + */ +static inline const char * +tuple_field_by_part(const struct tuple *tuple, struct key_part *part) +{ + return tuple_field_by_part_raw(tuple_format(tuple), tuple_data(tuple), + tuple_field_map(tuple), part); +} + +/** * Get tuple field by its JSON path. * @param tuple Tuple to get field from. * @param path Field JSON path. diff --git a/src/box/tuple_bloom.c b/src/box/tuple_bloom.c index ffad151..dc40698 100644 --- a/src/box/tuple_bloom.c +++ b/src/box/tuple_bloom.c @@ -74,8 +74,7 @@ tuple_bloom_builder_delete(struct tuple_bloom_builder *builder) int tuple_bloom_builder_add(struct tuple_bloom_builder *builder, - const struct tuple *tuple, - const struct key_def *key_def, + const struct tuple *tuple, struct key_def *key_def, uint32_t hashed_parts) { assert(builder->part_count == key_def->part_count); @@ -168,8 +167,7 @@ tuple_bloom_delete(struct tuple_bloom *bloom) bool tuple_bloom_maybe_has(const struct tuple_bloom *bloom, - const struct tuple *tuple, - const struct key_def *key_def) + const struct tuple *tuple, struct key_def *key_def) { if (bloom->is_legacy) { return bloom_maybe_has(&bloom->parts[0], @@ -195,7 +193,7 @@ tuple_bloom_maybe_has(const struct tuple_bloom *bloom, bool tuple_bloom_maybe_has_key(const struct tuple_bloom *bloom, const char *key, uint32_t part_count, - const struct key_def *key_def) + struct key_def *key_def) { if (bloom->is_legacy) { if (part_count < key_def->part_count) diff --git a/src/box/tuple_bloom.h b/src/box/tuple_bloom.h index 505933d..b05fee1 100644 --- a/src/box/tuple_bloom.h +++ b/src/box/tuple_bloom.h @@ -117,8 +117,7 @@ tuple_bloom_builder_delete(struct tuple_bloom_builder *builder); */ int tuple_bloom_builder_add(struct tuple_bloom_builder *builder, - const struct tuple *tuple, - const struct key_def *key_def, + const struct tuple *tuple, struct key_def *key_def, uint32_t hashed_parts); /** @@ -147,8 +146,7 @@ tuple_bloom_delete(struct tuple_bloom *bloom); */ bool tuple_bloom_maybe_has(const struct tuple_bloom *bloom, - const struct tuple *tuple, - const struct key_def *key_def); + const struct tuple *tuple, struct key_def *key_def); /** * Check if a tuple matching a key was stored in a tuple bloom filter. @@ -162,7 +160,7 @@ tuple_bloom_maybe_has(const struct tuple_bloom *bloom, bool tuple_bloom_maybe_has_key(const struct tuple_bloom *bloom, const char *key, uint32_t part_count, - const struct key_def *key_def); + struct key_def *key_def); /** * Return the size of a tuple bloom filter when encoded. diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index e53afba..b14ac35 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -426,15 +426,24 @@ tuple_compare_field_with_hint(const char *field_a, enum mp_type a_type, } uint32_t -tuple_common_key_parts(const struct tuple *tuple_a, - const struct tuple *tuple_b, - const struct key_def *key_def) +tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b, + struct key_def *key_def) { uint32_t i; + struct tuple_format *tuple_a_format = tuple_format(tuple_a); + struct tuple_format *tuple_b_format = tuple_format(tuple_b); + const char *tuple_a_raw = tuple_data(tuple_a); + const char *tuple_b_raw = tuple_data(tuple_b); + const uint32_t *tuple_a_field_map = tuple_field_map(tuple_a); + const uint32_t *tuple_b_field_map = tuple_field_map(tuple_b); for (i = 0; i < key_def->part_count; i++) { - const struct key_part *part = &key_def->parts[i]; - const char *field_a = tuple_field(tuple_a, part->fieldno); - const char *field_b = tuple_field(tuple_b, part->fieldno); + struct key_part *part = (struct key_part *)&key_def->parts[i]; + const char *field_a = + tuple_field_by_part_raw(tuple_a_format, tuple_a_raw, + tuple_a_field_map, part); + const char *field_b = + tuple_field_by_part_raw(tuple_b_format, tuple_b_raw, + tuple_b_field_map, part); enum mp_type a_type = field_a != NULL ? mp_typeof(*field_a) : MP_NIL; enum mp_type b_type = field_b != NULL ? @@ -449,15 +458,15 @@ tuple_common_key_parts(const struct tuple *tuple_a, return i; } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool has_json_path> static inline int tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, - const struct key_def *key_def) + struct key_def *key_def) { assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); - const struct key_part *part = key_def->parts; + struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); const char *tuple_b_raw = tuple_data(tuple_b); if (key_def->part_count == 1 && part->fieldno == 0) { @@ -488,7 +497,7 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, const struct tuple_format *format_b = tuple_format(tuple_b); const uint32_t *field_map_a = tuple_field_map(tuple_a); const uint32_t *field_map_b = tuple_field_map(tuple_b); - const struct key_part *end; + struct key_part *end; const char *field_a, *field_b; enum mp_type a_type, b_type; int rc; @@ -498,10 +507,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, end = part + key_def->part_count; for (; part < end; part++) { - field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a, - part->fieldno); - field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, - part->fieldno); + if (!has_json_path) { + field_a = tuple_field_raw(format_a, tuple_a_raw, + field_map_a, + part->fieldno); + field_b = tuple_field_raw(format_b, tuple_b_raw, + field_map_b, + part->fieldno); + } else { + field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, + field_map_a, part); + field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, + field_map_b, part); + } assert(has_optional_parts || (field_a != NULL && field_b != NULL)); if (! is_nullable) { @@ -548,10 +566,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, */ end = key_def->parts + key_def->part_count; for (; part < end; ++part) { - field_a = tuple_field_raw(format_a, tuple_a_raw, field_map_a, - part->fieldno); - field_b = tuple_field_raw(format_b, tuple_b_raw, field_map_b, - part->fieldno); + if (!has_json_path) { + field_a = tuple_field_raw(format_a, tuple_a_raw, + field_map_a, + part->fieldno); + field_b = tuple_field_raw(format_b, tuple_b_raw, + field_map_b, + part->fieldno); + } else { + field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, + field_map_a, part); + field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, + field_map_b, part); + } /* * Extended parts are primary, and they can not * be absent or be NULLs. @@ -565,26 +592,30 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, return 0; } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths> static inline int tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, - uint32_t part_count, - const struct key_def *key_def) + uint32_t part_count, struct key_def *key_def) { assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(key != NULL || part_count == 0); assert(part_count <= key_def->part_count); - const struct key_part *part = key_def->parts; + struct key_part *part = key_def->parts; const struct tuple_format *format = tuple_format(tuple); const char *tuple_raw = tuple_data(tuple); const uint32_t *field_map = tuple_field_map(tuple); enum mp_type a_type, b_type; if (likely(part_count == 1)) { const char *field; - field = tuple_field_raw(format, tuple_raw, field_map, - part->fieldno); + if (!has_json_paths) { + field = tuple_field_raw(format, tuple_raw, field_map, + part->fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, + field_map, part); + } if (! is_nullable) { return tuple_compare_field(field, key, part->type, part->coll); @@ -605,12 +636,17 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, } } - const struct key_part *end = part + part_count; + struct key_part *end = part + part_count; int rc; for (; part < end; ++part, mp_next(&key)) { const char *field; - field = tuple_field_raw(format, tuple_raw, field_map, - part->fieldno); + if (!has_json_paths) { + field = tuple_field_raw(format, tuple_raw, field_map, + part->fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, + field_map, part); + } if (! is_nullable) { rc = tuple_compare_field(field, key, part->type, part->coll); @@ -643,11 +679,11 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, template<bool is_nullable> static inline int key_compare_parts(const char *key_a, const char *key_b, uint32_t part_count, - const struct key_def *key_def) + struct key_def *key_def) { assert(is_nullable == key_def->is_nullable); assert((key_a != NULL && key_b != NULL) || part_count == 0); - const struct key_part *part = key_def->parts; + struct key_part *part = key_def->parts; if (likely(part_count == 1)) { if (! is_nullable) { return tuple_compare_field(key_a, key_b, part->type, @@ -667,7 +703,7 @@ key_compare_parts(const char *key_a, const char *key_b, uint32_t part_count, } } - const struct key_part *end = part + part_count; + struct key_part *end = part + part_count; int rc; for (; part < end; ++part, mp_next(&key_a), mp_next(&key_b)) { if (! is_nullable) { @@ -699,8 +735,7 @@ key_compare_parts(const char *key_a, const char *key_b, uint32_t part_count, template<bool is_nullable, bool has_optional_parts> static inline int tuple_compare_with_key_sequential(const struct tuple *tuple, const char *key, - uint32_t part_count, - const struct key_def *key_def) + uint32_t part_count, struct key_def *key_def) { assert(!has_optional_parts || is_nullable); assert(key_def_is_sequential(key_def)); @@ -739,8 +774,7 @@ tuple_compare_with_key_sequential(const struct tuple *tuple, const char *key, } int -key_compare(const char *key_a, const char *key_b, - const struct key_def *key_def) +key_compare(const char *key_a, const char *key_b, struct key_def *key_def) { uint32_t part_count_a = mp_decode_array(&key_a); uint32_t part_count_b = mp_decode_array(&key_b); @@ -760,8 +794,7 @@ key_compare(const char *key_a, const char *key_b, template <bool is_nullable, bool has_optional_parts> static int tuple_compare_sequential(const struct tuple *tuple_a, - const struct tuple *tuple_b, - const struct key_def *key_def) + const struct tuple *tuple_b, key_def *key_def) { assert(!has_optional_parts || is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -778,8 +811,8 @@ tuple_compare_sequential(const struct tuple *tuple_a, key_def->part_count, key_def); } bool was_null_met = false; - const struct key_part *part = key_def->parts; - const struct key_part *end = part + key_def->unique_part_count; + struct key_part *part = key_def->parts; + struct key_part *end = part + key_def->unique_part_count; int rc; uint32_t i = 0; for (; part < end; ++part, ++i) { @@ -944,7 +977,7 @@ struct TupleCompare { static int compare(const struct tuple *tuple_a, const struct tuple *tuple_b, - const struct key_def *) + struct key_def *) { struct tuple_format *format_a = tuple_format(tuple_a); struct tuple_format *format_b = tuple_format(tuple_b); @@ -963,7 +996,7 @@ template <int TYPE, int ...MORE_TYPES> struct TupleCompare<0, TYPE, MORE_TYPES...> { static int compare(const struct tuple *tuple_a, const struct tuple *tuple_b, - const struct key_def *) + struct key_def *) { struct tuple_format *format_a = tuple_format(tuple_a); struct tuple_format *format_b = tuple_format(tuple_b); @@ -1016,9 +1049,9 @@ tuple_compare_create(const struct key_def *def) else return tuple_compare_sequential<true, false>; } else if (def->has_optional_parts) { - return tuple_compare_slowpath<true, true>; + return tuple_compare_slowpath<true, true, false>; } else { - return tuple_compare_slowpath<true, false>; + return tuple_compare_slowpath<true, false, false>; } } assert(! def->has_optional_parts); @@ -1041,7 +1074,7 @@ tuple_compare_create(const struct key_def *def) if (key_def_is_sequential(def)) return tuple_compare_sequential<false, false>; else - return tuple_compare_slowpath<false, false>; + return tuple_compare_slowpath<false, false, false>; } /* }}} tuple_compare */ @@ -1115,7 +1148,7 @@ struct FieldCompareWithKey<FLD_ID, IDX, TYPE, IDX2, TYPE2, MORE_TYPES...> { inline static int compare(const struct tuple *tuple, const char *key, - uint32_t part_count, const struct key_def *key_def, + uint32_t part_count, struct key_def *key_def, const struct tuple_format *format, const char *field) { int r; @@ -1141,11 +1174,11 @@ struct FieldCompareWithKey<FLD_ID, IDX, TYPE, IDX2, TYPE2, MORE_TYPES...> template <int FLD_ID, int IDX, int TYPE> struct FieldCompareWithKey<FLD_ID, IDX, TYPE> { inline static int compare(const struct tuple *, - const char *key, - uint32_t, - const struct key_def *, - const struct tuple_format *, - const char *field) + const char *key, + uint32_t, + struct key_def *, + const struct tuple_format *, + const char *field) { return field_compare_with_key<TYPE>(&field, &key); } @@ -1159,7 +1192,7 @@ struct TupleCompareWithKey { static int compare(const struct tuple *tuple, const char *key, - uint32_t part_count, const struct key_def *key_def) + uint32_t part_count, struct key_def *key_def) { /* Part count can be 0 in wildcard searches. */ if (part_count == 0) @@ -1180,7 +1213,7 @@ struct TupleCompareWithKey<0, 0, TYPE, MORE_TYPES...> static int compare(const struct tuple *tuple, const char *key, uint32_t part_count, - const struct key_def *key_def) + struct key_def *key_def) { /* Part count can be 0 in wildcard searches. */ if (part_count == 0) @@ -1236,9 +1269,9 @@ tuple_compare_with_key_create(const struct key_def *def) false>; } } else if (def->has_optional_parts) { - return tuple_compare_with_key_slowpath<true, true>; + return tuple_compare_with_key_slowpath<true, true, false>; } else { - return tuple_compare_with_key_slowpath<true, false>; + return tuple_compare_with_key_slowpath<true, false, false>; } } assert(! def->has_optional_parts); @@ -1264,7 +1297,7 @@ tuple_compare_with_key_create(const struct key_def *def) if (key_def_is_sequential(def)) return tuple_compare_with_key_sequential<false, false>; else - return tuple_compare_with_key_slowpath<false, false>; + return tuple_compare_with_key_slowpath<false, false, false>; } /* }}} tuple_compare_with_key */ diff --git a/src/box/tuple_compare.h b/src/box/tuple_compare.h index 2a9875a..e3a6320 100644 --- a/src/box/tuple_compare.h +++ b/src/box/tuple_compare.h @@ -49,9 +49,8 @@ extern "C" { * @return number of key parts the two tuples have in common */ uint32_t -tuple_common_key_parts(const struct tuple *tuple_a, - const struct tuple *tuple_b, - const struct key_def *key_def); +tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b, + struct key_def *key_def); /** * Create a comparison function for the key_def diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 880abb6..6b771f3 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -22,8 +22,7 @@ key_def_contains_sequential_parts(const struct key_def *def) template <bool has_optional_parts> static char * tuple_extract_key_sequential_raw(const char *data, const char *data_end, - const struct key_def *key_def, - uint32_t *key_size) + struct key_def *key_def, uint32_t *key_size) { assert(!has_optional_parts || key_def->is_nullable); assert(key_def_is_sequential(key_def)); @@ -72,8 +71,7 @@ tuple_extract_key_sequential_raw(const char *data, const char *data_end, */ template <bool has_optional_parts> static inline char * -tuple_extract_key_sequential(const struct tuple *tuple, - const struct key_def *key_def, +tuple_extract_key_sequential(const struct tuple *tuple, struct key_def *key_def, uint32_t *key_size) { assert(key_def_is_sequential(key_def)); @@ -91,10 +89,11 @@ tuple_extract_key_sequential(const struct tuple *tuple, * General-purpose implementation of tuple_extract_key() * @copydoc tuple_extract_key() */ -template <bool contains_sequential_parts, bool has_optional_parts> +template <bool contains_sequential_parts, bool has_optional_parts, + bool has_json_paths> static char * tuple_extract_key_slowpath(const struct tuple *tuple, - const struct key_def *key_def, uint32_t *key_size) + struct key_def *key_def, uint32_t *key_size) { assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -110,9 +109,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple, /* Calculate the key size. */ for (uint32_t i = 0; i < part_count; ++i) { - const char *field = - tuple_field_raw(format, data, field_map, - key_def->parts[i].fieldno); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, data, field_map, + key_def->parts[i].fieldno); + } else { + field = tuple_field_by_part_raw(format, data, field_map, + &key_def->parts[i]); + } if (has_optional_parts && field == NULL) { bsize += mp_sizeof_nil(); continue; @@ -152,9 +156,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple, } char *key_buf = mp_encode_array(key, part_count); for (uint32_t i = 0; i < part_count; ++i) { - const char *field = - tuple_field_raw(format, data, field_map, - key_def->parts[i].fieldno); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, data, field_map, + key_def->parts[i].fieldno); + } else { + field = tuple_field_by_part_raw(format, data, field_map, + &key_def->parts[i]); + } if (has_optional_parts && field == NULL) { key_buf = mp_encode_nil(key_buf); continue; @@ -201,11 +210,10 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * General-purpose version of tuple_extract_key_raw() * @copydoc tuple_extract_key_raw() */ -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> static char * tuple_extract_key_slowpath_raw(const char *data, const char *data_end, - const struct key_def *key_def, - uint32_t *key_size) + struct key_def *key_def, uint32_t *key_size) { assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -318,18 +326,21 @@ tuple_extract_key_set(struct key_def *key_def) assert(key_def->is_nullable); if (key_def_contains_sequential_parts(key_def)) { key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, true>; + tuple_extract_key_slowpath<true, true, + false>; } else { key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, true>; + tuple_extract_key_slowpath<false, true, + false>; } } else { if (key_def_contains_sequential_parts(key_def)) { key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, false>; + tuple_extract_key_slowpath<true, false, + false>; } else { key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, + tuple_extract_key_slowpath<false, false, false>; } } @@ -337,9 +348,9 @@ tuple_extract_key_set(struct key_def *key_def) if (key_def->has_optional_parts) { assert(key_def->is_nullable); key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true>; + tuple_extract_key_slowpath_raw<true, false>; } else { key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false>; + tuple_extract_key_slowpath_raw<false, false>; } } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index b385c0d..6ae96e2 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -232,6 +232,11 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->dict = dict; tuple_dictionary_ref(dict); } + /* + * Set invalid epoch that should be changed later on + * attaching to space. + */ + format->epoch = 0; format->refs = 0; format->id = FORMAT_ID_NIL; format->field_count = field_count; @@ -541,6 +546,13 @@ tuple_field_go_to_key(const char **field, const char *key, int len) return -1; } +const char * +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) +{ + return tuple_field_raw(format, data, field_map, part->fieldno); +} + int tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *path, diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index c7dc48f..9406d5b 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -115,6 +115,12 @@ struct tuple_field { * Tuple format describes how tuple is stored and information about its fields */ struct tuple_format { + /** + * Counter that grows incrementally on space rebuild if + * format has other distribution of offset slots comparing + * with previous one. + */ + uint64_t epoch; /** Virtual function table */ struct tuple_format_vtab vtab; /** Pointer to engine-specific data. */ @@ -324,6 +330,18 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, const char *tuple); /** + * Get a field refereed by index @part in tuple. + * @param format Tuple format. + * @param tuple A pointer to MessagePack array. + * @param field_map A pointer to the LAST element of field map. + * @param part Index part to use. + * @retval Field data if the field exists or NULL. + */ +const char * +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part); + +/** * Get a field at the specific position in this MessagePack array. * Returns a pointer to MessagePack data. * @param format tuple format diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index dee9be3..01a0983 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -103,7 +103,7 @@ struct KeyFieldHash<TYPE> { template <int TYPE, int ...MORE_TYPES> struct KeyHash { - static uint32_t hash(const char *key, const struct key_def *) + static uint32_t hash(const char *key, struct key_def *) { uint32_t h = HASH_SEED; uint32_t carry = 0; @@ -116,7 +116,7 @@ struct KeyHash { template <> struct KeyHash<FIELD_TYPE_UNSIGNED> { - static uint32_t hash(const char *key, const struct key_def *key_def) + static uint32_t hash(const char *key, struct key_def *key_def) { uint64_t val = mp_decode_uint(&key); (void) key_def; @@ -152,12 +152,13 @@ template <int TYPE, int ...MORE_TYPES> struct TupleHash { static uint32_t hash(const struct tuple *tuple, - const struct key_def *key_def) + struct key_def *key_def) { uint32_t h = HASH_SEED; uint32_t carry = 0; uint32_t total_size = 0; - const char *field = tuple_field(tuple, key_def->parts->fieldno); + const char *field = + tuple_field_by_part(tuple, key_def->parts); TupleFieldHash<TYPE, MORE_TYPES...>:: hash(&field, &h, &carry, &total_size); return PMurHash32_Result(h, carry, total_size); @@ -167,9 +168,10 @@ struct TupleHash template <> struct TupleHash<FIELD_TYPE_UNSIGNED> { static uint32_t hash(const struct tuple *tuple, - const struct key_def *key_def) + struct key_def *key_def) { - const char *field = tuple_field(tuple, key_def->parts->fieldno); + const char *field = + tuple_field_by_part(tuple, key_def->parts); uint64_t val = mp_decode_uint(&field); if (likely(val <= UINT32_MAX)) return val; @@ -211,12 +213,12 @@ static const hasher_signature hash_arr[] = { #undef HASHER -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> uint32_t -tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def); +tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def); uint32_t -key_hash_slowpath(const char *key, const struct key_def *key_def); +key_hash_slowpath(const char *key, struct key_def *key_def); void tuple_hash_func_set(struct key_def *key_def) { @@ -255,9 +257,9 @@ tuple_hash_func_set(struct key_def *key_def) { slowpath: if (key_def->has_optional_parts) - key_def->tuple_hash = tuple_hash_slowpath<true>; + key_def->tuple_hash = tuple_hash_slowpath<true, false>; else - key_def->tuple_hash = tuple_hash_slowpath<false>; + key_def->tuple_hash = tuple_hash_slowpath<false, false>; key_def->key_hash = key_hash_slowpath; } @@ -308,26 +310,34 @@ tuple_hash_null(uint32_t *ph1, uint32_t *pcarry) } uint32_t -tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, - const struct tuple *tuple, - const struct key_part *part) +tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple, + struct key_part *part) { - const char *field = tuple_field(tuple, part->fieldno); + const char *field = tuple_field_by_part(tuple, part); if (field == NULL) return tuple_hash_null(ph1, pcarry); return tuple_hash_field(ph1, pcarry, &field, part->coll); } -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> uint32_t -tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) +tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def) { assert(has_optional_parts == key_def->has_optional_parts); uint32_t h = HASH_SEED; uint32_t carry = 0; uint32_t total_size = 0; uint32_t prev_fieldno = key_def->parts[0].fieldno; - const char *field = tuple_field(tuple, key_def->parts[0].fieldno); + struct tuple_format *format = tuple_format(tuple); + const char *tuple_raw = tuple_data(tuple); + const uint32_t *field_map = tuple_field_map(tuple); + const char *field; + if (!has_json_paths) { + field = tuple_field(tuple, prev_fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, field_map, + key_def->parts); + } const char *end = (char *)tuple + tuple_size(tuple); if (has_optional_parts && field == NULL) { total_size += tuple_hash_null(&h, &carry); @@ -341,7 +351,18 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) * need of tuple_field */ if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) { - field = tuple_field(tuple, key_def->parts[part_id].fieldno); + if (!has_json_paths) { + field = tuple_field(tuple, + key_def->parts[part_id]. + fieldno); + } else { + struct key_part *part = + &key_def->parts[part_id]; + field = tuple_field_by_part_raw(format, + tuple_raw, + field_map, + part); + } } if (has_optional_parts && (field == NULL || field >= end)) { total_size += tuple_hash_null(&h, &carry); @@ -357,13 +378,13 @@ tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) } uint32_t -key_hash_slowpath(const char *key, const struct key_def *key_def) +key_hash_slowpath(const char *key, struct key_def *key_def) { uint32_t h = HASH_SEED; uint32_t carry = 0; uint32_t total_size = 0; - for (const struct key_part *part = key_def->parts; + for (struct key_part *part = key_def->parts; part < key_def->parts + key_def->part_count; part++) { total_size += tuple_hash_field(&h, &carry, &key, part->coll); } diff --git a/src/box/tuple_hash.h b/src/box/tuple_hash.h index aab8f54..abc961b 100644 --- a/src/box/tuple_hash.h +++ b/src/box/tuple_hash.h @@ -70,9 +70,8 @@ tuple_hash_field(uint32_t *ph1, uint32_t *pcarry, const char **field, * This function updates @ph1 and @pcarry. */ uint32_t -tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, - const struct tuple *tuple, - const struct key_part *part); +tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple, + struct key_part *part); /** * Calculates a common hash value for a tuple @@ -81,7 +80,7 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, * @return - hash value */ static inline uint32_t -tuple_hash(const struct tuple *tuple, const struct key_def *key_def) +tuple_hash(const struct tuple *tuple, struct key_def *key_def) { return key_def->tuple_hash(tuple, key_def); } @@ -93,7 +92,7 @@ tuple_hash(const struct tuple *tuple, const struct key_def *key_def) * @return - hash value */ static inline uint32_t -key_hash(const char *key, const struct key_def *key_def) +key_hash(const char *key, struct key_def *key_def) { return key_def->key_hash(key, key_def); } diff --git a/src/box/vinyl.c b/src/box/vinyl.c index 0b33b6f..86a33ec 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -557,7 +557,7 @@ vinyl_engine_check_space_def(struct space_def *def) static struct space * vinyl_engine_create_space(struct engine *engine, struct space_def *def, - struct rlist *key_list) + struct rlist *key_list, uint64_t epoch) { struct space *space = malloc(sizeof(*space)); if (space == NULL) { @@ -589,6 +589,7 @@ vinyl_engine_create_space(struct engine *engine, struct space_def *def, return NULL; } format->exact_field_count = def->exact_field_count; + format->epoch = ++epoch; tuple_format_ref(format); if (space_create(space, engine, &vinyl_space_vtab, diff --git a/src/box/vy_history.c b/src/box/vy_history.c index 498da97..0f3b711 100644 --- a/src/box/vy_history.c +++ b/src/box/vy_history.c @@ -73,7 +73,7 @@ vy_history_cleanup(struct vy_history *history) } int -vy_history_apply(struct vy_history *history, const struct key_def *cmp_def, +vy_history_apply(struct vy_history *history, struct key_def *cmp_def, struct tuple_format *format, bool keep_delete, int *upserts_applied, struct tuple **ret) { diff --git a/src/box/vy_history.h b/src/box/vy_history.h index 1f8bb59..e3c5a19 100644 --- a/src/box/vy_history.h +++ b/src/box/vy_history.h @@ -154,7 +154,7 @@ vy_history_cleanup(struct vy_history *history); * will return NULL unless @keep_delete flag is set. */ int -vy_history_apply(struct vy_history *history, const struct key_def *cmp_def, +vy_history_apply(struct vy_history *history, struct key_def *cmp_def, struct tuple_format *format, bool keep_delete, int *upserts_applied, struct tuple **ret); diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c index cb3c436..8fa86d3 100644 --- a/src/box/vy_lsm.c +++ b/src/box/vy_lsm.c @@ -158,6 +158,7 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env, NULL); if (lsm->disk_format == NULL) goto fail_format; + lsm->disk_format->epoch = format->epoch; } tuple_format_ref(lsm->disk_format); @@ -166,6 +167,7 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env, vy_tuple_format_new_with_colmask(format); if (lsm->mem_format_with_colmask == NULL) goto fail_mem_format_with_colmask; + lsm->mem_format_with_colmask->epoch = format->epoch; } else { lsm->mem_format_with_colmask = pk->mem_format_with_colmask; } diff --git a/src/box/vy_mem.c b/src/box/vy_mem.c index f9be850..ccd079f 100644 --- a/src/box/vy_mem.c +++ b/src/box/vy_mem.c @@ -97,7 +97,7 @@ vy_mem_tree_extent_free(void *ctx, void *p) struct vy_mem * vy_mem_new(struct vy_mem_env *env, int64_t generation, - const struct key_def *cmp_def, struct tuple_format *format, + struct key_def *cmp_def, struct tuple_format *format, struct tuple_format *format_with_colmask, uint32_t space_cache_version) { @@ -321,7 +321,7 @@ vy_mem_iterator_find_lsn(struct vy_mem_iterator *itr, { assert(!vy_mem_tree_iterator_is_invalid(&itr->curr_pos)); assert(itr->curr_stmt == vy_mem_iterator_curr_stmt(itr)); - const struct key_def *cmp_def = itr->mem->cmp_def; + struct key_def *cmp_def = itr->mem->cmp_def; while (vy_stmt_lsn(itr->curr_stmt) > (**itr->read_view).vlsn || vy_stmt_flags(itr->curr_stmt) & VY_STMT_SKIP_READ) { if (vy_mem_iterator_step(itr, iterator_type) != 0 || @@ -461,7 +461,7 @@ vy_mem_iterator_next_key(struct vy_mem_iterator *itr) assert(itr->mem->version == itr->version); assert(!vy_mem_tree_iterator_is_invalid(&itr->curr_pos)); assert(itr->curr_stmt == vy_mem_iterator_curr_stmt(itr)); - const struct key_def *cmp_def = itr->mem->cmp_def; + struct key_def *cmp_def = itr->mem->cmp_def; const struct tuple *prev_stmt = itr->curr_stmt; do { @@ -493,7 +493,7 @@ vy_mem_iterator_next_lsn(struct vy_mem_iterator *itr) assert(itr->mem->version == itr->version); assert(!vy_mem_tree_iterator_is_invalid(&itr->curr_pos)); assert(itr->curr_stmt == vy_mem_iterator_curr_stmt(itr)); - const struct key_def *cmp_def = itr->mem->cmp_def; + struct key_def *cmp_def = itr->mem->cmp_def; struct vy_mem_tree_iterator next_pos = itr->curr_pos; next: diff --git a/src/box/vy_mem.h b/src/box/vy_mem.h index 29b60ac..d6afeed 100644 --- a/src/box/vy_mem.h +++ b/src/box/vy_mem.h @@ -88,7 +88,7 @@ struct tree_mem_key { */ static int vy_mem_tree_cmp(const struct tuple *a, const struct tuple *b, - const struct key_def *cmp_def) + struct key_def *cmp_def) { int res = vy_tuple_compare(a, b, cmp_def); if (res) @@ -102,7 +102,7 @@ vy_mem_tree_cmp(const struct tuple *a, const struct tuple *b, */ static int vy_mem_tree_cmp_key(const struct tuple *a, struct tree_mem_key *key, - const struct key_def *cmp_def) + struct key_def *cmp_def) { int res = vy_stmt_compare(a, key->stmt, cmp_def); if (res == 0) { @@ -123,7 +123,7 @@ vy_mem_tree_cmp_key(const struct tuple *a, struct tree_mem_key *key, #define BPS_TREE_COMPARE_KEY(a, b, cmp_def) vy_mem_tree_cmp_key(a, b, cmp_def) #define bps_tree_elem_t const struct tuple * #define bps_tree_key_t struct tree_mem_key * -#define bps_tree_arg_t const struct key_def * +#define bps_tree_arg_t struct key_def * #define BPS_TREE_NO_DEBUG #include <salad/bps_tree.h> @@ -183,7 +183,7 @@ struct vy_mem { * Key definition for this index, extended with primary * key parts. */ - const struct key_def *cmp_def; + struct key_def *cmp_def; /** version is initially 0 and is incremented on every write */ uint32_t version; /** Data dictionary cache version at the time of creation. */ @@ -265,7 +265,7 @@ vy_mem_wait_pinned(struct vy_mem *mem) */ struct vy_mem * vy_mem_new(struct vy_mem_env *env, int64_t generation, - const struct key_def *cmp_def, struct tuple_format *format, + struct key_def *cmp_def, struct tuple_format *format, struct tuple_format *format_with_colmask, uint32_t space_cache_version); diff --git a/src/box/vy_range.c b/src/box/vy_range.c index 6a55a01..6b6d23c 100644 --- a/src/box/vy_range.c +++ b/src/box/vy_range.c @@ -178,7 +178,7 @@ vy_range_tree_find_by_key(vy_range_tree_t *tree, struct vy_range * vy_range_new(int64_t id, struct tuple *begin, struct tuple *end, - const struct key_def *cmp_def) + struct key_def *cmp_def) { struct vy_range *range = calloc(1, sizeof(*range)); if (range == NULL) { diff --git a/src/box/vy_range.h b/src/box/vy_range.h index d7031e7..2f67840 100644 --- a/src/box/vy_range.h +++ b/src/box/vy_range.h @@ -72,7 +72,7 @@ struct vy_range { * keys, to ensure an always distinct result for * non-unique keys. */ - const struct key_def *cmp_def; + struct key_def *cmp_def; /** An estimate of the number of statements in this range. */ struct vy_disk_stmt_counter count; /** @@ -194,7 +194,7 @@ vy_range_tree_find_by_key(vy_range_tree_t *tree, */ struct vy_range * vy_range_new(int64_t id, struct tuple *begin, struct tuple *end, - const struct key_def *cmp_def); + struct key_def *cmp_def); /** * Free a range and all its slices. diff --git a/src/box/vy_run.c b/src/box/vy_run.c index f107e3a..7485d97 100644 --- a/src/box/vy_run.c +++ b/src/box/vy_run.c @@ -302,8 +302,8 @@ vy_run_bloom_size(struct vy_run *run) */ static uint32_t vy_page_index_find_page(struct vy_run *run, const struct tuple *key, - const struct key_def *cmp_def, - enum iterator_type itype, bool *equal_key) + struct key_def *cmp_def, enum iterator_type itype, + bool *equal_key) { if (itype == ITER_EQ) itype = ITER_GE; /* One day it'll become obsolete */ @@ -365,9 +365,8 @@ vy_page_index_find_page(struct vy_run *run, const struct tuple *key, } struct vy_slice * -vy_slice_new(int64_t id, struct vy_run *run, - struct tuple *begin, struct tuple *end, - const struct key_def *cmp_def) +vy_slice_new(int64_t id, struct vy_run *run, struct tuple *begin, + struct tuple *end, struct key_def *cmp_def) { struct vy_slice *slice = malloc(sizeof(*slice)); if (slice == NULL) { @@ -446,9 +445,8 @@ vy_slice_delete(struct vy_slice *slice) } int -vy_slice_cut(struct vy_slice *slice, int64_t id, - struct tuple *begin, struct tuple *end, - const struct key_def *cmp_def, +vy_slice_cut(struct vy_slice *slice, int64_t id, struct tuple *begin, + struct tuple *end, struct key_def *cmp_def, struct vy_slice **result) { *result = NULL; @@ -1148,7 +1146,7 @@ vy_run_iterator_find_lsn(struct vy_run_iterator *itr, const struct tuple *key, struct tuple **ret) { struct vy_slice *slice = itr->slice; - const struct key_def *cmp_def = itr->cmp_def; + struct key_def *cmp_def = itr->cmp_def; *ret = NULL; @@ -1228,7 +1226,7 @@ vy_run_iterator_do_seek(struct vy_run_iterator *itr, *ret = NULL; struct tuple_bloom *bloom = run->info.bloom; - const struct key_def *key_def = itr->key_def; + struct key_def *key_def = itr->key_def; if (iterator_type == ITER_EQ && bloom != NULL) { bool need_lookup; if (vy_stmt_type(key) == IPROTO_SELECT) { @@ -1318,7 +1316,7 @@ vy_run_iterator_seek(struct vy_run_iterator *itr, enum iterator_type iterator_type, const struct tuple *key, struct tuple **ret) { - const struct key_def *cmp_def = itr->cmp_def; + struct key_def *cmp_def = itr->cmp_def; struct vy_slice *slice = itr->slice; const struct tuple *check_eq_key = NULL; int cmp; @@ -1392,8 +1390,7 @@ vy_run_iterator_open(struct vy_run_iterator *itr, struct vy_run_iterator_stat *stat, struct vy_slice *slice, enum iterator_type iterator_type, const struct tuple *key, const struct vy_read_view **rv, - const struct key_def *cmp_def, - const struct key_def *key_def, + struct key_def *cmp_def, struct key_def *key_def, struct tuple_format *format, bool is_primary) { @@ -1729,7 +1726,7 @@ fail: /* dump statement to the run page buffers (stmt header and data) */ static int vy_run_dump_stmt(const struct tuple *value, struct xlog *data_xlog, - struct vy_page_info *info, const struct key_def *key_def, + struct vy_page_info *info, struct key_def *key_def, bool is_primary) { struct xrow_header xrow; @@ -2019,9 +2016,9 @@ fail: int vy_run_writer_create(struct vy_run_writer *writer, struct vy_run *run, - const char *dirpath, uint32_t space_id, uint32_t iid, - const struct key_def *cmp_def, const struct key_def *key_def, - uint64_t page_size, double bloom_fpr) + const char *dirpath, uint32_t space_id, uint32_t iid, + struct key_def *cmp_def, struct key_def *key_def, + uint64_t page_size, double bloom_fpr) { memset(writer, 0, sizeof(*writer)); writer->run = run; @@ -2285,10 +2282,8 @@ vy_run_writer_abort(struct vy_run_writer *writer) int vy_run_rebuild_index(struct vy_run *run, const char *dir, uint32_t space_id, uint32_t iid, - const struct key_def *cmp_def, - const struct key_def *key_def, - struct tuple_format *format, - const struct index_opts *opts) + struct key_def *cmp_def, struct key_def *key_def, + struct tuple_format *format, const struct index_opts *opts) { assert(run->info.bloom == NULL); assert(run->page_info == NULL); @@ -2628,7 +2623,7 @@ static const struct vy_stmt_stream_iface vy_slice_stream_iface = { void vy_slice_stream_open(struct vy_slice_stream *stream, struct vy_slice *slice, - const struct key_def *cmp_def, struct tuple_format *format, + struct key_def *cmp_def, struct tuple_format *format, bool is_primary) { stream->base.iface = &vy_slice_stream_iface; diff --git a/src/box/vy_run.h b/src/box/vy_run.h index 5030886..d74f216 100644 --- a/src/box/vy_run.h +++ b/src/box/vy_run.h @@ -218,9 +218,9 @@ struct vy_run_iterator { /* Members needed for memory allocation and disk access */ /** Key definition used for comparing statements on disk. */ - const struct key_def *cmp_def; + struct key_def *cmp_def; /** Key definition provided by the user. */ - const struct key_def *key_def; + struct key_def *key_def; /** * Format ot allocate REPLACE and DELETE tuples read from * pages. @@ -370,8 +370,7 @@ vy_run_recover(struct vy_run *run, const char *dir, int vy_run_rebuild_index(struct vy_run *run, const char *dir, uint32_t space_id, uint32_t iid, - const struct key_def *cmp_def, - const struct key_def *key_def, + struct key_def *cmp_def, struct key_def *key_def, struct tuple_format *format, const struct index_opts *opts); @@ -428,9 +427,8 @@ vy_run_remove_files(const char *dir, uint32_t space_id, * This function increments @run->refs. */ struct vy_slice * -vy_slice_new(int64_t id, struct vy_run *run, - struct tuple *begin, struct tuple *end, - const struct key_def *cmp_def); +vy_slice_new(int64_t id, struct vy_run *run, struct tuple *begin, + struct tuple *end, struct key_def *cmp_def); /** * Free a run slice. @@ -480,9 +478,8 @@ vy_slice_wait_pinned(struct vy_slice *slice) * with [@begin, @end), @result is set to NULL. */ int -vy_slice_cut(struct vy_slice *slice, int64_t id, - struct tuple *begin, struct tuple *end, - const struct key_def *cmp_def, +vy_slice_cut(struct vy_slice *slice, int64_t id, struct tuple *begin, + struct tuple *end, struct key_def *cmp_def, struct vy_slice **result); /** @@ -496,8 +493,7 @@ vy_run_iterator_open(struct vy_run_iterator *itr, struct vy_run_iterator_stat *stat, struct vy_slice *slice, enum iterator_type iterator_type, const struct tuple *key, const struct vy_read_view **rv, - const struct key_def *cmp_def, - const struct key_def *key_def, + struct key_def *cmp_def, struct key_def *key_def, struct tuple_format *format, bool is_primary); /** @@ -547,7 +543,7 @@ struct vy_slice_stream { * Key def for comparing with slice boundaries, * includes secondary key parts. */ - const struct key_def *cmp_def; + struct key_def *cmp_def; /** Format for allocating REPLACE and DELETE tuples read from pages. */ struct tuple_format *format; /** Set if this iterator is for a primary index. */ @@ -559,7 +555,7 @@ struct vy_slice_stream { */ void vy_slice_stream_open(struct vy_slice_stream *stream, struct vy_slice *slice, - const struct key_def *cmp_def, struct tuple_format *format, + struct key_def *cmp_def, struct tuple_format *format, bool is_primary); /** @@ -580,9 +576,9 @@ struct vy_run_writer { * min key, run min/max keys, and secondary index * statements. */ - const struct key_def *cmp_def; + struct key_def *cmp_def; /** Key definition to calculate bloom. */ - const struct key_def *key_def; + struct key_def *key_def; /** * Minimal page size. When a page becames bigger, it is * dumped. @@ -610,9 +606,9 @@ struct vy_run_writer { /** Create a run writer to fill a run with statements. */ int vy_run_writer_create(struct vy_run_writer *writer, struct vy_run *run, - const char *dirpath, uint32_t space_id, uint32_t iid, - const struct key_def *cmp_def, const struct key_def *key_def, - uint64_t page_size, double bloom_fpr); + const char *dirpath, uint32_t space_id, uint32_t iid, + struct key_def *cmp_def, struct key_def *key_def, + uint64_t page_size, double bloom_fpr); /** * Write a specified statement into a run. diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 37da282..8018dee 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -387,8 +387,7 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, } struct tuple * -vy_stmt_new_surrogate_delete_from_key(const char *key, - const struct key_def *cmp_def, +vy_stmt_new_surrogate_delete_from_key(const char *key, struct key_def *cmp_def, struct tuple_format *format) { return vy_stmt_new_surrogate_from_key(key, IPROTO_DELETE, @@ -457,7 +456,7 @@ vy_stmt_new_surrogate_delete_raw(struct tuple_format *format, } struct tuple * -vy_stmt_extract_key(const struct tuple *stmt, const struct key_def *key_def, +vy_stmt_extract_key(const struct tuple *stmt, struct key_def *key_def, struct tuple_format *format) { struct region *region = &fiber()->gc; @@ -475,7 +474,7 @@ vy_stmt_extract_key(const struct tuple *stmt, const struct key_def *key_def, struct tuple * vy_stmt_extract_key_raw(const char *data, const char *data_end, - const struct key_def *key_def, + struct key_def *key_def, struct tuple_format *format) { struct region *region = &fiber()->gc; @@ -543,9 +542,8 @@ vy_stmt_meta_decode(struct request *request, struct tuple *stmt) } int -vy_stmt_encode_primary(const struct tuple *value, - const struct key_def *key_def, uint32_t space_id, - struct xrow_header *xrow) +vy_stmt_encode_primary(const struct tuple *value, struct key_def *key_def, + uint32_t space_id, struct xrow_header *xrow) { memset(xrow, 0, sizeof(*xrow)); enum iproto_type type = vy_stmt_type(value); @@ -591,8 +589,7 @@ vy_stmt_encode_primary(const struct tuple *value, } int -vy_stmt_encode_secondary(const struct tuple *value, - const struct key_def *cmp_def, +vy_stmt_encode_secondary(const struct tuple *value, struct key_def *cmp_def, struct xrow_header *xrow) { memset(xrow, 0, sizeof(*xrow)); diff --git a/src/box/vy_stmt.h b/src/box/vy_stmt.h index 273d5e8..b52b4e2 100644 --- a/src/box/vy_stmt.h +++ b/src/box/vy_stmt.h @@ -331,7 +331,7 @@ vy_stmt_unref_if_possible(struct tuple *stmt) */ static inline int vy_key_compare(const struct tuple *a, const struct tuple *b, - const struct key_def *cmp_def) + struct key_def *cmp_def) { assert(vy_stmt_type(a) == IPROTO_SELECT); assert(vy_stmt_type(b) == IPROTO_SELECT); @@ -352,7 +352,7 @@ vy_key_compare(const struct tuple *a, const struct tuple *b, */ static inline int vy_tuple_compare(const struct tuple *a, const struct tuple *b, - const struct key_def *cmp_def) + struct key_def *cmp_def) { enum iproto_type type; type = vy_stmt_type(a); @@ -381,7 +381,7 @@ vy_tuple_compare(const struct tuple *a, const struct tuple *b, */ static inline int vy_tuple_compare_with_raw_key(const struct tuple *tuple, const char *key, - const struct key_def *key_def) + struct key_def *key_def) { uint32_t part_count = mp_decode_array(&key); return tuple_compare_with_key(tuple, key, part_count, key_def); @@ -390,7 +390,7 @@ vy_tuple_compare_with_raw_key(const struct tuple *tuple, const char *key, /** @sa vy_tuple_compare_with_raw_key(). */ static inline int vy_tuple_compare_with_key(const struct tuple *tuple, const struct tuple *key, - const struct key_def *key_def) + struct key_def *key_def) { const char *key_mp = tuple_data(key); uint32_t part_count = mp_decode_array(&key_mp); @@ -400,7 +400,7 @@ vy_tuple_compare_with_key(const struct tuple *tuple, const struct tuple *key, /** @sa tuple_compare. */ static inline int vy_stmt_compare(const struct tuple *a, const struct tuple *b, - const struct key_def *key_def) + struct key_def *key_def) { bool a_is_tuple = vy_stmt_type(a) != IPROTO_SELECT; bool b_is_tuple = vy_stmt_type(b) != IPROTO_SELECT; @@ -419,7 +419,7 @@ vy_stmt_compare(const struct tuple *a, const struct tuple *b, /** @sa tuple_compare_with_raw_key. */ static inline int vy_stmt_compare_with_raw_key(const struct tuple *stmt, const char *key, - const struct key_def *key_def) + struct key_def *key_def) { if (vy_stmt_type(stmt) != IPROTO_SELECT) return vy_tuple_compare_with_raw_key(stmt, key, key_def); @@ -429,7 +429,7 @@ vy_stmt_compare_with_raw_key(const struct tuple *stmt, const char *key, /** @sa tuple_compare_with_key. */ static inline int vy_stmt_compare_with_key(const struct tuple *stmt, const struct tuple *key, - const struct key_def *key_def) + struct key_def *key_def) { assert(vy_stmt_type(key) == IPROTO_SELECT); return vy_stmt_compare_with_raw_key(stmt, tuple_data(key), key_def); @@ -476,7 +476,7 @@ vy_key_dup(const char *key); */ struct tuple * vy_stmt_new_surrogate_delete_from_key(const char *key, - const struct key_def *cmp_def, + struct key_def *cmp_def, struct tuple_format *format); /** @@ -628,7 +628,7 @@ vy_key_from_msgpack(struct tuple_format *format, const char *key) * malloc(). */ struct tuple * -vy_stmt_extract_key(const struct tuple *stmt, const struct key_def *key_def, +vy_stmt_extract_key(const struct tuple *stmt, struct key_def *key_def, struct tuple_format *format); /** @@ -638,7 +638,7 @@ vy_stmt_extract_key(const struct tuple *stmt, const struct key_def *key_def, */ struct tuple * vy_stmt_extract_key_raw(const char *data, const char *data_end, - const struct key_def *key_def, + struct key_def *key_def, struct tuple_format *format); /** @@ -654,9 +654,8 @@ vy_stmt_extract_key_raw(const char *data, const char *data_end, * @retval -1 if error */ int -vy_stmt_encode_primary(const struct tuple *value, - const struct key_def *key_def, uint32_t space_id, - struct xrow_header *xrow); +vy_stmt_encode_primary(const struct tuple *value, struct key_def *key_def, + uint32_t space_id, struct xrow_header *xrow); /** * Encode vy_stmt for a secondary key as xrow_header @@ -669,8 +668,7 @@ vy_stmt_encode_primary(const struct tuple *value, * @retval -1 if error */ int -vy_stmt_encode_secondary(const struct tuple *value, - const struct key_def *cmp_def, +vy_stmt_encode_secondary(const struct tuple *value, struct key_def *cmp_def, struct xrow_header *xrow); /** @@ -716,10 +714,15 @@ vy_tuple_format_new_with_colmask(struct tuple_format *mem_format); * @retval Does the key contain NULL or not? */ static inline bool -vy_tuple_key_contains_null(const struct tuple *tuple, const struct key_def *def) +vy_tuple_key_contains_null(const struct tuple *tuple, struct key_def *def) { - for (uint32_t i = 0; i < def->part_count; ++i) { - const char *field = tuple_field(tuple, def->parts[i].fieldno); + struct tuple_format *format = tuple_format(tuple); + const char *data = tuple_data(tuple); + const uint32_t *field_map = tuple_field_map(tuple); + for (struct key_part *part = def->parts, *end = part + def->part_count; + part < end; ++part) { + const char *field = + tuple_field_by_part_raw(format, data, field_map, part); if (field == NULL || mp_typeof(*field) == MP_NIL) return true; } diff --git a/src/box/vy_upsert.c b/src/box/vy_upsert.c index 7af58d9..ebea278 100644 --- a/src/box/vy_upsert.c +++ b/src/box/vy_upsert.c @@ -88,7 +88,7 @@ vy_upsert_try_to_squash(struct tuple_format *format, struct region *region, struct tuple * vy_apply_upsert(const struct tuple *new_stmt, const struct tuple *old_stmt, - const struct key_def *cmp_def, struct tuple_format *format, + struct key_def *cmp_def, struct tuple_format *format, bool suppress_error) { /* diff --git a/src/box/vy_upsert.h b/src/box/vy_upsert.h index 7878b1b..5649961 100644 --- a/src/box/vy_upsert.h +++ b/src/box/vy_upsert.h @@ -65,7 +65,7 @@ struct tuple_format; */ struct tuple * vy_apply_upsert(const struct tuple *new_stmt, const struct tuple *old_stmt, - const struct key_def *cmp_def, struct tuple_format *format, + struct key_def *cmp_def, struct tuple_format *format, bool suppress_error); #if defined(__cplusplus) diff --git a/src/box/vy_write_iterator.c b/src/box/vy_write_iterator.c index df9c933..5252a1a 100644 --- a/src/box/vy_write_iterator.c +++ b/src/box/vy_write_iterator.c @@ -166,7 +166,7 @@ struct vy_write_iterator { /* A heap to order the sources, newest LSN at heap top. */ heap_t src_heap; /** Index key definition used to store statements on disk. */ - const struct key_def *cmp_def; + struct key_def *cmp_def; /** Format to allocate new REPLACE and DELETE tuples from vy_run */ struct tuple_format *format; /* There is no LSM tree level older than the one we're writing to. */ @@ -340,9 +340,9 @@ static const struct vy_stmt_stream_iface vy_slice_stream_iface; * @return the iterator or NULL on error (diag is set). */ struct vy_stmt_stream * -vy_write_iterator_new(const struct key_def *cmp_def, - struct tuple_format *format, bool is_primary, - bool is_last_level, struct rlist *read_views, +vy_write_iterator_new(struct key_def *cmp_def, struct tuple_format *format, + bool is_primary, bool is_last_level, + struct rlist *read_views, struct vy_deferred_delete_handler *handler) { /* diff --git a/src/box/vy_write_iterator.h b/src/box/vy_write_iterator.h index 5214b60..ffdc584 100644 --- a/src/box/vy_write_iterator.h +++ b/src/box/vy_write_iterator.h @@ -269,9 +269,9 @@ struct vy_deferred_delete_handler { * @return the iterator or NULL on error (diag is set). */ struct vy_stmt_stream * -vy_write_iterator_new(const struct key_def *cmp_def, - struct tuple_format *format, bool is_primary, - bool is_last_level, struct rlist *read_views, +vy_write_iterator_new(struct key_def *cmp_def, struct tuple_format *format, + bool is_primary, bool is_last_level, + struct rlist *read_views, struct vy_deferred_delete_handler *handler); /** -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 2/4] box: introduce slot_cache in key_part 2018-09-06 12:47 ` Kirill Shcherbatov @ 2018-09-17 17:08 ` Vladimir Davydov 0 siblings, 0 replies; 14+ messages in thread From: Vladimir Davydov @ 2018-09-17 17:08 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: Vladislav Shpilevoy, Tarantool MailList It looks like you plucked some random parts from the main patch just to make it look smaller. Please don't do that, because it doesn't make it any easier to review. Split a patch set so that each individual patch makes sense and can be applied independently. I see at least two separate things you're doing here that could be committed after the first review iteration: 1. Deconstantifying key_def throughout the code. 2. Introducing tuple_field_by_part helper. Please factor out those parts in separate patches and write a brief descriptive comment to each of them (what you're doing in a nutshell and why). Before you do that, I can't review this patch without risking to miss something critical. A few things that I did manage to notice: - You add has_json_path template parameter to comparators. This looks completely out-of-place. I think this change should go along with the patch that implements json path indexes. - Although offset_slot and offset_slot_epoch are added to key_part, they are never used. Looks like this also should go to another patch. May be, it's worth factoring out the 'epoch' optimization to a separate patch that would be applied on top of json indexes - after all, it's just an optimization, right? - The epoch is really easy to miss when creating a tuple format: > @@ -158,6 +158,7 @@ blackhole_engine_create_space(struct engine *engine, struct space_def *def, > return NULL; > } > format->exact_field_count = def->exact_field_count; > + format->epoch = ++epoch; I think it should be passed as an argument to tuple_format_new. Also, IMO it'd be more straightforward to pass the actual epoch that should be used for the new format, not the previous one. - In tuple_common_key_parts the following type cast isn't needed: > + struct key_part *part = (struct key_part *)&key_def->parts[i]; On Thu, Sep 06, 2018 at 03:47:09PM +0300, Kirill Shcherbatov wrote: > From 996ee351112bb070d511636bc702496bc445f047 Mon Sep 17 00:00:00 2001 > Message-Id: <996ee351112bb070d511636bc702496bc445f047.1536237903.git.kshcherbatov@tarantool.org> > In-Reply-To: <cover.1536237903.git.kshcherbatov@tarantool.org> > References: <cover.1536237903.git.kshcherbatov@tarantool.org> > From: Kirill Shcherbatov <kshcherbatov@tarantool.org> > Date: Thu, 9 Aug 2018 15:02:44 +0300 > Subject: [PATCH 2/4] box: introduce slot_cache in key_part > > Same key_part could be used in different formats multiple > times, so different field->offset_slot would be allocated. > In most scenarios we work with series of tuples of same > format, and (in general) format lookup for field would be > expensive operation for JSON-paths defined in key_part. > New slot_cache field in key_part structure and epoch-based > mechanism to validate it actuality should be effective > approach to improve performance. > New routine tuple_field_by_part use tuple and key_part > to access field that allows to rework and speedup all > scenarios of access tuple data by index. > This also allows to work with JSON-path key_parts later. > > Part of #1012. > --- > src/box/alter.cc | 6 +- > src/box/blackhole.c | 3 +- > src/box/engine.h | 11 ++-- > src/box/key_def.c | 16 +++-- > src/box/key_def.h | 38 +++++++----- > src/box/memtx_bitset.c | 5 +- > src/box/memtx_engine.c | 4 +- > src/box/memtx_hash.h | 4 +- > src/box/memtx_rtree.c | 3 +- > src/box/memtx_space.c | 3 +- > src/box/memtx_space.h | 2 +- > src/box/schema.cc | 4 +- > src/box/space.c | 4 +- > src/box/space.h | 8 ++- > src/box/sysview.c | 3 +- > src/box/tuple.h | 14 +++++ > src/box/tuple_bloom.c | 8 +-- > src/box/tuple_bloom.h | 8 +-- > src/box/tuple_compare.cc | 141 ++++++++++++++++++++++++++----------------- > src/box/tuple_compare.h | 5 +- > src/box/tuple_extract_key.cc | 53 +++++++++------- > src/box/tuple_format.c | 12 ++++ > src/box/tuple_format.h | 18 ++++++ > src/box/tuple_hash.cc | 63 ++++++++++++------- > src/box/tuple_hash.h | 9 ++- > src/box/vinyl.c | 3 +- > src/box/vy_history.c | 2 +- > src/box/vy_history.h | 2 +- > src/box/vy_lsm.c | 2 + > src/box/vy_mem.c | 8 +-- > src/box/vy_mem.h | 10 +-- > src/box/vy_range.c | 2 +- > src/box/vy_range.h | 4 +- > src/box/vy_run.c | 39 ++++++------ > src/box/vy_run.h | 34 +++++------ > src/box/vy_stmt.c | 15 ++--- > src/box/vy_stmt.h | 39 ++++++------ > src/box/vy_upsert.c | 2 +- > src/box/vy_upsert.h | 2 +- > src/box/vy_write_iterator.c | 8 +-- > src/box/vy_write_iterator.h | 6 +- > 41 files changed, 374 insertions(+), 249 deletions(-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v3 3/4] box: introduce JSON indexes 2018-08-27 7:37 [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 1/4] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 2/4] box: introduce slot_cache in key_part Kirill Shcherbatov @ 2018-08-27 7:37 ` Kirill Shcherbatov 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 4/4] box: specify indexes in user-friendly form Kirill Shcherbatov 2018-09-17 15:50 ` [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Vladimir Davydov 4 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-08-27 7:37 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov As we need to store user-defined JSON path in key_part and key_part_def, we have introduced path and path_len fields. JSON path is verified and transformed to canonical form on index msgpack unpack. Path string stored as a part of the key_def allocation: +-------+---------+-------+---------+-------+-------+-------+ |key_def|key_part1| ... |key_partN| path1 | pathK | pathN | +-------+---------+-------+---------+-------+-------+-------+ | ^ |-> path _________________| Because of field names specified as format could be changed key_part path persisted in Tarantool should be always started with first-level field access via array index(not by name). To work with JSON-defined indexes we use format JSON path hashtable data_path and a tree of intermediate path fields attached to format root fields. <Hashtable>: format->data_path [2].FIO.fname -> field "fname" {type=str, off_slot=-1} [2].FIO.sname -> field "sname" {type=str, off_slot=-2} <Tree>: format->field[2] {type = map} | FIO {type = map} | "fname" | "sname" {type=str,off_slot=-1} ____|____ {type = str,off_slot=-2} Leaf fields used in Index have initialized offset_slot. On new tuple creation we observe fields tree and use leaf records to init tuple field_map. At the same time we use data_path hashtable on tuple data access by index(when cached offset_slot is invalid). All paths stored at the end of format allocation: JSON-tree fields same as format->path_hash point to them. +------------+------------+-------+------------+-------+ |tuple_format|tuple_field1| ... |tuple_fieldN| pathK | +------------+------------+-------+------------+-------+ New routine tuple_format_add_json_path is used to construct all internal structures for JSON path on new format creation and duplicating. Paths are deduplicated in format allocation. Part of #1012. --- src/box/errcode.h | 2 +- src/box/index_def.c | 10 +- src/box/key_def.c | 294 +++++++++++++++-- src/box/key_def.h | 31 +- src/box/lua/space.cc | 5 + src/box/memtx_engine.c | 5 + src/box/schema.cc | 12 +- src/box/tuple.c | 11 +- src/box/tuple_compare.cc | 56 +++- src/box/tuple_extract_key.cc | 120 ++++--- src/box/tuple_format.c | 765 ++++++++++++++++++++++++++++++++++++++----- src/box/tuple_format.h | 62 +++- src/box/tuple_hash.cc | 18 +- src/box/vinyl.c | 5 + src/box/vy_log.c | 3 +- src/box/vy_lsm.c | 44 +++ src/box/vy_point_lookup.c | 2 - src/box/vy_stmt.c | 124 +++++-- test/box/misc.result | 57 ++-- test/engine/tuple.result | 316 ++++++++++++++++++ test/engine/tuple.test.lua | 89 +++++ 21 files changed, 1791 insertions(+), 240 deletions(-) diff --git a/src/box/errcode.h b/src/box/errcode.h index 4115e6b..464f413 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -107,7 +107,7 @@ struct errcode_record { /* 52 */_(ER_FUNCTION_EXISTS, "Function '%s' already exists") \ /* 53 */_(ER_BEFORE_REPLACE_RET, "Invalid return value of space:before_replace trigger: expected tuple or nil, got %s") \ /* 54 */_(ER_FUNCTION_MAX, "A limit on the total number of functions has been reached: %u") \ - /* 55 */_(ER_UNUSED4, "") \ + /* 55 */_(ER_DATA_STRUCTURE_MISMATCH, "Tuple doesn't math document structure: %s") \ /* 56 */_(ER_USER_MAX, "A limit on the total number of users has been reached: %u") \ /* 57 */_(ER_NO_SUCH_ENGINE, "Space engine '%s' does not exist") \ /* 58 */_(ER_RELOAD_CFG, "Can't set option '%s' dynamically") \ diff --git a/src/box/index_def.c b/src/box/index_def.c index 9cda63c..f67b952 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -209,8 +209,14 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) * Courtesy to a user who could have made * a typo. */ - if (index_def->key_def->parts[i].fieldno == - index_def->key_def->parts[j].fieldno) { + struct key_part *part_a = &index_def->key_def->parts[i]; + struct key_part *part_b = &index_def->key_def->parts[j]; + if ((part_a->fieldno == part_b->fieldno && + part_a->path == NULL && part_b->path == NULL) || + (part_a->path_len != 0 && + part_a->path_len == part_b->path_len && + memcmp(part_a->path, part_b->path, + part_a->path_len) == 0)) { diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, space_name, "same key part is indexed twice"); diff --git a/src/box/key_def.c b/src/box/key_def.c index 440d41e..cf1169e 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -35,12 +35,16 @@ #include "column_mask.h" #include "schema_def.h" #include "coll_id_cache.h" +#include "fiber.h" +#include "assoc.h" +#include "json/path.h" static const struct key_part_def key_part_def_default = { 0, field_type_MAX, COLL_NONE, false, + NULL }; static int64_t @@ -53,6 +57,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len) #define PART_OPT_FIELD "field" #define PART_OPT_COLLATION "collation" #define PART_OPT_NULLABILITY "is_nullable" +#define PART_OPT_PATH "path" const struct opt_def part_def_reg[] = { OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type, @@ -61,6 +66,7 @@ const struct opt_def part_def_reg[] = { OPT_DEF(PART_OPT_COLLATION, OPT_UINT32, struct key_part_def, coll_id), OPT_DEF(PART_OPT_NULLABILITY, OPT_BOOL, struct key_part_def, is_nullable), + OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path), OPT_END, }; @@ -96,13 +102,24 @@ const uint32_t key_mp_type[] = { struct key_def * key_def_dup(const struct key_def *src) { - size_t sz = key_def_sizeof(src->part_count); + const struct key_part *parts = src->parts; + const struct key_part *parts_end = parts + src->part_count; + size_t sz = 0; + for (; parts < parts_end; parts++) + sz += parts->path != NULL ? parts->path_len + 1 : 0; + sz = key_def_sizeof(src->part_count, sz); struct key_def *res = (struct key_def *)malloc(sz); if (res == NULL) { diag_set(OutOfMemory, sz, "malloc", "res"); return NULL; } memcpy(res, src, sz); + for (uint32_t i = 0; i < src->part_count; i++) { + if (src->parts[i].path == NULL) + continue; + size_t path_offset = src->parts[i].path - (char *)src; + res->parts[i].path = (char *)res + path_offset; + } return res; } @@ -110,8 +127,17 @@ void key_def_swap(struct key_def *old_def, struct key_def *new_def) { assert(old_def->part_count == new_def->part_count); - for (uint32_t i = 0; i < new_def->part_count; i++) - SWAP(old_def->parts[i], new_def->parts[i]); + for (uint32_t i = 0; i < new_def->part_count; i++) { + if (old_def->parts[i].path == NULL) { + SWAP(old_def->parts[i], new_def->parts[i]); + } else { + size_t path_offset = + old_def->parts[i].path - (char *)old_def; + SWAP(old_def->parts[i], new_def->parts[i]); + old_def->parts[i].path = (char *)old_def + path_offset; + new_def->parts[i].path = (char *)new_def + path_offset; + } + } SWAP(*old_def, *new_def); } @@ -131,9 +157,9 @@ key_def_set_cmp(struct key_def *def) } struct key_def * -key_def_new(uint32_t part_count) +key_def_new(uint32_t part_count, size_t paths_size) { - size_t sz = key_def_sizeof(part_count); + size_t sz = key_def_sizeof(part_count, paths_size); /** Use calloc() to zero comparator function pointers. */ struct key_def *key_def = (struct key_def *) calloc(1, sz); if (key_def == NULL) { @@ -148,10 +174,13 @@ key_def_new(uint32_t part_count) struct key_def * key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count) { - struct key_def *def = key_def_new(part_count); + size_t sz = 0; + for (uint32_t i = 0; i < part_count; i++) + sz += parts[i].path != NULL ? strlen(parts[i].path) + 1 : 0; + struct key_def *def = key_def_new(part_count, sz); if (def == NULL) return NULL; - + char *data = (char *)def + key_def_sizeof(part_count, 0); for (uint32_t i = 0; i < part_count; i++) { struct key_part_def *part = &parts[i]; struct coll *coll = NULL; @@ -165,14 +194,22 @@ key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count) } coll = coll_id->coll; } + uint32_t path_len = 0; + if (part->path != NULL) { + path_len = strlen(part->path); + def->parts[i].path = data; + data += path_len + 1; + } key_def_set_part(def, i, part->fieldno, part->type, - part->is_nullable, coll, part->coll_id); + part->is_nullable, coll, part->coll_id, + part->path, path_len); } return def; } -void -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts) +int +key_def_dump_parts(struct region *pool, const struct key_def *def, + struct key_part_def *parts) { for (uint32_t i = 0; i < def->part_count; i++) { const struct key_part *part = &def->parts[i]; @@ -181,13 +218,26 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts) part_def->type = part->type; part_def->is_nullable = part->is_nullable; part_def->coll_id = part->coll_id; + if (part->path != NULL) { + part_def->path = region_alloc(pool, part->path_len + 1); + if (part_def->path == NULL) { + diag_set(OutOfMemory, part->path_len + 1, + "region_alloc", "part_def->path"); + return -1; + } + memcpy(part_def->path, part->path, part->path_len); + part_def->path[part->path_len] = '\0'; + } else { + part_def->path = NULL; + } } + return 0; } box_key_def_t * box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) { - struct key_def *key_def = key_def_new(part_count); + struct key_def *key_def = key_def_new(part_count, 0); if (key_def == NULL) return key_def; @@ -195,7 +245,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) key_def_set_part(key_def, item, fields[item], (enum field_type)types[item], key_part_def_default.is_nullable, NULL, - COLL_NONE); + COLL_NONE, NULL, 0); } return key_def; } @@ -241,6 +291,13 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, if (part1->is_nullable != part2->is_nullable) return part1->is_nullable < part2->is_nullable ? -1 : 1; + /* Lexicographic strings order. */ + if (part1->path_len != part2->path_len) + return part1->path_len - part2->path_len; + int rc = 0; + if ((rc = memcmp(part1->path, part2->path, + part1->path_len)) != 0) + return rc; } return part_count1 < part_count2 ? -1 : part_count1 > part_count2; } @@ -248,11 +305,12 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, void key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, bool is_nullable, struct coll *coll, - uint32_t coll_id) + uint32_t coll_id, const char *path, uint32_t path_len) { assert(part_no < def->part_count); assert(type < field_type_MAX); def->is_nullable |= is_nullable; + def->has_json_paths |= path != NULL; def->parts[part_no].is_nullable = is_nullable; def->parts[part_no].fieldno = fieldno; def->parts[part_no].type = type; @@ -260,6 +318,17 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, def->parts[part_no].coll_id = coll_id; def->parts[part_no].offset_slot = TUPLE_OFFSET_SLOT_NIL; def->parts[part_no].offset_slot_epoch = 0; + if (path != NULL) { + def->parts[part_no].path_len = path_len; + assert(def->parts[part_no].path != NULL); + memcpy(def->parts[part_no].path, path, path_len); + def->parts[part_no].path[path_len] = '\0'; + def->parts[part_no].path_hash = mh_strn_hash(path, path_len); + } else { + def->parts[part_no].path_len = 0; + def->parts[part_no].path = NULL; + def->parts[part_no].path_hash = 0; + } column_mask_set_fieldno(&def->column_mask, fieldno); /** * When all parts are set, initialize the tuple @@ -284,7 +353,9 @@ key_def_update_optionality(struct key_def *def, uint32_t min_field_count) for (uint32_t i = 0; i < def->part_count; ++i) { struct key_part *part = &def->parts[i]; def->has_optional_parts |= part->is_nullable && - min_field_count < part->fieldno + 1; + part->path == NULL && + (min_field_count < + part->fieldno + 1); /* * One optional part is enough to switch to new * comparators. @@ -304,8 +375,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts, for (uint32_t i = 0; i < part_count; i++) { const struct key_part_def *part = &parts[i]; assert(part->type < field_type_MAX); - SNPRINT(total, snprintf, buf, size, "%d, '%s'", - (int)part->fieldno, field_type_strs[part->type]); + if (part->path != NULL) { + SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'", + (int) part->fieldno, part->path, + field_type_strs[part->type]); + } else { + SNPRINT(total, snprintf, buf, size, "%d, '%s'", + (int) part->fieldno, + field_type_strs[part->type]); + } if (i < part_count - 1) SNPRINT(total, snprintf, buf, size, ", "); } @@ -324,6 +402,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count) count++; if (part->is_nullable) count++; + if (part->path != NULL) + count++; size += mp_sizeof_map(count); size += mp_sizeof_str(strlen(PART_OPT_FIELD)); size += mp_sizeof_uint(part->fieldno); @@ -338,6 +418,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count) size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY)); size += mp_sizeof_bool(part->is_nullable); } + if (part->path != NULL) { + size += mp_sizeof_str(strlen(PART_OPT_PATH)); + size += mp_sizeof_str(strlen(part->path)); + } } return size; } @@ -351,6 +435,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, int count = 2; if (part->coll_id != COLL_NONE) count++; + if (part->path != NULL) + count++; if (part->is_nullable) count++; data = mp_encode_map(data, count); @@ -372,6 +458,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, strlen(PART_OPT_NULLABILITY)); data = mp_encode_bool(data, part->is_nullable); } + if (part->path != NULL) { + data = mp_encode_str(data, PART_OPT_PATH, + strlen(PART_OPT_PATH)); + data = mp_encode_str(data, part->path, + strlen(part->path)); + } } return data; } @@ -432,10 +524,113 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count, fields[part->fieldno].is_nullable : key_part_def_default.is_nullable); part->coll_id = COLL_NONE; + part->path = NULL; } return 0; } +/** + * Verify key_part JSON path and convert to canonical form. + * + * @param region Region to make allocations. + * @param part Part with path to update. + * @param path_extra Extra allocated space to reuse if possible. + * @param path_extra_size The @path_extra size. + * + * @retval -1 on error. + * @retval 0 on success. + */ +static int +key_def_normalize_json_path(struct region *region, struct key_part_def *part, + char **path_extra, uint32_t *path_extra_size) +{ + const char *err_msg = NULL; + + uint32_t allocated_size = *path_extra_size; + char *path = *path_extra; + + uint32_t path_len = strlen(part->path); + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, part->path, path_len); + /* + * A worst-case scenario is .a -> ["a"] + * i.e. 2.5 * path_len + 1 is enough. + */ + uint32_t new_path_size = 2.5 * path_len + 1; + if (new_path_size >= allocated_size) { + path = region_alloc(region, new_path_size); + if (path == NULL) { + diag_set(OutOfMemory, new_path_size, + "region_alloc", "path"); + return -1; + } + allocated_size = new_path_size; + } + assert(path != NULL); + part->path = path; + int rc = json_path_next(&parser, &node); + if (rc != 0) + goto error_invalid_json; + if (node.type != JSON_PATH_NUM) { + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno, + "invalid JSON path: first part should " + "be defined as array index"); + return -1; + } + if (node.num - TUPLE_INDEX_BASE != part->fieldno) { + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno, + "invalid JSON path: first part refers " + "to invalid field"); + return -1; + } + uint32_t lexemes = 0; + do { + if (node.type == JSON_PATH_NUM) { + path += sprintf(path, "[%u]", (uint32_t) node.num); + } else if (node.type == JSON_PATH_STR) { + path += sprintf(path, "[\"%.*s\"]", node.len, node.str); + } else { + unreachable(); + } + lexemes++; + } while ((rc = json_path_next(&parser, &node)) == 0 && + node.type != JSON_PATH_END); + if (rc != 0 || node.type != JSON_PATH_END) + goto error_invalid_json; + if (lexemes == 1) { + /* JSON index is useless. */ + path = part->path; + part->path = NULL; + } else { + /* Skip terminating zero. */ + path++; + /* Account constructed string size. */ + allocated_size -= path - part->path; + } + /* Going to try to reuse extra allocation next time. */ + if (allocated_size > (uint32_t)parser.src_len) { + /* Use rest of new buffer next time. */ + *path_extra = path; + *path_extra_size = allocated_size; + } else { + /* Reuse old path buffer. */ + *path_extra = (char *)parser.src; + *path_extra_size = parser.src_len; + } + return 0; + +error_invalid_json: + err_msg = tt_sprintf("invalid JSON path '%.*s': path has invalid " + "structure (error at position %d)", parser.src_len, + parser.src, rc); + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno + TUPLE_INDEX_BASE, err_msg); + return -1; +} + int key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, const char **data, const struct field_def *fields, @@ -445,8 +640,11 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, return key_def_decode_parts_166(parts, part_count, data, fields, field_count); } - for (uint32_t i = 0; i < part_count; i++) { - struct key_part_def *part = &parts[i]; + char *path = NULL; + uint32_t allocated_size = 0; + struct key_part_def *part = parts; + struct region *region = &fiber()->gc; + for (uint32_t i = 0; i < part_count; i++, part++) { if (mp_typeof(**data) != MP_MAP) { diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, i + TUPLE_INDEX_BASE, @@ -456,7 +654,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, *part = key_part_def_default; if (opts_decode(part, part_def_reg, data, ER_WRONG_INDEX_OPTIONS, i + TUPLE_INDEX_BASE, - NULL) != 0) + region) != 0) return -1; if (part->type == field_type_MAX) { diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, @@ -473,6 +671,10 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, "string and scalar parts"); return -1; } + if (part->path != NULL && + key_def_normalize_json_path(region, part, &path, + &allocated_size) != 0) + return -1; } return 0; } @@ -497,17 +699,21 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count, fields[part->fieldno].is_nullable : key_part_def_default.is_nullable); part->coll_id = COLL_NONE; + part->path = NULL; } return 0; } const struct key_part * -key_def_find(const struct key_def *key_def, uint32_t fieldno) +key_def_find(const struct key_def *key_def, uint32_t fieldno, const char *path, + uint32_t path_len) { const struct key_part *part = key_def->parts; const struct key_part *end = part + key_def->part_count; for (; part != end; part++) { - if (part->fieldno == fieldno) + if (part->fieldno == fieldno && part->path_len == path_len && + (part->path == NULL || + memcmp(part->path, path, path_len) == 0)) return part; } return NULL; @@ -519,7 +725,8 @@ key_def_contains(const struct key_def *first, const struct key_def *second) const struct key_part *part = second->parts; const struct key_part *end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno) == NULL) + if (key_def_find(first, part->fieldno, part->path, + part->path_len) == NULL) return false; } return true; @@ -533,18 +740,26 @@ key_def_merge(const struct key_def *first, const struct key_def *second) * Find and remove part duplicates, i.e. parts counted * twice since they are present in both key defs. */ - const struct key_part *part = second->parts; - const struct key_part *end = part + second->part_count; + size_t sz = 0; + const struct key_part *part = first->parts; + const struct key_part *end = part + first->part_count; + for (; part != end; part++) { + if (part->path != NULL) + sz += part->path_len + 1; + } + part = second->parts; + end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno)) + if (key_def_find(first, part->fieldno, part->path, + part->path_len) != NULL) --new_part_count; + else if (part->path != NULL) + sz += part->path_len + 1; } - - struct key_def *new_def; - new_def = (struct key_def *)calloc(1, key_def_sizeof(new_part_count)); + sz = key_def_sizeof(new_part_count, sz); + struct key_def *new_def = (struct key_def *)calloc(1, sz); if (new_def == NULL) { - diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc", - "new_def"); + diag_set(OutOfMemory, sz, "calloc", "new_def"); return NULL; } new_def->part_count = new_part_count; @@ -552,24 +767,37 @@ key_def_merge(const struct key_def *first, const struct key_def *second) new_def->is_nullable = first->is_nullable || second->is_nullable; new_def->has_optional_parts = first->has_optional_parts || second->has_optional_parts; + /* Path data write position in the new key_def. */ + char *data = (char *)new_def + key_def_sizeof(new_part_count, 0); /* Write position in the new key def. */ uint32_t pos = 0; /* Append first key def's parts to the new index_def. */ part = first->parts; end = part + first->part_count; for (; part != end; part++) { + if (part->path != NULL) { + new_def->parts[pos].path = data; + data += part->path_len + 1; + } key_def_set_part(new_def, pos++, part->fieldno, part->type, - part->is_nullable, part->coll, part->coll_id); + part->is_nullable, part->coll, part->coll_id, + part->path, part->path_len); } /* Set-append second key def's part to the new key def. */ part = second->parts; end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno)) + if (key_def_find(first, part->fieldno, part->path, + part->path_len) != NULL) continue; + if (part->path != NULL) { + new_def->parts[pos].path = data; + data += part->path_len + 1; + } key_def_set_part(new_def, pos++, part->fieldno, part->type, - part->is_nullable, part->coll, part->coll_id); + part->is_nullable, part->coll, part->coll_id, + part->path, part->path_len); } return new_def; } diff --git a/src/box/key_def.h b/src/box/key_def.h index a32c34c..693dd5d 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -54,6 +54,8 @@ struct key_part_def { uint32_t coll_id; /** True if a key part can store NULLs. */ bool is_nullable; + /** JSON path to data. */ + char *path; }; /** @@ -85,6 +87,12 @@ struct key_part { uint64_t offset_slot_epoch; /** Cache with format's field offset slot. */ int32_t offset_slot; + /** JSON path to data in canonical form. */ + char *path; + /** JSON path length. */ + uint32_t path_len; + /** JSON path hash. */ + uint32_t path_hash; }; struct key_def; @@ -144,6 +152,10 @@ struct key_def { * fields assumed to be MP_NIL. */ bool has_optional_parts; + /** + * True, if some key part contain JSON path. + */ + bool has_json_paths; /** Key fields mask. @sa column_mask.h for details. */ uint64_t column_mask; /** The size of the 'parts' array. */ @@ -232,16 +244,17 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b, /** \endcond public */ static inline size_t -key_def_sizeof(uint32_t part_count) +key_def_sizeof(uint32_t part_count, size_t paths_size) { - return sizeof(struct key_def) + sizeof(struct key_part) * part_count; + return sizeof(struct key_def) + sizeof(struct key_part) * part_count + + paths_size; } /** * Allocate a new key_def with the given part count. */ struct key_def * -key_def_new(uint32_t part_count); +key_def_new(uint32_t part_count, size_t paths_size); /** * Allocate a new key_def with the given part count @@ -253,8 +266,9 @@ key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count); /** * Dump part definitions of the given key def. */ -void -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts); +int +key_def_dump_parts(struct region *pool, const struct key_def *def, + struct key_part_def *parts); /** * Set a single key part in a key def. @@ -263,7 +277,7 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts); void key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, bool is_nullable, struct coll *coll, - uint32_t coll_id); + uint32_t coll_id, const char *path, uint32_t path_len); /** * Update 'has_optional_parts' of @a key_def with correspondence @@ -330,7 +344,8 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count, * If fieldno is not in index_def->parts returns NULL. */ const struct key_part * -key_def_find(const struct key_def *key_def, uint32_t fieldno); +key_def_find(const struct key_def *key_def, uint32_t fieldno, const char *path, + uint32_t path_len); /** * Check if key definition @a first contains all parts of @@ -377,6 +392,8 @@ key_validate_parts(const struct key_def *key_def, const char *key, static inline bool key_def_is_sequential(const struct key_def *key_def) { + if (key_def->has_json_paths) + return false; for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) { if (key_def->parts[part_id].fieldno != part_id) return false; diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc index 25b7e36..875e51f 100644 --- a/src/box/lua/space.cc +++ b/src/box/lua/space.cc @@ -295,6 +295,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i) lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE); lua_setfield(L, -2, "fieldno"); + if (part->path != NULL) { + lua_pushstring(L, part->path); + lua_setfield(L, -2, "path"); + } + lua_pushboolean(L, part->is_nullable); lua_setfield(L, -2, "is_nullable"); diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 1f80ce5..554d6f9 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1310,6 +1310,11 @@ memtx_index_def_change_requires_rebuild(struct index *index, return true; if (old_part->coll != new_part->coll) return true; + if (old_part->path_len != new_part->path_len) + return true; + if (memcmp(old_part->path, new_part->path, + old_part->path_len) != 0) + return true; } return false; } diff --git a/src/box/schema.cc b/src/box/schema.cc index 7f20f36..2411c8c 100644 --- a/src/box/schema.cc +++ b/src/box/schema.cc @@ -286,19 +286,19 @@ schema_init() * (and re-created) first. */ /* _schema - key/value space with schema description */ - struct key_def *key_def = key_def_new(1); /* part count */ + struct key_def *key_def = key_def_new(1, 0); if (key_def == NULL) diag_raise(); auto key_def_guard = make_scoped_guard([&] { key_def_delete(key_def); }); key_def_set_part(key_def, 0 /* part no */, 0 /* field no */, - FIELD_TYPE_STRING, false, NULL, COLL_NONE); + FIELD_TYPE_STRING, false, NULL, COLL_NONE, NULL, 0); sc_space_new(BOX_SCHEMA_ID, "_schema", key_def, &on_replace_schema, NULL); /* _space - home for all spaces. */ key_def_set_part(key_def, 0 /* part no */, 0 /* field no */, - FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE); + FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0); /* _collation - collation description. */ sc_space_new(BOX_COLLATION_ID, "_collation", key_def, @@ -341,15 +341,15 @@ schema_init() NULL); key_def_delete(key_def); - key_def = key_def_new(2); /* part count */ + key_def = key_def_new(2, 0); if (key_def == NULL) diag_raise(); /* space no */ key_def_set_part(key_def, 0 /* part no */, 0 /* field no */, - FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE); + FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0); /* index no */ key_def_set_part(key_def, 1 /* part no */, 1 /* field no */, - FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE); + FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0); sc_space_new(BOX_INDEX_ID, "_index", key_def, &alter_space_on_replace_index, &on_stmt_begin_index); diff --git a/src/box/tuple.c b/src/box/tuple.c index d7dbad3..70eda36 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -159,14 +159,21 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) /* Check field types */ struct tuple_field *field = &format->fields[0]; + const char *pos = tuple; uint32_t i = 0; uint32_t defined_field_count = MIN(field_count, format->field_count); for (; i < defined_field_count; ++i, ++field) { - if (key_mp_type_validate(field->type, mp_typeof(*tuple), + if (key_mp_type_validate(field->type, mp_typeof(*pos), ER_FIELD_TYPE, i + TUPLE_INDEX_BASE, field->is_nullable)) return -1; - mp_next(&tuple); + /* Check all JSON paths. */ + if (field->childs != NULL && + tuple_field_bypass_and_init(field, i, tuple, &pos, + NULL) != 0) + return -1; + if (field->childs == NULL) + mp_next(&pos); } return 0; } diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index 5d7df4d..9dded75 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -464,13 +464,15 @@ static inline int tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, const struct key_def *key_def) { + assert(has_json_path == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); const struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); const char *tuple_b_raw = tuple_data(tuple_b); - if (key_def->part_count == 1 && part->fieldno == 0) { + if (key_def->part_count == 1 && part->fieldno == 0 && + part->path == NULL) { /* * First field can not be optional - empty tuples * can not exist. @@ -525,7 +527,7 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, (struct key_part *) part); } - assert(has_optional_parts || + assert(has_optional_parts || has_json_path || (field_a != NULL && field_b != NULL)); if (! is_nullable) { rc = tuple_compare_field(field_a, field_b, part->type, @@ -535,7 +537,7 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, else continue; } - if (has_optional_parts) { + if (has_optional_parts || has_json_path) { a_type = field_a != NULL ? mp_typeof(*field_a) : MP_NIL; b_type = field_b != NULL ? mp_typeof(*field_b) : MP_NIL; } else { @@ -607,6 +609,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, const struct key_def *key_def) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -631,7 +634,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, return tuple_compare_field(field, key, part->type, part->coll); } - if (has_optional_parts) + if (has_optional_parts || has_json_paths) a_type = field != NULL ? mp_typeof(*field) : MP_NIL; else a_type = mp_typeof(*field); @@ -667,7 +670,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, else continue; } - if (has_optional_parts) + if (has_optional_parts || has_json_paths) a_type = field != NULL ? mp_typeof(*field) : MP_NIL; else a_type = mp_typeof(*field); @@ -1054,6 +1057,17 @@ static const comparator_signature cmp_arr[] = { #undef COMPARATOR +static const tuple_compare_t compare_slowpath_funcs[] = { + tuple_compare_slowpath<false, false, false>, + tuple_compare_slowpath<true, false, false>, + tuple_compare_slowpath<false, true, false>, + tuple_compare_slowpath<true, true, false>, + tuple_compare_slowpath<false, false, true>, + tuple_compare_slowpath<true, false, true>, + tuple_compare_slowpath<false, true, true>, + tuple_compare_slowpath<true, true, true> +}; + tuple_compare_t tuple_compare_create(const struct key_def *def) { @@ -1063,14 +1077,14 @@ tuple_compare_create(const struct key_def *def) return tuple_compare_sequential<true, true>; else return tuple_compare_sequential<true, false>; - } else if (def->has_optional_parts) { - return tuple_compare_slowpath<true, true, false>; } else { - return tuple_compare_slowpath<true, false, false>; + int func_idx = 1 + 2 * def->has_optional_parts + + 4 * def->has_json_paths; + return compare_slowpath_funcs[func_idx]; } } assert(! def->has_optional_parts); - if (!key_def_has_collation(def)) { + if (!key_def_has_collation(def) && !def->has_json_paths) { /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) { @@ -1088,6 +1102,8 @@ tuple_compare_create(const struct key_def *def) } if (key_def_is_sequential(def)) return tuple_compare_sequential<false, false>; + else if (def->has_json_paths) + return tuple_compare_slowpath<false, false, true>; else return tuple_compare_slowpath<false, false, false>; } @@ -1271,6 +1287,17 @@ static const comparator_with_key_signature cmp_wk_arr[] = { #undef KEY_COMPARATOR +static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = { + tuple_compare_with_key_slowpath<false, false, false>, + tuple_compare_with_key_slowpath<true, false, false>, + tuple_compare_with_key_slowpath<false, true, false>, + tuple_compare_with_key_slowpath<true, true, false>, + tuple_compare_with_key_slowpath<false, false, true>, + tuple_compare_with_key_slowpath<true, false, true>, + tuple_compare_with_key_slowpath<false, true, true>, + tuple_compare_with_key_slowpath<true, true, true> +}; + tuple_compare_with_key_t tuple_compare_with_key_create(const struct key_def *def) { @@ -1283,14 +1310,15 @@ tuple_compare_with_key_create(const struct key_def *def) return tuple_compare_with_key_sequential<true, false>; } - } else if (def->has_optional_parts) { - return tuple_compare_with_key_slowpath<true, true, false>; } else { - return tuple_compare_with_key_slowpath<true, false, false>; + int func_idx = (def->is_nullable ? 1 : 0) + + 2 * (def->has_optional_parts ? 1 : 0) + + 4 * (def->has_json_paths ? 1 : 0); + return compare_with_key_slowpath_funcs[func_idx]; } } assert(! def->has_optional_parts); - if (!key_def_has_collation(def)) { + if (!key_def_has_collation(def) && !def->has_json_paths) { /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]); @@ -1311,6 +1339,8 @@ tuple_compare_with_key_create(const struct key_def *def) } if (key_def_is_sequential(def)) return tuple_compare_with_key_sequential<false, false>; + else if (def->has_json_paths) + return tuple_compare_with_key_slowpath<false, false, true>; else return tuple_compare_with_key_slowpath<false, false, false>; } diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 2ea405a..2ee399f 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -1,15 +1,31 @@ #include "tuple_extract_key.h" #include "tuple.h" #include "fiber.h" +#include "json/path.h" enum { MSGPACK_NULL = 0xc0 }; +/** True if key part i and i+1 are suquential. */ +template <bool has_json_paths> +static inline bool +key_def_parts_are_sequential(const struct key_def *def, int i) +{ + uint32_t fieldno1 = def->parts[i].fieldno + 1; + uint32_t fieldno2 = def->parts[i + 1].fieldno; + if (!has_json_paths) { + return fieldno1 == fieldno2; + } else { + return fieldno1 == fieldno2 && def->parts[i].path == NULL && + def->parts[i + 1].path == NULL; + } +} + /** True, if a key con contain two or more parts in sequence. */ static bool key_def_contains_sequential_parts(const struct key_def *def) { for (uint32_t i = 0; i < def->part_count - 1; ++i) { - if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno) + if (key_def_parts_are_sequential<true>(def, i)) return true; } return false; @@ -97,6 +113,7 @@ static char * tuple_extract_key_slowpath(const struct tuple *tuple, const struct key_def *key_def, uint32_t *key_size) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(contains_sequential_parts == @@ -120,7 +137,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple, (struct key_part *) &key_def->parts[i]); } - if (has_optional_parts && field == NULL) { + if ((has_optional_parts || has_json_paths) && field == NULL) { bsize += mp_sizeof_nil(); continue; } @@ -132,8 +149,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * minimize tuple_field_raw() calls. */ for (; i < part_count - 1; i++) { - if (key_def->parts[i].fieldno + 1 != - key_def->parts[i + 1].fieldno) { + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) { /* * End of sequential part. */ @@ -168,7 +185,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple, (struct key_part *) &key_def->parts[i]); } - if (has_optional_parts && field == NULL) { + if ((has_optional_parts || has_json_paths) && field == NULL) { key_buf = mp_encode_nil(key_buf); continue; } @@ -180,8 +197,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * minimize tuple_field_raw() calls. */ for (; i < part_count - 1; i++) { - if (key_def->parts[i].fieldno + 1 != - key_def->parts[i + 1].fieldno) { + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) { /* * End of sequential part. */ @@ -200,7 +217,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple, bsize = end - field; memcpy(key_buf, field, bsize); key_buf += bsize; - if (has_optional_parts && null_count != 0) { + if ((has_optional_parts || has_json_paths) && null_count != 0) { memset(key_buf, MSGPACK_NULL, null_count); key_buf += null_count * mp_sizeof_nil(); } @@ -220,6 +237,7 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, const struct key_def *key_def, uint32_t *key_size) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(mp_sizeof_nil() == 1); @@ -247,11 +265,12 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, uint32_t fieldno = key_def->parts[i].fieldno; uint32_t null_count = 0; for (; i < key_def->part_count - 1; i++) { - if (key_def->parts[i].fieldno + 1 != - key_def->parts[i + 1].fieldno) + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) break; } - uint32_t end_fieldno = key_def->parts[i].fieldno; + const struct key_part *part = &key_def->parts[i]; + uint32_t end_fieldno = part->fieldno; if (fieldno < current_fieldno) { /* Rewind. */ @@ -293,20 +312,50 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, current_fieldno++; } } + const char *field_last, *field_end_last; + if (has_json_paths && part->path != NULL) { + field_last = field; + field_end_last = field_end; + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, part->path, + part->path_len); + /* Skip fieldno. */ + int rc = json_path_next(&parser, &node); + assert(rc == 0); + rc = tuple_field_dig_with_parser(&parser, &field); + field_end = field; + mp_next(&field_end); + } memcpy(key_buf, field, field_end - field); key_buf += field_end - field; - if (has_optional_parts && null_count != 0) { + if ((has_optional_parts || has_json_paths) && null_count != 0) { memset(key_buf, MSGPACK_NULL, null_count); key_buf += null_count * mp_sizeof_nil(); } else { assert(key_buf - key <= data_end - data); } + if (has_json_paths && part->path != NULL) { + field = field_last; + field_end = field_end_last; + } } if (key_size != NULL) *key_size = (uint32_t)(key_buf - key); return key; } +static const tuple_extract_key_t extract_key_slowpath_funcs[] = { + tuple_extract_key_slowpath<false, false, false>, + tuple_extract_key_slowpath<true, false, false>, + tuple_extract_key_slowpath<false, true, false>, + tuple_extract_key_slowpath<true, true, false>, + tuple_extract_key_slowpath<false, false, true>, + tuple_extract_key_slowpath<true, false, true>, + tuple_extract_key_slowpath<false, true, true>, + tuple_extract_key_slowpath<true, true, true> +}; + /** * Initialize tuple_extract_key() and tuple_extract_key_raw() */ @@ -327,35 +376,30 @@ tuple_extract_key_set(struct key_def *key_def) tuple_extract_key_sequential_raw<false>; } } else { - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - if (key_def_contains_sequential_parts(key_def)) { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, true, - false>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, true, - false>; - } - } else { - if (key_def_contains_sequential_parts(key_def)) { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, false, - false>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, false, - false>; - } - } + int func_idx = + (key_def_contains_sequential_parts(key_def) ? 1 : 0) + + 2 * (key_def->has_optional_parts ? 1 : 0) + + 4 * (key_def->has_json_paths ? 1 : 0); + key_def->tuple_extract_key = + extract_key_slowpath_funcs[func_idx]; + assert(!key_def->has_optional_parts || key_def->is_nullable); } if (key_def->has_optional_parts) { assert(key_def->is_nullable); - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true, false>; + if (key_def->has_json_paths) { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<true, true>; + } else { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<true, false>; + } } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false, false>; + if (key_def->has_json_paths) { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<false, true>; + } else { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<false, false>; + } } } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 2d4a85f..363fdff 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -30,6 +30,7 @@ */ #include "json/path.h" #include "tuple_format.h" +#include "assoc.h" /** Global table of tuple formats */ struct tuple_format **tuple_formats; @@ -38,10 +39,478 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL; static uint32_t formats_size = 0, formats_capacity = 0; static const struct tuple_field tuple_field_default = { - FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, + FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, {{NULL, 0}} }; /** + * Propagate @a field to MessagePack(field)[key]. + * @param[in][out] field Field to propagate. + * @param key Key to propagate to. + * @param len Length of @a key. + * @param field_idx Field index in map. + * + * @retval 0 Success, the index was found. + * @retval -1 Not found. + */ +static inline int +tuple_field_go_to_key(const char **field, const char *key, int len, + uint32_t *field_idx) +{ + enum mp_type type = mp_typeof(**field); + if (type != MP_MAP) + return -1; + uint32_t count = mp_decode_map(field); + for (uint32_t idx = 0; idx < count; idx++) { + type = mp_typeof(**field); + if (type == MP_STR) { + uint32_t value_len; + const char *value = mp_decode_str(field, &value_len); + if (value_len == (uint)len && + memcmp(value, key, len) == 0) { + *field_idx = idx; + return 0; + } + } else { + /* Skip key. */ + mp_next(field); + } + /* Skip value. */ + mp_next(field); + } + return -1; +} + +struct mh_strnptr_node_t * +json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path, + uint32_t path_len, uint32_t path_hash) +{ + assert(hashtable != NULL); + struct mh_strnptr_key_t key = {path, path_len, path_hash}; + mh_int_t rc = mh_strnptr_find(hashtable, &key, NULL); + if (rc == mh_end(hashtable)) + return NULL; + return mh_strnptr_node(hashtable, rc); +} + +/** + * Create a new hashtable object. + * @param[out] hashtable Pointer to object to create. + * @param records Count of records to reserve. + * @retval -1 On error. + * @retval 0 On success. + */ +static struct mh_strnptr_t * +json_path_hash_create(uint32_t records) +{ + struct mh_strnptr_t *ret = mh_strnptr_new(); + if (ret == NULL) { + diag_set(OutOfMemory, sizeof(struct mh_strnptr_t), + "mh_strnptr_new", "hashtable"); + return NULL; + } + if (records > 0 && + mh_strnptr_reserve(ret, records, NULL) != 0) { + mh_strnptr_delete(ret); + diag_set(OutOfMemory, records, "mh_strnptr_reserve", + "hashtable"); + return NULL; + } + return ret; +} + +/** + * Delete @hashtable object. + * @param hashtable Pointer to object to delete. + */ +static void +json_path_hash_delete(struct mh_strnptr_t *hashtable) +{ + assert(hashtable != NULL); + while (mh_size(hashtable) != 0) { + mh_int_t n = mh_first(hashtable); + mh_strnptr_del(hashtable, n, NULL); + } + mh_strnptr_delete(hashtable); +} + +/** + * Insert a new record to hashtable. + * @param hashtable Storage to insert new record. + * @param path String with path. + * @param path_len Length of @path. + * @param field Value to store in @hashtable. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +json_path_hash_insert(struct mh_strnptr_t *hashtable, const char *path, + uint32_t path_len, struct tuple_field *field) +{ + assert(hashtable != NULL); + uint32_t path_hash = mh_strn_hash(path, path_len); + struct mh_strnptr_node_t name_node = {path, path_len, path_hash, field}; + mh_int_t rc = mh_strnptr_put(hashtable, &name_node, NULL, NULL); + if (rc == mh_end(hashtable)) { + diag_set(OutOfMemory, sizeof(*hashtable), "mh_strnptr_put", + "hashtable"); + return -1; + } + return 0; +} + +/** + * Construct field tree level for JSON path part. + * + * @param[in, out] tuple_field Pointer to record to start with + * would be changed to record that math + * @part lexeme. + * @param fieldno Number of root space field. + * @param part JSON path lexeme to represent in field tree. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +json_field_tree_append(struct tuple_field **field_subtree, uint32_t fieldno, + struct json_path_node *part) +{ + enum field_type type; + struct tuple_field *field = *field_subtree; + switch (part->type) { + case JSON_PATH_NUM: { + type = FIELD_TYPE_ARRAY; + if (field->type != FIELD_TYPE_ANY && field->type != type) + goto error_type_mistmatch; + /* Create or resize field array if required. */ + if (field->array == NULL || part->num > field->array_size) { + struct tuple_field **array = + realloc(field->array, + part->num * sizeof(array[0])); + if (array == NULL) { + diag_set(OutOfMemory, + part->num * sizeof(array[0]), + "realloc","array"); + return -1; + } + memset(&array[field->array_size], 0, + (part->num - field->array_size) * + sizeof(array[0])); + field->array = array; + field->array_size = part->num; + field->type = type; + } else if (field->array[part->num - TUPLE_INDEX_BASE] != NULL) { + /* Record already exists. No actions required */ + *field_subtree = + field->array[part->num - TUPLE_INDEX_BASE]; + return 0; + } + break; + } + case JSON_PATH_STR: { + type = FIELD_TYPE_MAP; + if (field->type != FIELD_TYPE_ANY && field->type != type) + goto error_type_mistmatch; + if (field->map == NULL) { + field->map = json_path_hash_create(1); + if (field->map == NULL) + return -1; + field->type = type; + } else { + uint32_t str_hash = mh_strn_hash(part->str, part->len); + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(field->map, part->str, + part->len, str_hash); + if (ht_record != NULL) { + assert(ht_record->val != NULL); + *field_subtree = ht_record->val; + return 0; + } + } + break; + } + default: + unreachable(); + } + + /* Construct and insert a new record. */ + struct tuple_field *new_field = malloc(sizeof(struct tuple_field)); + if (new_field == NULL) { + diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc", + "new_field"); + return -1; + } + *new_field = tuple_field_default; + if (field->type == FIELD_TYPE_MAP) { + if (json_path_hash_insert(field->map, part->str, part->len, + new_field) != 0) { + free(new_field); + return -1; + } + } else if (field->type == FIELD_TYPE_ARRAY) { + field->array[part->num - TUPLE_INDEX_BASE] = new_field; + } + *field_subtree = new_field; + return 0; + +error_type_mistmatch: + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, + tt_sprintf("%d", fieldno + TUPLE_INDEX_BASE), + field_type_strs[type], field_type_strs[field->type]); + return -1; +} + +/** + * Delete @field_subtree object. + * @param field_subtree To delete. + */ +static void +json_field_tree_delete(struct tuple_field *field_subtree) +{ + if (field_subtree->type == FIELD_TYPE_MAP && + field_subtree->map != NULL) { + mh_int_t i; + mh_foreach(field_subtree->map, i) { + struct tuple_field *field = + mh_strnptr_node(field_subtree->map, i)->val; + assert(field != NULL); + json_field_tree_delete(field); + free(field); + } + json_path_hash_delete(field_subtree->map); + } else if (field_subtree->type == FIELD_TYPE_ARRAY && + field_subtree->array != NULL) { + for (uint32_t i = 0; i < field_subtree->array_size; i++) { + struct tuple_field *field = field_subtree->array[i]; + if (field == NULL) + continue; + json_field_tree_delete(field_subtree->array[i]); + free(field_subtree->array[i]); + } + free(field_subtree->array); + } +} + +int +tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, + const char *tuple, const char **offset, + uint32_t *field_map) +{ + assert(offset != NULL); + int rc = 0; + const char *mp_data = *offset; + const char *valid_type_str = NULL; + const char *err = NULL; + enum mp_type type = mp_typeof(**offset); + if (field->type == FIELD_TYPE_MAP) { + if (type != MP_MAP) { + valid_type_str = mp_type_strs[MP_MAP]; + goto error_type_mistmatch; + } + const char *max_offset = *offset; + uint32_t max_idx = 0; + uint32_t count = mp_decode_map(&max_offset); + mh_int_t i; + mh_foreach(field->map, i) { + struct mh_strnptr_node_t *ht_record = + mh_strnptr_node(field->map, i); + struct tuple_field *leaf = ht_record->val; + assert(leaf != NULL); + + const char *raw = *offset; + uint32_t map_idx = 0; + rc = tuple_field_go_to_key(&raw, ht_record->str, + (int)ht_record->len, + &map_idx); + if (rc != 0 && !leaf->is_nullable) { + err = tt_sprintf("map doesn't contain key " + "'%.*s' defined in index", + ht_record->len,ht_record->str); + goto error_invalid_document; + } + if (rc != 0) { + if (field_map != NULL && + leaf->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[leaf->offset_slot] = 0; + continue; + } + if (tuple_field_bypass_and_init(leaf, idx, tuple, &raw, + field_map) != 0) + return -1; + max_idx = MAX(max_idx, map_idx + 1); + max_offset = MAX(max_offset, raw); + } + *offset = max_offset; + while (count-- > max_idx) { + mp_next(offset); + mp_next(offset); + } + return 0; + } else if (field->type == FIELD_TYPE_ARRAY) { + if (type != MP_ARRAY) { + valid_type_str = mp_type_strs[MP_ARRAY]; + goto error_type_mistmatch; + } + uint32_t count = mp_decode_array(offset); + for (uint32_t i = count; i < field->array_size; i++) { + /* + * Index fields out of document array + * must be nullable. + */ + struct tuple_field *leaf = field->array[i]; + if (leaf == NULL) + continue; + if (leaf->is_nullable) { + if (field_map != NULL && + leaf->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[leaf->offset_slot] = 0; + continue; + } + err = tt_sprintf("array size %d is less than item %d " + "defined in index", i, + field->array_size); + goto error_invalid_document; + } + uint32_t fields = MIN(field->array_size, count); + for (uint32_t i = 0; i < fields; i++) { + if (field->array[i] == NULL) { + mp_next(offset); + continue; + } + if (tuple_field_bypass_and_init(field->array[i], idx, + tuple, + offset, field_map) != 0) + return -1; + } + while (count-- > fields) + mp_next(offset); + return 0; + } + /* Tree leaf field */ + if (key_mp_type_validate(field->type, type, ER_KEY_PART_TYPE, idx, + field->is_nullable) != 0) { + valid_type_str = field_type_strs[field->type]; + goto error_type_mistmatch; + } + assert(offset != NULL); + if (field_map != NULL && + field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + field_map[field->offset_slot] = + (uint32_t) (*offset - tuple); + } + mp_next(offset); + return 0; + +error_type_mistmatch: + err = tt_sprintf("type mismatch: have %s, expected %s", + mp_type_strs[type], valid_type_str); +error_invalid_document: + assert(err != NULL); + do { + char *data_buff = tt_static_buf(); + mp_snprint(data_buff, TT_STATIC_BUF_LEN, mp_data); + const char *err_msg = + tt_sprintf("invalid field %d document content '%s': %s", + idx + TUPLE_INDEX_BASE, data_buff, err); + diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg); + return -1; + } while (0); +} + +/** + * Add new JSON @path to @format. + * @param format Tuple format to modify. + * @param path String to add. + * @param path_len Length of @path. + * @param path_hash Hash of @path. + * @param type Type of field by @path. + * @param is_nullable Nullability of field by @path. + * @param strings Area to store unique JSON paths (optional). + * @param[out] leaf Pointer to leaf field. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +tuple_format_add_json_path(struct tuple_format *format, const char *path, + uint32_t path_len, uint32_t path_hash, + enum field_type type, bool is_nullable, + char **strings, struct tuple_field **leaf) +{ + assert(format->path_hash != NULL); + /* + * Get root field by index. + * Path is specified in canonical form: [i]... + */ + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, path, path_len); + int rc = json_path_next(&parser, &node); + assert(rc == 0 && node.type == JSON_PATH_NUM); + assert(node.num - TUPLE_INDEX_BASE < format->field_count); + + /* Test if path is already registered. */ + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(format->path_hash, path, path_len, path_hash); + assert(ht_record != NULL); + struct tuple_field *field = ht_record->val; + if (unlikely(field != NULL)) { + /* Path has been already registered. */ + if (field->type != type) { + const char *err = + tt_sprintf("JSON path '%.*s' has been already " + "constructed for '%s' leaf record", + path_len, path, + field_type_strs[field->type]); + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + node.num, err); + return -1; + } + if (field->is_nullable != is_nullable) { + const char *err = + tt_sprintf("JSON path '%.*s' has been already " + "constructed for '%snullable' leaf " + "record", path_len, path, + field->is_nullable ? "" : "not "); + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + node.num, err); + return -1; + } + *leaf = field; + return 0; + } else if (strings != NULL) { + /* + * Hashtable should hold memory related to format + * chunk allocation. + */ + memcpy(*strings, path, path_len); + (*strings)[path_len] = '\0'; + ht_record->str = *strings; + *strings += path_len + 1; + } + + /* + * We have to re-init parser with path string located in + * format chunk. + */ + json_path_parser_create(&parser, ht_record->str + parser.offset, + path_len - parser.offset); + /* Build data path tree. */ + uint32_t root_fieldno = node.num - TUPLE_INDEX_BASE; + field = &format->fields[root_fieldno]; + while ((rc = json_path_next(&parser, &node)) == 0 && + node.type != JSON_PATH_END) { + if (json_field_tree_append(&field, root_fieldno, &node) != 0) + return -1; + } + assert(rc == 0 && node.type == JSON_PATH_END); + + /* Leaf record is a new object as JSON path unique. */ + field->type = type; + field->is_nullable = is_nullable; + *leaf = field; + ht_record->val = field; + return 0; +} + +/** * Extract all available type info from keys and field * definitions. */ @@ -63,12 +532,18 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, format->fields[i].type = fields[i].type; format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL; format->fields[i].is_nullable = fields[i].is_nullable; + /* Don't need to init format->fields[i].map. */ + format->fields[i].childs = NULL; + format->fields[i].array_size = 0; } /* Initialize remaining fields */ for (uint32_t i = field_count; i < format->field_count; i++) format->fields[i] = tuple_field_default; int current_slot = 0; + /* Memory allocated for JSON paths if any. */ + char *data = (char *)format + sizeof(struct tuple_format) + + format->field_count * sizeof(struct tuple_field); /* extract field type info */ for (uint16_t key_no = 0; key_no < key_count; ++key_no) { @@ -91,6 +566,24 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, field->is_nullable = false; } + if (part->path != NULL) { + field->is_key_part = true; + assert(is_sequential == false); + struct tuple_field *leaf = NULL; + if (tuple_format_add_json_path(format, + part->path, + part->path_len, + part->path_hash, + part->type, + part->is_nullable, + &data, + &leaf) != 0) + return -1; + assert(leaf != NULL); + if (leaf->offset_slot == TUPLE_OFFSET_SLOT_NIL) + leaf->offset_slot = --current_slot; + continue; + } /* * Check that there are no conflicts * between index part types and space @@ -100,10 +593,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, * used in tuple_format. */ if (field_type1_contains_type2(field->type, - part->type)) { + part->type) && + part->path == NULL) { field->type = part->type; } else if (! field_type1_contains_type2(part->type, - field->type)) { + field->type) && + part->path == NULL) { const char *name; int fieldno = part->fieldno + TUPLE_INDEX_BASE; if (part->fieldno >= field_count) { @@ -201,19 +696,47 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, uint32_t space_field_count, struct tuple_dictionary *dict) { uint32_t index_field_count = 0; + /* JSON path hashtable. */ + struct mh_strnptr_t *path_hash = json_path_hash_create(0); + if (path_hash == NULL) + return NULL; /* find max max field no */ for (uint16_t key_no = 0; key_no < key_count; ++key_no) { const struct key_def *key_def = keys[key_no]; const struct key_part *part = key_def->parts; const struct key_part *pend = part + key_def->part_count; for (; part < pend; part++) { + if (part->path != NULL && + json_path_hash_insert(path_hash, part->path, + part->path_len, NULL) != 0) { + json_path_hash_delete(path_hash); + return NULL; + } index_field_count = MAX(index_field_count, part->fieldno + 1); } } + size_t extra_size = 0; + if (mh_size(path_hash) == 0) { + /* Hashtable is useless. */ + json_path_hash_delete(path_hash); + path_hash = NULL; + } else { + /* + * Calculate unique JSON paths count. + * Path data would be copied later on + * tuple_format_create routine. + */ + mh_int_t i; + mh_foreach(path_hash, i) { + struct mh_strnptr_node_t *node = + mh_strnptr_node(path_hash, i); + extra_size += node->len + 1; + } + } uint32_t field_count = MAX(space_field_count, index_field_count); uint32_t total = sizeof(struct tuple_format) + - field_count * sizeof(struct tuple_field); + field_count * sizeof(struct tuple_field) + extra_size; struct tuple_format *format = (struct tuple_format *) malloc(total); if (format == NULL) { @@ -243,6 +766,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->index_field_count = index_field_count; format->exact_field_count = 0; format->min_field_count = 0; + format->path_hash = path_hash; return format; } @@ -250,6 +774,10 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, static inline void tuple_format_destroy(struct tuple_format *format) { + for (uint32_t i = 0; i < format->field_count; i++) + json_field_tree_delete(&format->fields[i]); + if (format->path_hash != NULL) + json_path_hash_delete(format->path_hash); tuple_dictionary_unref(format->dict); } @@ -334,21 +862,61 @@ tuple_format_dup(struct tuple_format *src) { uint32_t total = sizeof(struct tuple_format) + src->field_count * sizeof(struct tuple_field); + if (src->path_hash != NULL) { + mh_int_t i; + mh_foreach(src->path_hash, i) + total += mh_strnptr_node(src->path_hash, i)->len + 1; + } struct tuple_format *format = (struct tuple_format *) malloc(total); if (format == NULL) { diag_set(OutOfMemory, total, "malloc", "tuple format"); return NULL; } memcpy(format, src, total); + + /* Fill with NULLs for normal destruction on error. */ + format->path_hash = NULL; + for (uint32_t i = 0; i < format->field_count; i++) { + format->fields[i].childs = NULL; + format->fields[i].array_size = 0; + } + if (src->path_hash != NULL) { + mh_int_t i; + format->path_hash = + json_path_hash_create(mh_size(src->path_hash)); + if (format->path_hash == NULL) + goto error; + mh_foreach(src->path_hash, i) { + struct mh_strnptr_node_t *node = + mh_strnptr_node(src->path_hash, i); + /* Path data has been already copied. */ + char *path = (char *)format + (node->str - (char *)src); + if (json_path_hash_insert(format->path_hash, path, + node->len, NULL) != 0) + goto error; + /* Store source leaf field offset_slot. */ + struct tuple_field *leaf = node->val; + int32_t offset_slot = leaf->offset_slot; + uint32_t path_hash = mh_strn_hash(path, node->len); + if (tuple_format_add_json_path(format, path, node->len, + path_hash, leaf->type, + leaf->is_nullable, NULL, + &leaf) != 0) + goto error; + /* Store offset_slot in a new leaf record. */ + assert(leaf != NULL); + leaf->offset_slot = offset_slot; + } + } tuple_dictionary_ref(format->dict); format->id = FORMAT_ID_NIL; format->refs = 0; - if (tuple_format_register(format) != 0) { - tuple_format_destroy(format); - free(format); - return NULL; - } - return format; + if (tuple_format_register(format) == 0) + return format; +error: + tuple_format_destroy(format); + free(format); + return NULL; } /** @sa declaration for details. */ @@ -377,18 +945,14 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, return -1; } - /* first field is simply accessible, so we do not store offset to it */ - enum mp_type mp_type = mp_typeof(*pos); + /* + * First field is simply accessible, store offset to it + * only for JSON path. + */ + uint32_t i = 0; + enum mp_type mp_type; const struct tuple_field *field = &format->fields[0]; - if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, - TUPLE_INDEX_BASE, field->is_nullable)) - return -1; - mp_next(&pos); - /* other fields...*/ - ++field; - uint32_t i = 1; - uint32_t defined_field_count = MIN(field_count, format->field_count); - if (field_count < format->index_field_count) { + if (field_count < format->index_field_count || field->childs != NULL) { /* * Nullify field map to be able to detect by 0, * which key fields are absent in tuple_field(). @@ -396,6 +960,16 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, memset((char *)field_map - format->field_map_size, 0, format->field_map_size); } + if (field->childs == NULL) { + mp_type = mp_typeof(*pos); + if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, + TUPLE_INDEX_BASE, field->is_nullable)) + return -1; + mp_next(&pos); + ++field; + ++i; + } + uint32_t defined_field_count = MIN(field_count, format->field_count); for (; i < defined_field_count; ++i, ++field) { mp_type = mp_typeof(*pos); if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, @@ -405,8 +979,12 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { field_map[field->offset_slot] = (uint32_t) (pos - tuple); - } - mp_next(&pos); + } else if (field->childs != NULL && + tuple_field_bypass_and_init(field, i, tuple, &pos, + field_map) != 0) + return -1; + if (field->childs == NULL) + mp_next(&pos); } return 0; } @@ -512,55 +1090,106 @@ tuple_field_go_to_index(const char **field, uint64_t index) return -1; } -/** - * Propagate @a field to MessagePack(field)[key]. - * @param[in][out] field Field to propagate. - * @param key Key to propagate to. - * @param len Length of @a key. - * - * @retval 0 Success, the index was found. - * @retval -1 Not found. - */ -static inline int -tuple_field_go_to_key(const char **field, const char *key, int len) +const char * +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) { - enum mp_type type = mp_typeof(**field); - if (type != MP_MAP) - return -1; - uint64_t count = mp_decode_map(field); - for (; count > 0; --count) { - type = mp_typeof(**field); - if (type == MP_STR) { - uint32_t value_len; - const char *value = mp_decode_str(field, &value_len); - if (value_len == (uint)len && - memcmp(value, key, len) == 0) - return 0; - } else { - /* Skip key. */ - mp_next(field); - } - /* Skip value. */ - mp_next(field); + if (likely(part->path == NULL)) + return tuple_field_raw(format, data, field_map, part->fieldno); + + struct mh_strnptr_node_t *ht_record = NULL; + int32_t offset_slot; + if (likely(part->offset_slot_epoch == format->epoch)) { + offset_slot = part->offset_slot; + } else if (format->path_hash != NULL && + (ht_record = json_path_hash_get(format->path_hash, part->path, + part->path_len, + part->path_hash)) != NULL) { + struct tuple_field *field = ht_record->val; + assert(field != NULL); + offset_slot = field->offset_slot; + } else { + /* + * Legacy tuple having no field map for + * JSON index. + */ + uint32_t path_hash = + field_name_hash(part->path, part->path_len); + const char *raw = NULL; + if (tuple_field_raw_by_path(format, data, field_map, + part->path, part->path_len, + path_hash, &raw) != 0) + raw = NULL; + return raw; } - return -1; + assert(offset_slot < 0); + assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size); + if (unlikely(field_map[offset_slot] == 0)) + return NULL; + /* Cache offset_slot if required. */ + if (part->offset_slot_epoch < format->epoch) { + part->offset_slot = offset_slot; + part->offset_slot_epoch = format->epoch; + } + return data + field_map[offset_slot]; } -const char * -tuple_field_by_part_raw(const struct tuple_format *format, const char *data, - const uint32_t *field_map, struct key_part *part) +int +tuple_field_dig_with_parser(struct json_path_parser *parser, const char **field) { - return tuple_field_raw(format, data, field_map, part->fieldno); + int rc; + struct json_path_node node; + while ((rc = json_path_next(parser, &node)) == 0) { + uint32_t dummy; + switch(node.type) { + case JSON_PATH_NUM: + rc = tuple_field_go_to_index(field, node.num); + break; + case JSON_PATH_STR: + rc = tuple_field_go_to_key(field, node.str, + node.len, &dummy); + break; + default: + assert(node.type == JSON_PATH_END); + return 0; + } + if (rc != 0) { + *field = NULL; + return 0; + } + } + return rc; } int -tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, +tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *path, uint32_t path_len, uint32_t path_hash, const char **field) { assert(path_len > 0); uint32_t fieldno; + if (format->path_hash != NULL) { + /* + * The path hash for format->path_hash hashtable + * may may be different from path_hash specified + * as function argument. + */ + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(format->path_hash, path, path_len, + mh_strn_hash(path, path_len)); + if (ht_record != NULL) { + struct tuple_field *leaf = ht_record->val; + assert(leaf != NULL); + int32_t offset_slot = leaf->offset_slot; + assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); + if (field_map[offset_slot] != 0) + *field = tuple + field_map[offset_slot]; + else + *field = NULL; + return 0; + } + } /* * It is possible, that a field has a name as * well-formatted JSON. For example 'a.b.c.d' or '[1]' can @@ -616,23 +1245,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, *field = NULL; return 0; } - while ((rc = json_path_next(&parser, &node)) == 0) { - switch(node.type) { - case JSON_PATH_NUM: - rc = tuple_field_go_to_index(field, node.num); - break; - case JSON_PATH_STR: - rc = tuple_field_go_to_key(field, node.str, node.len); - break; - default: - assert(node.type == JSON_PATH_END); - return 0; - } - if (rc != 0) { - *field = NULL; - return 0; - } - } + rc = tuple_field_dig_with_parser(&parser, field); + if (rc == 0) + return 0; error: assert(rc > 0); diag_set(ClientError, ER_ILLEGAL_PARAMS, diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index ecbf64c..af81a58 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -63,6 +63,7 @@ enum { TUPLE_OFFSET_SLOT_NIL = INT32_MAX }; struct tuple; struct tuple_format; +struct json_path_parser; /** Engine-specific tuple format methods. */ struct tuple_format_vtab { @@ -108,6 +109,23 @@ struct tuple_field { bool is_key_part; /** True, if a field can store NULL. */ bool is_nullable; + /** Tree child records. Must at the end of struct */ + union { + /** Array of fields. */ + struct { + struct tuple_field **array; + uint32_t array_size; + }; + /** Hashtable: path -> tuple_field. */ + struct mh_strnptr_t *map; + /** Leaf argument for manual tuple construction. */ + void *arg; + /** + * Auxiliary pointer to test if field has + * JSON path subtree. + */ + void *childs; + }; }; /** @@ -167,6 +185,8 @@ struct tuple_format { * Shared names storage used by all formats of a space. */ struct tuple_dictionary *dict; + /** JSON path hash table. */ + struct mh_strnptr_t *path_hash; /* Formats of the fields */ struct tuple_field fields[0]; }; @@ -394,7 +414,7 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple, * @retval NULL No field with @a name. */ static inline const char * -tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, +tuple_field_raw_by_name(const struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *name, uint32_t name_len, uint32_t name_hash) { @@ -419,11 +439,49 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, * @retval -1 Error in JSON path. */ int -tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, +tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *path, uint32_t path_len, uint32_t path_hash, const char **field); +/** + * Retrieve document data @field with initialized @parser. + * @param parser JSON parser. + * @param[in, out] field Tuple field to lookup. + * @retval 0 On success. + * @retval > 0 On error in path been used to initialize @parser. + */ +int +tuple_field_dig_with_parser(struct json_path_parser *parser, + const char **field); + +/** + * Get @hashtable record by key @path, @path_len. + * @param hashtable Storage to lookup. + * @param path Path string. + * @param path_len Length of @path. + * @retval NULL On nothing found. + * @retval not NULL Leaf field pointer for registered path. + */ +struct mh_strnptr_node_t * +json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path, + uint32_t path_len, uint32_t path_hash); + +/** + * Observe JSON path tree in @field comparing with @tuple + * structure. Initialize field map if specified. + * @param field Field to use on initialization. + * @param idx Root field index to emmit correct error. + * @param tuple Source raw data. + * @param field_map Field map to initialize (optional). + * @retval 0 On success. + * @retval -1 On error. + */ +int +tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, + const char *tuple, const char **offset, + uint32_t *field_map); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index 561ca55..deaeeea 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -230,7 +230,7 @@ key_hash_slowpath(const char *key, const struct key_def *key_def); void tuple_hash_func_set(struct key_def *key_def) { - if (key_def->is_nullable) + if (key_def->is_nullable || key_def->has_json_paths) goto slowpath; /* * Check that key_def defines sequential a key without holes @@ -264,10 +264,17 @@ tuple_hash_func_set(struct key_def *key_def) { } slowpath: - if (key_def->has_optional_parts) - key_def->tuple_hash = tuple_hash_slowpath<true, false>; - else - key_def->tuple_hash = tuple_hash_slowpath<false, false>; + if (key_def->has_optional_parts) { + if (key_def->has_json_paths) + key_def->tuple_hash = tuple_hash_slowpath<true, true>; + else + key_def->tuple_hash = tuple_hash_slowpath<true, false>; + } else { + if (key_def->has_json_paths) + key_def->tuple_hash = tuple_hash_slowpath<false, true>; + else + key_def->tuple_hash = tuple_hash_slowpath<false, false>; + } key_def->key_hash = key_hash_slowpath; } @@ -335,6 +342,7 @@ template <bool has_optional_parts, bool has_json_paths> uint32_t tuple_hash_slowpath(const struct tuple *tuple, const struct key_def *key_def) { + assert(has_json_paths == key_def->has_json_paths); assert(has_optional_parts == key_def->has_optional_parts); uint32_t h = HASH_SEED; uint32_t carry = 0; diff --git a/src/box/vinyl.c b/src/box/vinyl.c index 798a37f..e58d4c5 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -1001,6 +1001,11 @@ vinyl_index_def_change_requires_rebuild(struct index *index, return true; if (!field_type1_contains_type2(new_part->type, old_part->type)) return true; + if (old_part->path_len != new_part->path_len) + return true; + if (memcmp(old_part->path, new_part->path, + old_part->path_len) != 0) + return true; } return false; } diff --git a/src/box/vy_log.c b/src/box/vy_log.c index fc8ede5..f396705 100644 --- a/src/box/vy_log.c +++ b/src/box/vy_log.c @@ -711,7 +711,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src) "struct key_part_def"); goto err; } - key_def_dump_parts(src->key_def, dst->key_parts); + if (key_def_dump_parts(pool, src->key_def, dst->key_parts) != 0) + goto err; dst->key_part_count = src->key_def->part_count; dst->key_def = NULL; } diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c index cb3c436..e76da15 100644 --- a/src/box/vy_lsm.c +++ b/src/box/vy_lsm.c @@ -36,6 +36,7 @@ #include <sys/stat.h> #include <sys/types.h> #include <small/mempool.h> +#include <assoc.h> #include "diag.h" #include "errcode.h" @@ -158,6 +159,49 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env, NULL); if (lsm->disk_format == NULL) goto fail_format; + /* + * Tuple formats should be compatible to make + * epoch-based caching work. + */ + int32_t min_offset_slot = 0; + struct tuple_field *dst_fields = lsm->disk_format->fields; + struct mh_strnptr_t *dst_ht = lsm->disk_format->path_hash; + struct mh_strnptr_t *src_ht = format->path_hash; + struct key_part *part = cmp_def->parts; + struct key_part *part_end = part + cmp_def->part_count; + for (; part < part_end; part++) { + struct tuple_field *dst_field = + &dst_fields[part->fieldno]; + struct tuple_field *src_field; + if (dst_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + src_field = &format->fields[part->fieldno]; + } else if (part->path != NULL) { + struct mh_strnptr_node_t *ht_record; + ht_record = + json_path_hash_get(dst_ht, part->path, + part->path_len, + part->path_hash); + assert(ht_record != NULL); + dst_field = ht_record->val; + assert(dst_field != NULL); + ht_record = + json_path_hash_get(src_ht, part->path, + part->path_len, + part->path_hash); + assert(ht_record != NULL); + src_field = ht_record->val; + assert(src_field != NULL); + } else { + continue; + } + if (src_field->offset_slot == TUPLE_OFFSET_SLOT_NIL) + continue; + dst_field->offset_slot = src_field->offset_slot; + min_offset_slot = + MIN(src_field->offset_slot, min_offset_slot); + } + lsm->disk_format->field_map_size = + -min_offset_slot * sizeof(uint32_t); } tuple_format_ref(lsm->disk_format); diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c index 7b704b8..9d5e220 100644 --- a/src/box/vy_point_lookup.c +++ b/src/box/vy_point_lookup.c @@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx, const struct vy_read_view **rv, struct tuple *key, struct tuple **ret) { - assert(tuple_field_count(key) >= lsm->cmp_def->part_count); - *ret = NULL; double start_time = ev_monotonic_now(loop()); int rc = 0; diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 37da282..6b452cd 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -44,6 +44,7 @@ #include "tuple_format.h" #include "xrow.h" #include "fiber.h" +#include "assoc.h" /** * Statement metadata keys. @@ -330,6 +331,67 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert) return replace; } +static void +vy_stmt_msgpack_build(struct tuple_field *field, char *tuple, + uint32_t *field_map, char **offset, bool write_data) +{ + if (field->type == FIELD_TYPE_ARRAY) { + if (write_data) + *offset = mp_encode_array(*offset, field->array_size); + else + *offset += mp_sizeof_array(field->array_size); + for (uint32_t i = 0; i < field->array_size; i++) { + if (field->array[i] == NULL) { + if (write_data) + *offset = mp_encode_nil(*offset); + else + *offset += mp_sizeof_nil(); + continue; + } + vy_stmt_msgpack_build(field->array[i], tuple, field_map, + offset, write_data); + } + return; + } else if (field->type == FIELD_TYPE_MAP) { + if (write_data) + *offset = mp_encode_map(*offset, mh_size(field->map)); + else + *offset += mp_sizeof_map(mh_size(field->map)); + mh_int_t i; + mh_foreach(field->map, i) { + struct mh_strnptr_node_t *node = + mh_strnptr_node(field->map, i); + assert(node); + if (write_data) { + *offset = mp_encode_str(*offset, node->str, + node->len); + } else { + *offset += mp_sizeof_str(node->len); + } + vy_stmt_msgpack_build(node->val, tuple, field_map, + offset, write_data); + } + return; + } + + struct iovec *iov = field->arg; + if (iov == NULL) { + if (write_data) + *offset = mp_encode_nil(*offset); + else + *offset += mp_sizeof_nil(); + } else { + if (write_data) { + uint32_t data_offset = *offset - tuple; + memcpy(*offset, iov->iov_base, iov->iov_len); + field->arg = NULL; + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[field->offset_slot] = data_offset; + } + *offset += iov->iov_len; + } +} + static struct tuple * vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, const struct key_def *cmp_def, @@ -340,27 +402,45 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, struct region *region = &fiber()->gc; uint32_t field_count = format->index_field_count; - struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count); + uint32_t part_count = mp_decode_array(&key); + assert(part_count == cmp_def->part_count); + struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count); if (iov == NULL) { - diag_set(OutOfMemory, sizeof(*iov) * field_count, - "region", "iov for surrogate key"); + diag_set(OutOfMemory, sizeof(*iov) * part_count, "region", + "iov for surrogate key"); return NULL; } - memset(iov, 0, sizeof(*iov) * field_count); - uint32_t part_count = mp_decode_array(&key); - assert(part_count == cmp_def->part_count); - assert(part_count <= field_count); - uint32_t nulls_count = field_count - cmp_def->part_count; - uint32_t bsize = mp_sizeof_array(field_count) + - mp_sizeof_nil() * nulls_count; - for (uint32_t i = 0; i < part_count; ++i) { - const struct key_part *part = &cmp_def->parts[i]; + uint32_t bsize = mp_sizeof_array(field_count); + uint32_t nulls_count = field_count; + memset(iov, 0, sizeof(*iov) * part_count); + const struct key_part *part = cmp_def->parts; + for (uint32_t i = 0; i < part_count; ++i, ++part) { assert(part->fieldno < field_count); const char *svp = key; - iov[part->fieldno].iov_base = (char *) key; + iov[i].iov_base = (char *) key; mp_next(&key); - iov[part->fieldno].iov_len = key - svp; - bsize += key - svp; + iov[i].iov_len = key - svp; + struct tuple_field *field; + if (part->path == NULL) { + field = &format->fields[part->fieldno]; + --nulls_count; + } else { + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(format->path_hash, + part->path, part->path_len, + part->path_hash); + assert(ht_record != NULL); + field = ht_record->val; + assert(field != NULL); + } + field->arg = &iov[i]; + } + bsize += nulls_count * mp_sizeof_nil(); + for (uint32_t i = 0; i < field_count; ++i) { + char *data = NULL; + vy_stmt_msgpack_build(&format->fields[i], NULL, NULL, &data, + false); + bsize += data - (char *)NULL; } struct tuple *stmt = vy_stmt_alloc(format, bsize); @@ -371,17 +451,11 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, uint32_t *field_map = (uint32_t *) raw; char *wpos = mp_encode_array(raw, field_count); for (uint32_t i = 0; i < field_count; ++i) { - const struct tuple_field *field = &format->fields[i]; - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - field_map[field->offset_slot] = wpos - raw; - if (iov[i].iov_base == NULL) { - wpos = mp_encode_nil(wpos); - } else { - memcpy(wpos, iov[i].iov_base, iov[i].iov_len); - wpos += iov[i].iov_len; - } + vy_stmt_msgpack_build(&format->fields[i], raw, field_map, &wpos, + true); } - assert(wpos == raw + bsize); + + assert(wpos <= raw + bsize); vy_stmt_set_type(stmt, type); return stmt; } diff --git a/test/box/misc.result b/test/box/misc.result index 6237675..6ea97e1 100644 --- a/test/box/misc.result +++ b/test/box/misc.result @@ -350,7 +350,7 @@ t; - 'box.error.CANT_CREATE_COLLATION : 150' - 'box.error.USER_EXISTS : 46' - 'box.error.WAL_IO : 40' - - 'box.error.PROC_RET : 21' + - 'box.error.RTREE_RECT : 101' - 'box.error.PRIV_GRANTED : 89' - 'box.error.CREATE_SPACE : 9' - 'box.error.GRANT : 88' @@ -361,7 +361,7 @@ t; - 'box.error.VINYL_MAX_TUPLE_SIZE : 139' - 'box.error.LOAD_FUNCTION : 99' - 'box.error.INVALID_XLOG : 74' - - 'box.error.READ_VIEW_ABORTED : 130' + - 'box.error.PRIV_NOT_GRANTED : 91' - 'box.error.TRANSACTION_CONFLICT : 97' - 'box.error.GUEST_USER_PASSWORD : 96' - 'box.error.PROC_C : 102' @@ -371,8 +371,8 @@ t; - 'box.error.DROP_FUNCTION : 71' - 'box.error.CFG : 59' - 'box.error.NO_SUCH_FIELD : 37' - - 'box.error.CONNECTION_TO_SELF : 117' - - 'box.error.FUNCTION_MAX : 54' + - 'box.error.MORE_THAN_ONE_TUPLE : 41' + - 'box.error.PROC_LUA : 32' - 'box.error.ILLEGAL_PARAMS : 1' - 'box.error.PARTIAL_KEY : 136' - 'box.error.SAVEPOINT_NO_TRANSACTION : 114' @@ -400,34 +400,35 @@ t; - 'box.error.UPDATE_ARG_TYPE : 26' - 'box.error.CROSS_ENGINE_TRANSACTION : 81' - 'box.error.FORMAT_MISMATCH_INDEX_PART : 27' - - 'box.error.FUNCTION_TX_ACTIVE : 30' - 'box.error.injection : table: <address> - - 'box.error.ITERATOR_TYPE : 72' + - 'box.error.FUNCTION_TX_ACTIVE : 30' + - 'box.error.IDENTIFIER : 70' + - 'box.error.TRANSACTION_YIELD : 154' - 'box.error.NO_SUCH_ENGINE : 57' - 'box.error.COMMIT_IN_SUB_STMT : 122' - - 'box.error.TRANSACTION_YIELD : 154' - - 'box.error.UNSUPPORTED : 5' - - 'box.error.LAST_DROP : 15' + - 'box.error.RELOAD_CFG : 58' - 'box.error.SPACE_FIELD_IS_DUPLICATE : 149' + - 'box.error.LAST_DROP : 15' + - 'box.error.SEQUENCE_OVERFLOW : 147' - 'box.error.DECOMPRESSION : 124' - 'box.error.CREATE_SEQUENCE : 142' - 'box.error.CREATE_USER : 43' - - 'box.error.SEQUENCE_OVERFLOW : 147' + - 'box.error.FUNCTION_MAX : 54' - 'box.error.INSTANCE_UUID_MISMATCH : 66' - - 'box.error.RELOAD_CFG : 58' + - 'box.error.TUPLE_FORMAT_LIMIT : 16' - 'box.error.SYSTEM : 115' - 'box.error.KEY_PART_IS_TOO_LONG : 118' - - 'box.error.MORE_THAN_ONE_TUPLE : 41' - 'box.error.TRUNCATE_SYSTEM_SPACE : 137' - - 'box.error.NO_SUCH_SAVEPOINT : 61' - 'box.error.VY_QUOTA_TIMEOUT : 135' - - 'box.error.PRIV_NOT_GRANTED : 91' + - 'box.error.NO_SUCH_SAVEPOINT : 61' + - 'box.error.PROTOCOL : 104' + - 'box.error.READ_VIEW_ABORTED : 130' - 'box.error.WRONG_INDEX_OPTIONS : 108' - 'box.error.INVALID_VYLOG_FILE : 133' - 'box.error.INDEX_FIELD_COUNT_LIMIT : 127' - - 'box.error.BEFORE_REPLACE_RET : 53' + - 'box.error.DATA_STRUCTURE_MISMATCH : 55' - 'box.error.USER_MAX : 56' - - 'box.error.INVALID_MSGPACK : 20' + - 'box.error.BEFORE_REPLACE_RET : 53' - 'box.error.TUPLE_NOT_ARRAY : 22' - 'box.error.KEY_PART_COUNT : 31' - 'box.error.ALTER_SPACE : 12' @@ -436,47 +437,47 @@ t; - 'box.error.DROP_SEQUENCE : 144' - 'box.error.INVALID_XLOG_ORDER : 76' - 'box.error.UNKNOWN_REQUEST_TYPE : 48' - - 'box.error.PROC_LUA : 32' + - 'box.error.PROC_RET : 21' - 'box.error.SUB_STMT_MAX : 121' - 'box.error.ROLE_NOT_GRANTED : 92' - 'box.error.SPACE_EXISTS : 10' - - 'box.error.UPDATE_INTEGER_OVERFLOW : 95' + - 'box.error.UNSUPPORTED : 5' - 'box.error.MIN_FIELD_COUNT : 39' - 'box.error.NO_SUCH_SPACE : 36' - 'box.error.WRONG_INDEX_PARTS : 107' - 'box.error.REPLICASET_UUID_MISMATCH : 63' - 'box.error.UPDATE_FIELD : 29' - 'box.error.INDEX_EXISTS : 85' - - 'box.error.SPLICE : 25' + - 'box.error.DROP_SPACE : 11' - 'box.error.COMPRESSION : 119' - 'box.error.INVALID_ORDER : 68' - - 'box.error.UNKNOWN : 0' + - 'box.error.SPLICE : 25' - 'box.error.NO_SUCH_GROUP : 155' - - 'box.error.TUPLE_FORMAT_LIMIT : 16' + - 'box.error.INVALID_MSGPACK : 20' - 'box.error.DROP_PRIMARY_KEY : 17' - 'box.error.NULLABLE_PRIMARY : 152' - 'box.error.NO_SUCH_SEQUENCE : 145' - 'box.error.INJECTION : 8' - 'box.error.INVALID_UUID : 64' - - 'box.error.IDENTIFIER : 70' + - 'box.error.NO_SUCH_ROLE : 82' - 'box.error.TIMEOUT : 78' + - 'box.error.ITERATOR_TYPE : 72' - 'box.error.REPLICA_MAX : 73' - - 'box.error.NO_SUCH_ROLE : 82' - - 'box.error.DROP_SPACE : 11' + - 'box.error.UNKNOWN : 0' - 'box.error.MISSING_REQUEST_FIELD : 69' - 'box.error.MISSING_SNAPSHOT : 93' - 'box.error.WRONG_SPACE_OPTIONS : 111' - 'box.error.READONLY : 7' - - 'box.error.RTREE_RECT : 101' + - 'box.error.UPDATE_INTEGER_OVERFLOW : 95' - 'box.error.UPSERT_UNIQUE_SECONDARY_KEY : 105' - 'box.error.NO_CONNECTION : 77' - 'box.error.UNSUPPORTED_PRIV : 98' - 'box.error.WRONG_SCHEMA_VERSION : 109' - 'box.error.ROLLBACK_IN_SUB_STMT : 123' - - 'box.error.PROTOCOL : 104' - - 'box.error.INVALID_XLOG_TYPE : 125' - - 'box.error.INDEX_PART_TYPE_MISMATCH : 24' - 'box.error.UNSUPPORTED_INDEX_FEATURE : 112' + - 'box.error.CONNECTION_TO_SELF : 117' + - 'box.error.INDEX_PART_TYPE_MISMATCH : 24' + - 'box.error.INVALID_XLOG_TYPE : 125' ... test_run:cmd("setopt delimiter ''"); --- diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 7fb0916..8a0c0bc 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -891,6 +891,322 @@ t["{"] s:drop() --- ... +-- +-- gh-1012: Indexes for JSON-defined paths. +-- +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO["fname"]'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key + part is indexed twice' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '[3]["FIO"]["fname"]'}}}) +--- +- error: 'Wrong index options (field 2): ''path'' must be string' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'field.FIO.fname'}}}) +--- +- error: 'Wrong index options (field 2): invalid JSON path: first part should be defined + as array index' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '[3].FIO'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type + ''map'' is not supported' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[3][1]'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type + ''array'' is not supported' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +--- +- error: Field 3 has type 'map' in one index, but type 'string' in another +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3][1].sname'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +--- +- error: Field 3 has type 'map' in one index, but type 'array' in another +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fname'}}}) +--- +- error: 'Wrong index options (field 2): invalid JSON path: first part refers to invalid + field' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}}) +--- +- error: 'Wrong index options (field 3): invalid JSON path ''[3].FIO....fname'': path + has invalid structure (error at position 9)' +... +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +... +assert(idx ~= nil) +--- +- true +... +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''666'': type mismatch: have unsigned, expected map' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''666'': type mismatch: have unsigned, expected string' +... +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''{"fname": "James"}'': map doesn''t contain key ''sname'' defined in index' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5} +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5} +--- +- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +idx:select() +--- +- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] + - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +idx:min() +--- +- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:max() +--- +- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +s:drop() +--- +... +s = box.schema.create_space('withdata', {engine = engine}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[1][2]'} +--- +... +pk = s:create_index('pk', {parts = parts}) +--- +... +s:insert{{1, 2}, 3} +--- +- [[1, 2], 3] +... +s:upsert({{box.null, 2}}, {{'+', 2, 5}}) +--- +... +s:get(2) +--- +- [[1, 2], 8] +... +s:drop() +--- +... +-- Create index on space with data +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +pk = s:create_index('primary', { type = 'tree' }) +--- +... +s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5} +--- +- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5] +... +s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''1234'': type mismatch: have unsigned, expected map' +... +_ = s:delete(1) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +_ = s:delete(2) +--- +... +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}, {3, 'str', path = '[3]["FIO"]["extra"]', is_nullable = true}}}) +--- +... +assert(idx ~= nil) +--- +- true +... +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '[3]["FIO"]["fname"]'}}}) +--- +- error: 'Wrong index options (field 3): JSON path ''[3]["FIO"]["fname"]'' has been + already constructed for ''string'' leaf record' +... +s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]', is_nullable = true}}}) +--- +- error: 'Wrong index options (field 3): JSON path ''[3]["FIO"]["fname"]'' has been + already constructed for ''not nullable'' leaf record' +... +idx:select() +--- +- - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:min() +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:max() +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:drop() +--- +... +s:drop() +--- +... +-- Test complex JSON indexes +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'str', path='[1][3][2].a'} +--- +... +parts[2] = {1, 'unsigned', path = '[1][3][1]'} +--- +... +parts[3] = {2, 'str', path = '[2][2].d[1]'} +--- +... +pk = s:create_index('primary', { type = 'tree', parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}} +--- +- [[1, 2, [3, {'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [1, 2, + 3]] +... +s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6} +--- +- error: Duplicate key exists in unique index 'primary' in space 'withdata' +... +parts = {} +--- +... +parts[1] = {4, 'unsigned', path='[4][1]', is_nullable = false} +--- +... +parts[2] = {4, 'unsigned', path='[4][2]', is_nullable = true} +--- +... +parts[3] = {4, 'unsigned', path='[4][4]', is_nullable = true} +--- +... +trap_idx = s:create_index('trap', { type = 'tree', parts = parts}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[1][3][2].b' } +--- +... +parts[2] = {3, 'unsigned'} +--- +... +crosspart_idx = s:create_index('crosspart', { parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}} +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[1][3][2].b'} +--- +... +num_idx = s:create_index('numeric', {parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}} +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +num_idx:get(2) +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +num_idx:select() +--- +- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [ + 9, 2, 3]] + - [[1, 2, [3, {'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [1, + 2, 3]] + - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [ + 0]] +... +num_idx:max() +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +num_idx:min() +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +assert(crosspart_idx:max() == num_idx:max()) +--- +- true +... +assert(crosspart_idx:min() == num_idx:min()) +--- +- true +... +trap_idx:max() +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +trap_idx:min() +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +s:drop() +--- +... engine = nil --- ... diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index 30d6f1a..3b4ce26 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -289,5 +289,94 @@ t["{"] s:drop() +-- +-- gh-1012: Indexes for JSON-defined paths. +-- +s = box.schema.space.create('withdata', {engine = engine}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO["fname"]'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '[3]["FIO"]["fname"]'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'field.FIO.fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '[3].FIO'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[3][1]'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3][1].sname'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}}) +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +assert(idx ~= nil) +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5} +s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5} +idx:select() +idx:min() +idx:max() +s:drop() + +s = box.schema.create_space('withdata', {engine = engine}) +parts = {} +parts[1] = {1, 'unsigned', path='[1][2]'} +pk = s:create_index('pk', {parts = parts}) +s:insert{{1, 2}, 3} +s:upsert({{box.null, 2}}, {{'+', 2, 5}}) +s:get(2) +s:drop() + +-- Create index on space with data +s = box.schema.space.create('withdata', {engine = engine}) +pk = s:create_index('primary', { type = 'tree' }) +s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5} +s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +_ = s:delete(1) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +_ = s:delete(2) +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}, {3, 'str', path = '[3]["FIO"]["extra"]', is_nullable = true}}}) +assert(idx ~= nil) +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '[3]["FIO"]["fname"]'}}}) +s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]', is_nullable = true}}}) +idx:select() +idx:min() +idx:max() +idx:drop() +s:drop() + +-- Test complex JSON indexes +s = box.schema.space.create('withdata', {engine = engine}) +parts = {} +parts[1] = {1, 'str', path='[1][3][2].a'} +parts[2] = {1, 'unsigned', path = '[1][3][1]'} +parts[3] = {2, 'str', path = '[2][2].d[1]'} +pk = s:create_index('primary', { type = 'tree', parts = parts}) +s:insert{{1, 2, {3, {a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}} +s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6} +parts = {} +parts[1] = {4, 'unsigned', path='[4][1]', is_nullable = false} +parts[2] = {4, 'unsigned', path='[4][2]', is_nullable = true} +parts[3] = {4, 'unsigned', path='[4][4]', is_nullable = true} +trap_idx = s:create_index('trap', { type = 'tree', parts = parts}) +parts = {} +parts[1] = {1, 'unsigned', path='[1][3][2].b' } +parts[2] = {3, 'unsigned'} +crosspart_idx = s:create_index('crosspart', { parts = parts}) +s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}} +parts = {} +parts[1] = {1, 'unsigned', path='[1][3][2].b'} +num_idx = s:create_index('numeric', {parts = parts}) +s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}} +num_idx:get(2) +num_idx:select() +num_idx:max() +num_idx:min() +assert(crosspart_idx:max() == num_idx:max()) +assert(crosspart_idx:min() == num_idx:min()) +trap_idx:max() +trap_idx:min() +s:drop() + engine = nil test_run = nil -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v3 3/4] box: introduce JSON indexes 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 3/4] box: introduce JSON indexes Kirill Shcherbatov @ 2018-09-03 10:32 ` Vladislav Shpilevoy 2018-09-03 10:35 ` Vladislav Shpilevoy 2018-09-06 12:46 ` Kirill Shcherbatov 0 siblings, 2 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-09-03 10:32 UTC (permalink / raw) To: tarantool-patches, Kirill Shcherbatov Thanks for the fixes! See 28 comments below and a commit on the branch. 1. On your branch vinyl/errinj and vinyl/info are failing. I do not know is it because you did not cherry-pick Vladimir's patch or because of a bug in your patch, but it should not fail. This is what I get sometimes on engine/null test: [008] Test failed! Result content mismatch: [008] --- engine/null.result Thu Aug 30 15:20:24 2018 [008] +++ engine/null.reject Sat Sep 1 12:51:29 2018 [008] @@ -630,7 +630,7 @@ [008] ... [008] box.snapshot() [008] --- [008] -- ok [008] +- error: 'Invalid VYLOG file: Bad record: failed to decode index key definition' > diff --git a/src/box/key_def.c b/src/box/key_def.c > index 440d41e..cf1169e 100644 > --- a/src/box/key_def.c > +++ b/src/box/key_def.c> @@ -241,6 +291,13 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, > if (part1->is_nullable != part2->is_nullable) > return part1->is_nullable < > part2->is_nullable ? -1 : 1; > + /* Lexicographic strings order. */ > + if (part1->path_len != part2->path_len) > + return part1->path_len - part2->path_len; > + int rc = 0; > + if ((rc = memcmp(part1->path, part2->path, > + part1->path_len)) != 0) > + return rc; 2. Your question: > I've checked this code reachable with assert. But I don't now how handle it manually.. > Please, help. Answer: you should write a test, that extends a path in such a way that newly indexed field values change order of index sorting. Select from the index should return the new order. Before you fixed this thing it returned an old order. Test it, please. > } > return part_count1 < part_count2 ? -1 : part_count1 > part_count2; > } > @@ -284,7 +353,9 @@ key_def_update_optionality(struct key_def *def, uint32_t min_field_count) > for (uint32_t i = 0; i < def->part_count; ++i) { > struct key_part *part = &def->parts[i]; > def->has_optional_parts |= part->is_nullable && > - min_field_count < part->fieldno + 1; > + part->path == NULL && > + (min_field_count < > + part->fieldno + 1); 3. Why does path != NULL guarantee that a def has no optional parts? On the contrary, it means that a part is optional if it is not flat and is nullable. What is funny, even now in all has_optional_parts usage places you check both for is_flat and has_optional_parts. > /* > * One optional part is enough to switch to new > * comparators. > @@ -432,10 +524,113 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count, > fields[part->fieldno].is_nullable : > key_part_def_default.is_nullable); > part->coll_id = COLL_NONE; > + part->path = NULL; > } > return 0; > } > > +/** > + * Verify key_part JSON path and convert to canonical form. > + * > + * @param region Region to make allocations. > + * @param part Part with path to update. > + * @param path_extra Extra allocated space to reuse if possible. > + * @param path_extra_size The @path_extra size. > + * > + * @retval -1 on error. > + * @retval 0 on success. > + */ > +static int > +key_def_normalize_json_path(struct region *region, struct key_part_def *part, > + char **path_extra, uint32_t *path_extra_size) > +{ > + const char *err_msg = NULL; 4. You do not need announcement of this variable. (fixed by me) > + > + uint32_t allocated_size = *path_extra_size; > + char *path = *path_extra; > + > + uint32_t path_len = strlen(part->path); > + struct json_path_parser parser; > + struct json_path_node node; > + json_path_parser_create(&parser, part->path, path_len); > + /* > + * A worst-case scenario is .a -> ["a"] > + * i.e. 2.5 * path_len + 1 is enough. > + */ > + uint32_t new_path_size = 2.5 * path_len + 1; > + if (new_path_size >= allocated_size) { > + path = region_alloc(region, new_path_size); > + if (path == NULL) { > + diag_set(OutOfMemory, new_path_size, > + "region_alloc", "path"); > + return -1; > + } > + allocated_size = new_path_size; > + } > + assert(path != NULL); > + part->path = path; > + int rc = json_path_next(&parser, &node); > + if (rc != 0) > + goto error_invalid_json; > + if (node.type != JSON_PATH_NUM) { > + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, > + part->fieldno, > + "invalid JSON path: first part should " > + "be defined as array index"); > + return -1; > + } > + if (node.num - TUPLE_INDEX_BASE != part->fieldno) { > + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, > + part->fieldno, > + "invalid JSON path: first part refers " > + "to invalid field"); > + return -1; > + } > + uint32_t lexemes = 0; > + do { > + if (node.type == JSON_PATH_NUM) { > + path += sprintf(path, "[%u]", (uint32_t) node.num); > + } else if (node.type == JSON_PATH_STR) { > + path += sprintf(path, "[\"%.*s\"]", node.len, node.str); > + } else { > + unreachable(); > + } > + lexemes++; > + } while ((rc = json_path_next(&parser, &node)) == 0 && > + node.type != JSON_PATH_END); > + if (rc != 0 || node.type != JSON_PATH_END) > + goto error_invalid_json; > + if (lexemes == 1) { > + /* JSON index is useless. */ > + path = part->path; > + part->path = NULL; 5. As I remember, I asked you to use gcov to see which code is untested and test it. But looks like you did not, but I did and here a result is: 638: 602: if (lexemes == 1) { -: 603: /* JSON index is useless. */ #####: 604: path = part->path; #####: 605: part->path = NULL; #####: 606: } else { Either JSON path is never useless in the tests or there is a bug. Please, fix. > + } else { > + /* Skip terminating zero. */ > + path++; > + /* Account constructed string size. */ > + allocated_size -= path - part->path; > + } > @@ -552,24 +767,37 @@ key_def_merge(const struct key_def *first, const struct key_def *second) > new_def->is_nullable = first->is_nullable || second->is_nullable; > new_def->has_optional_parts = first->has_optional_parts || > second->has_optional_parts; > + /* Path data write position in the new key_def. */ > + char *data = (char *)new_def + key_def_sizeof(new_part_count, 0); > /* Write position in the new key def. */ > uint32_t pos = 0; > /* Append first key def's parts to the new index_def. */ > part = first->parts; > end = part + first->part_count; > for (; part != end; part++) { > + if (part->path != NULL) { > + new_def->parts[pos].path = data; > + data += part->path_len + 1; > + } > key_def_set_part(new_def, pos++, part->fieldno, part->type, > - part->is_nullable, part->coll, part->coll_id); > + part->is_nullable, part->coll, part->coll_id, > + part->path, part->path_len); > } > > /* Set-append second key def's part to the new key def. */ > part = second->parts; > end = part + second->part_count; > for (; part != end; part++) { > - if (key_def_find(first, part->fieldno)) > + if (key_def_find(first, part->fieldno, part->path, > + part->path_len) != NULL) > continue; > + if (part->path != NULL) { > + new_def->parts[pos].path = data; > + data += part->path_len + 1; > + } > key_def_set_part(new_def, pos++, part->fieldno, part->type, > - part->is_nullable, part->coll, part->coll_id); > + part->is_nullable, part->coll, part->coll_id, > + part->path, part->path_len); 6. Here and above you should set epoch and slot offset. > } > return new_def; > } > diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc > index 5d7df4d..9dded75 100644 > --- a/src/box/tuple_compare.cc > +++ b/src/box/tuple_compare.cc > @@ -1063,14 +1077,14 @@ tuple_compare_create(const struct key_def *def) > return tuple_compare_sequential<true, true>; > else > return tuple_compare_sequential<true, false>; > - } else if (def->has_optional_parts) { > - return tuple_compare_slowpath<true, true, false>; > } else { > - return tuple_compare_slowpath<true, false, false>; > + int func_idx = 1 + 2 * def->has_optional_parts + > + 4 * def->has_json_paths; > + return compare_slowpath_funcs[func_idx]; > } > } > assert(! def->has_optional_parts); > - if (!key_def_has_collation(def)) { > + if (!key_def_has_collation(def) && !def->has_json_paths) { > /* Precalculated comparators don't use collation */ > for (uint32_t k = 0; > k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) { > @@ -1088,6 +1102,8 @@ tuple_compare_create(const struct key_def *def) > } > if (key_def_is_sequential(def)) > return tuple_compare_sequential<false, false>; > + else if (def->has_json_paths) > + return tuple_compare_slowpath<false, false, true>; > else > return tuple_compare_slowpath<false, false, false>; 7. Comparators matrix is useless unless you use it for all tuple_compare_slowpath usage places. The same for tuple_compare_with_key. > } > @@ -1283,14 +1310,15 @@ tuple_compare_with_key_create(const struct key_def *def) > return tuple_compare_with_key_sequential<true, > false>; > } > - } else if (def->has_optional_parts) { > - return tuple_compare_with_key_slowpath<true, true, false>; > } else { > - return tuple_compare_with_key_slowpath<true, false, false>; > + int func_idx = (def->is_nullable ? 1 : 0) + > + 2 * (def->has_optional_parts ? 1 : 0) + > + 4 * (def->has_json_paths ? 1 : 0); > + return compare_with_key_slowpath_funcs[func_idx]; > } > } > assert(! def->has_optional_parts); > - if (!key_def_has_collation(def)) { > + if (!key_def_has_collation(def) && !def->has_json_paths) { > /* Precalculated comparators don't use collation */ > for (uint32_t k = 0; > k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]); > @@ -1311,6 +1339,8 @@ tuple_compare_with_key_create(const struct key_def *def) > } > if (key_def_is_sequential(def)) > return tuple_compare_with_key_sequential<false, false>; > + else if (def->has_json_paths) > + return tuple_compare_with_key_slowpath<false, false, true>; > else > return tuple_compare_with_key_slowpath<false, false, false>; > } > diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc > index 2ea405a..2ee399f 100644 > --- a/src/box/tuple_extract_key.cc > +++ b/src/box/tuple_extract_key.cc > @@ -1,15 +1,31 @@ > #include "tuple_extract_key.h" > #include "tuple.h" > #include "fiber.h" > +#include "json/path.h" > > enum { MSGPACK_NULL = 0xc0 }; > > +/** True if key part i and i+1 are suquential. */ 8. Typo. > diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c > index 2d4a85f..363fdff 100644 > --- a/src/box/tuple_format.c > +++ b/src/box/tuple_format.c > @@ -30,6 +30,7 @@ > */ > #include "json/path.h" > #include "tuple_format.h" > +#include "assoc.h" > > /** Global table of tuple formats */ > struct tuple_format **tuple_formats; > @@ -38,10 +39,478 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL; > static uint32_t formats_size = 0, formats_capacity = 0; > > static const struct tuple_field tuple_field_default = { > - FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, > + FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, {{NULL, 0}} > }; > > /** > + * Propagate @a field to MessagePack(field)[key]. > + * @param[in][out] field Field to propagate. > + * @param key Key to propagate to. > + * @param len Length of @a key. > + * @param field_idx Field index in map. > + * > + * @retval 0 Success, the index was found. > + * @retval -1 Not found. > + */ > +static inline int > +tuple_field_go_to_key(const char **field, const char *key, int len, > + uint32_t *field_idx) > +{ > + enum mp_type type = mp_typeof(**field); > + if (type != MP_MAP) > + return -1; > + uint32_t count = mp_decode_map(field); > + for (uint32_t idx = 0; idx < count; idx++) { > + type = mp_typeof(**field); > + if (type == MP_STR) { > + uint32_t value_len; > + const char *value = mp_decode_str(field, &value_len); > + if (value_len == (uint)len && > + memcmp(value, key, len) == 0) { > + *field_idx = idx; > + return 0; > + } > + } else { > + /* Skip key. */ > + mp_next(field); 9. Never tested: 167: 73: } else { -: 74: /* Skip key. */ #####: 75: mp_next(field); -: 76: } > + } > + /* Skip value. */ > + mp_next(field); > + } > + return -1; > +} > +/** > + * Construct field tree level for JSON path part. > + * > + * @param[in, out] tuple_field Pointer to record to start with > + * would be changed to record that math > + * @part lexeme. > + * @param fieldno Number of root space field. > + * @param part JSON path lexeme to represent in field tree. > + * @retval -1 On error. > + * @retval 0 On success. > + */ > +static int > +json_field_tree_append(struct tuple_field **field_subtree, uint32_t fieldno, > + struct json_path_node *part) > +{ > + enum field_type type; > + struct tuple_field *field = *field_subtree; > + switch (part->type) { > + case JSON_PATH_NUM: { > + type = FIELD_TYPE_ARRAY; > + if (field->type != FIELD_TYPE_ANY && field->type != type) > + goto error_type_mistmatch; 10. The same. 668: 180: type = FIELD_TYPE_ARRAY; 1020: 181: if (field->type != FIELD_TYPE_ANY && field->type != type) #####: 182: goto error_type_mistmatch; -: 183: /* Create or resize field array if required. */ > +int > +tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, > + const char *tuple, const char **offset, > + uint32_t *field_map) > +{ > + assert(offset != NULL); > + int rc = 0; > + const char *mp_data = *offset; > + const char *valid_type_str = NULL; > + const char *err = NULL; > + enum mp_type type = mp_typeof(**offset); > + if (field->type == FIELD_TYPE_MAP) { > + if (type != MP_MAP) { > + valid_type_str = mp_type_strs[MP_MAP]; > + goto error_type_mistmatch; > + } > + const char *max_offset = *offset; > + uint32_t max_idx = 0; > + uint32_t count = mp_decode_map(&max_offset); > + mh_int_t i; > + mh_foreach(field->map, i) { > + struct mh_strnptr_node_t *ht_record = > + mh_strnptr_node(field->map, i); > + struct tuple_field *leaf = ht_record->val; > + assert(leaf != NULL); > + > + const char *raw = *offset; > + uint32_t map_idx = 0; > + rc = tuple_field_go_to_key(&raw, ht_record->str, > + (int)ht_record->len, > + &map_idx); > + if (rc != 0 && !leaf->is_nullable) { > + err = tt_sprintf("map doesn't contain key " > + "'%.*s' defined in index", > + ht_record->len,ht_record->str); > + goto error_invalid_document; > + } > + if (rc != 0) { > + if (field_map != NULL && > + leaf->offset_slot != TUPLE_OFFSET_SLOT_NIL) > + field_map[leaf->offset_slot] = 0; > + continue; > + } > + if (tuple_field_bypass_and_init(leaf, idx, tuple, &raw, > + field_map) != 0) > + return -1; > + max_idx = MAX(max_idx, map_idx + 1); > + max_offset = MAX(max_offset, raw); > + } > + *offset = max_offset; > + while (count-- > max_idx) { > + mp_next(offset); > + mp_next(offset); > + } > + return 0; > + } else if (field->type == FIELD_TYPE_ARRAY) { > + if (type != MP_ARRAY) { > + valid_type_str = mp_type_strs[MP_ARRAY]; > + goto error_type_mistmatch; > + } 11. 132: 346: } else if (field->type == FIELD_TYPE_ARRAY) { 458: 347: if (type != MP_ARRAY) { 193: 348: valid_type_str = mp_type_strs[MP_ARRAY]; #####: 349: goto error_type_mistmatch; #####: 350: } -: 351: uint32_t count = mp_decode_array(offset); > + uint32_t count = mp_decode_array(offset); > + for (uint32_t i = count; i < field->array_size; i++) { > + /* > + * Index fields out of document array > + * must be nullable. > + */ > + struct tuple_field *leaf = field->array[i]; > + if (leaf == NULL) > + continue; > + if (leaf->is_nullable) { > + if (field_map != NULL && > + leaf->offset_slot != TUPLE_OFFSET_SLOT_NIL) > + field_map[leaf->offset_slot] = 0; > + continue; > + } > + err = tt_sprintf("array size %d is less than item %d " > + "defined in index", i, > + field->array_size); > + goto error_invalid_document; 12. 34: 365: } -: 366: err = tt_sprintf("array size %d is less than item %d " #####: 367: "defined in index", i, #####: 368: field->array_size); #####: 369: goto error_invalid_document; #####: 370: } > @@ -63,12 +532,18 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, > format->fields[i].type = fields[i].type; > format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL; > format->fields[i].is_nullable = fields[i].is_nullable; > + /* Don't need to init format->fields[i].map. */ > + format->fields[i].childs = NULL; > + format->fields[i].array_size = 0; 13. Auxiliary 'childs' makes no sense if you still need to init array_size. Please, come up with a better solution. Inlined memset from offsetof is not good as well. Or it could be wrapped into functions which test for childs, nullify them. > } > /* Initialize remaining fields */ > for (uint32_t i = field_count; i < format->field_count; i++) > format->fields[i] = tuple_field_default; > > int current_slot = 0; > + /* Memory allocated for JSON paths if any. */ > + char *data = (char *)format + sizeof(struct tuple_format) + > + format->field_count * sizeof(struct tuple_field); > > /* extract field type info */ > for (uint16_t key_no = 0; key_no < key_count; ++key_no) { > @@ -91,6 +566,24 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, > field->is_nullable = false; > } > > + if (part->path != NULL) { > + field->is_key_part = true; > + assert(is_sequential == false); > + struct tuple_field *leaf = NULL; > + if (tuple_format_add_json_path(format, > + part->path, > + part->path_len, > + part->path_hash, > + part->type, > + part->is_nullable, 14. Out of 80. > + &data, > + &leaf) != 0) > + return -1; > + assert(leaf != NULL); > + if (leaf->offset_slot == TUPLE_OFFSET_SLOT_NIL) > + leaf->offset_slot = --current_slot; > + continue; 15. Why do you check for path !=/== NULL in the code below if you already know it is == NULL? The only point of moving non-flat part processing there was to remove those checks. (fixed by me) 16. I see that you init field->is_nullable before checking for path != NULL, but it is wrong. If a field is not flat, then its first level nullability could be != a leaf nullability. Now it is impossible due to lack of non-flat format, but it will be. Besides, I think you should move all this cycle body into a function like tuple_format_add_key_part or something. 17. For flat fields different nullability is allowed and the strictest is chosen. Why do you forbid it for non-flat fields (in tuple_format_add_json_path)? The same for compatible types. Below they are allowed, but for non-flat fields are not. > + } > /* > * Check that there are no conflicts > * between index part types and space > @@ -100,10 +593,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, > * used in tuple_format. > */ > if (field_type1_contains_type2(field->type, > - part->type)) { > + part->type) && > + part->path == NULL) { > field->type = part->type; > } else if (! field_type1_contains_type2(part->type, > - field->type)) { > + field->type) && > + part->path == NULL) { > const char *name; > int fieldno = part->fieldno + TUPLE_INDEX_BASE; > if (part->fieldno >= field_count) { > @@ -201,19 +696,47 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, > uint32_t space_field_count, struct tuple_dictionary *dict) > { > uint32_t index_field_count = 0; > + /* JSON path hashtable. */ > + struct mh_strnptr_t *path_hash = json_path_hash_create(0); > + if (path_hash == NULL) > + return NULL; > /* find max max field no */ > for (uint16_t key_no = 0; key_no < key_count; ++key_no) { > const struct key_def *key_def = keys[key_no]; > const struct key_part *part = key_def->parts; > const struct key_part *pend = part + key_def->part_count; > for (; part < pend; part++) { > + if (part->path != NULL && > + json_path_hash_insert(path_hash, part->path, > + part->path_len, NULL) != 0) { > + json_path_hash_delete(path_hash); > + return NULL; > + } > index_field_count = MAX(index_field_count, > part->fieldno + 1); > } > } > + size_t extra_size = 0; > + if (mh_size(path_hash) == 0) { > + /* Hashtable is useless. */ > + json_path_hash_delete(path_hash); > + path_hash = NULL; > + } else { > + /* > + * Calculate unique JSON paths count. > + * Path data would be copied later on > + * tuple_format_create routine. > + */ > + mh_int_t i; > + mh_foreach(path_hash, i) { > + struct mh_strnptr_node_t *node = > + mh_strnptr_node(path_hash, i); > + extra_size += node->len + 1; > + } > + } > uint32_t field_count = MAX(space_field_count, index_field_count); > uint32_t total = sizeof(struct tuple_format) + > - field_count * sizeof(struct tuple_field); > + field_count * sizeof(struct tuple_field) + extra_size; > > struct tuple_format *format = (struct tuple_format *) malloc(total); > if (format == NULL) { 18. Two leaks of path_hash below where 'return NULL' is. (fixed by me) > @@ -377,18 +945,14 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, > return -1; > } > > - /* first field is simply accessible, so we do not store offset to it */ > - enum mp_type mp_type = mp_typeof(*pos); > + /* > + * First field is simply accessible, store offset to it > + * only for JSON path. > + */ 19. The comment now is not linked with the code below. (fixed by me) > + uint32_t i = 0; > + enum mp_type mp_type; > const struct tuple_field *field = &format->fields[0]; > - if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, > - TUPLE_INDEX_BASE, field->is_nullable)) > - return -1; > - mp_next(&pos); > - /* other fields...*/ > - ++field; > - uint32_t i = 1; > - uint32_t defined_field_count = MIN(field_count, format->field_count); > - if (field_count < format->index_field_count) { > + if (field_count < format->index_field_count || field->childs != NULL) { > /* > * Nullify field map to be able to detect by 0, > * which key fields are absent in tuple_field(). > @@ -512,55 +1090,106 @@ tuple_field_go_to_index(const char **field, uint64_t index) > return -1; > } > > -/** > - * Propagate @a field to MessagePack(field)[key]. > - * @param[in][out] field Field to propagate. > - * @param key Key to propagate to. > - * @param len Length of @a key. > - * > - * @retval 0 Success, the index was found. > - * @retval -1 Not found. > - */ > -static inline int > -tuple_field_go_to_key(const char **field, const char *key, int len) > +const char * > +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, > + const uint32_t *field_map, struct key_part *part) > { > - enum mp_type type = mp_typeof(**field); > - if (type != MP_MAP) > - return -1; > - uint64_t count = mp_decode_map(field); > - for (; count > 0; --count) { > - type = mp_typeof(**field); > - if (type == MP_STR) { > - uint32_t value_len; > - const char *value = mp_decode_str(field, &value_len); > - if (value_len == (uint)len && > - memcmp(value, key, len) == 0) > - return 0; > - } else { > - /* Skip key. */ > - mp_next(field); > - } > - /* Skip value. */ > - mp_next(field); > + if (likely(part->path == NULL)) > + return tuple_field_raw(format, data, field_map, part->fieldno); > + > + struct mh_strnptr_node_t *ht_record = NULL; > + int32_t offset_slot; > + if (likely(part->offset_slot_epoch == format->epoch)) { > + offset_slot = part->offset_slot; 20. This code is different on the branch and in the email. Here it is: > if (likely(part->offset_slot_epoch == format->epoch && > format->epoch != 0)) { > offset_slot = part->offset_slot; And please explain how format epoch can be 0 ever? A format epoch is initialized by 0 but then it is always either reset to an old epoch or old epoch + 1. When a space has no indexes, it could be 0, but each new index alters the space and updates its epoch. So when a format epoch is 0, it has no tuples. > + } else if (format->path_hash != NULL && > + (ht_record = json_path_hash_get(format->path_hash, part->path, > + part->path_len, > + part->path_hash)) != NULL) { > + struct tuple_field *field = ht_record->val; > + assert(field != NULL); > + offset_slot = field->offset_slot; > + } else { > + /* > + * Legacy tuple having no field map for > + * JSON index. > + */ > + uint32_t path_hash = > + field_name_hash(part->path, part->path_len); > + const char *raw = NULL; > + if (tuple_field_raw_by_path(format, data, field_map, > + part->path, part->path_len, > + path_hash, &raw) != 0) > + raw = NULL; > + return raw; > } > - return -1; > + assert(offset_slot < 0); > + assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size); > + if (unlikely(field_map[offset_slot] == 0)) > + return NULL; 21. Why do not you update epoch and slot in such a case? > + /* Cache offset_slot if required. */ > + if (part->offset_slot_epoch < format->epoch) { 22. This check is not needed for the best case when epochs are equal in format and key part. Please, move it to the only place where it is needed above. 23. By the way, why do you store offset_slot_epoch in key_part instead of in a parent key_def? As I understand, it is the same for all parts of a key_def. > + part->offset_slot = offset_slot; > + part->offset_slot_epoch = format->epoch; > + } > + return data + field_map[offset_slot]; > } > > -const char * > -tuple_field_by_part_raw(const struct tuple_format *format, const char *data, > - const uint32_t *field_map, struct key_part *part) > +int > +tuple_field_dig_with_parser(struct json_path_parser *parser, const char **field) > { > - return tuple_field_raw(format, data, field_map, part->fieldno); > + int rc; > + struct json_path_node node; > + while ((rc = json_path_next(parser, &node)) == 0) { > + uint32_t dummy; > + switch(node.type) { > + case JSON_PATH_NUM: > + rc = tuple_field_go_to_index(field, node.num); > + break; > + case JSON_PATH_STR: > + rc = tuple_field_go_to_key(field, node.str, > + node.len, &dummy); > + break; > + default: > + assert(node.type == JSON_PATH_END); > + return 0; > + } > + if (rc != 0) { > + *field = NULL; > + return 0; > + } > + } > + return rc; > } > > int > -tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, > +tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple, > const uint32_t *field_map, const char *path, > uint32_t path_len, uint32_t path_hash, > const char **field) > { > assert(path_len > 0); > uint32_t fieldno; > + if (format->path_hash != NULL) { > + /* > + * The path hash for format->path_hash hashtable > + * may may be different from path_hash specified > + * as function argument. > + */ > + struct mh_strnptr_node_t *ht_record = > + json_path_hash_get(format->path_hash, path, path_len, > + mh_strn_hash(path, path_len)); > + if (ht_record != NULL) { > + struct tuple_field *leaf = ht_record->val; > + assert(leaf != NULL); > + int32_t offset_slot = leaf->offset_slot; > + assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); > + if (field_map[offset_slot] != 0) > + *field = tuple + field_map[offset_slot]; > + else > + *field = NULL; > + return 0; > + } 24. Untested. 186: 1171: struct mh_strnptr_node_t *ht_record = 372: 1172: json_path_hash_get(format->path_hash, path, path_len, 186: 1173: mh_strn_hash(path, path_len)); 186: 1174: if (ht_record != NULL) { #####: 1175: struct tuple_field *leaf = ht_record->val; #####: 1176: assert(leaf != NULL); #####: 1177: int32_t offset_slot = leaf->offset_slot; #####: 1178: assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); #####: 1179: if (field_map[offset_slot] != 0) #####: 1180: *field = tuple + field_map[offset_slot]; -: 1181: else #####: 1182: *field = NULL; #####: 1183: return 0; -: 1184: } 186: 1185: } > + } > /* > * It is possible, that a field has a name as > * well-formatted JSON. For example 'a.b.c.d' or '[1]' can > diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h > index ecbf64c..af81a58 100644 > --- a/src/box/tuple_format.h > +++ b/src/box/tuple_format.h > @@ -419,11 +439,49 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, > * @retval -1 Error in JSON path. > */ > int > -tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, > +tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple, > const uint32_t *field_map, const char *path, > uint32_t path_len, uint32_t path_hash, > const char **field); > > +/** > + * Retrieve document data @field with initialized @parser. > + * @param parser JSON parser. > + * @param[in, out] field Tuple field to lookup. > + * @retval 0 On success. > + * @retval > 0 On error in path been used to initialize @parser. > + */ > +int > +tuple_field_dig_with_parser(struct json_path_parser *parser, > + const char **field); > + > +/** > + * Get @hashtable record by key @path, @path_len. > + * @param hashtable Storage to lookup. > + * @param path Path string. > + * @param path_len Length of @path. 25. path_hash parameter missed. > + * @retval NULL On nothing found. > + * @retval not NULL Leaf field pointer for registered path. > + */ > +struct mh_strnptr_node_t * > +json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path, > + uint32_t path_len, uint32_t path_hash); > + > +/** > + * Observe JSON path tree in @field comparing with @tuple > + * structure. Initialize field map if specified. > + * @param field Field to use on initialization. > + * @param idx Root field index to emmit correct error. > + * @param tuple Source raw data. 26. offset parameter missed. > + * @param field_map Field map to initialize (optional). > + * @retval 0 On success. > + * @retval -1 On error. > + */ > +int > +tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, > + const char *tuple, const char **offset, > + uint32_t *field_map); > + > #if defined(__cplusplus) > } /* extern "C" */ > diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c > index 37da282..6b452cd 100644 > --- a/src/box/vy_stmt.c > +++ b/src/box/vy_stmt.c > @@ -330,6 +331,67 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert) > return replace; > } > > +static void > +vy_stmt_msgpack_build(struct tuple_field *field, char *tuple, > + uint32_t *field_map, char **offset, bool write_data) > +{ > + if (field->type == FIELD_TYPE_ARRAY) { > + if (write_data) > + *offset = mp_encode_array(*offset, field->array_size); > + else > + *offset += mp_sizeof_array(field->array_size); > + for (uint32_t i = 0; i < field->array_size; i++) { > + if (field->array[i] == NULL) { > + if (write_data) > + *offset = mp_encode_nil(*offset); > + else > + *offset += mp_sizeof_nil(); > + continue; > + } > + vy_stmt_msgpack_build(field->array[i], tuple, field_map, > + offset, write_data); > + } > + return; > + } else if (field->type == FIELD_TYPE_MAP) { > + if (write_data) > + *offset = mp_encode_map(*offset, mh_size(field->map)); > + else > + *offset += mp_sizeof_map(mh_size(field->map)); > + mh_int_t i; > + mh_foreach(field->map, i) { > + struct mh_strnptr_node_t *node = > + mh_strnptr_node(field->map, i); > + assert(node); > + if (write_data) { > + *offset = mp_encode_str(*offset, node->str, > + node->len); > + } else { > + *offset += mp_sizeof_str(node->len); > + } > + vy_stmt_msgpack_build(node->val, tuple, field_map, > + offset, write_data); > + } > + return; > + } > + > + struct iovec *iov = field->arg; > + if (iov == NULL) { > + if (write_data) > + *offset = mp_encode_nil(*offset); > + else > + *offset += mp_sizeof_nil(); > + } else { > + if (write_data) { > + uint32_t data_offset = *offset - tuple; > + memcpy(*offset, iov->iov_base, iov->iov_len); > + field->arg = NULL; > + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) > + field_map[field->offset_slot] = data_offset; > + } > + *offset += iov->iov_len; > + } > +} 27. I still think that your walker should be generic and it is not ok that you reimplement it when a task is slightly extended, but it is up to Vova. 28. Please, remove tuple_field->arg field. It is extra ugly. Find a better solution. Why not just make it the builder's function argument? > + > static struct tuple * > vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, > const struct key_def *cmp_def, ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v3 3/4] box: introduce JSON indexes 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-09-03 10:35 ` Vladislav Shpilevoy 2018-09-06 12:46 ` Kirill Shcherbatov 1 sibling, 0 replies; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-09-03 10:35 UTC (permalink / raw) To: tarantool-patches, Kirill Shcherbatov Forgot to send diff: commit deedcb31040cfbe5b93393e54ec55f3cde5631b8 Author: Vladislav Shpilevoy <v.shpilevoy@tarantool.org> Date: Sat Sep 1 13:18:52 2018 -0300 Review fixes diff --git a/src/box/key_def.c b/src/box/key_def.c index cf1169ea0..ea7380b39 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -544,8 +544,6 @@ static int key_def_normalize_json_path(struct region *region, struct key_part_def *part, char **path_extra, uint32_t *path_extra_size) { - const char *err_msg = NULL; - uint32_t allocated_size = *path_extra_size; char *path = *path_extra; @@ -589,7 +587,8 @@ key_def_normalize_json_path(struct region *region, struct key_part_def *part, uint32_t lexemes = 0; do { if (node.type == JSON_PATH_NUM) { - path += sprintf(path, "[%u]", (uint32_t) node.num); + path += sprintf(path, "[%llu]", + (unsigned long long) node.num); } else if (node.type == JSON_PATH_STR) { path += sprintf(path, "[\"%.*s\"]", node.len, node.str); } else { @@ -622,10 +621,11 @@ key_def_normalize_json_path(struct region *region, struct key_part_def *part, } return 0; -error_invalid_json: - err_msg = tt_sprintf("invalid JSON path '%.*s': path has invalid " - "structure (error at position %d)", parser.src_len, - parser.src, rc); +error_invalid_json: ; + const char *err_msg = + tt_sprintf("invalid JSON path '%.*s': path has invalid "\ + "structure (error at position %d)", parser.src_len, + parser.src, rc); diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, part->fieldno + TUPLE_INDEX_BASE, err_msg); return -1; @@ -704,19 +704,20 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count, return 0; } -const struct key_part * -key_def_find(const struct key_def *key_def, uint32_t fieldno, const char *path, - uint32_t path_len) +bool +key_def_contains_part(const struct key_def *key_def, + const struct key_part *to_find) { const struct key_part *part = key_def->parts; const struct key_part *end = part + key_def->part_count; for (; part != end; part++) { - if (part->fieldno == fieldno && part->path_len == path_len && - (part->path == NULL || - memcmp(part->path, path, path_len) == 0)) - return part; + if (part->fieldno == to_find->fieldno && + part->path_len == to_find->path_len && + (part->path == NULL || memcmp(part->path, to_find->path, + to_find->path_len) == 0)) + return true; } - return NULL; + return false; } bool @@ -725,8 +726,7 @@ key_def_contains(const struct key_def *first, const struct key_def *second) const struct key_part *part = second->parts; const struct key_part *end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno, part->path, - part->path_len) == NULL) + if (! key_def_contains_part(first, part)) return false; } return true; @@ -750,8 +750,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second) part = second->parts; end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno, part->path, - part->path_len) != NULL) + if (key_def_contains_part(first, part)) --new_part_count; else if (part->path != NULL) sz += part->path_len + 1; @@ -788,8 +787,7 @@ key_def_merge(const struct key_def *first, const struct key_def *second) part = second->parts; end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno, part->path, - part->path_len) != NULL) + if (key_def_contains_part(first, part)) continue; if (part->path != NULL) { new_def->parts[pos].path = data; diff --git a/src/box/key_def.h b/src/box/key_def.h index 693dd5dd0..3387cb6be 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -339,13 +339,10 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count, const char **data, const struct field_def *fields, uint32_t field_count); -/** - * Returns the part in index_def->parts for the specified fieldno. - * If fieldno is not in index_def->parts returns NULL. - */ -const struct key_part * -key_def_find(const struct key_def *key_def, uint32_t fieldno, const char *path, - uint32_t path_len); +/** Check if @a key_def contains @a to_find part. */ +bool +key_def_contains_part(const struct key_def *key_def, + const struct key_part *to_find); /** * Check if key definition @a first contains all parts of diff --git a/src/box/tuple.c b/src/box/tuple.c index 70eda36b3..83bdad1a3 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -168,12 +168,13 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) field->is_nullable)) return -1; /* Check all JSON paths. */ - if (field->childs != NULL && - tuple_field_bypass_and_init(field, i, tuple, &pos, - NULL) != 0) - return -1; - if (field->childs == NULL) + if (field->childs != NULL) { + if (tuple_field_bypass_and_init(field, i, tuple, &pos, + NULL) != 0) + return -1; + } else { mp_next(&pos); + } } return 0; } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 243c81f2c..51624d243 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -295,7 +295,6 @@ tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, uint32_t *field_map) { assert(offset != NULL); - int rc = 0; const char *mp_data = *offset; const char *valid_type_str = NULL; const char *err = NULL; @@ -317,9 +316,9 @@ tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, const char *raw = *offset; uint32_t map_idx = 0; - rc = tuple_field_go_to_key(&raw, ht_record->str, - (int)ht_record->len, - &map_idx); + int rc = tuple_field_go_to_key(&raw, ht_record->str, + (int)ht_record->len, + &map_idx); if (rc != 0 && !leaf->is_nullable) { err = tt_sprintf("map doesn't contain key " "'%.*s' defined in index", @@ -392,10 +391,8 @@ tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, } assert(offset != NULL); if (field_map != NULL && - field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { - field_map[field->offset_slot] = - (uint32_t) (*offset - tuple); - } + field->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[field->offset_slot] = (uint32_t) (*offset - tuple); mp_next(offset); return 0; @@ -404,15 +401,13 @@ error_type_mistmatch: mp_type_strs[type], valid_type_str); error_invalid_document: assert(err != NULL); - do { - char *data_buff = tt_static_buf(); - mp_snprint(data_buff, TT_STATIC_BUF_LEN, mp_data); - const char *err_msg = - tt_sprintf("invalid field %d document content '%s': %s", - idx + TUPLE_INDEX_BASE, data_buff, err); - diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg); - return -1; - } while (0); + char *data_buff = tt_static_buf(); + mp_snprint(data_buff, TT_STATIC_BUF_LEN, mp_data); + const char *err_msg = + tt_sprintf("invalid field %d document content '%s': %s", + idx + TUPLE_INDEX_BASE, data_buff, err); + diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg); + return -1; } /** @@ -460,7 +455,7 @@ tuple_format_add_json_path(struct tuple_format *format, const char *path, path_len, path, field_type_strs[field->type]); diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, - node.num, err); + node.num, err); return -1; } if (field->is_nullable != is_nullable) { @@ -568,7 +563,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, if (part->path != NULL) { field->is_key_part = true; - assert(is_sequential == false); + assert(! is_sequential); struct tuple_field *leaf = NULL; if (tuple_format_add_json_path(format, part->path, @@ -593,12 +588,10 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, * used in tuple_format. */ if (field_type1_contains_type2(field->type, - part->type) && - part->path == NULL) { + part->type)) { field->type = part->type; } else if (! field_type1_contains_type2(part->type, - field->type) && - part->path == NULL) { + field->type)) { const char *name; int fieldno = part->fieldno + TUPLE_INDEX_BASE; if (part->fieldno >= field_count) { @@ -626,10 +619,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, * so we don't store an offset for it. */ if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && - is_sequential == false && part->fieldno > 0) { - + !is_sequential && part->fieldno > 0) field->offset_slot = --current_slot; - } } } @@ -708,10 +699,8 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, for (; part < pend; part++) { if (part->path != NULL && json_path_hash_insert(path_hash, part->path, - part->path_len, NULL) != 0) { - json_path_hash_delete(path_hash); - return NULL; - } + part->path_len, NULL) != 0) + goto error; index_field_count = MAX(index_field_count, part->fieldno + 1); } @@ -742,14 +731,14 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, if (format == NULL) { diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc", "tuple format"); - return NULL; + goto error; } if (dict == NULL) { assert(space_field_count == 0); format->dict = tuple_dictionary_new(NULL, 0); if (format->dict == NULL) { free(format); - return NULL; + goto error; } } else { format->dict = dict; @@ -768,6 +757,9 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->min_field_count = 0; format->path_hash = path_hash; return format; +error: + json_path_hash_delete(path_hash); + return NULL; } /** Free tuple format resources, doesn't unregister. */ @@ -945,10 +937,6 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, return -1; } - /* - * First field is simply accessible, store offset to it - * only for JSON path. - */ uint32_t i = 0; enum mp_type mp_type; const struct tuple_field *field = &format->fields[0]; @@ -961,6 +949,10 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, format->field_map_size); } if (field->childs == NULL) { + /* + * First field is simply accessible, do not store + * offset to it. + */ mp_type = mp_typeof(*pos); if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, TUPLE_INDEX_BASE, field->is_nullable)) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 3/4] box: introduce JSON indexes 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy 2018-09-03 10:35 ` Vladislav Shpilevoy @ 2018-09-06 12:46 ` Kirill Shcherbatov 1 sibling, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-09-06 12:46 UTC (permalink / raw) To: tarantool-patches, Vladislav Shpilevoy, Vladimir Davydov Hi! Thank you for review and fixes. > 1. On your branch vinyl/errinj and vinyl/info are failing. > I do not know is it because you did not cherry-pick > Vladimir's patch or because of a bug in your patch, but it > should not fail. > > This is what I get sometimes on engine/null test: I can't reproduce this problem. And I have definitely never seen this failure starting this specific test many times independently. > 2. Your question: > >> I've checked this code reachable with assert. But I don't now how handle it manually.. >> Please, help. > > Answer: you should write a test, that extends a path in such > a way that newly indexed field values change order of index > sorting. Select from the index should return the new order. > Before you fixed this thing it returned an old order. Test it, > please. s = box.schema.space.create('withdata', {engine = engine}) pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned', path = '[1]'}}}) assert(pk_simplified.path == box.NULL) idx = s:create_index('idx', {parts = {{2, 'integer', path = '[2].a'}}}) s:insert{31, {a = 1, aa = -1}} s:insert{22, {a = 2, aa = -2}} s:insert{13, {a = 3, aa = -3}} idx:select() idx:alter({parts = {{2, 'integer', path = '[2].aa'}}}) idx:select() s:drop() > 3. Why does path != NULL guarantee that a def has no optional parts? > On the contrary, it means that a part is optional if it is not flat > and is nullable. > > What is funny, even now in all has_optional_parts usage places you > check both for is_flat and has_optional_parts. Yes, you are right. Can't imagine, why did I implement it. > 4. You do not need announcement of this variable. > (fixed by me) > 5. As I remember, I asked you to use gcov to see > which code is untested and test it. But looks like > you did not, but I did and here a result is: > > 638: 602: if (lexemes == 1) { > -: 603: /* JSON index is useless. */ > #####: 604: path = part->path; > #####: 605: part->path = NULL; > #####: 606: } else { Introduced a test: pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned', path = '[1]'}}}) assert(pk_simplified.path == box.NULL) > 6. Here and above you should set epoch and slot offset. new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch; new_def->parts[pos].offset_slot = part->offset_slot; > 7. Comparators matrix is useless unless you use it for all > tuple_compare_slowpath usage places. > > The same for tuple_compare_with_key. Ok. > 8. Typo. Fixed. > 9. Never tested: > > 167: 73: } else { > -: 74: /* Skip key. */ > #####: 75: mp_next(field); > -: 76: } > >> + } >> + /* Skip value. */ >> + mp_next(field); >> + } >> + return -1; >> +} This code isn't mine, but I've modified test to trigger it: s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}} > 10. The same.> > 668: 180: type = FIELD_TYPE_ARRAY; > 1020: 181: if (field->type != FIELD_TYPE_ANY && field->type != type) > #####: 182: goto error_type_mistmatch; > -: 183: /* Create or resize field array if required. */ > Done. > 11. > > 132: 346: } else if (field->type == FIELD_TYPE_ARRAY) { > 458: 347: if (type != MP_ARRAY) { > 193: 348: valid_type_str = mp_type_strs[MP_ARRAY]; > #####: 349: goto error_type_mistmatch; > #####: 350: } > -: 351: uint32_t count = mp_decode_array(offset); Done. > 12. > > 34: 365: } > -: 366: err = tt_sprintf("array size %d is less than item %d " > #####: 367: "defined in index", i, > #####: 368: field->array_size); > #####: 369: goto error_invalid_document; > #####: 370: } Done. > 13. Auxiliary 'childs' makes no sense if you still need to > init array_size. Please, come up with a better solution. > Inlined memset from offsetof is not good as well. Or it > could be wrapped into functions which test for childs, > nullify them. > 14. Out of 80. Non actual already. > 15. Why do you check for path !=/== NULL in the code below if > you already know it is == NULL? The only point of moving non-flat > part processing there was to remove those checks. > > (fixed by me) > 16. I see that you init field->is_nullable before checking > for path != NULL, but it is wrong. If a field is not flat, > then its first level nullability could be != a leaf > nullability. Now it is impossible due to lack of non-flat > format, but it will be. Besides, I think you should > move all this cycle body into a function like > tuple_format_add_key_part or something. Moved to tuple_format_add_key_part. > 17. For flat fields different nullability is allowed > and the strictest is chosen. Why do you forbid it for > non-flat fields (in tuple_format_add_json_path)? > The same for compatible types. Below they are allowed, > but for non-flat fields are not. Done. > 18. Two leaks of path_hash below where 'return NULL' is. > (fixed by me) > 19. The comment now is not linked with the code below. > (fixed by me) > 20. This code is different on the branch and in the email. Here it is: > >> if (likely(part->offset_slot_epoch == format->epoch && >> format->epoch != 0)) { >> offset_slot = part->offset_slot > And please explain how format epoch can be 0 ever? A format epoch is > initialized by 0 but then it is always either reset to an old epoch or > old epoch + 1. When a space has no indexes, it could be 0, but each > new index alters the space and updates its epoch. So when a format > epoch is 0, it has no tuples. I've changed epoch bump mechanism(as a part of first commit) and not it is really useless. > 21. Why do not you update epoch and slot in such a case? > 22. This check is not needed for the best case when epochs are > equal in format and key part. Please, move it to the only place > where it is needed above. Done. > 23. By the way, why do you store offset_slot_epoch in key_part > instead of in a parent key_def? As I understand, it is the same > for all parts of a key_def. Because caches are per part, tuple_field_by_part_raw is used with parts. I don't like to manually indirectly resolve all key_parts to bump key_part epoch. > 24. Untested. > > 186: 1171: struct mh_strnptr_node_t *ht_record = > 372: 1172: json_path_hash_get(format->path_hash, path, path_len, > 186: 1173: mh_strn_hash(path, path_len)); > 186: 1174: if (ht_record != NULL) { > #####: 1175: struct tuple_field *leaf = ht_record->val; > #####: 1176: assert(leaf != NULL); > #####: 1177: int32_t offset_slot = leaf->offset_slot; > #####: 1178: assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); > #####: 1179: if (field_map[offset_slot] != 0) > #####: 1180: *field = tuple + field_map[offset_slot]; > -: 1181: else > #####: 1182: *field = NULL; > #####: 1183: return 0; > -: 1184: } > 186: 1185: } t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} -- Test field_map in tuple speed-up access by indexed path. t["[3][\"FIO\"][\"fname\"]"] > 25. path_hash parameter missed. > 26. offset parameter missed. FIxed. > 27. I still think that your walker should be generic and it is not > ok that you reimplement it when a task is slightly extended, but it > is up to Vova. > 28. Please, remove tuple_field->arg field. It is extra ugly. Find a > better solution. Why not just make it the builder's function argument? After productive discussion with Vova I've understood that this won't work. Vova likes my hack, but it is not thread-safe and it is critical here. Now I have to construct temporal hash table with map field -> iov that is pushed to function as an argument. =================================================== From cb48f2f6975725e24513660b371ec0d7cf519ec5 Mon Sep 17 00:00:00 2001 Message-Id: <cb48f2f6975725e24513660b371ec0d7cf519ec5.1536237903.git.kshcherbatov@tarantool.org> In-Reply-To: <cover.1536237903.git.kshcherbatov@tarantool.org> References: <cover.1536237903.git.kshcherbatov@tarantool.org> From: Kirill Shcherbatov <kshcherbatov@tarantool.org> Date: Tue, 31 Jul 2018 18:20:15 +0300 Subject: [PATCH 3/4] box: introduce JSON indexes As we need to store user-defined JSON path in key_part and key_part_def, we have introduced path and path_len fields. JSON path is verified and transformed to canonical form on index msgpack unpack. Path string stored as a part of the key_def allocation: +-------+---------+-------+---------+-------+-------+-------+ |key_def|key_part1| ... |key_partN| path1 | pathK | pathN | +-------+---------+-------+---------+-------+-------+-------+ | ^ |-> path _________________| Because of field names specified as format could be changed key_part path persisted in Tarantool should be always started with first-level field access via array index(not by name). To work with JSON-defined indexes we use format JSON path hashtable data_path and a tree of intermediate path fields attached to format root fields. <Hashtable>: format->data_path [2].FIO.fname -> field "fname" {type=str, off_slot=-1} [2].FIO.sname -> field "sname" {type=str, off_slot=-2} <Tree>: format->field[2] {type = map} | FIO {type = map} | "fname" | "sname" {type=str,off_slot=-1} ____|____ {type = str,off_slot=-2} Leaf fields used in Index have initialized offset_slot. On new tuple creation we observe fields tree and use leaf records to init tuple field_map. At the same time we use data_path hashtable on tuple data access by index(when cached offset_slot is invalid). All paths stored at the end of format allocation: JSON-tree fields same as format->path_hash point to them. +------------+------------+-------+------------+-------+ |tuple_format|tuple_field1| ... |tuple_fieldN| pathK | +------------+------------+-------+------------+-------+ New routine tuple_format_add_json_path is used to construct all internal structures for JSON path on new format creation and duplicating. Paths are deduplicated in format allocation. Part of #1012. --- src/box/errcode.h | 2 +- src/box/index_def.c | 10 +- src/box/key_def.c | 300 +++++++++++++-- src/box/key_def.h | 38 +- src/box/lua/space.cc | 5 + src/box/memtx_engine.c | 5 + src/box/schema.cc | 12 +- src/box/tuple.c | 12 +- src/box/tuple_compare.cc | 59 ++- src/box/tuple_extract_key.cc | 112 ++++-- src/box/tuple_format.c | 878 ++++++++++++++++++++++++++++++++++++------- src/box/tuple_format.h | 63 +++- src/box/tuple_hash.cc | 18 +- src/box/vinyl.c | 5 + src/box/vy_log.c | 3 +- src/box/vy_lsm.c | 44 +++ src/box/vy_point_lookup.c | 2 - src/box/vy_stmt.c | 149 ++++++-- test/box/misc.result | 57 +-- test/engine/tuple.result | 387 +++++++++++++++++++ test/engine/tuple.test.lua | 109 ++++++ test/vinyl/info.result | 2 +- 22 files changed, 1963 insertions(+), 309 deletions(-) diff --git a/src/box/errcode.h b/src/box/errcode.h index 4115e6b..464f413 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -107,7 +107,7 @@ struct errcode_record { /* 52 */_(ER_FUNCTION_EXISTS, "Function '%s' already exists") \ /* 53 */_(ER_BEFORE_REPLACE_RET, "Invalid return value of space:before_replace trigger: expected tuple or nil, got %s") \ /* 54 */_(ER_FUNCTION_MAX, "A limit on the total number of functions has been reached: %u") \ - /* 55 */_(ER_UNUSED4, "") \ + /* 55 */_(ER_DATA_STRUCTURE_MISMATCH, "Tuple doesn't math document structure: %s") \ /* 56 */_(ER_USER_MAX, "A limit on the total number of users has been reached: %u") \ /* 57 */_(ER_NO_SUCH_ENGINE, "Space engine '%s' does not exist") \ /* 58 */_(ER_RELOAD_CFG, "Can't set option '%s' dynamically") \ diff --git a/src/box/index_def.c b/src/box/index_def.c index 9cda63c..f67b952 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -209,8 +209,14 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) * Courtesy to a user who could have made * a typo. */ - if (index_def->key_def->parts[i].fieldno == - index_def->key_def->parts[j].fieldno) { + struct key_part *part_a = &index_def->key_def->parts[i]; + struct key_part *part_b = &index_def->key_def->parts[j]; + if ((part_a->fieldno == part_b->fieldno && + part_a->path == NULL && part_b->path == NULL) || + (part_a->path_len != 0 && + part_a->path_len == part_b->path_len && + memcmp(part_a->path, part_b->path, + part_a->path_len) == 0)) { diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, space_name, "same key part is indexed twice"); diff --git a/src/box/key_def.c b/src/box/key_def.c index 2ef78c1..216d858 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -35,12 +35,16 @@ #include "column_mask.h" #include "schema_def.h" #include "coll_id_cache.h" +#include "fiber.h" +#include "assoc.h" +#include "json/path.h" static const struct key_part_def key_part_def_default = { 0, field_type_MAX, COLL_NONE, false, + NULL }; static int64_t @@ -53,6 +57,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len) #define PART_OPT_FIELD "field" #define PART_OPT_COLLATION "collation" #define PART_OPT_NULLABILITY "is_nullable" +#define PART_OPT_PATH "path" const struct opt_def part_def_reg[] = { OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type, @@ -61,6 +66,7 @@ const struct opt_def part_def_reg[] = { OPT_DEF(PART_OPT_COLLATION, OPT_UINT32, struct key_part_def, coll_id), OPT_DEF(PART_OPT_NULLABILITY, OPT_BOOL, struct key_part_def, is_nullable), + OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path), OPT_END, }; @@ -96,13 +102,24 @@ const uint32_t key_mp_type[] = { struct key_def * key_def_dup(const struct key_def *src) { - size_t sz = key_def_sizeof(src->part_count); + const struct key_part *parts = src->parts; + const struct key_part *parts_end = parts + src->part_count; + size_t sz = 0; + for (; parts < parts_end; parts++) + sz += parts->path != NULL ? parts->path_len + 1 : 0; + sz = key_def_sizeof(src->part_count, sz); struct key_def *res = (struct key_def *)malloc(sz); if (res == NULL) { diag_set(OutOfMemory, sz, "malloc", "res"); return NULL; } memcpy(res, src, sz); + for (uint32_t i = 0; i < src->part_count; i++) { + if (src->parts[i].path == NULL) + continue; + size_t path_offset = src->parts[i].path - (char *)src; + res->parts[i].path = (char *)res + path_offset; + } return res; } @@ -110,8 +127,17 @@ void key_def_swap(struct key_def *old_def, struct key_def *new_def) { assert(old_def->part_count == new_def->part_count); - for (uint32_t i = 0; i < new_def->part_count; i++) - SWAP(old_def->parts[i], new_def->parts[i]); + for (uint32_t i = 0; i < new_def->part_count; i++) { + if (old_def->parts[i].path == NULL) { + SWAP(old_def->parts[i], new_def->parts[i]); + } else { + size_t path_offset = + old_def->parts[i].path - (char *)old_def; + SWAP(old_def->parts[i], new_def->parts[i]); + old_def->parts[i].path = (char *)old_def + path_offset; + new_def->parts[i].path = (char *)new_def + path_offset; + } + } SWAP(*old_def, *new_def); } @@ -131,9 +157,9 @@ key_def_set_cmp(struct key_def *def) } struct key_def * -key_def_new(uint32_t part_count) +key_def_new(uint32_t part_count, size_t paths_size) { - size_t sz = key_def_sizeof(part_count); + size_t sz = key_def_sizeof(part_count, paths_size); /** Use calloc() to zero comparator function pointers. */ struct key_def *key_def = (struct key_def *) calloc(1, sz); if (key_def == NULL) { @@ -148,10 +174,13 @@ key_def_new(uint32_t part_count) struct key_def * key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count) { - struct key_def *def = key_def_new(part_count); + size_t sz = 0; + for (uint32_t i = 0; i < part_count; i++) + sz += parts[i].path != NULL ? strlen(parts[i].path) + 1 : 0; + struct key_def *def = key_def_new(part_count, sz); if (def == NULL) return NULL; - + char *data = (char *)def + key_def_sizeof(part_count, 0); for (uint32_t i = 0; i < part_count; i++) { struct key_part_def *part = &parts[i]; struct coll *coll = NULL; @@ -165,14 +194,22 @@ key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count) } coll = coll_id->coll; } + uint32_t path_len = 0; + if (part->path != NULL) { + path_len = strlen(part->path); + def->parts[i].path = data; + data += path_len + 1; + } key_def_set_part(def, i, part->fieldno, part->type, - part->is_nullable, coll, part->coll_id); + part->is_nullable, coll, part->coll_id, + part->path, path_len); } return def; } -void -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts) +int +key_def_dump_parts(struct region *pool, const struct key_def *def, + struct key_part_def *parts) { for (uint32_t i = 0; i < def->part_count; i++) { const struct key_part *part = &def->parts[i]; @@ -181,13 +218,26 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts) part_def->type = part->type; part_def->is_nullable = part->is_nullable; part_def->coll_id = part->coll_id; + if (part->path != NULL) { + part_def->path = region_alloc(pool, part->path_len + 1); + if (part_def->path == NULL) { + diag_set(OutOfMemory, part->path_len + 1, + "region_alloc", "part_def->path"); + return -1; + } + memcpy(part_def->path, part->path, part->path_len); + part_def->path[part->path_len] = '\0'; + } else { + part_def->path = NULL; + } } + return 0; } box_key_def_t * box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) { - struct key_def *key_def = key_def_new(part_count); + struct key_def *key_def = key_def_new(part_count, 0); if (key_def == NULL) return key_def; @@ -195,7 +245,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) key_def_set_part(key_def, item, fields[item], (enum field_type)types[item], key_part_def_default.is_nullable, NULL, - COLL_NONE); + COLL_NONE, NULL, 0); } return key_def; } @@ -241,6 +291,13 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, if (part1->is_nullable != part2->is_nullable) return part1->is_nullable < part2->is_nullable ? -1 : 1; + /* Lexicographic strings order. */ + if (part1->path_len != part2->path_len) + return part1->path_len - part2->path_len; + int rc = 0; + if ((rc = memcmp(part1->path, part2->path, + part1->path_len)) != 0) + return rc; } return part_count1 < part_count2 ? -1 : part_count1 > part_count2; } @@ -248,11 +305,12 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, void key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, bool is_nullable, struct coll *coll, - uint32_t coll_id) + uint32_t coll_id, const char *path, uint32_t path_len) { assert(part_no < def->part_count); assert(type < field_type_MAX); def->is_nullable |= is_nullable; + def->has_json_paths |= path != NULL; def->parts[part_no].is_nullable = is_nullable; def->parts[part_no].fieldno = fieldno; def->parts[part_no].type = type; @@ -260,6 +318,17 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, def->parts[part_no].coll_id = coll_id; def->parts[part_no].offset_slot = TUPLE_OFFSET_SLOT_NIL; def->parts[part_no].offset_slot_epoch = 0; + if (path != NULL) { + def->parts[part_no].path_len = path_len; + assert(def->parts[part_no].path != NULL); + memcpy(def->parts[part_no].path, path, path_len); + def->parts[part_no].path[path_len] = '\0'; + def->parts[part_no].path_hash = mh_strn_hash(path, path_len); + } else { + def->parts[part_no].path_len = 0; + def->parts[part_no].path = NULL; + def->parts[part_no].path_hash = 0; + } column_mask_set_fieldno(&def->column_mask, fieldno); /** * When all parts are set, initialize the tuple @@ -284,13 +353,13 @@ key_def_update_optionality(struct key_def *def, uint32_t min_field_count) for (uint32_t i = 0; i < def->part_count; ++i) { struct key_part *part = &def->parts[i]; def->has_optional_parts |= part->is_nullable && - min_field_count < part->fieldno + 1; + (min_field_count < + part->fieldno + 1); /* * One optional part is enough to switch to new * comparators. */ - if (def->has_optional_parts) - break; + if (def->has_optional_parts) break; } key_def_set_cmp(def); } @@ -304,8 +373,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts, for (uint32_t i = 0; i < part_count; i++) { const struct key_part_def *part = &parts[i]; assert(part->type < field_type_MAX); - SNPRINT(total, snprintf, buf, size, "%d, '%s'", - (int)part->fieldno, field_type_strs[part->type]); + if (part->path != NULL) { + SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'", + (int) part->fieldno, part->path, + field_type_strs[part->type]); + } else { + SNPRINT(total, snprintf, buf, size, "%d, '%s'", + (int) part->fieldno, + field_type_strs[part->type]); + } if (i < part_count - 1) SNPRINT(total, snprintf, buf, size, ", "); } @@ -324,6 +400,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count) count++; if (part->is_nullable) count++; + if (part->path != NULL) + count++; size += mp_sizeof_map(count); size += mp_sizeof_str(strlen(PART_OPT_FIELD)); size += mp_sizeof_uint(part->fieldno); @@ -338,6 +416,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count) size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY)); size += mp_sizeof_bool(part->is_nullable); } + if (part->path != NULL) { + size += mp_sizeof_str(strlen(PART_OPT_PATH)); + size += mp_sizeof_str(strlen(part->path)); + } } return size; } @@ -351,6 +433,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, int count = 2; if (part->coll_id != COLL_NONE) count++; + if (part->path != NULL) + count++; if (part->is_nullable) count++; data = mp_encode_map(data, count); @@ -372,6 +456,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, strlen(PART_OPT_NULLABILITY)); data = mp_encode_bool(data, part->is_nullable); } + if (part->path != NULL) { + data = mp_encode_str(data, PART_OPT_PATH, + strlen(PART_OPT_PATH)); + data = mp_encode_str(data, part->path, + strlen(part->path)); + } } return data; } @@ -432,8 +522,111 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count, fields[part->fieldno].is_nullable : key_part_def_default.is_nullable); part->coll_id = COLL_NONE; + part->path = NULL; + } + return 0; +} + +/** + * Verify key_part JSON path and convert to canonical form. + * + * @param region Region to make allocations. + * @param part Part with path to update. + * @param path_extra Extra allocated space to reuse if possible. + * @param path_extra_size The @path_extra size. + * + * @retval -1 on error. + * @retval 0 on success. + */ +static int +key_def_normalize_json_path(struct region *region, struct key_part_def *part, + char **path_extra, uint32_t *path_extra_size) +{ + uint32_t allocated_size = *path_extra_size; + char *path = *path_extra; + + uint32_t path_len = strlen(part->path); + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, part->path, path_len); + /* + * A worst-case scenario is .a -> ["a"] + * i.e. 2.5 * path_len + 1 is enough. + */ + uint32_t new_path_size = 2.5 * path_len + 1; + if (new_path_size >= allocated_size) { + path = region_alloc(region, new_path_size); + if (path == NULL) { + diag_set(OutOfMemory, new_path_size, + "region_alloc", "path"); + return -1; + } + allocated_size = new_path_size; + } + assert(path != NULL); + part->path = path; + int rc = json_path_next(&parser, &node); + if (rc != 0) + goto error_invalid_json; + if (node.type != JSON_PATH_NUM) { + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno, + "invalid JSON path: first part should " + "be defined as array index"); + return -1; + } + if (node.num - TUPLE_INDEX_BASE != part->fieldno) { + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno, + "invalid JSON path: first part refers " + "to invalid field"); + return -1; + } + uint32_t lexemes = 0; + do { + if (node.type == JSON_PATH_NUM) { + path += sprintf(path, "[%llu]", + (unsigned long long) node.num); + } else if (node.type == JSON_PATH_STR) { + path += sprintf(path, "[\"%.*s\"]", node.len, node.str); + } else { + unreachable(); + } + lexemes++; + } while ((rc = json_path_next(&parser, &node)) == 0 && + node.type != JSON_PATH_END); + if (rc != 0 || node.type != JSON_PATH_END) + goto error_invalid_json; + if (lexemes == 1) { + /* JSON index is useless. */ + path = part->path; + part->path = NULL; + } else { + /* Skip terminating zero. */ + path++; + /* Account constructed string size. */ + allocated_size -= path - part->path; + } + /* Going to try to reuse extra allocation next time. */ + if (allocated_size > (uint32_t)parser.src_len) { + /* Use rest of new buffer next time. */ + *path_extra = path; + *path_extra_size = allocated_size; + } else { + /* Reuse old path buffer. */ + *path_extra = (char *)parser.src; + *path_extra_size = parser.src_len; } return 0; + +error_invalid_json: ; + const char *err_msg = + tt_sprintf("invalid JSON path '%.*s': path has invalid "\ + "structure (error at position %d)", parser.src_len, + parser.src, rc); + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno + TUPLE_INDEX_BASE, err_msg); + return -1; } int @@ -445,8 +638,11 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, return key_def_decode_parts_166(parts, part_count, data, fields, field_count); } - for (uint32_t i = 0; i < part_count; i++) { - struct key_part_def *part = &parts[i]; + char *path = NULL; + uint32_t allocated_size = 0; + struct key_part_def *part = parts; + struct region *region = &fiber()->gc; + for (uint32_t i = 0; i < part_count; i++, part++) { if (mp_typeof(**data) != MP_MAP) { diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, i + TUPLE_INDEX_BASE, @@ -456,7 +652,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, *part = key_part_def_default; if (opts_decode(part, part_def_reg, data, ER_WRONG_INDEX_OPTIONS, i + TUPLE_INDEX_BASE, - NULL) != 0) + region) != 0) return -1; if (part->type == field_type_MAX) { diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, @@ -473,6 +669,10 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, "string and scalar parts"); return -1; } + if (part->path != NULL && + key_def_normalize_json_path(region, part, &path, + &allocated_size) != 0) + return -1; } return 0; } @@ -497,20 +697,25 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count, fields[part->fieldno].is_nullable : key_part_def_default.is_nullable); part->coll_id = COLL_NONE; + part->path = NULL; } return 0; } -const struct key_part * -key_def_find(const struct key_def *key_def, uint32_t fieldno) +bool +key_def_contains_part(const struct key_def *key_def, + const struct key_part *to_find) { const struct key_part *part = key_def->parts; const struct key_part *end = part + key_def->part_count; for (; part != end; part++) { - if (part->fieldno == fieldno) - return part; + if (part->fieldno == to_find->fieldno && + part->path_len == to_find->path_len && + (part->path == NULL || memcmp(part->path, to_find->path, + to_find->path_len) == 0)) + return true; } - return NULL; + return false; } bool @@ -519,7 +724,7 @@ key_def_contains(const struct key_def *first, const struct key_def *second) const struct key_part *part = second->parts; const struct key_part *end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno) == NULL) + if (! key_def_contains_part(first, part)) return false; } return true; @@ -533,18 +738,25 @@ key_def_merge(const struct key_def *first, const struct key_def *second) * Find and remove part duplicates, i.e. parts counted * twice since they are present in both key defs. */ - const struct key_part *part = second->parts; - const struct key_part *end = part + second->part_count; + size_t sz = 0; + const struct key_part *part = first->parts; + const struct key_part *end = part + first->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno)) + if (part->path != NULL) + sz += part->path_len + 1; + } + part = second->parts; + end = part + second->part_count; + for (; part != end; part++) { + if (key_def_contains_part(first, part)) --new_part_count; + else if (part->path != NULL) + sz += part->path_len + 1; } - - struct key_def *new_def; - new_def = (struct key_def *)calloc(1, key_def_sizeof(new_part_count)); + sz = key_def_sizeof(new_part_count, sz); + struct key_def *new_def = (struct key_def *)calloc(1, sz); if (new_def == NULL) { - diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc", - "new_def"); + diag_set(OutOfMemory, sz, "calloc", "new_def"); return NULL; } new_def->part_count = new_part_count; @@ -552,14 +764,21 @@ key_def_merge(const struct key_def *first, const struct key_def *second) new_def->is_nullable = first->is_nullable || second->is_nullable; new_def->has_optional_parts = first->has_optional_parts || second->has_optional_parts; + /* Path data write position in the new key_def. */ + char *data = (char *)new_def + key_def_sizeof(new_part_count, 0); /* Write position in the new key def. */ uint32_t pos = 0; /* Append first key def's parts to the new index_def. */ part = first->parts; end = part + first->part_count; for (; part != end; part++) { + if (part->path != NULL) { + new_def->parts[pos].path = data; + data += part->path_len + 1; + } key_def_set_part(new_def, pos, part->fieldno, part->type, - part->is_nullable, part->coll, part->coll_id); + part->is_nullable, part->coll, part->coll_id, + part->path, part->path_len); new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch; new_def->parts[pos].offset_slot = part->offset_slot; pos++; @@ -569,10 +788,15 @@ key_def_merge(const struct key_def *first, const struct key_def *second) part = second->parts; end = part + second->part_count; for (; part != end; part++) { - if (key_def_find(first, part->fieldno)) + if (key_def_contains_part(first, part)) continue; + if (part->path != NULL) { + new_def->parts[pos].path = data; + data += part->path_len + 1; + } key_def_set_part(new_def, pos, part->fieldno, part->type, - part->is_nullable, part->coll, part->coll_id); + part->is_nullable, part->coll, part->coll_id, + part->path, part->path_len); new_def->parts[pos].offset_slot_epoch = part->offset_slot_epoch; new_def->parts[pos].offset_slot = part->offset_slot; pos++; diff --git a/src/box/key_def.h b/src/box/key_def.h index 07997b8..30c0b84 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -54,6 +54,8 @@ struct key_part_def { uint32_t coll_id; /** True if a key part can store NULLs. */ bool is_nullable; + /** JSON path to data. */ + char *path; }; /** @@ -85,6 +87,12 @@ struct key_part { uint64_t offset_slot_epoch; /** Cache with format's field offset slot. */ int32_t offset_slot; + /** JSON path to data in canonical form. */ + char *path; + /** JSON path length. */ + uint32_t path_len; + /** JSON path hash. */ + uint32_t path_hash; }; struct key_def; @@ -144,6 +152,10 @@ struct key_def { * fields assumed to be MP_NIL. */ bool has_optional_parts; + /** + * True, if some key part contain JSON path. + */ + bool has_json_paths; /** Key fields mask. @sa column_mask.h for details. */ uint64_t column_mask; /** The size of the 'parts' array. */ @@ -232,16 +244,17 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b, /** \endcond public */ static inline size_t -key_def_sizeof(uint32_t part_count) +key_def_sizeof(uint32_t part_count, size_t paths_size) { - return sizeof(struct key_def) + sizeof(struct key_part) * part_count; + return sizeof(struct key_def) + sizeof(struct key_part) * part_count + + paths_size; } /** * Allocate a new key_def with the given part count. */ struct key_def * -key_def_new(uint32_t part_count); +key_def_new(uint32_t part_count, size_t paths_size); /** * Allocate a new key_def with the given part count @@ -253,8 +266,9 @@ key_def_new_with_parts(struct key_part_def *parts, uint32_t part_count); /** * Dump part definitions of the given key def. */ -void -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts); +int +key_def_dump_parts(struct region *pool, const struct key_def *def, + struct key_part_def *parts); /** * Set a single key part in a key def. @@ -263,7 +277,7 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts); void key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, bool is_nullable, struct coll *coll, - uint32_t coll_id); + uint32_t coll_id, const char *path, uint32_t path_len); /** * Update 'has_optional_parts' of @a key_def with correspondence @@ -325,12 +339,10 @@ key_def_decode_parts_160(struct key_part_def *parts, uint32_t part_count, const char **data, const struct field_def *fields, uint32_t field_count); -/** - * Returns the part in index_def->parts for the specified fieldno. - * If fieldno is not in index_def->parts returns NULL. - */ -const struct key_part * -key_def_find(const struct key_def *key_def, uint32_t fieldno); +/** Check if @a key_def contains @a to_find part. */ +bool +key_def_contains_part(const struct key_def *key_def, + const struct key_part *to_find); /** * Check if key definition @a first contains all parts of @@ -377,6 +389,8 @@ key_validate_parts(const struct key_def *key_def, const char *key, static inline bool key_def_is_sequential(const struct key_def *key_def) { + if (key_def->has_json_paths) + return false; for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) { if (key_def->parts[part_id].fieldno != part_id) return false; diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc index 25b7e36..875e51f 100644 --- a/src/box/lua/space.cc +++ b/src/box/lua/space.cc @@ -295,6 +295,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i) lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE); lua_setfield(L, -2, "fieldno"); + if (part->path != NULL) { + lua_pushstring(L, part->path); + lua_setfield(L, -2, "path"); + } + lua_pushboolean(L, part->is_nullable); lua_setfield(L, -2, "is_nullable"); diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 4b7d377..62c3c9a 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1310,6 +1310,11 @@ memtx_index_def_change_requires_rebuild(struct index *index, return true; if (old_part->coll != new_part->coll) return true; + if (old_part->path_len != new_part->path_len) + return true; + if (memcmp(old_part->path, new_part->path, + old_part->path_len) != 0) + return true; } return false; } diff --git a/src/box/schema.cc b/src/box/schema.cc index e52e19d..cf7cf35 100644 --- a/src/box/schema.cc +++ b/src/box/schema.cc @@ -286,19 +286,19 @@ schema_init() * (and re-created) first. */ /* _schema - key/value space with schema description */ - struct key_def *key_def = key_def_new(1); /* part count */ + struct key_def *key_def = key_def_new(1, 0); if (key_def == NULL) diag_raise(); auto key_def_guard = make_scoped_guard([&] { key_def_delete(key_def); }); key_def_set_part(key_def, 0 /* part no */, 0 /* field no */, - FIELD_TYPE_STRING, false, NULL, COLL_NONE); + FIELD_TYPE_STRING, false, NULL, COLL_NONE, NULL, 0); sc_space_new(BOX_SCHEMA_ID, "_schema", key_def, &on_replace_schema, NULL); /* _space - home for all spaces. */ key_def_set_part(key_def, 0 /* part no */, 0 /* field no */, - FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE); + FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0); /* _collation - collation description. */ sc_space_new(BOX_COLLATION_ID, "_collation", key_def, @@ -341,15 +341,15 @@ schema_init() NULL); key_def_delete(key_def); - key_def = key_def_new(2); /* part count */ + key_def = key_def_new(2, 0); if (key_def == NULL) diag_raise(); /* space no */ key_def_set_part(key_def, 0 /* part no */, 0 /* field no */, - FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE); + FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0); /* index no */ key_def_set_part(key_def, 1 /* part no */, 1 /* field no */, - FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE); + FIELD_TYPE_UNSIGNED, false, NULL, COLL_NONE, NULL, 0); sc_space_new(BOX_INDEX_ID, "_index", key_def, &alter_space_on_replace_index, &on_stmt_begin_index); diff --git a/src/box/tuple.c b/src/box/tuple.c index d7dbad3..83bdad1 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -159,14 +159,22 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) /* Check field types */ struct tuple_field *field = &format->fields[0]; + const char *pos = tuple; uint32_t i = 0; uint32_t defined_field_count = MIN(field_count, format->field_count); for (; i < defined_field_count; ++i, ++field) { - if (key_mp_type_validate(field->type, mp_typeof(*tuple), + if (key_mp_type_validate(field->type, mp_typeof(*pos), ER_FIELD_TYPE, i + TUPLE_INDEX_BASE, field->is_nullable)) return -1; - mp_next(&tuple); + /* Check all JSON paths. */ + if (field->childs != NULL) { + if (tuple_field_bypass_and_init(field, i, tuple, &pos, + NULL) != 0) + return -1; + } else { + mp_next(&pos); + } } return 0; } diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index b14ac35..5a3a968 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -463,13 +463,15 @@ static inline int tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, struct key_def *key_def) { + assert(has_json_path == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); const char *tuple_b_raw = tuple_data(tuple_b); - if (key_def->part_count == 1 && part->fieldno == 0) { + if (key_def->part_count == 1 && part->fieldno == 0 && + part->path == NULL) { /* * First field can not be optional - empty tuples * can not exist. @@ -597,6 +599,7 @@ static inline int tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, struct key_def *key_def) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -1039,23 +1042,35 @@ static const comparator_signature cmp_arr[] = { #undef COMPARATOR +static const tuple_compare_t compare_slowpath_funcs[] = { + tuple_compare_slowpath<false, false, false>, + tuple_compare_slowpath<true, false, false>, + tuple_compare_slowpath<false, true, false>, + tuple_compare_slowpath<true, true, false>, + tuple_compare_slowpath<false, false, true>, + tuple_compare_slowpath<true, false, true>, + tuple_compare_slowpath<false, true, true>, + tuple_compare_slowpath<true, true, true> +}; + tuple_compare_t tuple_compare_create(const struct key_def *def) { + int cmp_func_idx = (def->is_nullable ? 1 : 0) + + 2 * (def->has_optional_parts ? 1 : 0) + + 4 * (def->has_json_paths ? 1 : 0); if (def->is_nullable) { if (key_def_is_sequential(def)) { if (def->has_optional_parts) return tuple_compare_sequential<true, true>; else return tuple_compare_sequential<true, false>; - } else if (def->has_optional_parts) { - return tuple_compare_slowpath<true, true, false>; } else { - return tuple_compare_slowpath<true, false, false>; + return compare_slowpath_funcs[cmp_func_idx]; } } assert(! def->has_optional_parts); - if (!key_def_has_collation(def)) { + if (!key_def_has_collation(def) && !def->has_json_paths) { /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) { @@ -1071,10 +1086,9 @@ tuple_compare_create(const struct key_def *def) return cmp_arr[k].f; } } - if (key_def_is_sequential(def)) - return tuple_compare_sequential<false, false>; - else - return tuple_compare_slowpath<false, false, false>; + return key_def_is_sequential(def) ? + tuple_compare_sequential<false, false> : + compare_slowpath_funcs[cmp_func_idx]; } /* }}} tuple_compare */ @@ -1256,9 +1270,23 @@ static const comparator_with_key_signature cmp_wk_arr[] = { #undef KEY_COMPARATOR +static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = { + tuple_compare_with_key_slowpath<false, false, false>, + tuple_compare_with_key_slowpath<true, false, false>, + tuple_compare_with_key_slowpath<false, true, false>, + tuple_compare_with_key_slowpath<true, true, false>, + tuple_compare_with_key_slowpath<false, false, true>, + tuple_compare_with_key_slowpath<true, false, true>, + tuple_compare_with_key_slowpath<false, true, true>, + tuple_compare_with_key_slowpath<true, true, true> +}; + tuple_compare_with_key_t tuple_compare_with_key_create(const struct key_def *def) { + int cmp_func_idx = (def->is_nullable ? 1 : 0) + + 2 * (def->has_optional_parts ? 1 : 0) + + 4 * (def->has_json_paths ? 1 : 0); if (def->is_nullable) { if (key_def_is_sequential(def)) { if (def->has_optional_parts) { @@ -1268,14 +1296,12 @@ tuple_compare_with_key_create(const struct key_def *def) return tuple_compare_with_key_sequential<true, false>; } - } else if (def->has_optional_parts) { - return tuple_compare_with_key_slowpath<true, true, false>; } else { - return tuple_compare_with_key_slowpath<true, false, false>; + return compare_with_key_slowpath_funcs[cmp_func_idx]; } } assert(! def->has_optional_parts); - if (!key_def_has_collation(def)) { + if (!key_def_has_collation(def) && !def->has_json_paths) { /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]); @@ -1294,10 +1320,9 @@ tuple_compare_with_key_create(const struct key_def *def) return cmp_wk_arr[k].f; } } - if (key_def_is_sequential(def)) - return tuple_compare_with_key_sequential<false, false>; - else - return tuple_compare_with_key_slowpath<false, false, false>; + return key_def_is_sequential(def) ? + tuple_compare_with_key_sequential<false, false> : + compare_with_key_slowpath_funcs[cmp_func_idx]; } /* }}} tuple_compare_with_key */ diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index 6b771f3..2474d98 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -1,15 +1,31 @@ #include "tuple_extract_key.h" #include "tuple.h" #include "fiber.h" +#include "json/path.h" enum { MSGPACK_NULL = 0xc0 }; +/** True if key part i and i+1 are sequential. */ +template <bool has_json_paths> +static inline bool +key_def_parts_are_sequential(const struct key_def *def, int i) +{ + uint32_t fieldno1 = def->parts[i].fieldno + 1; + uint32_t fieldno2 = def->parts[i + 1].fieldno; + if (!has_json_paths) { + return fieldno1 == fieldno2; + } else { + return fieldno1 == fieldno2 && def->parts[i].path == NULL && + def->parts[i + 1].path == NULL; + } +} + /** True, if a key con contain two or more parts in sequence. */ static bool key_def_contains_sequential_parts(const struct key_def *def) { for (uint32_t i = 0; i < def->part_count - 1; ++i) { - if (def->parts[i].fieldno + 1 == def->parts[i + 1].fieldno) + if (key_def_parts_are_sequential<true>(def, i)) return true; } return false; @@ -95,6 +111,7 @@ static char * tuple_extract_key_slowpath(const struct tuple *tuple, struct key_def *key_def, uint32_t *key_size) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(contains_sequential_parts == @@ -129,8 +146,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * minimize tuple_field_raw() calls. */ for (; i < part_count - 1; i++) { - if (key_def->parts[i].fieldno + 1 != - key_def->parts[i + 1].fieldno) { + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) { /* * End of sequential part. */ @@ -176,8 +193,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * minimize tuple_field_raw() calls. */ for (; i < part_count - 1; i++) { - if (key_def->parts[i].fieldno + 1 != - key_def->parts[i + 1].fieldno) { + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) { /* * End of sequential part. */ @@ -215,6 +232,7 @@ static char * tuple_extract_key_slowpath_raw(const char *data, const char *data_end, struct key_def *key_def, uint32_t *key_size) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(mp_sizeof_nil() == 1); @@ -242,11 +260,12 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, uint32_t fieldno = key_def->parts[i].fieldno; uint32_t null_count = 0; for (; i < key_def->part_count - 1; i++) { - if (key_def->parts[i].fieldno + 1 != - key_def->parts[i + 1].fieldno) + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) break; } - uint32_t end_fieldno = key_def->parts[i].fieldno; + const struct key_part *part = &key_def->parts[i]; + uint32_t end_fieldno = part->fieldno; if (fieldno < current_fieldno) { /* Rewind. */ @@ -288,6 +307,21 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, current_fieldno++; } } + const char *field_last, *field_end_last; + if (has_json_paths && part->path != NULL) { + field_last = field; + field_end_last = field_end; + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, part->path, + part->path_len); + /* Skip fieldno. */ + int rc = json_path_next(&parser, &node); + assert(rc == 0); + rc = tuple_field_dig_with_parser(&parser, &field); + field_end = field; + mp_next(&field_end); + } memcpy(key_buf, field, field_end - field); key_buf += field_end - field; if (has_optional_parts && null_count != 0) { @@ -296,12 +330,27 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, } else { assert(key_buf - key <= data_end - data); } + if (has_json_paths && part->path != NULL) { + field = field_last; + field_end = field_end_last; + } } if (key_size != NULL) *key_size = (uint32_t)(key_buf - key); return key; } +static const tuple_extract_key_t extract_key_slowpath_funcs[] = { + tuple_extract_key_slowpath<false, false, false>, + tuple_extract_key_slowpath<true, false, false>, + tuple_extract_key_slowpath<false, true, false>, + tuple_extract_key_slowpath<true, true, false>, + tuple_extract_key_slowpath<false, false, true>, + tuple_extract_key_slowpath<true, false, true>, + tuple_extract_key_slowpath<false, true, true>, + tuple_extract_key_slowpath<true, true, true> +}; + /** * Initialize tuple_extract_key() and tuple_extract_key_raw() */ @@ -322,35 +371,30 @@ tuple_extract_key_set(struct key_def *key_def) tuple_extract_key_sequential_raw<false>; } } else { - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - if (key_def_contains_sequential_parts(key_def)) { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, true, - false>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, true, - false>; - } - } else { - if (key_def_contains_sequential_parts(key_def)) { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, false, - false>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, false, - false>; - } - } + int func_idx = + (key_def_contains_sequential_parts(key_def) ? 1 : 0) + + 2 * (key_def->has_optional_parts ? 1 : 0) + + 4 * (key_def->has_json_paths ? 1 : 0); + key_def->tuple_extract_key = + extract_key_slowpath_funcs[func_idx]; + assert(!key_def->has_optional_parts || key_def->is_nullable); } if (key_def->has_optional_parts) { assert(key_def->is_nullable); - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true, false>; + if (key_def->has_json_paths) { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<true, true>; + } else { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<true, false>; + } } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false, false>; + if (key_def->has_json_paths) { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<false, true>; + } else { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<false, false>; + } } } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 6ae96e2..00170c9 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -30,6 +30,7 @@ */ #include "json/path.h" #include "tuple_format.h" +#include "assoc.h" /** Global table of tuple formats */ struct tuple_format **tuple_formats; @@ -38,10 +39,551 @@ static intptr_t recycled_format_ids = FORMAT_ID_NIL; static uint32_t formats_size = 0, formats_capacity = 0; static const struct tuple_field tuple_field_default = { - FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, + FIELD_TYPE_ANY, TUPLE_OFFSET_SLOT_NIL, false, false, {{NULL, 0}} }; /** + * Propagate @a field to MessagePack(field)[key]. + * @param[in][out] field Field to propagate. + * @param key Key to propagate to. + * @param len Length of @a key. + * @param field_idx Field index in map. + * + * @retval 0 Success, the index was found. + * @retval -1 Not found. + */ +static inline int +tuple_field_go_to_key(const char **field, const char *key, int len, + uint32_t *field_idx) +{ + enum mp_type type = mp_typeof(**field); + if (type != MP_MAP) + return -1; + uint32_t count = mp_decode_map(field); + for (uint32_t idx = 0; idx < count; idx++) { + type = mp_typeof(**field); + if (type == MP_STR) { + uint32_t value_len; + const char *value = mp_decode_str(field, &value_len); + if (value_len == (uint)len && + memcmp(value, key, len) == 0) { + *field_idx = idx; + return 0; + } + } else { + /* Skip key. */ + mp_next(field); + } + /* Skip value. */ + mp_next(field); + } + return -1; +} + +struct mh_strnptr_node_t * +json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path, + uint32_t path_len, uint32_t path_hash) +{ + assert(hashtable != NULL); + struct mh_strnptr_key_t key = {path, path_len, path_hash}; + mh_int_t rc = mh_strnptr_find(hashtable, &key, NULL); + if (rc == mh_end(hashtable)) + return NULL; + return mh_strnptr_node(hashtable, rc); +} + +/** + * Create a new hashtable object. + * @param[out] hashtable Pointer to object to create. + * @param records Count of records to reserve. + * @retval -1 On error. + * @retval 0 On success. + */ +static struct mh_strnptr_t * +json_path_hash_create(uint32_t records) +{ + struct mh_strnptr_t *ret = mh_strnptr_new(); + if (ret == NULL) { + diag_set(OutOfMemory, sizeof(struct mh_strnptr_t), + "mh_strnptr_new", "hashtable"); + return NULL; + } + if (records > 0 && + mh_strnptr_reserve(ret, records, NULL) != 0) { + mh_strnptr_delete(ret); + diag_set(OutOfMemory, records, "mh_strnptr_reserve", + "hashtable"); + return NULL; + } + return ret; +} + +/** + * Delete @hashtable object. + * @param hashtable Pointer to object to delete. + */ +static void +json_path_hash_delete(struct mh_strnptr_t *hashtable) +{ + assert(hashtable != NULL); + while (mh_size(hashtable) != 0) { + mh_int_t n = mh_first(hashtable); + mh_strnptr_del(hashtable, n, NULL); + } + mh_strnptr_delete(hashtable); +} + +/** + * Insert a new record to hashtable. + * @param hashtable Storage to insert new record. + * @param path String with path. + * @param path_len Length of @path. + * @param field Value to store in @hashtable. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +json_path_hash_insert(struct mh_strnptr_t *hashtable, const char *path, + uint32_t path_len, struct tuple_field *field) +{ + assert(hashtable != NULL); + uint32_t path_hash = mh_strn_hash(path, path_len); + struct mh_strnptr_node_t name_node = {path, path_len, path_hash, field}; + mh_int_t rc = mh_strnptr_put(hashtable, &name_node, NULL, NULL); + if (rc == mh_end(hashtable)) { + diag_set(OutOfMemory, sizeof(*hashtable), "mh_strnptr_put", + "hashtable"); + return -1; + } + return 0; +} + +/** + * Construct field tree level for JSON path part. + * + * @param[in, out] tuple_field Pointer to record to start with + * would be changed to record that math + * @part lexeme. + * @param fieldno Number of root space field. + * @param part JSON path lexeme to represent in field tree. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +json_field_tree_append(struct tuple_field **field_subtree, uint32_t fieldno, + struct json_path_node *part) +{ + enum field_type type; + struct tuple_field *field = *field_subtree; + switch (part->type) { + case JSON_PATH_NUM: { + type = FIELD_TYPE_ARRAY; + if (field->type != FIELD_TYPE_ANY && field->type != type) + goto error_type_mistmatch; + /* Create or resize field array if required. */ + if (field->array == NULL || part->num > field->array_size) { + struct tuple_field **array = + realloc(field->array, + part->num * sizeof(array[0])); + if (array == NULL) { + diag_set(OutOfMemory, + part->num * sizeof(array[0]), + "realloc","array"); + return -1; + } + memset(&array[field->array_size], 0, + (part->num - field->array_size) * + sizeof(array[0])); + field->array = array; + field->array_size = part->num; + field->type = type; + } else if (field->array[part->num - TUPLE_INDEX_BASE] != NULL) { + /* Record already exists. No actions required */ + *field_subtree = + field->array[part->num - TUPLE_INDEX_BASE]; + return 0; + } + break; + } + case JSON_PATH_STR: { + type = FIELD_TYPE_MAP; + if (field->type != FIELD_TYPE_ANY && field->type != type) + goto error_type_mistmatch; + if (field->map == NULL) { + field->map = json_path_hash_create(1); + if (field->map == NULL) + return -1; + field->type = type; + } else { + uint32_t str_hash = mh_strn_hash(part->str, part->len); + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(field->map, part->str, + part->len, str_hash); + if (ht_record != NULL) { + assert(ht_record->val != NULL); + *field_subtree = ht_record->val; + return 0; + } + } + break; + } + default: + unreachable(); + } + + /* Construct and insert a new record. */ + struct tuple_field *new_field = malloc(sizeof(struct tuple_field)); + if (new_field == NULL) { + diag_set(OutOfMemory, sizeof(struct tuple_field), "malloc", + "new_field"); + return -1; + } + *new_field = tuple_field_default; + if (field->type == FIELD_TYPE_MAP) { + if (json_path_hash_insert(field->map, part->str, part->len, + new_field) != 0) { + free(new_field); + return -1; + } + } else if (field->type == FIELD_TYPE_ARRAY) { + field->array[part->num - TUPLE_INDEX_BASE] = new_field; + } + *field_subtree = new_field; + return 0; + +error_type_mistmatch: + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH, + tt_sprintf("%d", fieldno + TUPLE_INDEX_BASE), + field_type_strs[type], field_type_strs[field->type]); + return -1; +} + +/** + * Delete @field_subtree object. + * @param field_subtree To delete. + */ +static void +json_field_tree_delete(struct tuple_field *field_subtree) +{ + if (field_subtree->type == FIELD_TYPE_MAP && + field_subtree->map != NULL) { + mh_int_t i; + mh_foreach(field_subtree->map, i) { + struct tuple_field *field = + mh_strnptr_node(field_subtree->map, i)->val; + assert(field != NULL); + json_field_tree_delete(field); + free(field); + } + json_path_hash_delete(field_subtree->map); + } else if (field_subtree->type == FIELD_TYPE_ARRAY && + field_subtree->array != NULL) { + for (uint32_t i = 0; i < field_subtree->array_size; i++) { + struct tuple_field *field = field_subtree->array[i]; + if (field == NULL) + continue; + json_field_tree_delete(field_subtree->array[i]); + free(field_subtree->array[i]); + } + free(field_subtree->array); + } +} + +int +tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, + const char *tuple, const char **offset, + uint32_t *field_map) +{ + assert(offset != NULL); + const char *mp_data = *offset; + const char *valid_type_str = NULL; + const char *err = NULL; + enum mp_type type = mp_typeof(**offset); + if (field->type == FIELD_TYPE_MAP) { + if (type != MP_MAP) { + valid_type_str = mp_type_strs[MP_MAP]; + goto error_type_mistmatch; + } + const char *max_offset = *offset; + uint32_t max_idx = 0; + uint32_t count = mp_decode_map(&max_offset); + mh_int_t i; + mh_foreach(field->map, i) { + struct mh_strnptr_node_t *ht_record = + mh_strnptr_node(field->map, i); + struct tuple_field *leaf = ht_record->val; + assert(leaf != NULL); + + const char *raw = *offset; + uint32_t map_idx = 0; + int rc = tuple_field_go_to_key(&raw, ht_record->str, + (int)ht_record->len, + &map_idx); + if (rc != 0 && !leaf->is_nullable) { + err = tt_sprintf("map doesn't contain key " + "'%.*s' defined in index", + ht_record->len,ht_record->str); + goto error_invalid_document; + } + if (rc != 0) { + if (field_map != NULL && + leaf->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[leaf->offset_slot] = 0; + continue; + } + if (tuple_field_bypass_and_init(leaf, idx, tuple, &raw, + field_map) != 0) + return -1; + max_idx = MAX(max_idx, map_idx + 1); + max_offset = MAX(max_offset, raw); + } + *offset = max_offset; + while (count-- > max_idx) { + mp_next(offset); + mp_next(offset); + } + return 0; + } else if (field->type == FIELD_TYPE_ARRAY) { + if (type != MP_ARRAY) { + valid_type_str = mp_type_strs[MP_ARRAY]; + goto error_type_mistmatch; + } + uint32_t count = mp_decode_array(offset); + for (uint32_t i = count; i < field->array_size; i++) { + /* + * Index fields out of document array + * must be nullable. + */ + struct tuple_field *leaf = field->array[i]; + if (leaf == NULL) + continue; + if (leaf->is_nullable) { + if (field_map != NULL && + leaf->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[leaf->offset_slot] = 0; + continue; + } + err = tt_sprintf("array size %d is less than size of %d " + "defined in index", i, i + 1); + goto error_invalid_document; + } + uint32_t fields = MIN(field->array_size, count); + for (uint32_t i = 0; i < fields; i++) { + if (field->array[i] == NULL) { + mp_next(offset); + continue; + } + if (tuple_field_bypass_and_init(field->array[i], idx, + tuple, + offset, field_map) != 0) + return -1; + } + while (count-- > fields) + mp_next(offset); + return 0; + } + /* Tree leaf field */ + if (key_mp_type_validate(field->type, type, ER_KEY_PART_TYPE, idx, + field->is_nullable) != 0) { + valid_type_str = field_type_strs[field->type]; + goto error_type_mistmatch; + } + assert(offset != NULL); + if (field_map != NULL && + field->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[field->offset_slot] = (uint32_t) (*offset - tuple); + mp_next(offset); + return 0; + +error_type_mistmatch: + err = tt_sprintf("type mismatch: have %s, expected %s", + mp_type_strs[type], valid_type_str); +error_invalid_document: + assert(err != NULL); + char *data_buff = tt_static_buf(); + mp_snprint(data_buff, TT_STATIC_BUF_LEN, mp_data); + const char *err_msg = + tt_sprintf("invalid field %d document content '%s': %s", + idx + TUPLE_INDEX_BASE, data_buff, err); + diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg); + return -1; +} + +/** + * Add new JSON @path to @format. + * @param format Tuple format to modify. + * @param path String to add. + * @param path_len Length of @path. + * @param path_hash Hash of @path. + * @param type Type of field by @path. + * @param is_nullable Nullability of field by @path. + * @param strings Area to store unique JSON paths (optional). + * @param[out] leaf Pointer to leaf field. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +tuple_format_add_json_path(struct tuple_format *format, const char *path, + uint32_t path_len, uint32_t path_hash, + enum field_type type, bool is_nullable, + char **strings, struct tuple_field **leaf) +{ + assert(format->path_hash != NULL); + /* + * Get root field by index. + * Path is specified in canonical form: [i]... + */ + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, path, path_len); + int rc = json_path_next(&parser, &node); + assert(rc == 0 && node.type == JSON_PATH_NUM); + assert(node.num - TUPLE_INDEX_BASE < format->field_count); + + /* Test if path is already registered. */ + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(format->path_hash, path, path_len, path_hash); + assert(ht_record != NULL); + struct tuple_field *field = ht_record->val; + if (unlikely(field != NULL)) { + /* Path has been already registered. */ + if (field->is_nullable != is_nullable) + field->is_nullable = false; + if (field_type1_contains_type2(field->type, type)) { + field->type = type; + } else if (!field_type1_contains_type2(type, field->type)) { + const char *err = + tt_sprintf("JSON path '%.*s' has been already " + "constructed for '%s' leaf record", + path_len, path, + field_type_strs[field->type]); + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + node.num, err); + return -1; + } + *leaf = field; + return 0; + } else if (strings != NULL) { + /* + * Hashtable should hold memory related to format + * chunk allocation. + */ + memcpy(*strings, path, path_len); + (*strings)[path_len] = '\0'; + ht_record->str = *strings; + *strings += path_len + 1; + } + + /* + * We have to re-init parser with path string located in + * format chunk. + */ + json_path_parser_create(&parser, ht_record->str + parser.offset, + path_len - parser.offset); + /* Build data path tree. */ + uint32_t root_fieldno = node.num - TUPLE_INDEX_BASE; + field = &format->fields[root_fieldno]; + while ((rc = json_path_next(&parser, &node)) == 0 && + node.type != JSON_PATH_END) { + if (json_field_tree_append(&field, root_fieldno, &node) != 0) + return -1; + } + assert(rc == 0 && node.type == JSON_PATH_END); + + /* Leaf record is a new object as JSON path unique. */ + field->type = type; + field->is_nullable = is_nullable; + *leaf = field; + ht_record->val = field; + return 0; +} + +/** + * Add a new key_part to format and initialize format tuple_field + * representation. + * @param format Format to initialize. + * @param fields Fields definition if any. + * @param fields_count Count of @fields. + * @param part An index part to append. + * @param is_sequential Does this part sequential. + * @param data Memory to store path strings. + * @param current_slot Pointer to last offset slot. + * @retval -1 On error. + * @retval 0 On success. + */ +static int +tuple_format_add_key_part(struct tuple_format *format, + const struct field_def *fields, uint32_t field_count, + const struct key_part *part, bool is_sequential, + char **data, int *current_slot) +{ + assert(part->fieldno < format->field_count); + struct tuple_field *field = &format->fields[part->fieldno]; + if (part->path != NULL) { + field->is_key_part = true; + assert(!is_sequential); + struct tuple_field *leaf = NULL; + if (tuple_format_add_json_path(format, part->path, + part->path_len, part->path_hash, + part->type, part->is_nullable, + data, &leaf) != 0) + return -1; + assert(leaf != NULL); + if (leaf->offset_slot == TUPLE_OFFSET_SLOT_NIL) { + *current_slot = *current_slot - 1; + leaf->offset_slot = *current_slot; + } + return 0; + } + if (part->fieldno >= field_count) { + field->is_nullable = part->is_nullable; + } else if (field->is_nullable != part->is_nullable) { + /* + * In case of mismatch set the most + * strict option for is_nullable. + */ + field->is_nullable = false; + } + /* + * Check that there are no conflicts + * between index part types and space + * fields. If a part type is compatible + * with field's one, then the part type is + * more strict and the part type must be + * used in tuple_format. + */ + if (field_type1_contains_type2(field->type, part->type)) { + field->type = part->type; + } else if (!field_type1_contains_type2(part->type, field->type)) { + int fieldno = part->fieldno + TUPLE_INDEX_BASE; + const char *name = part->fieldno >= field_count ? + tt_sprintf("%d", fieldno) : + tt_sprintf("'%s'", + fields[part->fieldno].name); + int errcode = !field->is_key_part ? + ER_FORMAT_MISMATCH_INDEX_PART : + ER_INDEX_PART_TYPE_MISMATCH; + diag_set(ClientError, errcode, name, + field_type_strs[field->type], + field_type_strs[part->type]); + return -1; + } + field->is_key_part = true; + /* + * In the tuple, store only offsets necessary + * to access fields of non-sequential keys. + * First field is always simply accessible, + * so we don't store an offset for it. + */ + if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && !is_sequential && + part->fieldno > 0) { + *current_slot = *current_slot - 1; + field->offset_slot = *current_slot; + } + return 0; +} + +/** * Extract all available type info from keys and field * definitions. */ @@ -63,12 +605,18 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, format->fields[i].type = fields[i].type; format->fields[i].offset_slot = TUPLE_OFFSET_SLOT_NIL; format->fields[i].is_nullable = fields[i].is_nullable; + /* Don't need to init format->fields[i].map. */ + format->fields[i].childs = NULL; + format->fields[i].array_size = 0; } /* Initialize remaining fields */ for (uint32_t i = field_count; i < format->field_count; i++) format->fields[i] = tuple_field_default; int current_slot = 0; + /* Memory allocated for JSON paths if any. */ + char *data = (char *)format + sizeof(struct tuple_format) + + format->field_count * sizeof(struct tuple_field); /* extract field type info */ for (uint16_t key_no = 0; key_no < key_count; ++key_no) { @@ -76,65 +624,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, bool is_sequential = key_def_is_sequential(key_def); const struct key_part *part = key_def->parts; const struct key_part *parts_end = part + key_def->part_count; - for (; part < parts_end; part++) { - assert(part->fieldno < format->field_count); - struct tuple_field *field = - &format->fields[part->fieldno]; - if (part->fieldno >= field_count) { - field->is_nullable = part->is_nullable; - } else if (field->is_nullable != part->is_nullable) { - /* - * In case of mismatch set the most - * strict option for is_nullable. - */ - field->is_nullable = false; - } - - /* - * Check that there are no conflicts - * between index part types and space - * fields. If a part type is compatible - * with field's one, then the part type is - * more strict and the part type must be - * used in tuple_format. - */ - if (field_type1_contains_type2(field->type, - part->type)) { - field->type = part->type; - } else if (! field_type1_contains_type2(part->type, - field->type)) { - const char *name; - int fieldno = part->fieldno + TUPLE_INDEX_BASE; - if (part->fieldno >= field_count) { - name = tt_sprintf("%d", fieldno); - } else { - const struct field_def *def = - &fields[part->fieldno]; - name = tt_sprintf("'%s'", def->name); - } - int errcode; - if (! field->is_key_part) - errcode = ER_FORMAT_MISMATCH_INDEX_PART; - else - errcode = ER_INDEX_PART_TYPE_MISMATCH; - diag_set(ClientError, errcode, name, - field_type_strs[field->type], - field_type_strs[part->type]); + if (tuple_format_add_key_part(format, fields, + field_count, part, + is_sequential, &data, + ¤t_slot) != 0) return -1; - } - field->is_key_part = true; - /* - * In the tuple, store only offsets necessary - * to access fields of non-sequential keys. - * First field is always simply accessible, - * so we don't store an offset for it. - */ - if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && - is_sequential == false && part->fieldno > 0) { - - field->offset_slot = --current_slot; - } } } @@ -201,32 +696,58 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, uint32_t space_field_count, struct tuple_dictionary *dict) { uint32_t index_field_count = 0; + /* JSON path hashtable. */ + struct mh_strnptr_t *path_hash = json_path_hash_create(0); + if (path_hash == NULL) + return NULL; /* find max max field no */ for (uint16_t key_no = 0; key_no < key_count; ++key_no) { const struct key_def *key_def = keys[key_no]; const struct key_part *part = key_def->parts; const struct key_part *pend = part + key_def->part_count; for (; part < pend; part++) { + if (part->path != NULL && + json_path_hash_insert(path_hash, part->path, + part->path_len, NULL) != 0) + goto error; index_field_count = MAX(index_field_count, part->fieldno + 1); } } + size_t extra_size = 0; + if (mh_size(path_hash) == 0) { + /* Hashtable is useless. */ + json_path_hash_delete(path_hash); + path_hash = NULL; + } else { + /* + * Calculate unique JSON paths count. + * Path data would be copied later on + * tuple_format_create routine. + */ + mh_int_t i; + mh_foreach(path_hash, i) { + struct mh_strnptr_node_t *node = + mh_strnptr_node(path_hash, i); + extra_size += node->len + 1; + } + } uint32_t field_count = MAX(space_field_count, index_field_count); uint32_t total = sizeof(struct tuple_format) + - field_count * sizeof(struct tuple_field); + field_count * sizeof(struct tuple_field) + extra_size; struct tuple_format *format = (struct tuple_format *) malloc(total); if (format == NULL) { diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc", "tuple format"); - return NULL; + goto error; } if (dict == NULL) { assert(space_field_count == 0); format->dict = tuple_dictionary_new(NULL, 0); if (format->dict == NULL) { free(format); - return NULL; + goto error; } } else { format->dict = dict; @@ -243,13 +764,21 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->index_field_count = index_field_count; format->exact_field_count = 0; format->min_field_count = 0; + format->path_hash = path_hash; return format; +error: + json_path_hash_delete(path_hash); + return NULL; } /** Free tuple format resources, doesn't unregister. */ static inline void tuple_format_destroy(struct tuple_format *format) { + for (uint32_t i = 0; i < format->field_count; i++) + json_field_tree_delete(&format->fields[i]); + if (format->path_hash != NULL) + json_path_hash_delete(format->path_hash); tuple_dictionary_unref(format->dict); } @@ -334,21 +863,61 @@ tuple_format_dup(struct tuple_format *src) { uint32_t total = sizeof(struct tuple_format) + src->field_count * sizeof(struct tuple_field); + if (src->path_hash != NULL) { + mh_int_t i; + mh_foreach(src->path_hash, i) + total += mh_strnptr_node(src->path_hash, i)->len + 1; + } struct tuple_format *format = (struct tuple_format *) malloc(total); if (format == NULL) { diag_set(OutOfMemory, total, "malloc", "tuple format"); return NULL; } memcpy(format, src, total); + + /* Fill with NULLs for normal destruction on error. */ + format->path_hash = NULL; + for (uint32_t i = 0; i < format->field_count; i++) { + format->fields[i].childs = NULL; + format->fields[i].array_size = 0; + } + if (src->path_hash != NULL) { + mh_int_t i; + format->path_hash = + json_path_hash_create(mh_size(src->path_hash)); + if (format->path_hash == NULL) + goto error; + mh_foreach(src->path_hash, i) { + struct mh_strnptr_node_t *node = + mh_strnptr_node(src->path_hash, i); + /* Path data has been already copied. */ + char *path = (char *)format + (node->str - (char *)src); + if (json_path_hash_insert(format->path_hash, path, + node->len, NULL) != 0) + goto error; + /* Store source leaf field offset_slot. */ + struct tuple_field *leaf = node->val; + int32_t offset_slot = leaf->offset_slot; + uint32_t path_hash = mh_strn_hash(path, node->len); + if (tuple_format_add_json_path(format, path, node->len, + path_hash, leaf->type, + leaf->is_nullable, NULL, + &leaf) != 0) + goto error; + /* Store offset_slot in a new leaf record. */ + assert(leaf != NULL); + leaf->offset_slot = offset_slot; + } + } tuple_dictionary_ref(format->dict); format->id = FORMAT_ID_NIL; format->refs = 0; - if (tuple_format_register(format) != 0) { - tuple_format_destroy(format); - free(format); - return NULL; - } - return format; + if (tuple_format_register(format) == 0) + return format; +error: + tuple_format_destroy(format); + free(format); + return NULL; } /** @sa declaration for details. */ @@ -377,18 +946,10 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, return -1; } - /* first field is simply accessible, so we do not store offset to it */ - enum mp_type mp_type = mp_typeof(*pos); + uint32_t i = 0; + enum mp_type mp_type; const struct tuple_field *field = &format->fields[0]; - if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, - TUPLE_INDEX_BASE, field->is_nullable)) - return -1; - mp_next(&pos); - /* other fields...*/ - ++field; - uint32_t i = 1; - uint32_t defined_field_count = MIN(field_count, format->field_count); - if (field_count < format->index_field_count) { + if (field_count < format->index_field_count || field->childs != NULL) { /* * Nullify field map to be able to detect by 0, * which key fields are absent in tuple_field(). @@ -396,6 +957,20 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, memset((char *)field_map - format->field_map_size, 0, format->field_map_size); } + if (field->childs == NULL) { + /* + * First field is simply accessible, do not store + * offset to it. + */ + mp_type = mp_typeof(*pos); + if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, + TUPLE_INDEX_BASE, field->is_nullable)) + return -1; + mp_next(&pos); + ++field; + ++i; + } + uint32_t defined_field_count = MIN(field_count, format->field_count); for (; i < defined_field_count; ++i, ++field) { mp_type = mp_typeof(*pos); if (key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, @@ -405,8 +980,12 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { field_map[field->offset_slot] = (uint32_t) (pos - tuple); - } - mp_next(&pos); + } else if (field->childs != NULL && + tuple_field_bypass_and_init(field, i, tuple, &pos, + field_map) != 0) + return -1; + if (field->childs == NULL) + mp_next(&pos); } return 0; } @@ -512,55 +1091,106 @@ tuple_field_go_to_index(const char **field, uint64_t index) return -1; } -/** - * Propagate @a field to MessagePack(field)[key]. - * @param[in][out] field Field to propagate. - * @param key Key to propagate to. - * @param len Length of @a key. - * - * @retval 0 Success, the index was found. - * @retval -1 Not found. - */ -static inline int -tuple_field_go_to_key(const char **field, const char *key, int len) +const char * +tuple_field_by_part_raw(const struct tuple_format *format, const char *data, + const uint32_t *field_map, struct key_part *part) { - enum mp_type type = mp_typeof(**field); - if (type != MP_MAP) - return -1; - uint64_t count = mp_decode_map(field); - for (; count > 0; --count) { - type = mp_typeof(**field); - if (type == MP_STR) { - uint32_t value_len; - const char *value = mp_decode_str(field, &value_len); - if (value_len == (uint)len && - memcmp(value, key, len) == 0) - return 0; - } else { - /* Skip key. */ - mp_next(field); + if (likely(part->path == NULL)) + return tuple_field_raw(format, data, field_map, part->fieldno); + + struct mh_strnptr_node_t *ht_record = NULL; + int32_t offset_slot; + if (likely(part->offset_slot_epoch == format->epoch)) { + offset_slot = part->offset_slot; + } else if (format->path_hash != NULL && + (ht_record = json_path_hash_get(format->path_hash, part->path, + part->path_len, + part->path_hash)) != NULL) { + struct tuple_field *field = ht_record->val; + assert(field != NULL); + offset_slot = field->offset_slot; + /* Cache offset_slot if required. */ + if (part->offset_slot_epoch < format->epoch) { + part->offset_slot = offset_slot; + part->offset_slot_epoch = format->epoch; } - /* Skip value. */ - mp_next(field); + } else { + /* + * Legacy tuple having no field map for + * JSON index. + */ + uint32_t path_hash = + field_name_hash(part->path, part->path_len); + const char *raw = NULL; + if (tuple_field_raw_by_path(format, data, field_map, + part->path, part->path_len, + path_hash, &raw) != 0) + raw = NULL; + return raw; } - return -1; + assert(offset_slot < 0); + assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size); + if (unlikely(field_map[offset_slot] == 0)) + return NULL; + return data + field_map[offset_slot]; } -const char * -tuple_field_by_part_raw(const struct tuple_format *format, const char *data, - const uint32_t *field_map, struct key_part *part) +int +tuple_field_dig_with_parser(struct json_path_parser *parser, const char **field) { - return tuple_field_raw(format, data, field_map, part->fieldno); + int rc; + struct json_path_node node; + while ((rc = json_path_next(parser, &node)) == 0) { + uint32_t dummy; + switch(node.type) { + case JSON_PATH_NUM: + rc = tuple_field_go_to_index(field, node.num); + break; + case JSON_PATH_STR: + rc = tuple_field_go_to_key(field, node.str, + node.len, &dummy); + break; + default: + assert(node.type == JSON_PATH_END); + return 0; + } + if (rc != 0) { + *field = NULL; + return 0; + } + } + return rc; } int -tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, +tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *path, uint32_t path_len, uint32_t path_hash, const char **field) { assert(path_len > 0); uint32_t fieldno; + if (format->path_hash != NULL) { + /* + * The path hash for format->path_hash hashtable + * may may be different from path_hash specified + * as function argument. + */ + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(format->path_hash, path, path_len, + mh_strn_hash(path, path_len)); + if (ht_record != NULL) { + struct tuple_field *leaf = ht_record->val; + assert(leaf != NULL); + int32_t offset_slot = leaf->offset_slot; + assert(offset_slot != TUPLE_OFFSET_SLOT_NIL); + if (likely(field_map[offset_slot] != 0)) + *field = tuple + field_map[offset_slot]; + else + *field = NULL; + return 0; + } + } /* * It is possible, that a field has a name as * well-formatted JSON. For example 'a.b.c.d' or '[1]' can @@ -616,23 +1246,9 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, *field = NULL; return 0; } - while ((rc = json_path_next(&parser, &node)) == 0) { - switch(node.type) { - case JSON_PATH_NUM: - rc = tuple_field_go_to_index(field, node.num); - break; - case JSON_PATH_STR: - rc = tuple_field_go_to_key(field, node.str, node.len); - break; - default: - assert(node.type == JSON_PATH_END); - return 0; - } - if (rc != 0) { - *field = NULL; - return 0; - } - } + rc = tuple_field_dig_with_parser(&parser, field); + if (rc == 0) + return 0; error: assert(rc > 0); diag_set(ClientError, ER_ILLEGAL_PARAMS, diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 9406d5b..afdb2aa 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -63,6 +63,8 @@ enum { TUPLE_OFFSET_SLOT_NIL = INT32_MAX }; struct tuple; struct tuple_format; +struct json_path_parser; +struct mh_strnptr_t; /** Engine-specific tuple format methods. */ struct tuple_format_vtab { @@ -108,6 +110,21 @@ struct tuple_field { bool is_key_part; /** True, if a field can store NULL. */ bool is_nullable; + /** Tree child records. Must at the end of struct */ + union { + /** Array of fields. */ + struct { + struct tuple_field **array; + uint32_t array_size; + }; + /** Hashtable: path -> tuple_field. */ + struct mh_strnptr_t *map; + /** + * Auxiliary pointer to test if field has + * JSON path subtree. + */ + void *childs; + }; }; /** @@ -167,6 +184,8 @@ struct tuple_format { * Shared names storage used by all formats of a space. */ struct tuple_dictionary *dict; + /** JSON path hash table. */ + struct mh_strnptr_t *path_hash; /* Formats of the fields */ struct tuple_field fields[0]; }; @@ -394,7 +413,7 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple, * @retval NULL No field with @a name. */ static inline const char * -tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, +tuple_field_raw_by_name(const struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *name, uint32_t name_len, uint32_t name_hash) { @@ -419,11 +438,51 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, * @retval -1 Error in JSON path. */ int -tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, +tuple_field_raw_by_path(const struct tuple_format *format, const char *tuple, const uint32_t *field_map, const char *path, uint32_t path_len, uint32_t path_hash, const char **field); +/** + * Retrieve document data @field with initialized @parser. + * @param parser JSON parser. + * @param[in, out] field Tuple field to lookup. + * @retval 0 On success. + * @retval > 0 On error in path been used to initialize @parser. + */ +int +tuple_field_dig_with_parser(struct json_path_parser *parser, + const char **field); + +/** + * Get @hashtable record by key @path, @path_len. + * @param hashtable Storage to lookup. + * @param path Path string. + * @param path_len Length of @path. + * @param path_hash Hash of @path. + * @retval NULL On nothing found. + * @retval not NULL Leaf field pointer for registered path. + */ +struct mh_strnptr_node_t * +json_path_hash_get(struct mh_strnptr_t *hashtable, const char *path, + uint32_t path_len, uint32_t path_hash); + +/** + * Observe JSON path tree in @field comparing with @tuple + * structure. Initialize field map if specified. + * @param field Field to use on initialization. + * @param idx Root field index to emmit correct error. + * @param tuple Source raw data. + * @param offset Document field offset to process. + * @param field_map Field map to initialize (optional). + * @retval 0 On success. + * @retval -1 On error. + */ +int +tuple_field_bypass_and_init(const struct tuple_field *field, uint32_t idx, + const char *tuple, const char **offset, + uint32_t *field_map); + #if defined(__cplusplus) } /* extern "C" */ #endif /* defined(__cplusplus) */ diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index 01a0983..8ede290 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -222,7 +222,7 @@ key_hash_slowpath(const char *key, struct key_def *key_def); void tuple_hash_func_set(struct key_def *key_def) { - if (key_def->is_nullable) + if (key_def->is_nullable || key_def->has_json_paths) goto slowpath; /* * Check that key_def defines sequential a key without holes @@ -256,10 +256,17 @@ tuple_hash_func_set(struct key_def *key_def) { } slowpath: - if (key_def->has_optional_parts) - key_def->tuple_hash = tuple_hash_slowpath<true, false>; - else - key_def->tuple_hash = tuple_hash_slowpath<false, false>; + if (key_def->has_optional_parts) { + if (key_def->has_json_paths) + key_def->tuple_hash = tuple_hash_slowpath<true, true>; + else + key_def->tuple_hash = tuple_hash_slowpath<true, false>; + } else { + if (key_def->has_json_paths) + key_def->tuple_hash = tuple_hash_slowpath<false, true>; + else + key_def->tuple_hash = tuple_hash_slowpath<false, false>; + } key_def->key_hash = key_hash_slowpath; } @@ -323,6 +330,7 @@ template <bool has_optional_parts, bool has_json_paths> uint32_t tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def) { + assert(has_json_paths == key_def->has_json_paths); assert(has_optional_parts == key_def->has_optional_parts); uint32_t h = HASH_SEED; uint32_t carry = 0; diff --git a/src/box/vinyl.c b/src/box/vinyl.c index 86a33ec..2da9607 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -956,6 +956,11 @@ vinyl_index_def_change_requires_rebuild(struct index *index, return true; if (!field_type1_contains_type2(new_part->type, old_part->type)) return true; + if (old_part->path_len != new_part->path_len) + return true; + if (memcmp(old_part->path, new_part->path, + old_part->path_len) != 0) + return true; } return false; } diff --git a/src/box/vy_log.c b/src/box/vy_log.c index fc8ede5..f396705 100644 --- a/src/box/vy_log.c +++ b/src/box/vy_log.c @@ -711,7 +711,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src) "struct key_part_def"); goto err; } - key_def_dump_parts(src->key_def, dst->key_parts); + if (key_def_dump_parts(pool, src->key_def, dst->key_parts) != 0) + goto err; dst->key_part_count = src->key_def->part_count; dst->key_def = NULL; } diff --git a/src/box/vy_lsm.c b/src/box/vy_lsm.c index 8fa86d3..0abdd15 100644 --- a/src/box/vy_lsm.c +++ b/src/box/vy_lsm.c @@ -36,6 +36,7 @@ #include <sys/stat.h> #include <sys/types.h> #include <small/mempool.h> +#include <assoc.h> #include "diag.h" #include "errcode.h" @@ -158,6 +159,49 @@ vy_lsm_new(struct vy_lsm_env *lsm_env, struct vy_cache_env *cache_env, NULL); if (lsm->disk_format == NULL) goto fail_format; + /* + * Tuple formats should be compatible to make + * epoch-based caching work. + */ + int32_t min_offset_slot = 0; + struct tuple_field *dst_fields = lsm->disk_format->fields; + struct mh_strnptr_t *dst_ht = lsm->disk_format->path_hash; + struct mh_strnptr_t *src_ht = format->path_hash; + struct key_part *part = cmp_def->parts; + struct key_part *part_end = part + cmp_def->part_count; + for (; part < part_end; part++) { + struct tuple_field *dst_field = + &dst_fields[part->fieldno]; + struct tuple_field *src_field; + if (dst_field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + src_field = &format->fields[part->fieldno]; + } else if (part->path != NULL) { + struct mh_strnptr_node_t *ht_record; + ht_record = + json_path_hash_get(dst_ht, part->path, + part->path_len, + part->path_hash); + assert(ht_record != NULL); + dst_field = ht_record->val; + assert(dst_field != NULL); + ht_record = + json_path_hash_get(src_ht, part->path, + part->path_len, + part->path_hash); + assert(ht_record != NULL); + src_field = ht_record->val; + assert(src_field != NULL); + } else { + continue; + } + if (src_field->offset_slot == TUPLE_OFFSET_SLOT_NIL) + continue; + dst_field->offset_slot = src_field->offset_slot; + min_offset_slot = + MIN(src_field->offset_slot, min_offset_slot); + } + lsm->disk_format->field_map_size = + -min_offset_slot * sizeof(uint32_t); lsm->disk_format->epoch = format->epoch; } tuple_format_ref(lsm->disk_format); diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c index 7b704b8..9d5e220 100644 --- a/src/box/vy_point_lookup.c +++ b/src/box/vy_point_lookup.c @@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx, const struct vy_read_view **rv, struct tuple *key, struct tuple **ret) { - assert(tuple_field_count(key) >= lsm->cmp_def->part_count); - *ret = NULL; double start_time = ev_monotonic_now(loop()); int rc = 0; diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 8018dee..8259a91 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -44,6 +44,7 @@ #include "tuple_format.h" #include "xrow.h" #include "fiber.h" +#include "assoc.h" /** * Statement metadata keys. @@ -330,6 +331,71 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert) return replace; } +static void +vy_stmt_msgpack_build(struct tuple_field *field, char *tuple, + uint32_t *field_map, char **offset, bool write_data, + struct mh_i64ptr_t *fields_iov_ht) +{ + if (field->type == FIELD_TYPE_ARRAY) { + if (write_data) + *offset = mp_encode_array(*offset, field->array_size); + else + *offset += mp_sizeof_array(field->array_size); + for (uint32_t i = 0; i < field->array_size; i++) { + if (field->array[i] == NULL) { + if (write_data) + *offset = mp_encode_nil(*offset); + else + *offset += mp_sizeof_nil(); + continue; + } + vy_stmt_msgpack_build(field->array[i], tuple, field_map, + offset, write_data, + fields_iov_ht); + } + return; + } else if (field->type == FIELD_TYPE_MAP) { + if (write_data) + *offset = mp_encode_map(*offset, mh_size(field->map)); + else + *offset += mp_sizeof_map(mh_size(field->map)); + mh_int_t i; + mh_foreach(field->map, i) { + struct mh_strnptr_node_t *node = + mh_strnptr_node(field->map, i); + assert(node); + if (write_data) { + *offset = mp_encode_str(*offset, node->str, + node->len); + } else { + *offset += mp_sizeof_str(node->len); + } + vy_stmt_msgpack_build(node->val, tuple, field_map, + offset, write_data, + fields_iov_ht); + } + return; + } + + mh_int_t k = mh_i64ptr_find(fields_iov_ht, (uint64_t)field, NULL); + struct iovec *iov = k != mh_end(fields_iov_ht) ? + mh_i64ptr_node(fields_iov_ht, k)->val : NULL; + if (iov == NULL) { + if (write_data) + *offset = mp_encode_nil(*offset); + else + *offset += mp_sizeof_nil(); + } else { + if (write_data) { + uint32_t data_offset = *offset - tuple; + memcpy(*offset, iov->iov_base, iov->iov_len); + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) + field_map[field->offset_slot] = data_offset; + } + *offset += iov->iov_len; + } +} + static struct tuple * vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, const struct key_def *cmp_def, @@ -338,51 +404,80 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, /* UPSERT can't be surrogate. */ assert(type != IPROTO_UPSERT); struct region *region = &fiber()->gc; + struct tuple *stmt = NULL; uint32_t field_count = format->index_field_count; - struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count); + uint32_t part_count = mp_decode_array(&key); + assert(part_count == cmp_def->part_count); + struct iovec *iov = region_alloc(region, sizeof(*iov) * part_count); if (iov == NULL) { - diag_set(OutOfMemory, sizeof(*iov) * field_count, - "region", "iov for surrogate key"); + diag_set(OutOfMemory, sizeof(*iov) * part_count, "region", + "iov for surrogate key"); return NULL; } - memset(iov, 0, sizeof(*iov) * field_count); - uint32_t part_count = mp_decode_array(&key); - assert(part_count == cmp_def->part_count); - assert(part_count <= field_count); - uint32_t nulls_count = field_count - cmp_def->part_count; - uint32_t bsize = mp_sizeof_array(field_count) + - mp_sizeof_nil() * nulls_count; - for (uint32_t i = 0; i < part_count; ++i) { - const struct key_part *part = &cmp_def->parts[i]; + struct mh_i64ptr_t *fields_iov_ht = mh_i64ptr_new(); + if (fields_iov_ht == NULL) { + diag_set(OutOfMemory, sizeof(struct mh_i64ptr_t), + "mh_i64ptr_new", "fields_iov_ht"); + return NULL; + } + if (mh_i64ptr_reserve(fields_iov_ht, part_count, NULL) != 0) { + diag_set(OutOfMemory, part_count, "mh_i64ptr_reserve", + "fields_iov_ht"); + goto end; + } + uint32_t bsize = mp_sizeof_array(field_count); + uint32_t nulls_count = field_count; + memset(iov, 0, sizeof(*iov) * part_count); + const struct key_part *part = cmp_def->parts; + for (uint32_t i = 0; i < part_count; ++i, ++part) { assert(part->fieldno < field_count); const char *svp = key; - iov[part->fieldno].iov_base = (char *) key; + iov[i].iov_base = (char *) key; mp_next(&key); - iov[part->fieldno].iov_len = key - svp; - bsize += key - svp; + iov[i].iov_len = key - svp; + struct tuple_field *field; + if (part->path == NULL) { + field = &format->fields[part->fieldno]; + --nulls_count; + } else { + struct mh_strnptr_node_t *ht_record = + json_path_hash_get(format->path_hash, + part->path, part->path_len, + part->path_hash); + assert(ht_record != NULL); + field = ht_record->val; + assert(field != NULL); + } + struct mh_i64ptr_node_t node = {(uint64_t)field, &iov[i]}; + mh_int_t k = mh_i64ptr_put(fields_iov_ht, &node, NULL, NULL); + if (k == mh_end(fields_iov_ht)) + goto end; + } + bsize += nulls_count * mp_sizeof_nil(); + for (uint32_t i = 0; i < field_count; ++i) { + char *data = NULL; + vy_stmt_msgpack_build(&format->fields[i], NULL, NULL, &data, + false, fields_iov_ht); + bsize += data - (char *)NULL; } - struct tuple *stmt = vy_stmt_alloc(format, bsize); + stmt = vy_stmt_alloc(format, bsize); if (stmt == NULL) - return NULL; + goto end; char *raw = (char *) tuple_data(stmt); uint32_t *field_map = (uint32_t *) raw; char *wpos = mp_encode_array(raw, field_count); for (uint32_t i = 0; i < field_count; ++i) { - const struct tuple_field *field = &format->fields[i]; - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - field_map[field->offset_slot] = wpos - raw; - if (iov[i].iov_base == NULL) { - wpos = mp_encode_nil(wpos); - } else { - memcpy(wpos, iov[i].iov_base, iov[i].iov_len); - wpos += iov[i].iov_len; - } + vy_stmt_msgpack_build(&format->fields[i], raw, field_map, &wpos, + true, fields_iov_ht); } - assert(wpos == raw + bsize); + assert(wpos <= raw + bsize); vy_stmt_set_type(stmt, type); + +end: + mh_i64ptr_delete(fields_iov_ht); return stmt; } diff --git a/test/box/misc.result b/test/box/misc.result index 6237675..6ea97e1 100644 --- a/test/box/misc.result +++ b/test/box/misc.result @@ -350,7 +350,7 @@ t; - 'box.error.CANT_CREATE_COLLATION : 150' - 'box.error.USER_EXISTS : 46' - 'box.error.WAL_IO : 40' - - 'box.error.PROC_RET : 21' + - 'box.error.RTREE_RECT : 101' - 'box.error.PRIV_GRANTED : 89' - 'box.error.CREATE_SPACE : 9' - 'box.error.GRANT : 88' @@ -361,7 +361,7 @@ t; - 'box.error.VINYL_MAX_TUPLE_SIZE : 139' - 'box.error.LOAD_FUNCTION : 99' - 'box.error.INVALID_XLOG : 74' - - 'box.error.READ_VIEW_ABORTED : 130' + - 'box.error.PRIV_NOT_GRANTED : 91' - 'box.error.TRANSACTION_CONFLICT : 97' - 'box.error.GUEST_USER_PASSWORD : 96' - 'box.error.PROC_C : 102' @@ -371,8 +371,8 @@ t; - 'box.error.DROP_FUNCTION : 71' - 'box.error.CFG : 59' - 'box.error.NO_SUCH_FIELD : 37' - - 'box.error.CONNECTION_TO_SELF : 117' - - 'box.error.FUNCTION_MAX : 54' + - 'box.error.MORE_THAN_ONE_TUPLE : 41' + - 'box.error.PROC_LUA : 32' - 'box.error.ILLEGAL_PARAMS : 1' - 'box.error.PARTIAL_KEY : 136' - 'box.error.SAVEPOINT_NO_TRANSACTION : 114' @@ -400,34 +400,35 @@ t; - 'box.error.UPDATE_ARG_TYPE : 26' - 'box.error.CROSS_ENGINE_TRANSACTION : 81' - 'box.error.FORMAT_MISMATCH_INDEX_PART : 27' - - 'box.error.FUNCTION_TX_ACTIVE : 30' - 'box.error.injection : table: <address> - - 'box.error.ITERATOR_TYPE : 72' + - 'box.error.FUNCTION_TX_ACTIVE : 30' + - 'box.error.IDENTIFIER : 70' + - 'box.error.TRANSACTION_YIELD : 154' - 'box.error.NO_SUCH_ENGINE : 57' - 'box.error.COMMIT_IN_SUB_STMT : 122' - - 'box.error.TRANSACTION_YIELD : 154' - - 'box.error.UNSUPPORTED : 5' - - 'box.error.LAST_DROP : 15' + - 'box.error.RELOAD_CFG : 58' - 'box.error.SPACE_FIELD_IS_DUPLICATE : 149' + - 'box.error.LAST_DROP : 15' + - 'box.error.SEQUENCE_OVERFLOW : 147' - 'box.error.DECOMPRESSION : 124' - 'box.error.CREATE_SEQUENCE : 142' - 'box.error.CREATE_USER : 43' - - 'box.error.SEQUENCE_OVERFLOW : 147' + - 'box.error.FUNCTION_MAX : 54' - 'box.error.INSTANCE_UUID_MISMATCH : 66' - - 'box.error.RELOAD_CFG : 58' + - 'box.error.TUPLE_FORMAT_LIMIT : 16' - 'box.error.SYSTEM : 115' - 'box.error.KEY_PART_IS_TOO_LONG : 118' - - 'box.error.MORE_THAN_ONE_TUPLE : 41' - 'box.error.TRUNCATE_SYSTEM_SPACE : 137' - - 'box.error.NO_SUCH_SAVEPOINT : 61' - 'box.error.VY_QUOTA_TIMEOUT : 135' - - 'box.error.PRIV_NOT_GRANTED : 91' + - 'box.error.NO_SUCH_SAVEPOINT : 61' + - 'box.error.PROTOCOL : 104' + - 'box.error.READ_VIEW_ABORTED : 130' - 'box.error.WRONG_INDEX_OPTIONS : 108' - 'box.error.INVALID_VYLOG_FILE : 133' - 'box.error.INDEX_FIELD_COUNT_LIMIT : 127' - - 'box.error.BEFORE_REPLACE_RET : 53' + - 'box.error.DATA_STRUCTURE_MISMATCH : 55' - 'box.error.USER_MAX : 56' - - 'box.error.INVALID_MSGPACK : 20' + - 'box.error.BEFORE_REPLACE_RET : 53' - 'box.error.TUPLE_NOT_ARRAY : 22' - 'box.error.KEY_PART_COUNT : 31' - 'box.error.ALTER_SPACE : 12' @@ -436,47 +437,47 @@ t; - 'box.error.DROP_SEQUENCE : 144' - 'box.error.INVALID_XLOG_ORDER : 76' - 'box.error.UNKNOWN_REQUEST_TYPE : 48' - - 'box.error.PROC_LUA : 32' + - 'box.error.PROC_RET : 21' - 'box.error.SUB_STMT_MAX : 121' - 'box.error.ROLE_NOT_GRANTED : 92' - 'box.error.SPACE_EXISTS : 10' - - 'box.error.UPDATE_INTEGER_OVERFLOW : 95' + - 'box.error.UNSUPPORTED : 5' - 'box.error.MIN_FIELD_COUNT : 39' - 'box.error.NO_SUCH_SPACE : 36' - 'box.error.WRONG_INDEX_PARTS : 107' - 'box.error.REPLICASET_UUID_MISMATCH : 63' - 'box.error.UPDATE_FIELD : 29' - 'box.error.INDEX_EXISTS : 85' - - 'box.error.SPLICE : 25' + - 'box.error.DROP_SPACE : 11' - 'box.error.COMPRESSION : 119' - 'box.error.INVALID_ORDER : 68' - - 'box.error.UNKNOWN : 0' + - 'box.error.SPLICE : 25' - 'box.error.NO_SUCH_GROUP : 155' - - 'box.error.TUPLE_FORMAT_LIMIT : 16' + - 'box.error.INVALID_MSGPACK : 20' - 'box.error.DROP_PRIMARY_KEY : 17' - 'box.error.NULLABLE_PRIMARY : 152' - 'box.error.NO_SUCH_SEQUENCE : 145' - 'box.error.INJECTION : 8' - 'box.error.INVALID_UUID : 64' - - 'box.error.IDENTIFIER : 70' + - 'box.error.NO_SUCH_ROLE : 82' - 'box.error.TIMEOUT : 78' + - 'box.error.ITERATOR_TYPE : 72' - 'box.error.REPLICA_MAX : 73' - - 'box.error.NO_SUCH_ROLE : 82' - - 'box.error.DROP_SPACE : 11' + - 'box.error.UNKNOWN : 0' - 'box.error.MISSING_REQUEST_FIELD : 69' - 'box.error.MISSING_SNAPSHOT : 93' - 'box.error.WRONG_SPACE_OPTIONS : 111' - 'box.error.READONLY : 7' - - 'box.error.RTREE_RECT : 101' + - 'box.error.UPDATE_INTEGER_OVERFLOW : 95' - 'box.error.UPSERT_UNIQUE_SECONDARY_KEY : 105' - 'box.error.NO_CONNECTION : 77' - 'box.error.UNSUPPORTED_PRIV : 98' - 'box.error.WRONG_SCHEMA_VERSION : 109' - 'box.error.ROLLBACK_IN_SUB_STMT : 123' - - 'box.error.PROTOCOL : 104' - - 'box.error.INVALID_XLOG_TYPE : 125' - - 'box.error.INDEX_PART_TYPE_MISMATCH : 24' - 'box.error.UNSUPPORTED_INDEX_FEATURE : 112' + - 'box.error.CONNECTION_TO_SELF : 117' + - 'box.error.INDEX_PART_TYPE_MISMATCH : 24' + - 'box.error.INVALID_XLOG_TYPE : 125' ... test_run:cmd("setopt delimiter ''"); --- diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 35c700e..b74bb23 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -954,6 +954,393 @@ type(tuple:tomap().fourth) s:drop() --- ... +-- +-- gh-1012: Indexes for JSON-defined paths. +-- +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO["fname"]'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key + part is indexed twice' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '[3]["FIO"]["fname"]'}}}) +--- +- error: 'Wrong index options (field 2): ''path'' must be string' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'field.FIO.fname'}}}) +--- +- error: 'Wrong index options (field 2): invalid JSON path: first part should be defined + as array index' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '[3].FIO'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type + ''map'' is not supported' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[3][1]'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type + ''array'' is not supported' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +--- +- error: Field 3 has type 'map' in one index, but type 'string' in another +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3][1].sname'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +--- +- error: Field 3 has type 'map' in one index, but type 'array' in another +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fname'}}}) +--- +- error: 'Wrong index options (field 2): invalid JSON path: first part refers to invalid + field' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}}) +--- +- error: 'Wrong index options (field 3): invalid JSON path ''[3].FIO....fname'': path + has invalid structure (error at position 9)' +... +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +... +assert(idx ~= nil) +--- +- true +... +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''666'': type mismatch: have unsigned, expected map' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''666'': type mismatch: have unsigned, expected string' +... +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''{"fname": "James"}'': map doesn''t contain key ''sname'' defined in index' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5} +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5} +--- +- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +idx:select() +--- +- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] + - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +idx:min() +--- +- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:max() +--- +- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +s:drop() +--- +... +s = box.schema.create_space('withdata', {engine = engine}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[1][2]'} +--- +... +pk = s:create_index('pk', {parts = parts}) +--- +... +s:insert{{1, 2}, 3} +--- +- [[1, 2], 3] +... +s:upsert({{box.null, 2}}, {{'+', 2, 5}}) +--- +... +s:get(2) +--- +- [[1, 2], 8] +... +s:drop() +--- +... +-- Create index on space with data +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +pk = s:create_index('primary', { type = 'tree' }) +--- +... +s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5} +--- +- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5] +... +s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5} +--- +- [4, 7, {'town': 'London', 'FIO': [1, 2, 3]}, 4, 5] +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''1234'': type mismatch: have unsigned, expected map' +... +_ = s:delete(1) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +_ = s:delete(2) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +--- +- error: 'Tuple doesn''t math document structure: invalid field 3 document content + ''[1, 2, 3]'': type mismatch: have array, expected map' +... +_ = s:delete(4) +--- +... +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '[3]["FIO"]["sname"]'}, {3, 'str', path = '[3]["FIO"]["extra"]', is_nullable = true}}}) +--- +... +assert(idx ~= nil) +--- +- true +... +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '[3]["FIO"]["fname"]'}}}) +--- +- error: 'Wrong index options (field 3): JSON path ''[3]["FIO"]["fname"]'' has been + already constructed for ''string'' leaf record' +... +idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}}}) +--- +... +assert(idx2 ~= nil) +--- +- true +... +t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} +--- +... +-- Test field_map in tuple speed-up access by indexed path. +t["[3][\"FIO\"][\"fname\"]"] +--- +- Agent +... +idx:select() +--- +- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5] + - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:min() +--- +- [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5] +... +idx:max() +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:drop() +--- +... +s:drop() +--- +... +-- Test complex JSON indexes +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'str', path='[1][3][2].a'} +--- +... +parts[2] = {1, 'unsigned', path = '[1][3][1]'} +--- +... +parts[3] = {2, 'str', path = '[2][2].d[1]'} +--- +... +pk = s:create_index('primary', { type = 'tree', parts = parts}) +--- +... +s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}} +--- +- [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, + [1, 2, 3]] +... +s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6} +--- +- error: Duplicate key exists in unique index 'primary' in space 'withdata' +... +parts = {} +--- +... +parts[1] = {4, 'unsigned', path='[4][1]', is_nullable = false} +--- +... +parts[2] = {4, 'unsigned', path='[4][2]', is_nullable = true} +--- +... +parts[3] = {4, 'unsigned', path='[4][4]', is_nullable = true} +--- +... +trap_idx = s:create_index('trap', { type = 'tree', parts = parts}) +--- +... +s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}} +--- +- error: 'Tuple doesn''t math document structure: invalid field 4 document content + ''[]'': array size 0 is less than size of 1 defined in index' +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[1][3][2].b' } +--- +... +parts[2] = {3, 'unsigned'} +--- +... +crosspart_idx = s:create_index('crosspart', { parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}} +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[1][3][2].b'} +--- +... +num_idx = s:create_index('numeric', {parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}} +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +num_idx:get(2) +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +num_idx:select() +--- +- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [ + 9, 2, 3]] + - [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], + 6, [1, 2, 3]] + - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [ + 0]] +... +num_idx:max() +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +num_idx:min() +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +assert(crosspart_idx:max() == num_idx:max()) +--- +- true +... +assert(crosspart_idx:min() == num_idx:min()) +--- +- true +... +trap_idx:max() +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +trap_idx:min() +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +s:drop() +--- +... +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned', path = '[1]'}}}) +--- +... +assert(pk_simplified.path == box.NULL) +--- +- true +... +idx = s:create_index('idx', {parts = {{2, 'integer', path = '[2].a'}}}) +--- +... +s:insert{31, {a = 1, aa = -1}} +--- +- [31, {'a': 1, 'aa': -1}] +... +s:insert{22, {a = 2, aa = -2}} +--- +- [22, {'a': 2, 'aa': -2}] +... +s:insert{13, {a = 3, aa = -3}} +--- +- [13, {'a': 3, 'aa': -3}] +... +idx:select() +--- +- - [31, {'a': 1, 'aa': -1}] + - [22, {'a': 2, 'aa': -2}] + - [13, {'a': 3, 'aa': -3}] +... +idx:alter({parts = {{2, 'integer', path = '[2].aa'}}}) +--- +... +idx:select() +--- +- - [13, {'a': 3, 'aa': -3}] + - [22, {'a': 2, 'aa': -2}] + - [31, {'a': 1, 'aa': -1}] +... +s:drop() +--- +... engine = nil --- ... diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index edc3dab..d563c66 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -312,5 +312,114 @@ tuple:tomap().fourth type(tuple:tomap().fourth) s:drop() +-- +-- gh-1012: Indexes for JSON-defined paths. +-- +s = box.schema.space.create('withdata', {engine = engine}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO["fname"]'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '[3]["FIO"]["fname"]'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'field.FIO.fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '[3].FIO'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[3][1]'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3][1].sname'}, {3, 'str', path = '[3]["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}}) +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +assert(idx ~= nil) +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5} +s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5} +idx:select() +idx:min() +idx:max() +s:drop() + +s = box.schema.create_space('withdata', {engine = engine}) +parts = {} +parts[1] = {1, 'unsigned', path='[1][2]'} +pk = s:create_index('pk', {parts = parts}) +s:insert{{1, 2}, 3} +s:upsert({{box.null, 2}}, {{'+', 2, 5}}) +s:get(2) +s:drop() + +-- Create index on space with data +s = box.schema.space.create('withdata', {engine = engine}) +pk = s:create_index('primary', { type = 'tree' }) +s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5} +s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5} +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +_ = s:delete(1) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +_ = s:delete(2) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) +_ = s:delete(4) +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '[3]["FIO"]["sname"]'}, {3, 'str', path = '[3]["FIO"]["extra"]', is_nullable = true}}}) +assert(idx ~= nil) +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '[3]["FIO"]["fname"]'}}}) +idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}}}) +assert(idx2 ~= nil) +t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} +-- Test field_map in tuple speed-up access by indexed path. +t["[3][\"FIO\"][\"fname\"]"] +idx:select() +idx:min() +idx:max() +idx:drop() +s:drop() + +-- Test complex JSON indexes +s = box.schema.space.create('withdata', {engine = engine}) +parts = {} +parts[1] = {1, 'str', path='[1][3][2].a'} +parts[2] = {1, 'unsigned', path = '[1][3][1]'} +parts[3] = {2, 'str', path = '[2][2].d[1]'} +pk = s:create_index('primary', { type = 'tree', parts = parts}) +s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}} +s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6} +parts = {} +parts[1] = {4, 'unsigned', path='[4][1]', is_nullable = false} +parts[2] = {4, 'unsigned', path='[4][2]', is_nullable = true} +parts[3] = {4, 'unsigned', path='[4][4]', is_nullable = true} +trap_idx = s:create_index('trap', { type = 'tree', parts = parts}) +s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}} +parts = {} +parts[1] = {1, 'unsigned', path='[1][3][2].b' } +parts[2] = {3, 'unsigned'} +crosspart_idx = s:create_index('crosspart', { parts = parts}) +s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}} +parts = {} +parts[1] = {1, 'unsigned', path='[1][3][2].b'} +num_idx = s:create_index('numeric', {parts = parts}) +s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}} +num_idx:get(2) +num_idx:select() +num_idx:max() +num_idx:min() +assert(crosspart_idx:max() == num_idx:max()) +assert(crosspart_idx:min() == num_idx:min()) +trap_idx:max() +trap_idx:min() +s:drop() + +s = box.schema.space.create('withdata', {engine = engine}) +pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned', path = '[1]'}}}) +assert(pk_simplified.path == box.NULL) +idx = s:create_index('idx', {parts = {{2, 'integer', path = '[2].a'}}}) +s:insert{31, {a = 1, aa = -1}} +s:insert{22, {a = 2, aa = -2}} +s:insert{13, {a = 3, aa = -3}} +idx:select() +idx:alter({parts = {{2, 'integer', path = '[2].aa'}}}) +idx:select() +s:drop() + engine = nil test_run = nil diff --git a/test/vinyl/info.result b/test/vinyl/info.result index 95e8cc6..134924f 100644 --- a/test/vinyl/info.result +++ b/test/vinyl/info.result @@ -1157,7 +1157,7 @@ st2 = i2:stat() ... s:bsize() --- -- 107449 +- 107499 ... i1:len(), i2:len() --- -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] [PATCH v3 4/4] box: specify indexes in user-friendly form 2018-08-27 7:37 [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Kirill Shcherbatov ` (2 preceding siblings ...) 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 3/4] box: introduce JSON indexes Kirill Shcherbatov @ 2018-08-27 7:37 ` Kirill Shcherbatov 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy 2018-09-17 15:50 ` [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Vladimir Davydov 4 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-08-27 7:37 UTC (permalink / raw) To: tarantool-patches; +Cc: v.shpilevoy, Kirill Shcherbatov Since now it is possible to create indexes by JSON-path using field names specified in format. Closes #1012. @TarantoolBot document Title: Indexes by JSON path Sometimes field data could have complex document structure. When this structure is consistent across whole document, you are able to create an index by JSON path. Example: s:create_index('json_index', {parts = {{'data.FIO["fname"]', 'str'}}}) --- src/box/lua/index.c | 74 +++++++++++++++++++++++++++++++++++++++++++++ src/box/lua/schema.lua | 20 ++++++------ test/engine/iterator.result | 2 +- test/engine/tuple.result | 37 +++++++++++++++++++++++ test/engine/tuple.test.lua | 10 ++++++ 5 files changed, 131 insertions(+), 12 deletions(-) diff --git a/src/box/lua/index.c b/src/box/lua/index.c index ef89c39..b2b8ca2 100644 --- a/src/box/lua/index.c +++ b/src/box/lua/index.c @@ -28,6 +28,9 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include "box/schema.h" +#include "box/tuple_format.h" +#include "json/path.h" #include "box/lua/index.h" #include "lua/utils.h" #include "box/box.h" @@ -328,6 +331,76 @@ lbox_index_compact(lua_State *L) return 0; } +static int +lbox_index_resolve_path(struct lua_State *L) +{ + if (lua_gettop(L) != 3 || + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) { + return luaL_error(L, "Usage box.internal." + "path_resolve(part_id, space_id, path)"); + } + uint32_t part_id = lua_tonumber(L, 1); + uint32_t space_id = lua_tonumber(L, 2); + const char *path = lua_tostring(L, 3); + size_t path_len = strlen(path); + struct space *space = space_cache_find(space_id); + if (space == NULL) + return luaT_error(L); + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, path, path_len); + int rc = json_path_next(&parser, &node); + if (rc != 0) { + const char *err_msg = + tt_sprintf("Illegal parameters, options.parts[%d]: " + "error in path on position %d", part_id, rc); + diag_set(IllegalParams, err_msg); + return luaT_error(L); + } + assert(space->format != NULL && space->format->dict != NULL); + uint32_t fieldno = node.num - TUPLE_INDEX_BASE;; + if (node.type == JSON_PATH_NUM && + fieldno >= space->format->field_count) { + const char *err_msg = + tt_sprintf("Illegal parameters, options.parts[%d]: " + "field '%d' referenced in path is greater " + "than format field count %d", part_id, + fieldno + TUPLE_INDEX_BASE, + space->format->field_count); + diag_set(IllegalParams, err_msg); + return luaT_error(L); + } else if (node.type == JSON_PATH_STR && + tuple_fieldno_by_name(space->format->dict, node.str, node.len, + field_name_hash(node.str, node.len), + &fieldno) != 0) { + const char *err_msg = + tt_sprintf("Illegal parameters, options.parts[%d]: " + "field was not found by name '%.*s'", + part_id, node.len, node.str); + diag_set(IllegalParams, err_msg); + return luaT_error(L); + } + fieldno += TUPLE_INDEX_BASE; + + char *path_resolved = region_alloc(&fiber()->gc, path_len + 1); + if (path_resolved == NULL) { + diag_set(OutOfMemory, path_len + 1, "region_alloc", + "path_resolved"); + return luaT_error(L); + } + + path = path_resolved; + path_resolved += + sprintf(path_resolved, "[%d]", fieldno); + memcpy(path_resolved, parser.src + parser.offset, + parser.src_len - parser.offset); + path_resolved[parser.src_len - parser.offset] = '\0'; + + lua_pushnumber(L, fieldno); + lua_pushstring(L, path); + return 2; +} + /* }}} */ void @@ -365,6 +438,7 @@ box_lua_index_init(struct lua_State *L) {"truncate", lbox_truncate}, {"stat", lbox_index_stat}, {"compact", lbox_index_compact}, + {"path_resolve", lbox_index_resolve_path}, {NULL, NULL} }; diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua index 540a2a5..bc11375 100644 --- a/src/box/lua/schema.lua +++ b/src/box/lua/schema.lua @@ -556,7 +556,7 @@ local function update_index_parts_1_6_0(parts) return result end -local function update_index_parts(format, parts) +local function update_index_parts(format, parts, space_id) if type(parts) ~= "table" then box.error(box.error.ILLEGAL_PARAMS, "options.parts parameter should be a table") @@ -607,16 +607,14 @@ local function update_index_parts(format, parts) box.error(box.error.ILLEGAL_PARAMS, "options.parts[" .. i .. "]: field (name or number) is expected") elseif type(part.field) == 'string' then - for k,v in pairs(format) do - if v.name == part.field then - part.field = k - break - end - end - if type(part.field) == 'string' then + local idx, path = box.internal.path_resolve(i, space_id, part.field) + if part.path ~= nil and part.path ~= path then box.error(box.error.ILLEGAL_PARAMS, - "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'") + "options.parts[" .. i .. "]: field path '"..part.path.." doesn't math path resolved by name '" .. part.field .. "'") end + parts_can_be_simplified = parts_can_be_simplified and path == nil + part.field = idx + part.path = path or part.path elseif part.field == 0 then box.error(box.error.ILLEGAL_PARAMS, "options.parts[" .. i .. "]: field (number) must be one-based") @@ -767,7 +765,7 @@ box.schema.index.create = function(space_id, name, options) end end local parts, parts_can_be_simplified = - update_index_parts(format, options.parts) + update_index_parts(format, options.parts, space_id) -- create_index() options contains type, parts, etc, -- stored separately. Remove these members from index_opts local index_opts = { @@ -934,7 +932,7 @@ box.schema.index.alter = function(space_id, index_id, options) if options.parts then local parts_can_be_simplified parts, parts_can_be_simplified = - update_index_parts(format, options.parts) + update_index_parts(format, options.parts, space_id) -- save parts in old format if possible if parts_can_be_simplified then parts = simplify_index_parts(parts) diff --git a/test/engine/iterator.result b/test/engine/iterator.result index 10097ed..05d892d 100644 --- a/test/engine/iterator.result +++ b/test/engine/iterator.result @@ -4213,7 +4213,7 @@ s:replace{35} ... state, value = gen(param,state) --- -- error: 'builtin/box/schema.lua:1034: usage: next(param, state)' +- error: 'builtin/box/schema.lua:1032: usage: next(param, state)' ... value --- diff --git a/test/engine/tuple.result b/test/engine/tuple.result index 8a0c0bc..e778050 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -946,6 +946,43 @@ assert(idx ~= nil) --- - true ... +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +--- +... +s:format(format) +--- +- error: Field 3 has type 'map' in one index, but type 'array' in another +... +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +--- +... +s:format(format) +--- +... +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: error in path on position 1' +... +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is + greater than format field count 5' +... +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid''' +... +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) +--- +... +assert(idx3 ~= nil) +--- +- true +... +assert(idx3.parts[2].path == "[3][\"FIO\"][\"fname\"]") +--- +- true +... s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} --- - error: 'Tuple doesn''t math document structure: invalid field 3 document content diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index 3b4ce26..56d24e5 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -304,6 +304,16 @@ s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fnam s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}}) idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) assert(idx ~= nil) +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +s:format(format) +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +s:format(format) +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) +assert(idx3 ~= nil) +assert(idx3.parts[2].path == "[3][\"FIO\"][\"fname\"]") s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [tarantool-patches] Re: [PATCH v3 4/4] box: specify indexes in user-friendly form 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 4/4] box: specify indexes in user-friendly form Kirill Shcherbatov @ 2018-09-03 10:32 ` Vladislav Shpilevoy 2018-09-06 12:46 ` Kirill Shcherbatov 0 siblings, 1 reply; 14+ messages in thread From: Vladislav Shpilevoy @ 2018-09-03 10:32 UTC (permalink / raw) To: Kirill Shcherbatov, tarantool-patches Thanks for the fixes! See 3 comments below. > diff --git a/src/box/lua/index.c b/src/box/lua/index.c > index ef89c39..b2b8ca2 100644 > --- a/src/box/lua/index.c > +++ b/src/box/lua/index.c > @@ -328,6 +331,76 @@ lbox_index_compact(lua_State *L) > return 0; > } > > +static int > +lbox_index_resolve_path(struct lua_State *L) > +{ > + if (lua_gettop(L) != 3 || > + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) { > + return luaL_error(L, "Usage box.internal." > + "path_resolve(part_id, space_id, path)"); > + } > + uint32_t part_id = lua_tonumber(L, 1); > + uint32_t space_id = lua_tonumber(L, 2); > + const char *path = lua_tostring(L, 3); > + size_t path_len = strlen(path); > + struct space *space = space_cache_find(space_id); > + if (space == NULL) > + return luaT_error(L); > + struct json_path_parser parser; > + struct json_path_node node; > + json_path_parser_create(&parser, path, path_len); > + int rc = json_path_next(&parser, &node); > + if (rc != 0) { > + const char *err_msg = > + tt_sprintf("Illegal parameters, options.parts[%d]: " > + "error in path on position %d", part_id, rc); > + diag_set(IllegalParams, err_msg); 1. It is box sub-module so use ClientError please. > + return luaT_error(L); > + } > + assert(space->format != NULL && space->format->dict != NULL); > + uint32_t fieldno = node.num - TUPLE_INDEX_BASE;; 2. How do you use node.num before checking that it is num? And why referencing a field out of format is forbidden? I am free to define indexes on non-formatted fields. > + if (node.type == JSON_PATH_NUM && > + fieldno >= space->format->field_count) { > + const char *err_msg = > + tt_sprintf("Illegal parameters, options.parts[%d]: " > + "field '%d' referenced in path is greater " > + "than format field count %d", part_id, > + fieldno + TUPLE_INDEX_BASE, > + space->format->field_count); > + diag_set(IllegalParams, err_msg); > + return luaT_error(L); > + } else if (node.type == JSON_PATH_STR && > + tuple_fieldno_by_name(space->format->dict, node.str, node.len, > + field_name_hash(node.str, node.len), > + &fieldno) != 0) { > + const char *err_msg = > + tt_sprintf("Illegal parameters, options.parts[%d]: " > + "field was not found by name '%.*s'", > + part_id, node.len, node.str); > + diag_set(IllegalParams, err_msg); > + return luaT_error(L); > + } > + fieldno += TUPLE_INDEX_BASE; > + > + char *path_resolved = region_alloc(&fiber()->gc, path_len + 1); > + if (path_resolved == NULL) { > + diag_set(OutOfMemory, path_len + 1, "region_alloc", > + "path_resolved"); > + return luaT_error(L); > + } > + > + path = path_resolved; > + path_resolved += > + sprintf(path_resolved, "[%d]", fieldno); 3. Fits in one line. > + memcpy(path_resolved, parser.src + parser.offset, > + parser.src_len - parser.offset); > + path_resolved[parser.src_len - parser.offset] = '\0'; > + > + lua_pushnumber(L, fieldno); > + lua_pushstring(L, path); > + return 2; > +} > + > /* }}} */ > > void ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v3 4/4] box: specify indexes in user-friendly form 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy @ 2018-09-06 12:46 ` Kirill Shcherbatov 0 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-09-06 12:46 UTC (permalink / raw) To: tarantool-patches, Vladislav Shpilevoy, Vladimir Davydov > 1. It is box sub-module so use ClientError please. Fixed. > 2. How do you use node.num before checking that it is num? And why > referencing a field out of format is forbidden? I am free to define > indexes on non-formatted fields. Ok, fixed. > 3. Fits in one line. Fixed. ================================================ From bf784403070114ea7ec065ca93903c8353bd7e73 Mon Sep 17 00:00:00 2001 Message-Id: <bf784403070114ea7ec065ca93903c8353bd7e73.1536237903.git.kshcherbatov@tarantool.org> In-Reply-To: <cover.1536237903.git.kshcherbatov@tarantool.org> References: <cover.1536237903.git.kshcherbatov@tarantool.org> From: Kirill Shcherbatov <kshcherbatov@tarantool.org> Date: Thu, 23 Aug 2018 18:11:04 +0300 Subject: [PATCH 4/4] box: specify indexes in user-friendly form Since now it is possible to create indexes by JSON-path using field names specified in format. Closes #1012. @TarantoolBot document Title: Indexes by JSON path Sometimes field data could have complex document structure. When this structure is consistent across whole document, you are able to create an index by JSON path. Example: s:create_index('json_index', {parts = {{'data.FIO["fname"]', 'str'}}}) --- src/box/lua/index.c | 73 +++++++++++++++++++++++++++++++++++++++++++++ src/box/lua/schema.lua | 20 ++++++------- src/box/tuple_format.c | 3 +- test/engine/iterator.result | 2 +- test/engine/tuple.result | 41 +++++++++++++++++++++++++ test/engine/tuple.test.lua | 12 ++++++++ 6 files changed, 137 insertions(+), 14 deletions(-) diff --git a/src/box/lua/index.c b/src/box/lua/index.c index ef89c39..8f1b038 100644 --- a/src/box/lua/index.c +++ b/src/box/lua/index.c @@ -28,6 +28,9 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include "box/schema.h" +#include "box/tuple_format.h" +#include "json/path.h" #include "box/lua/index.h" #include "lua/utils.h" #include "box/box.h" @@ -328,6 +331,75 @@ lbox_index_compact(lua_State *L) return 0; } +static int +lbox_index_resolve_path(struct lua_State *L) +{ + if (lua_gettop(L) != 3 || + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) { + return luaL_error(L, "Usage box.internal." + "path_resolve(part_id, space_id, path)"); + } + uint32_t part_id = lua_tonumber(L, 1); + uint32_t space_id = lua_tonumber(L, 2); + size_t path_len; + const char *path = lua_tolstring(L, 3, &path_len); + struct space *space = space_cache_find(space_id); + if (space == NULL) + return luaT_error(L); + struct json_path_parser parser; + struct json_path_node node; + json_path_parser_create(&parser, path, path_len); + int rc = json_path_next(&parser, &node); + if (rc != 0) { + const char *err_msg = + tt_sprintf("options.parts[%d]: error in path on " + "position %d", part_id, rc); + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); + return luaT_error(L); + } + assert(space->format != NULL && space->format->dict != NULL); + uint32_t fieldno; + if (node.type == JSON_PATH_NUM && + (fieldno = node.num - TUPLE_INDEX_BASE) >= + space->format->field_count) { + const char *err_msg = + tt_sprintf("options.parts[%d]: field '%d' referenced " + "in path is greater than format field " + "count %d", part_id, + fieldno + TUPLE_INDEX_BASE, + space->format->field_count); + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); + return luaT_error(L); + } else if (node.type == JSON_PATH_STR && + tuple_fieldno_by_name(space->format->dict, node.str, node.len, + field_name_hash(node.str, node.len), + &fieldno) != 0) { + const char *err_msg = + tt_sprintf("options.parts[%d]: field was not found by " + "name '%.*s'", part_id, node.len, node.str); + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); + return luaT_error(L); + } + fieldno += TUPLE_INDEX_BASE; + + char *path_resolved = region_alloc(&fiber()->gc, path_len + 1); + if (path_resolved == NULL) { + diag_set(OutOfMemory, path_len + 1, "region_alloc", + "path_resolved"); + return luaT_error(L); + } + + path = path_resolved; + path_resolved += sprintf(path_resolved, "[%d]", fieldno); + memcpy(path_resolved, parser.src + parser.offset, + parser.src_len - parser.offset); + path_resolved[parser.src_len - parser.offset] = '\0'; + + lua_pushnumber(L, fieldno); + lua_pushstring(L, path); + return 2; +} + /* }}} */ void @@ -365,6 +437,7 @@ box_lua_index_init(struct lua_State *L) {"truncate", lbox_truncate}, {"stat", lbox_index_stat}, {"compact", lbox_index_compact}, + {"path_resolve", lbox_index_resolve_path}, {NULL, NULL} }; diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua index 540a2a5..bc11375 100644 --- a/src/box/lua/schema.lua +++ b/src/box/lua/schema.lua @@ -556,7 +556,7 @@ local function update_index_parts_1_6_0(parts) return result end -local function update_index_parts(format, parts) +local function update_index_parts(format, parts, space_id) if type(parts) ~= "table" then box.error(box.error.ILLEGAL_PARAMS, "options.parts parameter should be a table") @@ -607,16 +607,14 @@ local function update_index_parts(format, parts) box.error(box.error.ILLEGAL_PARAMS, "options.parts[" .. i .. "]: field (name or number) is expected") elseif type(part.field) == 'string' then - for k,v in pairs(format) do - if v.name == part.field then - part.field = k - break - end - end - if type(part.field) == 'string' then + local idx, path = box.internal.path_resolve(i, space_id, part.field) + if part.path ~= nil and part.path ~= path then box.error(box.error.ILLEGAL_PARAMS, - "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'") + "options.parts[" .. i .. "]: field path '"..part.path.." doesn't math path resolved by name '" .. part.field .. "'") end + parts_can_be_simplified = parts_can_be_simplified and path == nil + part.field = idx + part.path = path or part.path elseif part.field == 0 then box.error(box.error.ILLEGAL_PARAMS, "options.parts[" .. i .. "]: field (number) must be one-based") @@ -767,7 +765,7 @@ box.schema.index.create = function(space_id, name, options) end end local parts, parts_can_be_simplified = - update_index_parts(format, options.parts) + update_index_parts(format, options.parts, space_id) -- create_index() options contains type, parts, etc, -- stored separately. Remove these members from index_opts local index_opts = { @@ -934,7 +932,7 @@ box.schema.index.alter = function(space_id, index_id, options) if options.parts then local parts_can_be_simplified parts, parts_can_be_simplified = - update_index_parts(format, options.parts) + update_index_parts(format, options.parts, space_id) -- save parts in old format if possible if parts_can_be_simplified then parts = simplify_index_parts(parts) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 00170c9..9587e6b 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -1119,8 +1119,7 @@ tuple_field_by_part_raw(const struct tuple_format *format, const char *data, * Legacy tuple having no field map for * JSON index. */ - uint32_t path_hash = - field_name_hash(part->path, part->path_len); + uint32_t path_hash = field_name_hash(part->path, part->path_len); const char *raw = NULL; if (tuple_field_raw_by_path(format, data, field_map, part->path, part->path_len, diff --git a/test/engine/iterator.result b/test/engine/iterator.result index 10097ed..05d892d 100644 --- a/test/engine/iterator.result +++ b/test/engine/iterator.result @@ -4213,7 +4213,7 @@ s:replace{35} ... state, value = gen(param,state) --- -- error: 'builtin/box/schema.lua:1034: usage: next(param, state)' +- error: 'builtin/box/schema.lua:1032: usage: next(param, state)' ... value --- diff --git a/test/engine/tuple.result b/test/engine/tuple.result index b74bb23..d28d6b4 100644 --- a/test/engine/tuple.result +++ b/test/engine/tuple.result @@ -1009,6 +1009,47 @@ assert(idx ~= nil) --- - true ... +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +--- +... +s:format(format) +--- +- error: Field 3 has type 'map' in one index, but type 'array' in another +... +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +--- +... +s:format(format) +--- +... +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: error in path on position 1' +... +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is + greater than format field count 5' +... +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid''' +... +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) +--- +... +assert(idx3 ~= nil) +--- +- true +... +assert(idx3.parts[2].path == "[3][\"FIO\"][\"fname\"]") +--- +- true +... +-- Vinyl has optimizations that omit index checks, so errors could differ. +idx3:drop() +--- +... s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} --- - error: 'Tuple doesn''t math document structure: invalid field 3 document content diff --git a/test/engine/tuple.test.lua b/test/engine/tuple.test.lua index d563c66..1508f99 100644 --- a/test/engine/tuple.test.lua +++ b/test/engine/tuple.test.lua @@ -327,6 +327,18 @@ s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[2].FIO.fnam s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3].FIO....fname'}}}) idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[3]["FIO"]["fname"]'}, {3, 'str', path = '[3]["FIO"]["sname"]'}}}) assert(idx ~= nil) +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'array'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +s:format(format) +format = {{'int1', 'unsigned'}, {'int2', 'unsigned'}, {'data', 'map'}, {'int3', 'unsigned'}, {'int4', 'unsigned'}} +s:format(format) +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) +assert(idx3 ~= nil) +assert(idx3.parts[2].path == "[3][\"FIO\"][\"fname\"]") +-- Vinyl has optimizations that omit index checks, so errors could differ. +idx3:drop() s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} -- 2.7.4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path 2018-08-27 7:37 [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Kirill Shcherbatov ` (3 preceding siblings ...) 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 4/4] box: specify indexes in user-friendly form Kirill Shcherbatov @ 2018-09-17 15:50 ` Vladimir Davydov 4 siblings, 0 replies; 14+ messages in thread From: Vladimir Davydov @ 2018-09-17 15:50 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, v.shpilevoy On Mon, Aug 27, 2018 at 10:37:26AM +0300, Kirill Shcherbatov wrote: > Branch: http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes > Issue: https://github.com/tarantool/tarantool/issues/1012 vinyl/select_consistency.test.lua crashes on this branch: } vinyl/select_consistency.test.lua } } [Instance "vinyl" killed by signal: 6 (SIGABRT)] } Found assertion fail in the results file [/home/vlad/src/tarantool/test/var/001_vinyl/vinyl.log]: } 2018-09-17 18:47:21.911 [26787] vinyl.writer.0/111/task I> writing `/home/vlad/src/tarantool/test/var/001_vinyl/vinyl/512/1/00000000000000000090.run' } 2018-09-17 18:47:21.911 [26787] vinyl.writer.0/111/task I> writing `/home/vlad/src/tarantool/test/var/001_vinyl/vinyl/512/1/00000000000000000090.index' } 2018-09-17 18:47:21.912 [26787] main/118/lua I> removed /home/vlad/src/tarantool/test/var/001_vinyl/vinyl/00000000000000000799.snap } 2018-09-17 18:47:21.912 [26787] main/104/vinyl.scheduler I> 512/2: completed compacting range (-inf..[1, 7, 72]) } 2018-09-17 18:47:21.912 [26787] main/118/lua I> removed /home/vlad/src/tarantool/test/var/001_vinyl/vinyl/00000000000000000007.vylog } 2018-09-17 18:47:21.913 [26787] main/104/vinyl.scheduler I> 512/1: completed compacting range ([5, 3, 100]..[7, 8, 99]) } 2018-09-17 18:47:21.913 [26787] main/104/vinyl.scheduler I> 512/0: split range (-inf..[43]) by key [14] } 2018-09-17 18:47:21.913 [26787] main/104/vinyl.scheduler I> 512/1: started compacting range ([7, 8, 99]..inf), runs 2/2 } 2018-09-17 18:47:21.913 [26787] vinyl.writer.0/112/task I> writing `/home/vlad/src/tarantool/test/var/001_vinyl/vinyl/512/1/00000000000000000099.run' } 2018-09-17 18:47:21.913 [26787] main/118/lua I> removed /home/vlad/src/tarantool/test/var/001_vinyl/vinyl/512/0/00000000000000000010.index } 2018-09-17 18:47:21.913 [26787] main/104/vinyl.scheduler I> 512/0: split range ([43]..inf) by key [74] } 2018-09-17 18:47:21.913 [26787] main/118/lua I> removed /home/vlad/src/tarantool/test/var/001_vinyl/vinyl/512/0/00000000000000000010.run } 2018-09-17 18:47:21.913 [26787] vinyl.writer.0/112/task I> writing `/home/vlad/src/tarantool/test/var/001_vinyl/vinyl/512/1/00000000000000000099.index' } 2018-09-17 18:47:21.913 [26787] main/104/vinyl.scheduler I> 512/2: started compacting range ([1, 7, 72]..[5, 2, 100]), runs 2/2 } tarantool: /home/vlad/src/tarantool/src/box/tuple.h:701: mp_tuple_assert: Assertion `tuple == tuple_end' failed. } [ fail ] We can't push this patch set until this issue is fixed. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-09-17 17:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-27 7:37 [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 1/4] rfc: describe a Tarantool JSON indexes Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 2/4] box: introduce slot_cache in key_part Kirill Shcherbatov 2018-09-03 10:35 ` [tarantool-patches] " Vladislav Shpilevoy 2018-09-06 12:47 ` Kirill Shcherbatov 2018-09-17 17:08 ` Vladimir Davydov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 3/4] box: introduce JSON indexes Kirill Shcherbatov 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy 2018-09-03 10:35 ` Vladislav Shpilevoy 2018-09-06 12:46 ` Kirill Shcherbatov 2018-08-27 7:37 ` [tarantool-patches] [PATCH v3 4/4] box: specify indexes in user-friendly form Kirill Shcherbatov 2018-09-03 10:32 ` [tarantool-patches] " Vladislav Shpilevoy 2018-09-06 12:46 ` Kirill Shcherbatov 2018-09-17 15:50 ` [tarantool-patches] [PATCH v3 0/4] box: indexes by JSON path Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox