* [PATCH v6 0/8] box: Indexes by JSON path @ 2018-12-17 6:52 Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 1/8] box: refactor tuple_validate_raw Kirill Shcherbatov ` (8 more replies) 0 siblings, 9 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Sometimes field data could have complex document structure. When this structure is consistent across whole document, you are able to create an index by JSON path. Changes in version 6: - key_part(s) path string is not 0-terminated anymore - global formats_epoch pool used to initialize tuple_format:epoch numeric field - more informative error messages - splitted changes to few new additional preparatory commits - got rid off hashtable in vy_stmt_build - prevent traversing field tree twice in vy_stmt_build - inline basic field_by_part with slowpath alternative - simplified tuple_format1_can_store_format2_tuples - splitted tuple_init_field_map to helper routines - moved tests to separate file - large and descriptive comments for many things http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes https://github.com/tarantool/tarantool/issues/1012 Kirill Shcherbatov (8): box: refactor tuple_validate_raw box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} box: build path to field string uniformly on error box: introduce JSON Indexes box: introduce has_json_paths flag in templates box: tune tuple_field_raw_by_path for indexed data box: introduce offset_slot cache in key_part box: specify indexes in user-friendly form src/box/alter.cc | 2 +- src/box/errcode.h | 6 +- src/box/index_def.c | 10 +- src/box/key_def.c | 172 +++++++-- src/box/key_def.h | 44 ++- src/box/lua/index.c | 64 ++++ src/box/lua/schema.lua | 22 +- src/box/lua/space.cc | 5 + src/box/memtx_engine.c | 4 + src/box/memtx_rtree.c | 2 +- src/box/sql.c | 1 + src/box/sql/build.c | 1 + src/box/sql/select.c | 3 +- src/box/sql/where.c | 1 + src/box/tuple.c | 35 +- src/box/tuple.h | 34 +- src/box/tuple_compare.cc | 125 +++++-- src/box/tuple_extract_key.cc | 128 +++++-- src/box/tuple_format.c | 666 ++++++++++++++++++++++++++++++----- src/box/tuple_format.h | 111 ++++-- src/box/tuple_hash.cc | 47 ++- src/box/vinyl.c | 4 + src/box/vy_log.c | 11 +- src/box/vy_point_lookup.c | 2 - src/box/vy_stmt.c | 105 +++++- test/box/alter.result | 3 +- test/box/blackhole.result | 6 +- test/box/ddl.result | 6 +- test/box/misc.result | 1 + test/box/rtree_misc.result | 3 +- test/engine/ddl.result | 57 +-- test/engine/json.result | 483 +++++++++++++++++++++++++ test/engine/json.test.lua | 139 ++++++++ test/engine/null.result | 24 +- test/engine/update.result | 6 +- test/vinyl/errinj.result | 3 +- 36 files changed, 1975 insertions(+), 361 deletions(-) create mode 100644 test/engine/json.result create mode 100644 test/engine/json.test.lua -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 1/8] box: refactor tuple_validate_raw 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 2/8] box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} Kirill Shcherbatov ` (7 subsequent siblings) 8 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Since the tuple_validate_raw and tuple_init_field_map functions make the same data checks, we implemented tuple_validate_raw via tuple_init_field_map called with region memory chunk is passed as field map. This is required because in subsequent patches the tuple_init_field_map routine logic will be complicated, and we want to avoid writing the same checks twice. Introduced the format->field_map_template allocation - a memory chunk having field_map_size and initialized with 0 for nullable fields and UINT32_MAX marker for other. The tuple_init_field_map copies template data to real field_map and do map initialization for indexed fields. Then the new routine tuple_field_map_validate scans initialized field_map and looks for UINT32_MAX marker. This is only integrity assert check now. But later it will be used to raise errors when JSON document has invalid structure. Needed for #1012 --- src/box/tuple.c | 35 +++-------- src/box/tuple_format.c | 129 +++++++++++++++++++++++++++++++---------- src/box/tuple_format.h | 8 +++ 3 files changed, 116 insertions(+), 56 deletions(-) diff --git a/src/box/tuple.c b/src/box/tuple.c index aae1c3cdd..4ad932f07 100644 --- a/src/box/tuple.c +++ b/src/box/tuple.c @@ -141,35 +141,18 @@ tuple_validate_raw(struct tuple_format *format, const char *tuple) if (tuple_format_field_count(format) == 0) return 0; /* Nothing to check */ - /* Check to see if the tuple has a sufficient number of fields. */ - uint32_t field_count = mp_decode_array(&tuple); - if (format->exact_field_count > 0 && - format->exact_field_count != field_count) { - diag_set(ClientError, ER_EXACT_FIELD_COUNT, - (unsigned) field_count, - (unsigned) format->exact_field_count); + struct region *region = &fiber()->gc; + uint32_t used = region_used(region); + uint32_t *field_map = region_alloc(region, format->field_map_size); + if (field_map == NULL) { + diag_set(OutOfMemory, format->field_map_size, "region_alloc", + "field_map"); return -1; } - if (unlikely(field_count < format->min_field_count)) { - diag_set(ClientError, ER_MIN_FIELD_COUNT, - (unsigned) field_count, - (unsigned) format->min_field_count); + field_map = (uint32_t *)((char *)field_map + format->field_map_size); + if (tuple_init_field_map(format, field_map, tuple, true) != 0) return -1; - } - - /* Check field types */ - struct tuple_field *field = tuple_format_field(format, 0); - uint32_t i = 0; - uint32_t defined_field_count = - MIN(field_count, tuple_format_field_count(format)); - for (; i < defined_field_count; ++i) { - field = tuple_format_field(format, i); - if (key_mp_type_validate(field->type, mp_typeof(*tuple), - ER_FIELD_TYPE, i + TUPLE_INDEX_BASE, - tuple_field_is_nullable(field))) - return -1; - mp_next(&tuple); - } + region_truncate(region, used); return 0; } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 3af39a37e..1ed3656e4 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -208,6 +208,42 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, return -1; } format->field_map_size = field_map_size; + /** + * Allocate field_map_template used for field map + * initialization and validation. + * Read tuple_format:field_map_template description for + * more details. + */ + uint32_t *field_map_template = malloc(field_map_size); + if (field_map_template == NULL) { + diag_set(OutOfMemory, field_map_size, "malloc", + "field_map_template"); + return -1; + } + format->field_map_template = field_map_template; + /* + * Mark all template_field_map items as uninitialized + * with UINT32_MAX magic value. + */ + field_map_template = (uint32_t *)((char *)field_map_template + + format->field_map_size); + for (int i = -1; i >= current_slot; i--) + field_map_template[i] = UINT32_MAX; + struct tuple_field *field; + struct json_token *root = (struct json_token *)&format->fields.root; + json_tree_foreach_entry_preorder(field, root, struct tuple_field, + token) { + /* + * Initialize nullable fields in + * field_map_template with 0 as we shouldn't + * raise error when field_map item for nullable + * field was not calculated during tuple parse + * (when tuple lacks such field). + */ + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL && + tuple_field_is_nullable(field)) + field_map_template[field->offset_slot] = 0; + } return 0; } @@ -330,6 +366,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->index_field_count = index_field_count; format->exact_field_count = 0; format->min_field_count = 0; + format->field_map_template = NULL; return format; error: tuple_format_destroy_fields(format); @@ -341,6 +378,7 @@ error: static inline void tuple_format_destroy(struct tuple_format *format) { + free(format->field_map_template); tuple_format_destroy_fields(format); tuple_dictionary_unref(format->dict); } @@ -424,6 +462,25 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, return true; } +/** + * Verify that all offset_slots has been initialized in field_map. + * Routine relies on the field_map memory has been filled from the + * field_map_template containing UINT32_MAX marker for required + * fields. + */ +static int +tuple_field_map_validate(const struct tuple_format *format, uint32_t *field_map) +{ + int32_t field_map_items = + (int32_t)(format->field_map_size/sizeof(field_map[0])); + for (int32_t i = -1; i >= -field_map_items; i--) { + if (field_map[i] != UINT32_MAX) + continue; + return -1; + } + return 0; +} + /** @sa declaration for details. */ int tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, @@ -449,44 +506,56 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, (unsigned) format->min_field_count); return -1; } - - /* first field is simply accessible, so we do not store offset to it */ - enum mp_type mp_type = mp_typeof(*pos); - const struct tuple_field *field = - tuple_format_field((struct tuple_format *)format, 0); - if (validate && - key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, - TUPLE_INDEX_BASE, tuple_field_is_nullable(field))) - return -1; - mp_next(&pos); - /* other fields...*/ - uint32_t i = 1; uint32_t defined_field_count = MIN(field_count, validate ? tuple_format_field_count(format) : format->index_field_count); - if (field_count < format->index_field_count) { - /* - * Nullify field map to be able to detect by 0, - * which key fields are absent in tuple_field(). - */ + /* + * Initialize memory with zeros when no validation is + * required as it is reserved field_map value for nullable + * fields. + */ + if (!validate) { memset((char *)field_map - format->field_map_size, 0, format->field_map_size); + } else { + memcpy((char *)field_map - format->field_map_size, + format->field_map_template, format->field_map_size); } - for (; i < defined_field_count; ++i) { - field = tuple_format_field((struct tuple_format *)format, i); - mp_type = mp_typeof(*pos); - if (validate && - key_mp_type_validate(field->type, mp_type, ER_FIELD_TYPE, - i + TUPLE_INDEX_BASE, - tuple_field_is_nullable(field))) - return -1; - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { - field_map[field->offset_slot] = - (uint32_t) (pos - tuple); + struct json_tree *tree = (struct json_tree *)&format->fields; + struct json_token *parent = &tree->root; + struct json_token token; + token.type = JSON_TOKEN_NUM; + token.num = 0; + while ((uint32_t)token.num < defined_field_count) { + struct tuple_field *field = + json_tree_lookup_entry(tree, parent, &token, + struct tuple_field, token); + enum mp_type type = mp_typeof(*pos); + if (field != NULL) { + bool is_nullable = tuple_field_is_nullable(field); + if (validate && + key_mp_type_validate(field->type, type, + ER_FIELD_TYPE, token.num + 1, + is_nullable) != 0) + return -1; + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { + field_map[field->offset_slot] = + (uint32_t)(pos - tuple); + } } + token.num++; mp_next(&pos); - } - return 0; + }; + if (!validate) + return 0; + int rc = tuple_field_map_validate(format, field_map); + /* + * As assert field_count >= min_field_count has already + * tested and all first-level fields has parsed, all + * offset_slots must be initialized. + */ + assert(rc == 0); + return rc; } uint32_t diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 949337807..e61e271d5 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -154,6 +154,14 @@ struct tuple_format { * \sa struct tuple */ uint16_t field_map_size; + /** + * Template field map initialized with zeros for nullable + * and UINT32_MAX for other fields. It is used for the + * tuple_init_field_map initial fill of a new field map + * in order to check that all fields were filled out after + * parsing a data tuple. + */ + uint32_t *field_map_template; /** * If not set (== 0), any tuple in the space can have any number of * fields. If set, each tuple must have exactly this number of fields. -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 2/8] box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 1/8] box: refactor tuple_validate_raw Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 3/8] box: build path to field string uniformly on error Kirill Shcherbatov ` (6 subsequent siblings) 8 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Reworked ER_FIELD_TYPE and ER_ACTION_MISMATCH error to pass string instead of field number. Refactored tuple_init_field_map to set this ER_FIELD_TYPE directly when new routine mp_type_is_compatible test is failed. This code is hot, so preparing fieldno string for modified key_mp_type_validate each time were not acceptable solution. This patch is required for further JSON patches, to give detailed information about field for ER_FIELD_TYPE error type. Needed for #1012 --- src/box/errcode.h | 4 ++-- src/box/memtx_rtree.c | 2 +- src/box/tuple.h | 15 +++++++++------ src/box/tuple_format.c | 26 ++++++++++++++++++++------ 4 files changed, 32 insertions(+), 15 deletions(-) diff --git a/src/box/errcode.h b/src/box/errcode.h index 73359ebdf..7d1f8ddc7 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -75,7 +75,7 @@ struct errcode_record { /* 20 */_(ER_INVALID_MSGPACK, "Invalid MsgPack - %s") \ /* 21 */_(ER_PROC_RET, "msgpack.encode: can not encode Lua type '%s'") \ /* 22 */_(ER_TUPLE_NOT_ARRAY, "Tuple/Key must be MsgPack array") \ - /* 23 */_(ER_FIELD_TYPE, "Tuple field %u type does not match one required by operation: expected %s") \ + /* 23 */_(ER_FIELD_TYPE, "Tuple field %s type does not match one required by operation: expected %s") \ /* 24 */_(ER_INDEX_PART_TYPE_MISMATCH, "Field %s has type '%s' in one index, but type '%s' in another") \ /* 25 */_(ER_SPLICE, "SPLICE error on field %u: %s") \ /* 26 */_(ER_UPDATE_ARG_TYPE, "Argument type in operation '%c' on field %u does not match field type: expected %s") \ @@ -214,7 +214,7 @@ struct errcode_record { /*159 */_(ER_SQL_EXECUTE, "Failed to execute SQL statement: %s") \ /*160 */_(ER_SQL, "SQL error: %s") \ /*161 */_(ER_SQL_BIND_NOT_FOUND, "Parameter %s was not found in the statement") \ - /*162 */_(ER_ACTION_MISMATCH, "Field %d contains %s on conflict action, but %s in index parts") \ + /*162 */_(ER_ACTION_MISMATCH, "Field %s contains %s on conflict action, but %s in index parts") \ /*163 */_(ER_VIEW_MISSING_SQL, "Space declared as a view must have SQL statement") \ /*164 */_(ER_FOREIGN_KEY_CONSTRAINT, "Can not commit transaction: deferred foreign keys violations are not resolved") \ /*165 */_(ER_NO_SUCH_MODULE, "Module '%s' does not exist") \ diff --git a/src/box/memtx_rtree.c b/src/box/memtx_rtree.c index f2aa6c3e5..269660ac3 100644 --- a/src/box/memtx_rtree.c +++ b/src/box/memtx_rtree.c @@ -48,7 +48,7 @@ mp_decode_num(const char **data, uint32_t fieldno, double *ret) { if (mp_read_double(data, ret) != 0) { diag_set(ClientError, ER_FIELD_TYPE, - fieldno + TUPLE_INDEX_BASE, + tt_sprintf("%u", fieldno + TUPLE_INDEX_BASE), field_type_strs[FIELD_TYPE_NUMBER]); return -1; } diff --git a/src/box/tuple.h b/src/box/tuple.h index 3361b6757..3c8b8825e 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -640,8 +640,8 @@ tuple_next_with_type(struct tuple_iterator *it, enum mp_type type) } if (mp_typeof(*field) != type) { diag_set(ClientError, ER_FIELD_TYPE, - fieldno + TUPLE_INDEX_BASE, - mp_type_strs[type]); + tt_sprintf("%u", fieldno + TUPLE_INDEX_BASE), + mp_type_strs[type]); return NULL; } return field; @@ -658,7 +658,7 @@ tuple_next_u32(struct tuple_iterator *it, uint32_t *out) uint32_t val = mp_decode_uint(&field); if (val > UINT32_MAX) { diag_set(ClientError, ER_FIELD_TYPE, - fieldno + TUPLE_INDEX_BASE, + tt_sprintf("%u", fieldno + TUPLE_INDEX_BASE), field_type_strs[FIELD_TYPE_UNSIGNED]); return -1; } @@ -706,7 +706,8 @@ tuple_field_with_type(const struct tuple *tuple, uint32_t fieldno, } if (mp_typeof(*field) != type) { diag_set(ClientError, ER_FIELD_TYPE, - fieldno + TUPLE_INDEX_BASE, mp_type_strs[type]); + tt_sprintf("%u", fieldno + TUPLE_INDEX_BASE), + mp_type_strs[type]); return NULL; } return field; @@ -751,7 +752,8 @@ tuple_field_i64(const struct tuple *tuple, uint32_t fieldno, int64_t *out) } FALLTHROUGH; default: - diag_set(ClientError, ER_FIELD_TYPE, fieldno + TUPLE_INDEX_BASE, + diag_set(ClientError, ER_FIELD_TYPE, + tt_sprintf("%u", fieldno + TUPLE_INDEX_BASE), field_type_strs[FIELD_TYPE_INTEGER]); return -1; } @@ -784,7 +786,8 @@ tuple_field_u32(const struct tuple *tuple, uint32_t fieldno, uint32_t *out) return -1; *out = mp_decode_uint(&field); if (*out > UINT32_MAX) { - diag_set(ClientError, ER_FIELD_TYPE, fieldno + TUPLE_INDEX_BASE, + diag_set(ClientError, ER_FIELD_TYPE, + tt_sprintf("%u", fieldno + TUPLE_INDEX_BASE), field_type_strs[FIELD_TYPE_UNSIGNED]); return -1; } diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 1ed3656e4..903232c66 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -94,9 +94,9 @@ tuple_format_use_key_part(struct tuple_format *format, field->nullable_action = part->nullable_action; } else if (field->nullable_action != part->nullable_action) { diag_set(ClientError, ER_ACTION_MISMATCH, - part->fieldno + TUPLE_INDEX_BASE, - on_conflict_action_strs[field->nullable_action], - on_conflict_action_strs[part->nullable_action]); + tt_sprintf("%u", part->fieldno + TUPLE_INDEX_BASE), + on_conflict_action_strs[field->nullable_action], + on_conflict_action_strs[part->nullable_action]); return -1; } @@ -481,6 +481,17 @@ tuple_field_map_validate(const struct tuple_format *format, uint32_t *field_map) return 0; } +/** Checks if mp_type (MsgPack) is compatible with field type. */ +static inline bool +mp_type_is_compatible(enum mp_type mp_type, enum field_type type, + bool is_nullable) +{ + assert(type < field_type_MAX); + assert((size_t) mp_type < CHAR_BIT * sizeof(*key_mp_type)); + uint32_t mask = key_mp_type[type] | (is_nullable * (1U << MP_NIL)); + return (mask & (1U << mp_type)) != 0; +} + /** @sa declaration for details. */ int tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, @@ -534,10 +545,13 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, if (field != NULL) { bool is_nullable = tuple_field_is_nullable(field); if (validate && - key_mp_type_validate(field->type, type, - ER_FIELD_TYPE, token.num + 1, - is_nullable) != 0) + !mp_type_is_compatible(type, field->type, + is_nullable) != 0) { + diag_set(ClientError, ER_FIELD_TYPE, + tt_sprintf("%u", token.num + 1), + field_type_strs[field->type]); return -1; + } if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { field_map[field->offset_slot] = (uint32_t)(pos - tuple); -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 3/8] box: build path to field string uniformly on error 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 1/8] box: refactor tuple_validate_raw Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 2/8] box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 4/8] box: introduce JSON Indexes Kirill Shcherbatov ` (5 subsequent siblings) 8 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Introduced new tuple_field_json_path routine that constructs path to field string containing fieldno or field name when space format is set. In subsequent patches it will be improved to support complex JSON paths. Needed for #1012 --- src/box/tuple_format.c | 50 +++++++++++++++++++++------------ test/box/alter.result | 3 +- test/box/blackhole.result | 6 ++-- test/box/ddl.result | 6 ++-- test/box/rtree_misc.result | 3 +- test/engine/ddl.result | 57 +++++++++++++++++++++++--------------- test/engine/null.result | 24 ++++++++++------ test/engine/update.result | 6 ++-- test/vinyl/errinj.result | 3 +- 9 files changed, 100 insertions(+), 58 deletions(-) diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 903232c66..8338bba44 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -61,9 +61,28 @@ tuple_field_delete(struct tuple_field *field) free(field); } +/** Build the JSON path by field specified. */ +static const char * +tuple_field_json_path(const struct tuple_format *format, + struct tuple_field *field) +{ + struct json_token *token = &field->token; + const char *path; + if (token->parent == &format->fields.root && + token->num < (int)format->dict->name_count) { + const char *field_name = + format->dict->names[token->num]; + path = tt_sprintf("\"%s\"", field_name); + } else if (token->type == JSON_TOKEN_NUM) { + path = tt_sprintf("%u", token->num + TUPLE_INDEX_BASE); + } else { + unreachable(); + } + return path; +} + static int -tuple_format_use_key_part(struct tuple_format *format, - const struct field_def *fields, uint32_t field_count, +tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, const struct key_part *part, bool is_sequential, int *current_slot) { @@ -93,8 +112,9 @@ tuple_format_use_key_part(struct tuple_format *format, if (field->nullable_action == ON_CONFLICT_ACTION_NONE) field->nullable_action = part->nullable_action; } else if (field->nullable_action != part->nullable_action) { - diag_set(ClientError, ER_ACTION_MISMATCH, - tt_sprintf("%u", part->fieldno + TUPLE_INDEX_BASE), + const char *path = tuple_field_json_path(format, field); + assert(path != NULL); + diag_set(ClientError, ER_ACTION_MISMATCH, path, on_conflict_action_strs[field->nullable_action], on_conflict_action_strs[part->nullable_action]); return -1; @@ -111,21 +131,14 @@ tuple_format_use_key_part(struct tuple_format *format, field->type = part->type; } else if (!field_type1_contains_type2(part->type, field->type)) { - const char *name; - int fieldno = part->fieldno + TUPLE_INDEX_BASE; - if (part->fieldno >= field_count) { - name = tt_sprintf("%d", fieldno); - } else { - const struct field_def *def = - &fields[part->fieldno]; - name = tt_sprintf("'%s'", def->name); - } int errcode; if (!field->is_key_part) errcode = ER_FORMAT_MISMATCH_INDEX_PART; else errcode = ER_INDEX_PART_TYPE_MISMATCH; - diag_set(ClientError, errcode, name, + const char *path = tuple_field_json_path(format, field); + assert(path != NULL); + diag_set(ClientError, errcode, path, field_type_strs[field->type], field_type_strs[part->type]); return -1; @@ -190,8 +203,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, const struct key_part *parts_end = part + key_def->part_count; for (; part < parts_end; part++) { - if (tuple_format_use_key_part(format, fields, - field_count, part, + if (tuple_format_use_key_part(format, field_count, part, is_sequential, ¤t_slot) != 0) return -1; @@ -547,8 +559,10 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, if (validate && !mp_type_is_compatible(type, field->type, is_nullable) != 0) { - diag_set(ClientError, ER_FIELD_TYPE, - tt_sprintf("%u", token.num + 1), + const char *path = + tuple_field_json_path(format, field); + assert(path != NULL); + diag_set(ClientError, ER_FIELD_TYPE, path, field_type_strs[field->type]); return -1; } diff --git a/test/box/alter.result b/test/box/alter.result index 9a1086e0c..40c8f921a 100644 --- a/test/box/alter.result +++ b/test/box/alter.result @@ -36,7 +36,8 @@ _space:insert{_space.id, ADMIN, 'test', 'memtx', 0, EMPTY_MAP, {}} -- _space:insert{'hello', 'world', 'test', 'memtx', 0, EMPTY_MAP, {}} --- -- error: 'Tuple field 1 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "id" type does not match one required by operation: expected + unsigned' ... -- -- Can't create a space which has wrong field count - field_count must be NUM diff --git a/test/box/blackhole.result b/test/box/blackhole.result index 945b2755c..25c2eedfb 100644 --- a/test/box/blackhole.result +++ b/test/box/blackhole.result @@ -28,11 +28,13 @@ t, t.key, t.value ... s:insert{1, 2, 3} -- error --- -- error: 'Tuple field 2 type does not match one required by operation: expected string' +- error: 'Tuple field "value" type does not match one required by operation: expected + string' ... s:replace{'a', 'b', 'c'} -- error --- -- error: 'Tuple field 1 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "key" type does not match one required by operation: expected + unsigned' ... s:format{} --- diff --git a/test/box/ddl.result b/test/box/ddl.result index d3b0d1e0e..75155ed3f 100644 --- a/test/box/ddl.result +++ b/test/box/ddl.result @@ -371,11 +371,13 @@ box.space._collation.index.name:delete{'nothing'} -- allowed ... box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', 42} --- -- error: 'Tuple field 6 type does not match one required by operation: expected map' +- error: 'Tuple field "opts" type does not match one required by operation: expected + map' ... box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', 'options'} --- -- error: 'Tuple field 6 type does not match one required by operation: expected map' +- error: 'Tuple field "opts" type does not match one required by operation: expected + map' ... box.space._collation:auto_increment{'test', 0, 'ICU', 'ru_RU', {ping='pong'}} --- diff --git a/test/box/rtree_misc.result b/test/box/rtree_misc.result index 36d5b8f55..1b88203b9 100644 --- a/test/box/rtree_misc.result +++ b/test/box/rtree_misc.result @@ -554,7 +554,8 @@ s.index.s:drop() -- with wrong args box.space._index:insert{s.id, 2, 's', 'rtree', nil, {{2, 'array'}}} --- -- error: 'Tuple field 5 type does not match one required by operation: expected map' +- error: 'Tuple field "opts" type does not match one required by operation: expected + map' ... box.space._index:insert{s.id, 2, 's', 'rtree', utils.setmap({}), {{2, 'array'}}} --- diff --git a/test/engine/ddl.result b/test/engine/ddl.result index 3c84e942d..219faa9dc 100644 --- a/test/engine/ddl.result +++ b/test/engine/ddl.result @@ -707,7 +707,7 @@ pk = s:create_index('pk') ... sk1 = s:create_index('sk1', { parts = { 2, 'unsigned' } }) --- -- error: Field 'field2' has type 'string' in space format, but type 'unsigned' in +- error: Field "field2" has type 'string' in space format, but type 'unsigned' in index definition ... -- Check space format conflicting with index parts. @@ -719,7 +719,7 @@ format[2].type = 'unsigned' ... s:format(format) --- -- error: Field 'field2' has type 'unsigned' in space format, but type 'string' in +- error: Field "field2" has type 'unsigned' in space format, but type 'string' in index definition ... s:format() @@ -822,27 +822,33 @@ s:replace{1, '2', {3, 3}, 4.4, -5, true, {value=7}, 8, 9} ... s:replace{1, 2, {3, 3}, 4.4, -5, true, {value=7}, 8, 9} --- -- error: 'Tuple field 2 type does not match one required by operation: expected string' +- error: 'Tuple field "field2" type does not match one required by operation: expected + string' ... s:replace{1, '2', 3, 4.4, -5, true, {value=7}, 8, 9} --- -- error: 'Tuple field 3 type does not match one required by operation: expected array' +- error: 'Tuple field "field3" type does not match one required by operation: expected + array' ... s:replace{1, '2', {3, 3}, '4', -5, true, {value=7}, 8, 9} --- -- error: 'Tuple field 4 type does not match one required by operation: expected number' +- error: 'Tuple field "field4" type does not match one required by operation: expected + number' ... s:replace{1, '2', {3, 3}, 4.4, -5.5, true, {value=7}, 8, 9} --- -- error: 'Tuple field 5 type does not match one required by operation: expected integer' +- error: 'Tuple field "field5" type does not match one required by operation: expected + integer' ... s:replace{1, '2', {3, 3}, 4.4, -5, {6, 6}, {value=7}, 8, 9} --- -- error: 'Tuple field 6 type does not match one required by operation: expected scalar' +- error: 'Tuple field "field6" type does not match one required by operation: expected + scalar' ... s:replace{1, '2', {3, 3}, 4.4, -5, true, {7}, 8, 9} --- -- error: 'Tuple field 7 type does not match one required by operation: expected map' +- error: 'Tuple field "field7" type does not match one required by operation: expected + map' ... s:replace{1, '2', {3, 3}, 4.4, -5, true, {value=7}} --- @@ -1137,7 +1143,7 @@ inspector:cmd("setopt delimiter ''"); -- any --X--> unsigned fail_format_change(2, 'unsigned') --- -- 'Tuple field 2 type does not match one required by operation: expected unsigned' +- 'Tuple field "field2" type does not match one required by operation: expected unsigned' ... -- unsigned -----> any ok_format_change(3, 'any') @@ -1146,7 +1152,7 @@ ok_format_change(3, 'any') -- unsigned --X--> string fail_format_change(3, 'string') --- -- 'Tuple field 3 type does not match one required by operation: expected string' +- 'Tuple field "field3" type does not match one required by operation: expected string' ... -- unsigned -----> number ok_format_change(3, 'number') @@ -1163,7 +1169,7 @@ ok_format_change(3, 'scalar') -- unsigned --X--> map fail_format_change(3, 'map') --- -- 'Tuple field 3 type does not match one required by operation: expected map' +- 'Tuple field "field3" type does not match one required by operation: expected map' ... -- string -----> any ok_format_change(4, 'any') @@ -1176,7 +1182,7 @@ ok_format_change(4, 'scalar') -- string --X--> boolean fail_format_change(4, 'boolean') --- -- 'Tuple field 4 type does not match one required by operation: expected boolean' +- 'Tuple field "field4" type does not match one required by operation: expected boolean' ... -- number -----> any ok_format_change(5, 'any') @@ -1189,7 +1195,7 @@ ok_format_change(5, 'scalar') -- number --X--> integer fail_format_change(5, 'integer') --- -- 'Tuple field 5 type does not match one required by operation: expected integer' +- 'Tuple field "field5" type does not match one required by operation: expected integer' ... -- integer -----> any ok_format_change(6, 'any') @@ -1206,7 +1212,7 @@ ok_format_change(6, 'scalar') -- integer --X--> unsigned fail_format_change(6, 'unsigned') --- -- 'Tuple field 6 type does not match one required by operation: expected unsigned' +- 'Tuple field "field6" type does not match one required by operation: expected unsigned' ... -- boolean -----> any ok_format_change(7, 'any') @@ -1219,7 +1225,7 @@ ok_format_change(7, 'scalar') -- boolean --X--> string fail_format_change(7, 'string') --- -- 'Tuple field 7 type does not match one required by operation: expected string' +- 'Tuple field "field7" type does not match one required by operation: expected string' ... -- scalar -----> any ok_format_change(8, 'any') @@ -1228,7 +1234,7 @@ ok_format_change(8, 'any') -- scalar --X--> unsigned fail_format_change(8, 'unsigned') --- -- 'Tuple field 8 type does not match one required by operation: expected unsigned' +- 'Tuple field "field8" type does not match one required by operation: expected unsigned' ... -- array -----> any ok_format_change(9, 'any') @@ -1237,7 +1243,7 @@ ok_format_change(9, 'any') -- array --X--> scalar fail_format_change(9, 'scalar') --- -- 'Tuple field 9 type does not match one required by operation: expected scalar' +- 'Tuple field "field9" type does not match one required by operation: expected scalar' ... -- map -----> any ok_format_change(10, 'any') @@ -1246,7 +1252,7 @@ ok_format_change(10, 'any') -- map --X--> scalar fail_format_change(10, 'scalar') --- -- 'Tuple field 10 type does not match one required by operation: expected scalar' +- 'Tuple field "field10" type does not match one required by operation: expected scalar' ... s:drop() --- @@ -1478,7 +1484,8 @@ format[2].is_nullable = false ... s:format(format) --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "field2" type does not match one required by operation: expected + unsigned' ... _ = s:delete(1) --- @@ -1761,11 +1768,13 @@ s:format() ... s:replace{1, '100', -20.2} --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "field2" type does not match one required by operation: expected + unsigned' ... s:replace{1, 100, -20.2} --- -- error: 'Tuple field 3 type does not match one required by operation: expected integer' +- error: 'Tuple field "field3" type does not match one required by operation: expected + integer' ... s:replace{1, 100, -20} --- @@ -1829,11 +1838,13 @@ sk4:alter{parts = {{3, 'integer'}}} ... s:replace{1, 50.5, 1.5} --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "field2" type does not match one required by operation: expected + unsigned' ... s:replace{1, 50, 1.5} --- -- error: 'Tuple field 3 type does not match one required by operation: expected integer' +- error: 'Tuple field "field3" type does not match one required by operation: expected + integer' ... s:replace{5, 5, 5} --- diff --git a/test/engine/null.result b/test/engine/null.result index 757e63185..ca05fccf9 100644 --- a/test/engine/null.result +++ b/test/engine/null.result @@ -970,7 +970,8 @@ s:replace{50, 50} ... s:replace{25, box.NULL} --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "field2" type does not match one required by operation: expected + unsigned' ... format[2].is_nullable = true --- @@ -1774,7 +1775,8 @@ s:insert{5} ... s:insert{5, box.NULL} --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... s.index.secondary:alter{parts={{2, 'unsigned', is_nullable=true}}} -- This is allowed. --- @@ -1782,7 +1784,8 @@ s.index.secondary:alter{parts={{2, 'unsigned', is_nullable=true}}} -- This is al -- Without space format setting this fails. s:insert{5, box.NULL} --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... s:insert{5} --- @@ -1801,7 +1804,8 @@ s:format(format) -- This is also allowed. -- inserts still fail due to not nullable index parts. s:insert{5, box.NULL} --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... s:insert{5} --- @@ -1838,11 +1842,13 @@ format[2].is_nullable = false ... s:format(format) -- Fail. --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... s.index.secondary:alter{parts={{2, 'unsigned', is_nullable=false}}} -- Fail. --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... _ = s:delete{5} --- @@ -1869,7 +1875,8 @@ s:format(format) ... s:insert{5, box.NULL} -- Fail. --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... s:insert{5} -- Fail. --- @@ -1887,7 +1894,8 @@ s.index.secondary:alter{parts={{2, 'unsigned', is_nullable=false}}} ... s:insert{5, box.NULL} -- Fail. --- -- error: 'Tuple field 2 type does not match one required by operation: expected unsigned' +- error: 'Tuple field "second" type does not match one required by operation: expected + unsigned' ... s:insert{5} -- Fail. --- diff --git a/test/engine/update.result b/test/engine/update.result index b73037ebc..138859ac5 100644 --- a/test/engine/update.result +++ b/test/engine/update.result @@ -722,7 +722,8 @@ aa.VAL -- invalid update aa:update({{'=',2, 666}}) --- -- error: 'Tuple field 2 type does not match one required by operation: expected string' +- error: 'Tuple field "VAL" type does not match one required by operation: expected + string' ... -- test transform integrity aa:transform(-1, 1) @@ -753,7 +754,8 @@ box.space.tst_sample:get(2).VAL -- invalid upsert s:upsert({2, 666}, {{'=', 2, 666}}) --- -- error: 'Tuple field 2 type does not match one required by operation: expected string' +- error: 'Tuple field "VAL" type does not match one required by operation: expected + string' ... s:drop() --- diff --git a/test/vinyl/errinj.result b/test/vinyl/errinj.result index a081575be..1a68db015 100644 --- a/test/vinyl/errinj.result +++ b/test/vinyl/errinj.result @@ -1449,7 +1449,8 @@ format[2].is_nullable = false ... s:format(format) -- must fail --- -- error: 'Tuple field 2 type does not match one required by operation: expected string' +- error: 'Tuple field "field2" type does not match one required by operation: expected + string' ... errinj.set("ERRINJ_VY_READ_PAGE_TIMEOUT", 0) --- -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 4/8] box: introduce JSON Indexes 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov ` (2 preceding siblings ...) 2018-12-17 6:52 ` [PATCH v6 3/8] box: build path to field string uniformly on error Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 5/8] box: introduce has_json_paths flag in templates Kirill Shcherbatov ` (4 subsequent siblings) 8 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov New JSON indexes allows to index documents content. At first, introduced new key_part fields path and path_len representing JSON path string specified by user. Modified tuple_format_use_key_part routine constructs corresponding tuple_fields chain in tuple_format:fields tree to indexed data. The resulting tree is used for type checking and for alloctating indexed fields offset slots. Refined tuple_init_field_map routine logic parses tuple msgpack in depth using stack allocated on region and initialize field map with corresponding tuple_format:field if any. This stack is necessary as mp-container(map or array) length is specified at the frame beginning, but this information is also required to determine mp-container end. Then tuple_field_map_validate checks initialized field_map and raises an ER_DATA_STRUCTURE_MISMATCH error when magic UINT32_MAX value founded. This typically means that indexed fields presenting in tuple_format:fields tree were not looked up during antecedent tuple parsing step and is not nullable(otherwise there would be 0, not UINT32_MAX magic). Offset slot allows to uniquely identify the indexed field in the tree to prepare descriptive message. The other essential feature is vinyl's secondary key restored by key_part (stmt) extracted keys loaded frim disc. Introduced tuple_format:total_field_count and tuple_field:id allows to allocate iov's array of size sizeof(iov[0])*tuple_format:total_field_count on region and link extracted keys blobs with corresponding fields by tuple_field:id. New vy_stmt_tuple_restore_raw would traverse tuple_format:fields tree and contruct vy_stmt data using iov's array to place data blobs for indexed leafs. Introduced vy_stmt_meta_size - precalculated stmt size as if all leaf fields are zero. It allows allocate stmt chunk without extra traversing a tree. Example: To create a new JSON index specify path to document data as a part of key_part: parts = {{3, 'str', path = '.FIO.fname', is_nullable = false}} idx = s:create_index('json_idx', {parts = parse}) idx:select("Ivanov") Part of #1012 --- src/box/alter.cc | 2 +- src/box/errcode.h | 2 +- src/box/index_def.c | 10 +- src/box/key_def.c | 166 +++++++++-- src/box/key_def.h | 33 ++- src/box/lua/space.cc | 5 + src/box/memtx_engine.c | 4 + src/box/sql.c | 1 + src/box/sql/build.c | 1 + src/box/sql/select.c | 3 +- src/box/sql/where.c | 1 + src/box/tuple_compare.cc | 13 +- src/box/tuple_extract_key.cc | 28 +- src/box/tuple_format.c | 518 ++++++++++++++++++++++++++++++----- src/box/tuple_format.h | 73 ++++- src/box/tuple_hash.cc | 2 +- src/box/vinyl.c | 4 + src/box/vy_log.c | 11 +- src/box/vy_point_lookup.c | 2 - src/box/vy_stmt.c | 105 +++++-- test/box/misc.result | 1 + test/engine/json.result | 450 ++++++++++++++++++++++++++++++ test/engine/json.test.lua | 129 +++++++++ 23 files changed, 1427 insertions(+), 137 deletions(-) create mode 100644 test/engine/json.result create mode 100644 test/engine/json.test.lua diff --git a/src/box/alter.cc b/src/box/alter.cc index 03ba68adc..d6d37d60e 100644 --- a/src/box/alter.cc +++ b/src/box/alter.cc @@ -277,7 +277,7 @@ index_def_new_from_tuple(struct tuple *tuple, struct space *space) }); if (key_def_decode_parts(part_def, part_count, &parts, space->def->fields, - space->def->field_count) != 0) + space->def->field_count, &fiber()->gc) != 0) diag_raise(); key_def = key_def_new(part_def, part_count); if (key_def == NULL) diff --git a/src/box/errcode.h b/src/box/errcode.h index 7d1f8ddc7..e097607e3 100644 --- a/src/box/errcode.h +++ b/src/box/errcode.h @@ -138,7 +138,7 @@ struct errcode_record { /* 83 */_(ER_ROLE_EXISTS, "Role '%s' already exists") \ /* 84 */_(ER_CREATE_ROLE, "Failed to create role '%s': %s") \ /* 85 */_(ER_INDEX_EXISTS, "Index '%s' already exists") \ - /* 86 */_(ER_UNUSED6, "") \ + /* 86 */_(ER_DATA_STRUCTURE_MISMATCH, "Tuple doesn't math document structure: %s") \ /* 87 */_(ER_ROLE_LOOP, "Granting role '%s' to role '%s' would create a loop") \ /* 88 */_(ER_GRANT, "Incorrect grant arguments: %s") \ /* 89 */_(ER_PRIV_GRANTED, "User '%s' already has %s access on %s '%s'") \ diff --git a/src/box/index_def.c b/src/box/index_def.c index 45c74d9ec..e430a70ba 100644 --- a/src/box/index_def.c +++ b/src/box/index_def.c @@ -31,6 +31,8 @@ #include "index_def.h" #include "schema_def.h" #include "identifier.h" +#include "tuple_format.h" +#include "json/json.h" const char *index_type_strs[] = { "HASH", "TREE", "BITSET", "RTREE" }; @@ -298,8 +300,12 @@ index_def_is_valid(struct index_def *index_def, const char *space_name) * Courtesy to a user who could have made * a typo. */ - if (index_def->key_def->parts[i].fieldno == - index_def->key_def->parts[j].fieldno) { + struct key_part *part_a = &index_def->key_def->parts[i]; + struct key_part *part_b = &index_def->key_def->parts[j]; + if (part_a->fieldno == part_b->fieldno && + json_path_cmp(part_a->path, part_a->path_len, + part_b->path, part_b->path_len, + TUPLE_INDEX_BASE) == 0){ diag_set(ClientError, ER_MODIFY_INDEX, index_def->name, space_name, "same key part is indexed twice"); diff --git a/src/box/key_def.c b/src/box/key_def.c index 2119ca389..a6bb89f5b 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -28,6 +28,7 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include "json/json.h" #include "key_def.h" #include "tuple_compare.h" #include "tuple_extract_key.h" @@ -35,6 +36,7 @@ #include "column_mask.h" #include "schema_def.h" #include "coll_id_cache.h" +#include "small/region.h" const char *sort_order_strs[] = { "asc", "desc", "undef" }; @@ -44,7 +46,8 @@ const struct key_part_def key_part_def_default = { COLL_NONE, false, ON_CONFLICT_ACTION_DEFAULT, - SORT_ORDER_ASC + SORT_ORDER_ASC, + NULL }; static int64_t @@ -59,6 +62,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len) #define PART_OPT_NULLABILITY "is_nullable" #define PART_OPT_NULLABLE_ACTION "nullable_action" #define PART_OPT_SORT_ORDER "sort_order" +#define PART_OPT_PATH "path" const struct opt_def part_def_reg[] = { OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type, @@ -71,6 +75,7 @@ const struct opt_def part_def_reg[] = { struct key_part_def, nullable_action, NULL), OPT_DEF_ENUM(PART_OPT_SORT_ORDER, sort_order, struct key_part_def, sort_order, NULL), + OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path), OPT_END, }; @@ -106,13 +111,23 @@ const uint32_t key_mp_type[] = { struct key_def * key_def_dup(const struct key_def *src) { - size_t sz = key_def_sizeof(src->part_count); - struct key_def *res = (struct key_def *)malloc(sz); + size_t sz = 0; + for (uint32_t i = 0; i < src->part_count; i++) + sz += src->parts[i].path_len; + sz = key_def_sizeof(src->part_count, sz); + struct key_def *res = (struct key_def *)calloc(1, sz); if (res == NULL) { diag_set(OutOfMemory, sz, "malloc", "res"); return NULL; } memcpy(res, src, sz); + /* Update paths to point to the new memory chunk.*/ + for (uint32_t i = 0; i < src->part_count; i++) { + if (src->parts[i].path == NULL) + continue; + size_t path_offset = src->parts[i].path - (char *)src; + res->parts[i].path = (char *)res + path_offset; + } return res; } @@ -120,8 +135,16 @@ void key_def_swap(struct key_def *old_def, struct key_def *new_def) { assert(old_def->part_count == new_def->part_count); - for (uint32_t i = 0; i < new_def->part_count; i++) + for (uint32_t i = 0; i < new_def->part_count; i++) { SWAP(old_def->parts[i], new_def->parts[i]); + /* + * Paths are allocated as a part of key_def so + * we need to swap path pointers back - it's OK + * as paths aren't supposed to change. + */ + assert(old_def->parts[i].path_len == new_def->parts[i].path_len); + SWAP(old_def->parts[i].path, new_def->parts[i].path); + } SWAP(*old_def, *new_def); } @@ -144,24 +167,39 @@ static void key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, enum on_conflict_action nullable_action, struct coll *coll, uint32_t coll_id, - enum sort_order sort_order) + enum sort_order sort_order, const char *path, + uint32_t path_len, char **static_pool) { assert(part_no < def->part_count); assert(type < field_type_MAX); def->is_nullable |= (nullable_action == ON_CONFLICT_ACTION_NONE); + def->has_json_paths |= path != NULL; def->parts[part_no].nullable_action = nullable_action; def->parts[part_no].fieldno = fieldno; def->parts[part_no].type = type; def->parts[part_no].coll = coll; def->parts[part_no].coll_id = coll_id; def->parts[part_no].sort_order = sort_order; + if (path != NULL) { + assert(static_pool != NULL); + def->parts[part_no].path = *static_pool; + *static_pool += path_len; + memcpy(def->parts[part_no].path, path, path_len); + def->parts[part_no].path_len = path_len; + } else { + def->parts[part_no].path = NULL; + def->parts[part_no].path_len = 0; + } column_mask_set_fieldno(&def->column_mask, fieldno); } struct key_def * key_def_new(const struct key_part_def *parts, uint32_t part_count) { - size_t sz = key_def_sizeof(part_count); + ssize_t sz = 0; + for (uint32_t i = 0; i < part_count; i++) + sz += parts[i].path != NULL ? strlen(parts[i].path) : 0; + sz = key_def_sizeof(part_count, sz); struct key_def *def = calloc(1, sz); if (def == NULL) { diag_set(OutOfMemory, sz, "malloc", "struct key_def"); @@ -171,6 +209,7 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count) def->part_count = part_count; def->unique_part_count = part_count; + char *data = (char *)def + key_def_sizeof(part_count, 0); for (uint32_t i = 0; i < part_count; i++) { const struct key_part_def *part = &parts[i]; struct coll *coll = NULL; @@ -184,16 +223,18 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count) } coll = coll_id->coll; } + uint32_t path_len = part->path != NULL ? strlen(part->path) : 0; key_def_set_part(def, i, part->fieldno, part->type, part->nullable_action, coll, part->coll_id, - part->sort_order); + part->sort_order, part->path, path_len, &data); } key_def_set_cmp(def); return def; } -void -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts) +int +key_def_dump_parts(const struct key_def *def, struct key_part_def *parts, + struct region *pool) { for (uint32_t i = 0; i < def->part_count; i++) { const struct key_part *part = &def->parts[i]; @@ -203,13 +244,28 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts) part_def->is_nullable = key_part_is_nullable(part); part_def->nullable_action = part->nullable_action; part_def->coll_id = part->coll_id; + if (part->path != NULL) { + assert(pool != NULL); + char *path = region_alloc(pool, part->path_len + 1); + if (path == NULL) { + diag_set(OutOfMemory, part->path_len + 1, + "region_alloc", "part_def->path"); + return -1; + } + memcpy(path, part->path, part->path_len); + path[part->path_len] = '\0'; + part_def->path = path; + } else { + part_def->path = NULL; + } } + return 0; } box_key_def_t * box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) { - size_t sz = key_def_sizeof(part_count); + size_t sz = key_def_sizeof(part_count, 0); struct key_def *key_def = calloc(1, sz); if (key_def == NULL) { diag_set(OutOfMemory, sz, "malloc", "struct key_def"); @@ -223,7 +279,8 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) key_def_set_part(key_def, item, fields[item], (enum field_type)types[item], ON_CONFLICT_ACTION_DEFAULT, - NULL, COLL_NONE, SORT_ORDER_ASC); + NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0, + NULL); } key_def_set_cmp(key_def); return key_def; @@ -272,6 +329,11 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1, if (key_part_is_nullable(part1) != key_part_is_nullable(part2)) return key_part_is_nullable(part1) < key_part_is_nullable(part2) ? -1 : 1; + int rc; + if ((rc = json_path_cmp(part1->path, part1->path_len, + part2->path, part2->path_len, + TUPLE_INDEX_BASE)) != 0) + return rc; } return part_count1 < part_count2 ? -1 : part_count1 > part_count2; } @@ -303,8 +365,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts, for (uint32_t i = 0; i < part_count; i++) { const struct key_part_def *part = &parts[i]; assert(part->type < field_type_MAX); - SNPRINT(total, snprintf, buf, size, "%d, '%s'", - (int)part->fieldno, field_type_strs[part->type]); + if (part->path != NULL) { + SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'", + (int)part->fieldno, field_type_strs[part->type], + part->path); + } else { + SNPRINT(total, snprintf, buf, size, "%d, '%s'", + (int)part->fieldno, + field_type_strs[part->type]); + } if (i < part_count - 1) SNPRINT(total, snprintf, buf, size, ", "); } @@ -323,6 +392,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count) count++; if (part->is_nullable) count++; + if (part->path != NULL) + count++; size += mp_sizeof_map(count); size += mp_sizeof_str(strlen(PART_OPT_FIELD)); size += mp_sizeof_uint(part->fieldno); @@ -337,6 +408,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count) size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY)); size += mp_sizeof_bool(part->is_nullable); } + if (part->path != NULL) { + size += mp_sizeof_str(strlen(PART_OPT_PATH)); + size += mp_sizeof_str(strlen(part->path)); + } } return size; } @@ -352,6 +427,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, count++; if (part->is_nullable) count++; + if (part->path != NULL) + count++; data = mp_encode_map(data, count); data = mp_encode_str(data, PART_OPT_FIELD, strlen(PART_OPT_FIELD)); @@ -371,6 +448,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, strlen(PART_OPT_NULLABILITY)); data = mp_encode_bool(data, part->is_nullable); } + if (part->path != NULL) { + data = mp_encode_str(data, PART_OPT_PATH, + strlen(PART_OPT_PATH)); + data = mp_encode_str(data, part->path, + strlen(part->path)); + } } return data; } @@ -432,6 +515,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count, fields[part->fieldno].is_nullable : key_part_def_default.is_nullable); part->coll_id = COLL_NONE; + part->path = NULL; } return 0; } @@ -439,7 +523,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count, int key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, const char **data, const struct field_def *fields, - uint32_t field_count) + uint32_t field_count, struct region *pool) { if (mp_typeof(**data) == MP_ARRAY) { return key_def_decode_parts_166(parts, part_count, data, @@ -468,7 +552,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, const char *key = mp_decode_str(data, &key_len); if (opts_parse_key(part, part_def_reg, key, key_len, data, ER_WRONG_INDEX_OPTIONS, - i + TUPLE_INDEX_BASE, NULL, + i + TUPLE_INDEX_BASE, pool, false) != 0) return -1; if (is_action_missing && @@ -514,6 +598,28 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, "index part: unknown sort order"); return -1; } + if (part->path != NULL) { + uint32_t path_len = strlen(part->path); + int rc = 0; + if (*part->path != '.' && *part->path != '[') { + /* JSON path should be relative. */ + rc = 1; + } else { + rc = json_path_validate(part->path, path_len, + TUPLE_INDEX_BASE); + } + if (rc != 0) { + const char *err_msg = + tt_sprintf("invalid JSON path '%s': " + "path has invalid structure " + "(error at position %d)", + part->path, rc); + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS, + part->fieldno + TUPLE_INDEX_BASE, + err_msg); + return -1; + } + } } return 0; } @@ -533,7 +639,10 @@ key_def_find(const struct key_def *key_def, const struct key_part *to_find) const struct key_part *part = key_def->parts; const struct key_part *end = part + key_def->part_count; for (; part != end; part++) { - if (part->fieldno == to_find->fieldno) + if (part->fieldno == to_find->fieldno && + json_path_cmp(part->path, part->path_len, + to_find->path, to_find->path_len, + TUPLE_INDEX_BASE) == 0) return part; } return NULL; @@ -559,18 +668,25 @@ key_def_merge(const struct key_def *first, const struct key_def *second) * Find and remove part duplicates, i.e. parts counted * twice since they are present in both key defs. */ - const struct key_part *part = second->parts; - const struct key_part *end = part + second->part_count; + size_t sz = 0; + const struct key_part *part = first->parts; + const struct key_part *end = part + first->part_count; + for (; part != end; part++) + sz += part->path_len; + part = second->parts; + end = part + second->part_count; for (; part != end; part++) { if (key_def_find(first, part) != NULL) --new_part_count; + else + sz += part->path_len; } + sz = key_def_sizeof(new_part_count, sz); struct key_def *new_def; - new_def = (struct key_def *)calloc(1, key_def_sizeof(new_part_count)); + new_def = (struct key_def *)calloc(1, sz); if (new_def == NULL) { - diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc", - "new_def"); + diag_set(OutOfMemory, sz, "malloc", "new_def"); return NULL; } new_def->part_count = new_part_count; @@ -578,6 +694,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second) new_def->is_nullable = first->is_nullable || second->is_nullable; new_def->has_optional_parts = first->has_optional_parts || second->has_optional_parts; + /* Path data write position in the new key_def. */ + char *data = (char *)new_def + key_def_sizeof(new_part_count, 0); /* Write position in the new key def. */ uint32_t pos = 0; /* Append first key def's parts to the new index_def. */ @@ -586,7 +704,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second) for (; part != end; part++) { key_def_set_part(new_def, pos++, part->fieldno, part->type, part->nullable_action, part->coll, - part->coll_id, part->sort_order); + part->coll_id, part->sort_order, part->path, + part->path_len, &data); } /* Set-append second key def's part to the new key def. */ @@ -597,7 +716,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second) continue; key_def_set_part(new_def, pos++, part->fieldno, part->type, part->nullable_action, part->coll, - part->coll_id, part->sort_order); + part->coll_id, part->sort_order, part->path, + part->path_len, &data); } key_def_set_cmp(new_def); return new_def; diff --git a/src/box/key_def.h b/src/box/key_def.h index d4da6c5a1..df498964c 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -68,6 +68,11 @@ struct key_part_def { enum on_conflict_action nullable_action; /** Part sort order. */ enum sort_order sort_order; + /** + * JSON path to indexed data, relative to the field number, + * or NULL if this key part indexes a top-level field. + */ + const char *path; }; extern const struct key_part_def key_part_def_default; @@ -86,6 +91,15 @@ struct key_part { enum on_conflict_action nullable_action; /** Part sort order. */ enum sort_order sort_order; + /** + * JSON path to indexed data, relative to the field number, + * or NULL if this key part indexes a top-level field. + * This sting is not 0-terminated. Memory is allocated + * at the end of key_def region. + */ + char *path; + /** The length of JSON path. */ + uint32_t path_len; }; struct key_def; @@ -152,6 +166,8 @@ struct key_def { uint32_t unique_part_count; /** True, if at least one part can store NULL. */ bool is_nullable; + /** True, if some key part has JSON path. */ + bool has_json_paths; /** * True, if some key parts can be absent in a tuple. These * fields assumed to be MP_NIL. @@ -245,9 +261,10 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b, /** \endcond public */ static inline size_t -key_def_sizeof(uint32_t part_count) +key_def_sizeof(uint32_t part_count, uint32_t paths_size) { - return sizeof(struct key_def) + sizeof(struct key_part) * part_count; + return sizeof(struct key_def) + sizeof(struct key_part) * part_count + + paths_size; } /** @@ -259,9 +276,13 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count); /** * Dump part definitions of the given key def. + * Region is required to make allocation for JSON path if some + * such path present. JSON path strings are 0-terminated. + * Return -1 on memory allocation error, 0 on success. */ -void -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts); +int +key_def_dump_parts(const struct key_def *def, struct key_part_def *parts, + struct region *pool); /** * Update 'has_optional_parts' of @a key_def with correspondence @@ -307,7 +328,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts, int key_def_decode_parts(struct key_part_def *parts, uint32_t part_count, const char **data, const struct field_def *fields, - uint32_t field_count); + uint32_t field_count, struct region *pool); /** * Returns the part in index_def->parts for the specified fieldno. @@ -368,6 +389,8 @@ key_validate_parts(const struct key_def *key_def, const char *key, static inline bool key_def_is_sequential(const struct key_def *key_def) { + if (key_def->has_json_paths) + return false; for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) { if (key_def->parts[part_id].fieldno != part_id) return false; diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc index 7cae436f1..1f152917e 100644 --- a/src/box/lua/space.cc +++ b/src/box/lua/space.cc @@ -296,6 +296,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i) lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE); lua_setfield(L, -2, "fieldno"); + if (part->path != NULL) { + lua_pushlstring(L, part->path, part->path_len); + lua_setfield(L, -2, "path"); + } + lua_pushboolean(L, key_part_is_nullable(part)); lua_setfield(L, -2, "is_nullable"); diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c index 5cf70ab94..2cae791e1 100644 --- a/src/box/memtx_engine.c +++ b/src/box/memtx_engine.c @@ -1317,6 +1317,10 @@ memtx_index_def_change_requires_rebuild(struct index *index, return true; if (old_part->coll != new_part->coll) return true; + if (json_path_cmp(old_part->path, old_part->path_len, + new_part->path, new_part->path_len, + TUPLE_INDEX_BASE) != 0) + return true; } return false; } diff --git a/src/box/sql.c b/src/box/sql.c index 2cb0edbff..b5d0eefd4 100644 --- a/src/box/sql.c +++ b/src/box/sql.c @@ -380,6 +380,7 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info) part->nullable_action = ON_CONFLICT_ACTION_NONE; part->is_nullable = true; part->sort_order = SORT_ORDER_ASC; + part->path = NULL; if (def != NULL && i < def->part_count) part->coll_id = def->parts[i].coll_id; else diff --git a/src/box/sql/build.c b/src/box/sql/build.c index ab42694f4..68bc77f10 100644 --- a/src/box/sql/build.c +++ b/src/box/sql/build.c @@ -2423,6 +2423,7 @@ index_fill_def(struct Parse *parse, struct index *index, part->is_nullable = part->nullable_action == ON_CONFLICT_ACTION_NONE; part->sort_order = SORT_ORDER_ASC; part->coll_id = coll_id; + part->path = NULL; } key_def = key_def_new(key_parts, expr_list->nExpr); if (key_def == NULL) diff --git a/src/box/sql/select.c b/src/box/sql/select.c index 02ee225f1..3f136a342 100644 --- a/src/box/sql/select.c +++ b/src/box/sql/select.c @@ -1360,6 +1360,7 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count) part->is_nullable = false; part->nullable_action = ON_CONFLICT_ACTION_ABORT; part->sort_order = SORT_ORDER_ASC; + part->path = NULL; } return key_info; } @@ -1377,7 +1378,7 @@ sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def) key_info->key_def = NULL; key_info->refs = 1; key_info->part_count = key_def->part_count; - key_def_dump_parts(key_def, key_info->parts); + key_def_dump_parts(key_def, key_info->parts, NULL); return key_info; } diff --git a/src/box/sql/where.c b/src/box/sql/where.c index 9c3462bc0..78f70f4b5 100644 --- a/src/box/sql/where.c +++ b/src/box/sql/where.c @@ -2807,6 +2807,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */ part.is_nullable = false; part.sort_order = SORT_ORDER_ASC; part.coll_id = COLL_NONE; + part.path = NULL; struct key_def *key_def = key_def_new(&part, 1); if (key_def == NULL) { diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index e21b0096c..554c29f83 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, struct key_part *part = key_def->parts; const char *tuple_a_raw = tuple_data(tuple_a); const char *tuple_b_raw = tuple_data(tuple_b); - if (key_def->part_count == 1 && part->fieldno == 0) { + if (key_def->part_count == 1 && part->fieldno == 0 && + part->path == NULL) { /* * First field can not be optional - empty tuples * can not exist. @@ -493,8 +494,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, } bool was_null_met = false; - const struct tuple_format *format_a = tuple_format(tuple_a); - const struct tuple_format *format_b = tuple_format(tuple_b); + struct tuple_format *format_a = tuple_format(tuple_a); + struct tuple_format *format_b = tuple_format(tuple_b); const uint32_t *field_map_a = tuple_field_map(tuple_a); const uint32_t *field_map_b = tuple_field_map(tuple_b); struct key_part *end; @@ -585,7 +586,7 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, assert(key != NULL || part_count == 0); assert(part_count <= key_def->part_count); struct key_part *part = key_def->parts; - const struct tuple_format *format = tuple_format(tuple); + struct tuple_format *format = tuple_format(tuple); const char *tuple_raw = tuple_data(tuple); const uint32_t *field_map = tuple_field_map(tuple); enum mp_type a_type, b_type; @@ -1027,7 +1028,7 @@ tuple_compare_create(const struct key_def *def) } } assert(! def->has_optional_parts); - if (!key_def_has_collation(def)) { + if (!key_def_has_collation(def) && !def->has_json_paths) { /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) { @@ -1247,7 +1248,7 @@ tuple_compare_with_key_create(const struct key_def *def) } } assert(! def->has_optional_parts); - if (!key_def_has_collation(def)) { + if (!key_def_has_collation(def) && !def->has_json_paths) { /* Precalculated comparators don't use collation */ for (uint32_t k = 0; k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]); diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index e9d7cac3e..c40d7887d 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -10,7 +10,8 @@ key_def_parts_are_sequential(const struct key_def *def, int i) { uint32_t fieldno1 = def->parts[i].fieldno + 1; uint32_t fieldno2 = def->parts[i + 1].fieldno; - return fieldno1 == fieldno2; + return fieldno1 == fieldno2 && def->parts[i].path == NULL && + def->parts[i + 1].path == NULL; } /** True, if a key con contain two or more parts in sequence. */ @@ -111,7 +112,7 @@ tuple_extract_key_slowpath(const struct tuple *tuple, const char *data = tuple_data(tuple); uint32_t part_count = key_def->part_count; uint32_t bsize = mp_sizeof_array(part_count); - const struct tuple_format *format = tuple_format(tuple); + struct tuple_format *format = tuple_format(tuple); const uint32_t *field_map = tuple_field_map(tuple); const char *tuple_end = data + tuple->bsize; @@ -241,7 +242,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, if (!key_def_parts_are_sequential(key_def, i)) break; } - uint32_t end_fieldno = key_def->parts[i].fieldno; + const struct key_part *part = &key_def->parts[i]; + uint32_t end_fieldno = part->fieldno; if (fieldno < current_fieldno) { /* Rewind. */ @@ -283,6 +285,22 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, current_fieldno++; } } + const char *field_last, *field_end_last; + if (part->path != NULL) { + field_last = field; + field_end_last = field_end; + MAYBE_UNUSED int rc = + tuple_field_go_to_path(&field, part->path, + part->path_len); + /* + * All tuples must be valid as all + * integrity checks has already been + * passed. + */ + assert(rc == 0); + field_end = field; + mp_next(&field_end); + } memcpy(key_buf, field, field_end - field); key_buf += field_end - field; if (has_optional_parts && null_count != 0) { @@ -291,6 +309,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, } else { assert(key_buf - key <= data_end - data); } + if (part->path != NULL) { + field = field_last; + field_end = field_end_last; + } } if (key_size != NULL) *key_size = (uint32_t)(key_buf - key); diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 8338bba44..4314d3b1d 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -28,6 +28,7 @@ * THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ +#include "fiber.h" #include "json/json.h" #include "tuple_format.h" #include "coll_id_cache.h" @@ -64,30 +65,160 @@ tuple_field_delete(struct tuple_field *field) /** Build the JSON path by field specified. */ static const char * tuple_field_json_path(const struct tuple_format *format, - struct tuple_field *field) + struct tuple_field *field, + struct region *region) { + /* Don't put brackets for first-level fields. */ + bool brackets = true; + if (field->token.parent == &format->fields.root) + brackets = false; + + uint32_t token_path_sz = sizeof(void *)*format->max_path_tokens; + uint32_t token_path_len = 0; + struct json_token **token_path = region_alloc(region, token_path_sz); + if (token_path == NULL) { + diag_set(OutOfMemory, token_path_sz, "region_alloc", + "token_path"); + return NULL; + } + uint32_t path_size = 1; struct json_token *token = &field->token; - const char *path; - if (token->parent == &format->fields.root && - token->num < (int)format->dict->name_count) { - const char *field_name = - format->dict->names[token->num]; - path = tt_sprintf("\"%s\"", field_name); - } else if (token->type == JSON_TOKEN_NUM) { - path = tt_sprintf("%u", token->num + TUPLE_INDEX_BASE); - } else { - unreachable(); + while (token != &format->fields.root) { + token_path[token_path_len++] = token; + if (token->parent == &format->fields.root && + token->num < (int)format->dict->name_count) { + const char *field_name = + format->dict->names[token->num]; + path_size += 4 + strlen(field_name); + } else if (token->type == JSON_TOKEN_NUM) { + uint32_t digits = 0; + for (int num = token->num + TUPLE_INDEX_BASE; num > 0; + num /= 10) + digits++; + path_size += 2 + digits; + } else if (token->type == JSON_TOKEN_STR) { + path_size += 4 + token->len; + } else { + unreachable(); + } + token = token->parent; } + char *path = region_alloc(region, path_size); + if (path == NULL) { + diag_set(OutOfMemory, path_size, "region_alloc", "path"); + return NULL; + } + char *wptr = path; + for (int i = token_path_len - 1; i >= 0; i--) { + token = token_path[i]; + if (token->parent == &format->fields.root && + token->num < (int)format->dict->name_count) { + const char *field_name = + format->dict->names[token->num]; + wptr += sprintf(wptr, brackets ? "[\"%s\"]" : "\"%s\"", + field_name); + } else if (token->type == JSON_TOKEN_NUM) { + wptr += sprintf(wptr, brackets ? "[%u]" : "%u", + token->num + TUPLE_INDEX_BASE); + } else if (token->type == JSON_TOKEN_STR) { + wptr += sprintf(wptr, "[\"%.*s\"]", token->len, + token->str); + } else { + unreachable(); + } + } + *wptr = '\0'; return path; } +/** Build a JSON tree path for specified path. */ +static struct tuple_field * +tuple_field_tree_add_path(struct tuple_format *format, const char *path, + uint32_t path_len, uint32_t fieldno) +{ + int rc = 0; + struct json_tree *tree = &format->fields; + struct tuple_field *parent = tuple_format_field(format, fieldno); + struct tuple_field *field = tuple_field_new(); + if (field == NULL) + goto fail; + + struct json_lexer lexer; + uint32_t token_count = 0; + json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); + while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 && + field->token.type != JSON_TOKEN_END) { + enum field_type expected_type = + field->token.type == JSON_TOKEN_STR ? + FIELD_TYPE_MAP : FIELD_TYPE_ARRAY; + if (parent->type != FIELD_TYPE_ANY && + parent->type != expected_type) { + /* Parent field has incompatable type. */ + const char *path = tuple_field_json_path(format, parent, + &fiber()->gc); + if (path != NULL) { + diag_set(ClientError, + ER_INDEX_PART_TYPE_MISMATCH, path, + field_type_strs[parent->type], + field_type_strs[expected_type]); + } + goto fail; + } + struct tuple_field *next = + json_tree_lookup_entry(tree, &parent->token, + &field->token, + struct tuple_field, token); + if (next == NULL) { + rc = json_tree_add(tree, &parent->token, &field->token); + if (rc != 0) { + diag_set(OutOfMemory, sizeof(struct json_token), + "json_tree_add", "tree"); + goto fail; + } + next = field; + field = tuple_field_new(); + if (field == NULL) + goto fail; + } + parent->type = expected_type; + parent = next; + token_count++; + } + assert(rc == 0 && field->token.type == JSON_TOKEN_END); + assert(parent != NULL); + /* Update tree depth information. */ + format->max_path_tokens = MAX(format->max_path_tokens, token_count + 1); +end: + tuple_field_delete(field); + return parent; +fail: + parent = NULL; + goto end; +} + static int tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, const struct key_part *part, bool is_sequential, - int *current_slot) + int *current_slot, char **path_data) { assert(part->fieldno < tuple_format_field_count(format)); - struct tuple_field *field = tuple_format_field(format, part->fieldno); + struct tuple_field *field; + if (part->path == NULL) { + field = tuple_format_field(format, part->fieldno); + } else { + assert(!is_sequential); + /** + * Copy JSON path data to reserved area at the + * end of format allocation. + */ + memcpy(*path_data, part->path, part->path_len); + field = tuple_field_tree_add_path(format, *path_data, + part->path_len, + part->fieldno); + if (field == NULL) + return -1; + *path_data += part->path_len; + } /* * If a field is not present in the space format, * inherit nullable action of the first key part @@ -112,11 +243,13 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, if (field->nullable_action == ON_CONFLICT_ACTION_NONE) field->nullable_action = part->nullable_action; } else if (field->nullable_action != part->nullable_action) { - const char *path = tuple_field_json_path(format, field); - assert(path != NULL); - diag_set(ClientError, ER_ACTION_MISMATCH, path, - on_conflict_action_strs[field->nullable_action], - on_conflict_action_strs[part->nullable_action]); + const char *path = tuple_field_json_path(format, field, + &fiber()->gc); + if (path != NULL) { + diag_set(ClientError, ER_ACTION_MISMATCH, path, + on_conflict_action_strs[field->nullable_action], + on_conflict_action_strs[part->nullable_action]); + } return -1; } @@ -136,11 +269,13 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, errcode = ER_FORMAT_MISMATCH_INDEX_PART; else errcode = ER_INDEX_PART_TYPE_MISMATCH; - const char *path = tuple_field_json_path(format, field); - assert(path != NULL); - diag_set(ClientError, errcode, path, - field_type_strs[field->type], - field_type_strs[part->type]); + const char *path = tuple_field_json_path(format, field, + &fiber()->gc); + if (path != NULL) { + diag_set(ClientError, errcode, path, + field_type_strs[field->type], + field_type_strs[part->type]); + } return -1; } field->is_key_part = true; @@ -150,7 +285,8 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count, * simply accessible, so we don't store an offset for it. */ if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL && - is_sequential == false && part->fieldno > 0) { + is_sequential == false && + (part->fieldno > 0 || part->path != NULL)) { *current_slot = *current_slot - 1; field->offset_slot = *current_slot; } @@ -195,6 +331,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, int current_slot = 0; + char *paths_data = (char *)format + sizeof(struct tuple_format); /* extract field type info */ for (uint16_t key_no = 0; key_no < key_count; ++key_no) { const struct key_def *key_def = keys[key_no]; @@ -205,7 +342,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, for (; part < parts_end; part++) { if (tuple_format_use_key_part(format, field_count, part, is_sequential, - ¤t_slot) != 0) + ¤t_slot, + &paths_data) != 0) return -1; } } @@ -241,6 +379,7 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, format->field_map_size); for (int i = -1; i >= current_slot; i--) field_map_template[i] = UINT32_MAX; + int id = 0; struct tuple_field *field; struct json_token *root = (struct json_token *)&format->fields.root; json_tree_foreach_entry_preorder(field, root, struct tuple_field, @@ -255,7 +394,53 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys, if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL && tuple_field_is_nullable(field)) field_map_template[field->offset_slot] = 0; + + /* + * Estimate the size of vy_stmt secondary key + * tuple. All leaf records are assumed to be + * nil(s). + */ + int size = 0; + struct json_token *curr_node = &field->token; + enum field_type parent_type = + curr_node->parent == &format->fields.root ? + FIELD_TYPE_ARRAY : + json_tree_entry(curr_node->parent, struct tuple_field, + token)->type; + if (parent_type == FIELD_TYPE_ARRAY) { + /* + * Account a gap between neighboring + * fields filled with nil(s) when parent + * field type is FIELD_TYPE_ARRAY. + */ + int nulls = 0; + for (int i = field->token.sibling_idx - 1; + i > 0 && curr_node->parent->children[i] == NULL; + i--) + nulls++; + size += nulls * mp_sizeof_nil(); + } else if (parent_type == FIELD_TYPE_MAP) { + /* + * Account memory required for map key + * string when parent field type is + * FIELD_TYPE_MAP. + */ + size += mp_sizeof_str(field->token.len); + } + if (field->token.max_child_idx == -1) { + size += mp_sizeof_nil(); + } else if (field->type == FIELD_TYPE_ARRAY) { + size += mp_sizeof_array(field->token.max_child_idx); + } else if (field->type == FIELD_TYPE_MAP) { + size += mp_sizeof_map(field->token.max_child_idx); + } + format->vy_stmt_meta_size += size; + + /* Assign unique identifier for each field. */ + field->id = id++; } + /* Total amount of fields in format. */ + format->total_field_count = id; return 0; } @@ -325,6 +510,8 @@ static struct tuple_format * tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, uint32_t space_field_count, struct tuple_dictionary *dict) { + /* Size of area to store paths. */ + uint32_t paths_size = 0; uint32_t index_field_count = 0; /* find max max field no */ for (uint16_t key_no = 0; key_no < key_count; ++key_no) { @@ -334,13 +521,15 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, for (; part < pend; part++) { index_field_count = MAX(index_field_count, part->fieldno + 1); + paths_size += part->path_len; } } uint32_t field_count = MAX(space_field_count, index_field_count); - struct tuple_format *format = malloc(sizeof(struct tuple_format)); + uint32_t allocation_size = sizeof(struct tuple_format) + paths_size; + struct tuple_format *format = malloc(allocation_size); if (format == NULL) { - diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc", + diag_set(OutOfMemory, allocation_size, "malloc", "tuple format"); return NULL; } @@ -354,6 +543,7 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, struct tuple_field *field = tuple_field_new(); if (field == NULL) goto error; + field->id = fieldno; field->token.num = fieldno; field->token.type = JSON_TOKEN_NUM; if (json_tree_add(&format->fields, &format->fields.root, @@ -373,6 +563,9 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count, format->dict = dict; tuple_dictionary_ref(dict); } + format->max_path_tokens = 1; + format->total_field_count = field_count; + format->vy_stmt_meta_size = 0; format->refs = 0; format->id = FORMAT_ID_NIL; format->index_field_count = index_field_count; @@ -434,16 +627,32 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, { if (format1->exact_field_count != format2->exact_field_count) return false; - uint32_t format1_field_count = tuple_format_field_count(format1); - uint32_t format2_field_count = tuple_format_field_count(format2); - for (uint32_t i = 0; i < format1_field_count; ++i) { - const struct tuple_field *field1 = - tuple_format_field(format1, i); + struct tuple_field *field1; + struct json_token *field2_prev_token = &format2->fields.root; + struct json_token *field1_prev_token = &format1->fields.root; + json_tree_foreach_entry_preorder(field1, &format1->fields.root, + struct tuple_field, token) { +next: + /* + * While switching to the next item, it may be + * necessary to update the parents of both tree + * iterators. + */ + while (field1_prev_token != field1->token.parent) { + field1_prev_token = field1_prev_token->parent; + field2_prev_token = field2_prev_token->parent; + assert(field1_prev_token != NULL); + } + struct tuple_field *field2 = + json_tree_lookup_entry(&format2->fields, + field2_prev_token, + &field1->token, + struct tuple_field, token); /* * The field has a data type in format1, but has * no data type in format2. */ - if (i >= format2_field_count) { + if (field2 == NULL) { /* * The field can get a name added * for it, and this doesn't require a data @@ -454,13 +663,22 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, * NULLs or miss the subject field. */ if (field1->type == FIELD_TYPE_ANY && - tuple_field_is_nullable(field1)) - continue; - else + tuple_field_is_nullable(field1)) { + /* Skip subtree. */ + struct json_token *root = &format1->fields.root; + struct json_token *next = + json_tree_preorder_next(root, + &field1->token); + field1 = json_tree_entry_safe(next, + struct tuple_field, + token); + if (field1 == NULL) + break; + goto next; + } else { return false; + } } - const struct tuple_field *field2 = - tuple_format_field(format2, i); if (! field_type1_contains_type2(field1->type, field2->type)) return false; /* @@ -470,10 +688,28 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1, if (tuple_field_is_nullable(field2) && !tuple_field_is_nullable(field1)) return false; + + field2_prev_token = &field2->token; + field1_prev_token = &field1->token; } return true; } +/** Find a field in format by offset slot. */ +static struct tuple_field * +tuple_field_by_offset_slot(const struct tuple_format *format, + int32_t offset_slot) +{ + struct tuple_field *field; + struct json_token *root = (struct json_token *)&format->fields.root; + json_tree_foreach_entry_preorder(field, root, struct tuple_field, + token) { + if (field->offset_slot == offset_slot) + return field; + } + return NULL; +} + /** * Verify that all offset_slots has been initialized in field_map. * Routine relies on the field_map memory has been filled from the @@ -488,6 +724,32 @@ tuple_field_map_validate(const struct tuple_format *format, uint32_t *field_map) for (int32_t i = -1; i >= -field_map_items; i--) { if (field_map[i] != UINT32_MAX) continue; + + struct tuple_field *field = + tuple_field_by_offset_slot(format, i); + assert(field != NULL); + /* Lookup for field number in tree. */ + const char *path = + tuple_field_json_path(format, field, &fiber()->gc); + if (path == NULL) + return -1; + + struct json_token *token = &field->token; + const char *err_msg; + if (field->token.type == JSON_TOKEN_STR) { + err_msg = tt_sprintf("invalid field \"%s\" document " + "content: map doesn't contain a " + "key '%.*s' defined in index", + path, token->len, token->str); + } else if (field->token.type == JSON_TOKEN_NUM) { + uint32_t expected_size = + token->parent->max_child_idx + 1; + err_msg = tt_sprintf("invalid field \"%s\" document " + "content: array size %d is less " + "than size %d defined in index", + path, token->num, expected_size); + } + diag_set(ClientError, ER_DATA_STRUCTURE_MISMATCH, err_msg); return -1; } return 0; @@ -504,6 +766,90 @@ mp_type_is_compatible(enum mp_type mp_type, enum field_type type, return (mask & (1U << mp_type)) != 0; } +/** + * Descriptor of the parsed msgpack frame. + * Due to the fact that the msgpack has nested structures whose + * length is stored in the frame header at the blob beginning, we + * need to be able to determine that we have finished parsing the + * current component and should move on to the next one. + * For this purpose a stack of disassembled levels is organized, + * where the type of the level, the total number of elements, + * and the number of elements that have already been parsed are + * stored. + */ +struct mp_frame { + /** JSON token type representing frame data structure. */ + enum json_token_type child_type; + /** Total count of MP members to process. */ + uint32_t total; + /** Count of MP elements that already have parseed. */ + uint32_t curr; +}; + +/** + * Emit token to analyze and do msgpack pointer shift using top + * mp_stack frame. Return 0 on success, -1 when analyse step must + * be skipped (on usuported term detection). + */ +static int +mp_frame_parse(struct mp_frame *mp_stack, uint32_t mp_stack_idx, + const char **pos, struct json_token *token) +{ + token->type = mp_stack[mp_stack_idx].child_type; + ++mp_stack[mp_stack_idx].curr; + if (token->type == JSON_TOKEN_NUM) { + token->num = mp_stack[mp_stack_idx].curr - TUPLE_INDEX_BASE; + } else if (token->type == JSON_TOKEN_STR) { + if (mp_typeof(**pos) != MP_STR) { + /* Skip key. */ + mp_next(pos); + return -1; + } + token->str = mp_decode_str(pos, (uint32_t *)&token->len); + } else { + unreachable(); + } + return 0; +} + +/** + * Prepare mp_frame for futher iterations. Store container length + * and child_type. Update parent token pointer and shift msgpack + * pointer. + */ +static int +mp_frame_prepare(struct mp_frame *mp_stack, uint32_t *mp_stack_idx, + uint32_t mp_stack_total, struct json_token *token, + const char **pos, struct json_token **parent) +{ + enum mp_type type = mp_typeof(**pos); + if (token != NULL && *mp_stack_idx + 1 < mp_stack_total && + (type == MP_MAP || type == MP_ARRAY)) { + uint32_t size = type == MP_ARRAY ? mp_decode_array(pos) : + mp_decode_map(pos); + if (size == 0) + return 0; + *parent = token; + enum json_token_type child_type = + type == MP_ARRAY ? JSON_TOKEN_NUM : JSON_TOKEN_STR; + *mp_stack_idx = *mp_stack_idx + 1; + mp_stack[*mp_stack_idx].child_type = child_type; + mp_stack[*mp_stack_idx].total = size; + mp_stack[*mp_stack_idx].curr = 0; + } else { + mp_next(pos); + while (mp_stack[*mp_stack_idx].curr >= + mp_stack[*mp_stack_idx].total) { + assert(*parent != NULL); + *parent = (*parent)->parent; + if (*mp_stack_idx == 0) + return -1; + *mp_stack_idx = *mp_stack_idx - 1; + } + } + return 0; +} + /** @sa declaration for details. */ int tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, @@ -544,12 +890,28 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, memcpy((char *)field_map - format->field_map_size, format->field_map_template, format->field_map_size); } + + struct region *region = &fiber()->gc; + uint32_t mp_stack_size = + format->max_path_tokens * sizeof(struct mp_frame); + struct mp_frame *mp_stack = region_alloc(region, mp_stack_size); + if (mp_stack == NULL) { + diag_set(OutOfMemory, mp_stack_size, "region_alloc", + "mp_stack"); + return -1; + } + mp_stack[0].child_type = JSON_TOKEN_NUM; + mp_stack[0].total = defined_field_count; + mp_stack[0].curr = 0; + uint32_t mp_stack_idx = 0; struct json_tree *tree = (struct json_tree *)&format->fields; struct json_token *parent = &tree->root; - struct json_token token; - token.type = JSON_TOKEN_NUM; - token.num = 0; - while ((uint32_t)token.num < defined_field_count) { + while (mp_stack[0].curr <= mp_stack[0].total) { + /* Prepare key for tree lookup. */ + struct json_token token; + if (mp_frame_parse(mp_stack, mp_stack_idx, &pos, &token) != 0) + goto finish_frame; + struct tuple_field *field = json_tree_lookup_entry(tree, parent, &token, struct tuple_field, token); @@ -560,10 +922,13 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, !mp_type_is_compatible(type, field->type, is_nullable) != 0) { const char *path = - tuple_field_json_path(format, field); - assert(path != NULL); - diag_set(ClientError, ER_FIELD_TYPE, path, - field_type_strs[field->type]); + tuple_field_json_path(format, field, + region); + if (path != NULL) { + diag_set(ClientError, ER_FIELD_TYPE, + path, + field_type_strs[field->type]); + } return -1; } if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { @@ -571,19 +936,16 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, (uint32_t)(pos - tuple); } } - token.num++; - mp_next(&pos); +finish_frame: + /* Prepare stack info for next iteration. */ + if (mp_frame_prepare(mp_stack, &mp_stack_idx, + format->max_path_tokens, + field != NULL ? &field->token : NULL, + &pos, &parent) != 0) + goto end; }; - if (!validate) - return 0; - int rc = tuple_field_map_validate(format, field_map); - /* - * As assert field_count >= min_field_count has already - * tested and all first-level fields has parsed, all - * offset_slots must be initialized. - */ - assert(rc == 0); - return rc; +end: + return validate ? tuple_field_map_validate(format, field_map) : 0; } uint32_t @@ -718,15 +1080,7 @@ tuple_field_go_to_key(const char **field, const char *key, int len) return -1; } -/** - * Retrieve msgpack data by JSON path. - * @param data Pointer to msgpack with data. - * @param path The path to process. - * @param path_len The length of the @path. - * @retval 0 On success. - * @retval >0 On path parsing error, invalid character position. - */ -static int +int tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len) { int rc; @@ -825,3 +1179,33 @@ error: tt_sprintf("error in path on position %d", rc)); return -1; } + +int +tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_part *part, const char **raw) +{ + assert(part->path != NULL); + struct tuple_field *field = + tuple_format_field_by_path(format, part->fieldno, part->path, + part->path_len); + if (field != NULL) { + int32_t offset_slot = field->offset_slot; + assert(-offset_slot * sizeof(uint32_t) <= + format->field_map_size); + *raw = field_map[offset_slot] == 0 ? + NULL : data + field_map[offset_slot]; + return 0; + } + /* + * Format doesn't have field representing specified part. + * Make slow tuple parsing. + */ + *raw = tuple_field_raw(format, data, field_map, part->fieldno); + if (*raw == NULL) + return 0; + int rc = 0; + if ((rc = tuple_field_go_to_path(raw, part->path, part->path_len)) != 0) + return rc; + return 0; +} diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index e61e271d5..8c05c3322 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -114,6 +114,8 @@ struct tuple_field { struct coll *coll; /** Collation identifier. */ uint32_t coll_id; + /** Field unique identifier in tuple_format. */ + uint32_t id; /** Link in tuple_format::fields. */ struct json_token token; }; @@ -181,6 +183,27 @@ struct tuple_format { * Shared names storage used by all formats of a space. */ struct tuple_dictionary *dict; + /** + * A maximum depth of fields subtree. This information + * is required for allocating stack for tuple parse + * context during tuple_init_field_map call. + */ + uint32_t max_path_tokens; + /** + * Total count of format fields in fields subtree. + * This information is required to allocate iov array + * argument for vy_stmt_tuple_restore_raw routine that + * is able to store extracted key iov descriptor for each + * field identified by id. + */ + uint32_t total_field_count; + /** + * Calculated size of vinyl's secondary key tuple lacking + * all leaf fields. This information is required in + * vy_stmt_new_surrogate_from_key routine to estimate + * stmt tuple size withut tree traversal. + */ + uint32_t vy_stmt_meta_size; /** * Fields comprising the format, organized in a tree. * First level nodes correspond to tuple fields. @@ -217,6 +240,25 @@ tuple_format_field(struct tuple_format *format, uint32_t fieldno) &token, struct tuple_field, token); } +/** + * Lookup field by relative JSON path and root fieldno in + * format:fields tree. +*/ +static inline struct tuple_field * +tuple_format_field_by_path(struct tuple_format *format, uint32_t fieldno, + const char *path, uint32_t path_len) +{ + uint32_t field_count = tuple_format_field_count(format); + if (fieldno >= field_count) + return NULL; + struct tuple_field *root = tuple_format_field(format, fieldno); + assert(root != NULL); + return json_tree_lookup_path_entry(&format->fields, &root->token, + path, path_len, TUPLE_INDEX_BASE, + struct tuple_field, token); +} + + extern struct tuple_format **tuple_formats; static inline uint32_t @@ -417,6 +459,18 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, return tuple_field_raw(format, tuple, field_map, fieldno); } +/** + * Retrieve msgpack data by JSON path. + * @param data Pointer to msgpack with data. + * @param path The path to process. + * @param path_len The length of the @path. + * @retval 0 On success. + * @retval >0 On path parsing error, invalid character position. + */ +int +tuple_field_go_to_path(const char **data, const char *path, + uint32_t path_len); + /** * Get tuple field by its path. * @param format Tuple format. @@ -436,6 +490,12 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, uint32_t path_len, uint32_t path_hash, const char **field); +/** Internal function, use tuple_field_by_part_raw instead. */ +int +tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data, + const uint32_t *field_map, + struct key_part *part, const char **raw); + /** * Get a tuple field pointed to by an index part. * @param format Tuple format. @@ -445,10 +505,19 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, * @retval Field data if the field exists or NULL. */ static inline const char * -tuple_field_by_part_raw(const struct tuple_format *format, const char *data, +tuple_field_by_part_raw(struct tuple_format *format, const char *data, const uint32_t *field_map, struct key_part *part) { - return tuple_field_raw(format, data, field_map, part->fieldno); + if (likely(part->path == NULL)) { + return tuple_field_raw(format, data, field_map, part->fieldno); + } else { + const char *raw; + MAYBE_UNUSED int rc = + tuple_field_by_part_raw_slowpath(format, data, + field_map, part, &raw); + assert(rc == 0); + return raw; + } } #if defined(__cplusplus) diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index b394804fe..3486ce11c 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -222,7 +222,7 @@ key_hash_slowpath(const char *key, struct key_def *key_def); void tuple_hash_func_set(struct key_def *key_def) { - if (key_def->is_nullable) + if (key_def->is_nullable || key_def->has_json_paths) goto slowpath; /* * Check that key_def defines sequential a key without holes diff --git a/src/box/vinyl.c b/src/box/vinyl.c index f5b36ce14..2199ebe09 100644 --- a/src/box/vinyl.c +++ b/src/box/vinyl.c @@ -982,6 +982,10 @@ vinyl_index_def_change_requires_rebuild(struct index *index, return true; if (!field_type1_contains_type2(new_part->type, old_part->type)) return true; + if (json_path_cmp(old_part->path, old_part->path_len, + new_part->path, new_part->path_len, + TUPLE_INDEX_BASE) != 0) + return true; } return false; } diff --git a/src/box/vy_log.c b/src/box/vy_log.c index c9e0713c8..6fc051648 100644 --- a/src/box/vy_log.c +++ b/src/box/vy_log.c @@ -581,9 +581,11 @@ vy_log_record_decode(struct vy_log_record *record, record->group_id = mp_decode_uint(&pos); break; case VY_LOG_KEY_DEF: { + struct region *region = &fiber()->gc; uint32_t part_count = mp_decode_array(&pos); - struct key_part_def *parts = region_alloc(&fiber()->gc, - sizeof(*parts) * part_count); + struct key_part_def *parts = + region_alloc(region, + sizeof(*parts) * part_count); if (parts == NULL) { diag_set(OutOfMemory, sizeof(*parts) * part_count, @@ -591,7 +593,7 @@ vy_log_record_decode(struct vy_log_record *record, return -1; } if (key_def_decode_parts(parts, part_count, &pos, - NULL, 0) != 0) { + NULL, 0, region) != 0) { diag_log(); diag_set(ClientError, ER_INVALID_VYLOG_FILE, "Bad record: failed to decode " @@ -705,7 +707,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src) "struct key_part_def"); goto err; } - key_def_dump_parts(src->key_def, dst->key_parts); + if (key_def_dump_parts(src->key_def, dst->key_parts, pool) != 0) + goto err; dst->key_part_count = src->key_def->part_count; dst->key_def = NULL; } diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c index ddbc2d46f..14e0c0c93 100644 --- a/src/box/vy_point_lookup.c +++ b/src/box/vy_point_lookup.c @@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx, const struct vy_read_view **rv, struct tuple *key, struct tuple **ret) { - assert(tuple_field_count(key) >= lsm->cmp_def->part_count); - *ret = NULL; double start_time = ev_monotonic_now(loop()); int rc = 0; diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c index 3e60fece9..f3f678f99 100644 --- a/src/box/vy_stmt.c +++ b/src/box/vy_stmt.c @@ -370,6 +370,63 @@ vy_stmt_replace_from_upsert(const struct tuple *upsert) return replace; } +/** + * Construct secondary-index tuple and initialize field_map. + * The iov[field->id] array item contains an extracted key + * for indexed field identified with unique field->id. + */ +static void +vy_stmt_tuple_restore_raw(struct tuple_format *format, char *tuple_raw, + uint32_t *field_map, char **offset, struct iovec *iov) +{ + struct tuple_field *curr; + json_tree_foreach_entry_preorder(curr, &format->fields.root, + struct tuple_field, token) { + struct json_token *curr_node = &curr->token; + enum field_type parent_type = + curr_node->parent == &format->fields.root ? FIELD_TYPE_ARRAY : + json_tree_entry(curr_node->parent, struct tuple_field, + token)->type; + if (parent_type == FIELD_TYPE_ARRAY && + curr_node->sibling_idx > 0) { + /* + * Fill unindexed array items with nulls. + * Gaps size calculated as a difference + * between sibling nodes. + */ + for (uint32_t i = curr_node->sibling_idx - 1; + curr_node->parent->children[i] == NULL && + i > 0; i--) + *offset = mp_encode_nil(*offset); + } else if (parent_type == FIELD_TYPE_MAP) { + /* Set map key. */ + const char *str = curr_node->str; + uint32_t len = curr_node->len; + *offset = mp_encode_str(*offset, str, len); + } + /* Fill data. */ + uint32_t children_count = curr_node->max_child_idx + 1; + if (curr_node->max_child_idx == -1) { + /* Leaf record. */ + if (iov[curr->id].iov_len == 0) { + *offset = mp_encode_nil(*offset); + } else { + uint32_t data_offset = *offset - tuple_raw; + int32_t slot = curr->offset_slot; + memcpy(*offset, iov[curr->id].iov_base, + iov[curr->id].iov_len); + if (slot != TUPLE_OFFSET_SLOT_NIL) + field_map[slot] = data_offset; + *offset += iov[curr->id].iov_len; + } + } else if (curr->type == FIELD_TYPE_ARRAY) { + *offset = mp_encode_array(*offset, children_count); + } else if (curr->type == FIELD_TYPE_MAP) { + *offset = mp_encode_map(*offset, children_count); + } + } +} + static struct tuple * vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, const struct key_def *cmp_def, @@ -380,26 +437,43 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, struct region *region = &fiber()->gc; uint32_t field_count = format->index_field_count; - struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count); + struct iovec *iov = + region_alloc(region, sizeof(*iov) * format->total_field_count); if (iov == NULL) { - diag_set(OutOfMemory, sizeof(*iov) * field_count, + diag_set(OutOfMemory, sizeof(*iov) * format->total_field_count, "region", "iov for surrogate key"); return NULL; } - memset(iov, 0, sizeof(*iov) * field_count); + memset(iov, 0, sizeof(*iov) * format->total_field_count); uint32_t part_count = mp_decode_array(&key); assert(part_count == cmp_def->part_count); - assert(part_count <= field_count); - uint32_t nulls_count = field_count - cmp_def->part_count; + assert(part_count <= format->total_field_count); + /** + * The format:vy_stmt_meta_size contains a size of + * stmt tuple having all leaf fields set to null. + * Calculate bsize as vy_stmt_meta_size where parts_count + * nulls replaced with extracted keys. + */ uint32_t bsize = mp_sizeof_array(field_count) + - mp_sizeof_nil() * nulls_count; + format->vy_stmt_meta_size - + mp_sizeof_nil() * part_count; for (uint32_t i = 0; i < part_count; ++i) { const struct key_part *part = &cmp_def->parts[i]; assert(part->fieldno < field_count); + struct tuple_field *field; + if (part->path != NULL) { + field = tuple_format_field_by_path(format, + part->fieldno, + part->path, + part->path_len); + } else { + field = tuple_format_field(format, part->fieldno); + } + assert(field != NULL); const char *svp = key; - iov[part->fieldno].iov_base = (char *) key; + iov[field->id].iov_base = (char *) key; mp_next(&key); - iov[part->fieldno].iov_len = key - svp; + iov[field->id].iov_len = key - svp; bsize += key - svp; } @@ -409,18 +483,11 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type, char *raw = (char *) tuple_data(stmt); uint32_t *field_map = (uint32_t *) raw; + memset((char *)field_map - format->field_map_size, 0, + format->field_map_size); char *wpos = mp_encode_array(raw, field_count); - for (uint32_t i = 0; i < field_count; ++i) { - const struct tuple_field *field = tuple_format_field(format, i); - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) - field_map[field->offset_slot] = wpos - raw; - if (iov[i].iov_base == NULL) { - wpos = mp_encode_nil(wpos); - } else { - memcpy(wpos, iov[i].iov_base, iov[i].iov_len); - wpos += iov[i].iov_len; - } - } + vy_stmt_tuple_restore_raw(format, raw, field_map, &wpos, iov); + assert(wpos == raw + bsize); vy_stmt_set_type(stmt, type); return stmt; diff --git a/test/box/misc.result b/test/box/misc.result index d266bb334..e4a8c0efb 100644 --- a/test/box/misc.result +++ b/test/box/misc.result @@ -415,6 +415,7 @@ t; 83: box.error.ROLE_EXISTS 84: box.error.CREATE_ROLE 85: box.error.INDEX_EXISTS + 86: box.error.DATA_STRUCTURE_MISMATCH 87: box.error.ROLE_LOOP 88: box.error.GRANT 89: box.error.PRIV_GRANTED diff --git a/test/engine/json.result b/test/engine/json.result new file mode 100644 index 000000000..7e7374e6f --- /dev/null +++ b/test/engine/json.result @@ -0,0 +1,450 @@ +test_run = require('test_run').new() +--- +... +engine = test_run:get_cfg('engine') +--- +... +-- +-- gh-1012: Indexes for JSON-defined paths. +-- +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key + part is indexed twice' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}}) +--- +- error: 'Wrong index options (field 2): ''path'' must be string' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '.FIO'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type + ''map'' is not supported' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}}) +--- +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type + ''array'' is not supported' +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO'}, {3, 'str', path = '.FIO.fname'}}}) +--- +- error: Field [3]["FIO"] has type 'string' in one index, but type 'map' in another +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}}) +--- +- error: Field 3 has type 'array' in one index, but type 'map' in another +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO....fname'}}}) +--- +- error: 'Wrong index options (field 3): invalid JSON path ''.FIO....fname'': path + has invalid structure (error at position 6)' +... +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO.fname', is_nullable = false}, {3, 'str', path = '["FIO"]["sname"]'}}}) +--- +... +assert(idx ~= nil) +--- +- true +... +assert(idx.parts[2].path == ".FIO.fname") +--- +- true +... +format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'array'}, {'age', 'unsigned'}, {'level', 'unsigned'}} +--- +... +s:format(format) +--- +- error: Field 3 has type 'array' in one index, but type 'map' in another +... +format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}} +--- +... +s:format(format) +--- +... +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '.FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +--- +- error: Field ["data"]["FIO"]["fname"] has type 'string' in one index, but type 'number' + in another +... +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} +--- +- error: 'Tuple field ["data"]["FIO"] type does not match one required by operation: + expected map' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} +--- +- error: 'Tuple field ["data"]["FIO"]["fname"] type does not match one required by + operation: expected string' +... +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} +--- +- error: 'Tuple doesn''t math document structure: invalid field "["data"]["FIO"]["sname"]" + document content: map doesn''t contain a key ''sname'' defined in index' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5} +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5} +--- +- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +idx:select() +--- +- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] + - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +idx:min() +--- +- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:max() +--- +- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}}, + 4, 5] +... +s:drop() +--- +... +s = box.schema.create_space('withdata', {engine = engine}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[2]'} +--- +... +pk = s:create_index('pk', {parts = parts}) +--- +... +s:insert{{1, 2}, 3} +--- +- [[1, 2], 3] +... +s:upsert({{box.null, 2}}, {{'+', 2, 5}}) +--- +... +s:get(2) +--- +- [[1, 2], 8] +... +s:drop() +--- +... +-- Create index on space with data +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +pk = s:create_index('primary', { type = 'tree' }) +--- +... +s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5} +--- +- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5] +... +s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5} +--- +- [4, 7, {'town': 'London', 'FIO': [1, 2, 3]}, 4, 5] +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +--- +- error: 'Tuple field [3]["FIO"] type does not match one required by operation: expected + map' +... +_ = s:delete(1) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +--- +- error: Duplicate key exists in unique index 'test1' in space 'withdata' +... +_ = s:delete(2) +--- +... +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +--- +- error: 'Tuple field [3]["FIO"] type does not match one required by operation: expected + map' +... +_ = s:delete(4) +--- +... +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}}) +--- +... +assert(idx ~= nil) +--- +- true +... +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}}) +--- +- error: Field [3]["FIO"]["fname"] has type 'string' in one index, but type 'number' + in another +... +idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}}) +--- +... +assert(idx2 ~= nil) +--- +- true +... +t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} +--- +... +idx:select() +--- +- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5] + - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:min() +--- +- [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5] +... +idx:max() +--- +- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5] +... +idx:drop() +--- +... +s:drop() +--- +... +-- Test complex JSON indexes +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +parts = {} +--- +... +parts[1] = {1, 'str', path='[3][2].a'} +--- +... +parts[2] = {1, 'unsigned', path = '[3][1]'} +--- +... +parts[3] = {2, 'str', path = '[2].d[1]'} +--- +... +pk = s:create_index('primary', { type = 'tree', parts = parts}) +--- +... +s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}} +--- +- [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, + [1, 2, 3]] +... +s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6} +--- +- error: Duplicate key exists in unique index 'primary' in space 'withdata' +... +parts = {} +--- +... +parts[1] = {4, 'unsigned', path='[1]', is_nullable = false} +--- +... +parts[2] = {4, 'unsigned', path='[2]', is_nullable = true} +--- +... +parts[3] = {4, 'unsigned', path='[4]', is_nullable = true} +--- +... +trap_idx = s:create_index('trap', { type = 'tree', parts = parts}) +--- +... +s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}} +--- +- error: 'Tuple doesn''t math document structure: invalid field "[4][1]" document + content: array size 0 is less than size 4 defined in index' +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[3][2].b' } +--- +... +parts[2] = {3, 'unsigned'} +--- +... +crosspart_idx = s:create_index('crosspart', { parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}} +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +parts = {} +--- +... +parts[1] = {1, 'unsigned', path='[3][2].b'} +--- +... +num_idx = s:create_index('numeric', {parts = parts}) +--- +... +s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}} +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +num_idx:get(2) +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +num_idx:select() +--- +- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [ + 9, 2, 3]] + - [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], + 6, [1, 2, 3]] + - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [ + 0]] +... +num_idx:max() +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +num_idx:min() +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +assert(crosspart_idx:max() == num_idx:max()) +--- +- true +... +assert(crosspart_idx:min() == num_idx:min()) +--- +- true +... +trap_idx:max() +--- +- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9, + 2, 3]] +... +trap_idx:min() +--- +- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]] +... +s:drop() +--- +... +s = box.schema.space.create('withdata', {engine = engine}) +--- +... +pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned'}}}) +--- +... +assert(pk_simplified.path == box.NULL) +--- +- true +... +idx = s:create_index('idx', {parts = {{2, 'integer', path = '.a'}}}) +--- +... +s:insert{31, {a = 1, aa = -1}} +--- +- [31, {'a': 1, 'aa': -1}] +... +s:insert{22, {a = 2, aa = -2}} +--- +- [22, {'a': 2, 'aa': -2}] +... +s:insert{13, {a = 3, aa = -3}} +--- +- [13, {'a': 3, 'aa': -3}] +... +idx:select() +--- +- - [31, {'a': 1, 'aa': -1}] + - [22, {'a': 2, 'aa': -2}] + - [13, {'a': 3, 'aa': -3}] +... +idx:alter({parts = {{2, 'integer', path = '.aa'}}}) +--- +... +idx:select() +--- +- - [13, {'a': 3, 'aa': -3}] + - [22, {'a': 2, 'aa': -2}] + - [31, {'a': 1, 'aa': -1}] +... +s:drop() +--- +... +-- incompatible format change +s = box.schema.space.create('test') +--- +... +i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}}) +--- +... +s:insert{{-1}} +--- +- [[-1]] +... +i:alter{parts = {{1, 'string', path = '[1]'}}} +--- +- error: 'Tuple field [1][1] type does not match one required by operation: expected + string' +... +s:insert{{'a'}} +--- +- error: 'Tuple field [1][1] type does not match one required by operation: expected + integer' +... +i:drop() +--- +... +i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}}) +--- +... +s:insert{{{FIO=-1}}} +--- +- [[{'FIO': -1}]] +... +i:alter{parts = {{1, 'integer', path = '[1][1]'}}} +--- +- error: 'Tuple field [1][1] type does not match one required by operation: expected + array' +... +i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}} +--- +- error: 'Tuple field [1][1]["FIO"] type does not match one required by operation: + expected array' +... +s:drop() +--- +... +engine = nil +--- +... +test_run = nil +--- +... diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua new file mode 100644 index 000000000..50759a5b1 --- /dev/null +++ b/test/engine/json.test.lua @@ -0,0 +1,129 @@ +test_run = require('test_run').new() +engine = test_run:get_cfg('engine') +-- +-- gh-1012: Indexes for JSON-defined paths. +-- +s = box.schema.space.create('withdata', {engine = engine}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = '.FIO'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO'}, {3, 'str', path = '.FIO.fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}}) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO....fname'}}}) +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '.FIO.fname', is_nullable = false}, {3, 'str', path = '["FIO"]["sname"]'}}}) +assert(idx ~= nil) +assert(idx.parts[2].path == ".FIO.fname") +format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'array'}, {'age', 'unsigned'}, {'level', 'unsigned'}} +s:format(format) +format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}} +s:format(format) +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '.FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5} +s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5} +idx:select() +idx:min() +idx:max() +s:drop() + +s = box.schema.create_space('withdata', {engine = engine}) +parts = {} +parts[1] = {1, 'unsigned', path='[2]'} +pk = s:create_index('pk', {parts = parts}) +s:insert{{1, 2}, 3} +s:upsert({{box.null, 2}}, {{'+', 2, 5}}) +s:get(2) +s:drop() + +-- Create index on space with data +s = box.schema.space.create('withdata', {engine = engine}) +pk = s:create_index('primary', { type = 'tree' }) +s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5} +s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5} +s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5} +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +_ = s:delete(1) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +_ = s:delete(2) +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +_ = s:delete(4) +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}}) +assert(idx ~= nil) +s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}}) +idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}}) +assert(idx2 ~= nil) +t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} +idx:select() +idx:min() +idx:max() +idx:drop() +s:drop() + +-- Test complex JSON indexes +s = box.schema.space.create('withdata', {engine = engine}) +parts = {} +parts[1] = {1, 'str', path='[3][2].a'} +parts[2] = {1, 'unsigned', path = '[3][1]'} +parts[3] = {2, 'str', path = '[2].d[1]'} +pk = s:create_index('primary', { type = 'tree', parts = parts}) +s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}} +s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6} +parts = {} +parts[1] = {4, 'unsigned', path='[1]', is_nullable = false} +parts[2] = {4, 'unsigned', path='[2]', is_nullable = true} +parts[3] = {4, 'unsigned', path='[4]', is_nullable = true} +trap_idx = s:create_index('trap', { type = 'tree', parts = parts}) +s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}} +parts = {} +parts[1] = {1, 'unsigned', path='[3][2].b' } +parts[2] = {3, 'unsigned'} +crosspart_idx = s:create_index('crosspart', { parts = parts}) +s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}} +parts = {} +parts[1] = {1, 'unsigned', path='[3][2].b'} +num_idx = s:create_index('numeric', {parts = parts}) +s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}} +num_idx:get(2) +num_idx:select() +num_idx:max() +num_idx:min() +assert(crosspart_idx:max() == num_idx:max()) +assert(crosspart_idx:min() == num_idx:min()) +trap_idx:max() +trap_idx:min() +s:drop() + +s = box.schema.space.create('withdata', {engine = engine}) +pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned'}}}) +assert(pk_simplified.path == box.NULL) +idx = s:create_index('idx', {parts = {{2, 'integer', path = '.a'}}}) +s:insert{31, {a = 1, aa = -1}} +s:insert{22, {a = 2, aa = -2}} +s:insert{13, {a = 3, aa = -3}} +idx:select() +idx:alter({parts = {{2, 'integer', path = '.aa'}}}) +idx:select() +s:drop() + +-- incompatible format change +s = box.schema.space.create('test') +i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}}) +s:insert{{-1}} +i:alter{parts = {{1, 'string', path = '[1]'}}} +s:insert{{'a'}} +i:drop() +i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}}) +s:insert{{{FIO=-1}}} +i:alter{parts = {{1, 'integer', path = '[1][1]'}}} +i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}} +s:drop() + +engine = nil +test_run = nil + -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 5/8] box: introduce has_json_paths flag in templates 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov ` (3 preceding siblings ...) 2018-12-17 6:52 ` [PATCH v6 4/8] box: introduce JSON Indexes Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-27 11:51 ` [tarantool-patches] " Konstantin Osipov 2018-12-17 6:52 ` [PATCH v6 6/8] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov ` (3 subsequent siblings) 8 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Introduced has_json_path flag for compare, hash and extract functions(that are really hot) to make possible do not look to path field for flat indexes without any JSON paths. Part of #1012 --- src/box/tuple_compare.cc | 112 +++++++++++++++++++++++++---------- src/box/tuple_extract_key.cc | 104 ++++++++++++++++++++------------ src/box/tuple_hash.cc | 45 ++++++++++---- 3 files changed, 182 insertions(+), 79 deletions(-) diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc index 554c29f83..97963d0d9 100644 --- a/src/box/tuple_compare.cc +++ b/src/box/tuple_compare.cc @@ -458,11 +458,12 @@ tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b, return i; } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool has_json_path> static inline int tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, struct key_def *key_def) { + assert(has_json_path == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -508,10 +509,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, end = part + key_def->part_count; for (; part < end; part++) { - field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, - field_map_a, part); - field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, - field_map_b, part); + if (!has_json_path) { + field_a = tuple_field_raw(format_a, tuple_a_raw, + field_map_a, + part->fieldno); + field_b = tuple_field_raw(format_b, tuple_b_raw, + field_map_b, + part->fieldno); + } else { + field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, + field_map_a, part); + field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, + field_map_b, part); + } assert(has_optional_parts || (field_a != NULL && field_b != NULL)); if (! is_nullable) { @@ -558,10 +568,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, */ end = key_def->parts + key_def->part_count; for (; part < end; ++part) { - field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, - field_map_a, part); - field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, - field_map_b, part); + if (!has_json_path) { + field_a = tuple_field_raw(format_a, tuple_a_raw, + field_map_a, + part->fieldno); + field_b = tuple_field_raw(format_b, tuple_b_raw, + field_map_b, + part->fieldno); + } else { + field_a = tuple_field_by_part_raw(format_a, tuple_a_raw, + field_map_a, part); + field_b = tuple_field_by_part_raw(format_b, tuple_b_raw, + field_map_b, part); + } /* * Extended parts are primary, and they can not * be absent or be NULLs. @@ -575,11 +594,12 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b, return 0; } -template<bool is_nullable, bool has_optional_parts> +template<bool is_nullable, bool has_optional_parts, bool has_json_paths> static inline int tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, uint32_t part_count, struct key_def *key_def) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || is_nullable); assert(is_nullable == key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); @@ -591,9 +611,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, const uint32_t *field_map = tuple_field_map(tuple); enum mp_type a_type, b_type; if (likely(part_count == 1)) { - const char *field = - tuple_field_by_part_raw(format, tuple_raw, field_map, - part); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, tuple_raw, field_map, + part->fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, + field_map, part); + } if (! is_nullable) { return tuple_compare_field(field, key, part->type, part->coll); @@ -617,9 +642,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key, struct key_part *end = part + part_count; int rc; for (; part < end; ++part, mp_next(&key)) { - const char *field = - tuple_field_by_part_raw(format, tuple_raw, - field_map, part); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, tuple_raw, field_map, + part->fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, + field_map, part); + } if (! is_nullable) { rc = tuple_compare_field(field, key, part->type, part->coll); @@ -1012,19 +1042,31 @@ static const comparator_signature cmp_arr[] = { #undef COMPARATOR +static const tuple_compare_t compare_slowpath_funcs[] = { + tuple_compare_slowpath<false, false, false>, + tuple_compare_slowpath<true, false, false>, + tuple_compare_slowpath<false, true, false>, + tuple_compare_slowpath<true, true, false>, + tuple_compare_slowpath<false, false, true>, + tuple_compare_slowpath<true, false, true>, + tuple_compare_slowpath<false, true, true>, + tuple_compare_slowpath<true, true, true> +}; + tuple_compare_t tuple_compare_create(const struct key_def *def) { + int cmp_func_idx = (def->is_nullable ? 1 : 0) + + 2 * (def->has_optional_parts ? 1 : 0) + + 4 * (def->has_json_paths ? 1 : 0); if (def->is_nullable) { if (key_def_is_sequential(def)) { if (def->has_optional_parts) return tuple_compare_sequential<true, true>; else return tuple_compare_sequential<true, false>; - } else if (def->has_optional_parts) { - return tuple_compare_slowpath<true, true>; } else { - return tuple_compare_slowpath<true, false>; + return compare_slowpath_funcs[cmp_func_idx]; } } assert(! def->has_optional_parts); @@ -1044,10 +1086,9 @@ tuple_compare_create(const struct key_def *def) return cmp_arr[k].f; } } - if (key_def_is_sequential(def)) - return tuple_compare_sequential<false, false>; - else - return tuple_compare_slowpath<false, false>; + return key_def_is_sequential(def) ? + tuple_compare_sequential<false, false> : + compare_slowpath_funcs[cmp_func_idx]; } /* }}} tuple_compare */ @@ -1229,9 +1270,23 @@ static const comparator_with_key_signature cmp_wk_arr[] = { #undef KEY_COMPARATOR +static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = { + tuple_compare_with_key_slowpath<false, false, false>, + tuple_compare_with_key_slowpath<true, false, false>, + tuple_compare_with_key_slowpath<false, true, false>, + tuple_compare_with_key_slowpath<true, true, false>, + tuple_compare_with_key_slowpath<false, false, true>, + tuple_compare_with_key_slowpath<true, false, true>, + tuple_compare_with_key_slowpath<false, true, true>, + tuple_compare_with_key_slowpath<true, true, true> +}; + tuple_compare_with_key_t tuple_compare_with_key_create(const struct key_def *def) { + int cmp_func_idx = (def->is_nullable ? 1 : 0) + + 2 * (def->has_optional_parts ? 1 : 0) + + 4 * (def->has_json_paths ? 1 : 0); if (def->is_nullable) { if (key_def_is_sequential(def)) { if (def->has_optional_parts) { @@ -1241,10 +1296,8 @@ tuple_compare_with_key_create(const struct key_def *def) return tuple_compare_with_key_sequential<true, false>; } - } else if (def->has_optional_parts) { - return tuple_compare_with_key_slowpath<true, true>; } else { - return tuple_compare_with_key_slowpath<true, false>; + return compare_with_key_slowpath_funcs[cmp_func_idx]; } } assert(! def->has_optional_parts); @@ -1267,10 +1320,9 @@ tuple_compare_with_key_create(const struct key_def *def) return cmp_wk_arr[k].f; } } - if (key_def_is_sequential(def)) - return tuple_compare_with_key_sequential<false, false>; - else - return tuple_compare_with_key_slowpath<false, false>; + return key_def_is_sequential(def) ? + tuple_compare_with_key_sequential<false, false> : + compare_with_key_slowpath_funcs[cmp_func_idx]; } /* }}} tuple_compare_with_key */ diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc index c40d7887d..30f4f2f35 100644 --- a/src/box/tuple_extract_key.cc +++ b/src/box/tuple_extract_key.cc @@ -5,13 +5,18 @@ enum { MSGPACK_NULL = 0xc0 }; /** True if key part i and i+1 are sequential. */ +template <bool has_json_paths> static inline bool key_def_parts_are_sequential(const struct key_def *def, int i) { uint32_t fieldno1 = def->parts[i].fieldno + 1; uint32_t fieldno2 = def->parts[i + 1].fieldno; - return fieldno1 == fieldno2 && def->parts[i].path == NULL && - def->parts[i + 1].path == NULL; + if (!has_json_paths) { + return fieldno1 == fieldno2; + } else { + return fieldno1 == fieldno2 && def->parts[i].path == NULL && + def->parts[i + 1].path == NULL; + } } /** True, if a key con contain two or more parts in sequence. */ @@ -19,7 +24,7 @@ static bool key_def_contains_sequential_parts(const struct key_def *def) { for (uint32_t i = 0; i < def->part_count - 1; ++i) { - if (key_def_parts_are_sequential(def, i)) + if (key_def_parts_are_sequential<true>(def, i)) return true; } return false; @@ -99,11 +104,13 @@ tuple_extract_key_sequential(const struct tuple *tuple, struct key_def *key_def, * General-purpose implementation of tuple_extract_key() * @copydoc tuple_extract_key() */ -template <bool contains_sequential_parts, bool has_optional_parts> +template <bool contains_sequential_parts, bool has_optional_parts, + bool has_json_paths> static char * tuple_extract_key_slowpath(const struct tuple *tuple, struct key_def *key_def, uint32_t *key_size) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(contains_sequential_parts == @@ -118,9 +125,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple, /* Calculate the key size. */ for (uint32_t i = 0; i < part_count; ++i) { - const char *field = - tuple_field_by_part_raw(format, data, field_map, - &key_def->parts[i]); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, data, field_map, + key_def->parts[i].fieldno); + } else { + field = tuple_field_by_part_raw(format, data, field_map, + &key_def->parts[i]); + } if (has_optional_parts && field == NULL) { bsize += mp_sizeof_nil(); continue; @@ -133,7 +145,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * minimize tuple_field_raw() calls. */ for (; i < part_count - 1; i++) { - if (!key_def_parts_are_sequential(key_def, i)) { + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) { /* * End of sequential part. */ @@ -159,9 +172,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple, } char *key_buf = mp_encode_array(key, part_count); for (uint32_t i = 0; i < part_count; ++i) { - const char *field = - tuple_field_by_part_raw(format, data, field_map, - &key_def->parts[i]); + const char *field; + if (!has_json_paths) { + field = tuple_field_raw(format, data, field_map, + key_def->parts[i].fieldno); + } else { + field = tuple_field_by_part_raw(format, data, field_map, + &key_def->parts[i]); + } if (has_optional_parts && field == NULL) { key_buf = mp_encode_nil(key_buf); continue; @@ -174,7 +192,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * minimize tuple_field_raw() calls. */ for (; i < part_count - 1; i++) { - if (!key_def_parts_are_sequential(key_def, i)) { + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) { /* * End of sequential part. */ @@ -207,11 +226,12 @@ tuple_extract_key_slowpath(const struct tuple *tuple, * General-purpose version of tuple_extract_key_raw() * @copydoc tuple_extract_key_raw() */ -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> static char * tuple_extract_key_slowpath_raw(const char *data, const char *data_end, struct key_def *key_def, uint32_t *key_size) { + assert(has_json_paths == key_def->has_json_paths); assert(!has_optional_parts || key_def->is_nullable); assert(has_optional_parts == key_def->has_optional_parts); assert(mp_sizeof_nil() == 1); @@ -239,7 +259,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, uint32_t fieldno = key_def->parts[i].fieldno; uint32_t null_count = 0; for (; i < key_def->part_count - 1; i++) { - if (!key_def_parts_are_sequential(key_def, i)) + if (!key_def_parts_are_sequential + <has_json_paths>(key_def, i)) break; } const struct key_part *part = &key_def->parts[i]; @@ -319,6 +340,17 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end, return key; } +static const tuple_extract_key_t extract_key_slowpath_funcs[] = { + tuple_extract_key_slowpath<false, false, false>, + tuple_extract_key_slowpath<true, false, false>, + tuple_extract_key_slowpath<false, true, false>, + tuple_extract_key_slowpath<true, true, false>, + tuple_extract_key_slowpath<false, false, true>, + tuple_extract_key_slowpath<true, false, true>, + tuple_extract_key_slowpath<false, true, true>, + tuple_extract_key_slowpath<true, true, true> +}; + /** * Initialize tuple_extract_key() and tuple_extract_key_raw() */ @@ -339,32 +371,30 @@ tuple_extract_key_set(struct key_def *key_def) tuple_extract_key_sequential_raw<false>; } } else { - if (key_def->has_optional_parts) { - assert(key_def->is_nullable); - if (key_def_contains_sequential_parts(key_def)) { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, true>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, true>; - } - } else { - if (key_def_contains_sequential_parts(key_def)) { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<true, false>; - } else { - key_def->tuple_extract_key = - tuple_extract_key_slowpath<false, - false>; - } - } + int func_idx = + (key_def_contains_sequential_parts(key_def) ? 1 : 0) + + 2 * (key_def->has_optional_parts ? 1 : 0) + + 4 * (key_def->has_json_paths ? 1 : 0); + key_def->tuple_extract_key = + extract_key_slowpath_funcs[func_idx]; + assert(!key_def->has_optional_parts || key_def->is_nullable); } if (key_def->has_optional_parts) { assert(key_def->is_nullable); - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<true>; + if (key_def->has_json_paths) { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<true, true>; + } else { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<true, false>; + } } else { - key_def->tuple_extract_key_raw = - tuple_extract_key_slowpath_raw<false>; + if (key_def->has_json_paths) { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<false, true>; + } else { + key_def->tuple_extract_key_raw = + tuple_extract_key_slowpath_raw<false, false>; + } } } diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc index 3486ce11c..8ede29045 100644 --- a/src/box/tuple_hash.cc +++ b/src/box/tuple_hash.cc @@ -213,7 +213,7 @@ static const hasher_signature hash_arr[] = { #undef HASHER -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> uint32_t tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def); @@ -256,10 +256,17 @@ tuple_hash_func_set(struct key_def *key_def) { } slowpath: - if (key_def->has_optional_parts) - key_def->tuple_hash = tuple_hash_slowpath<true>; - else - key_def->tuple_hash = tuple_hash_slowpath<false>; + if (key_def->has_optional_parts) { + if (key_def->has_json_paths) + key_def->tuple_hash = tuple_hash_slowpath<true, true>; + else + key_def->tuple_hash = tuple_hash_slowpath<true, false>; + } else { + if (key_def->has_json_paths) + key_def->tuple_hash = tuple_hash_slowpath<false, true>; + else + key_def->tuple_hash = tuple_hash_slowpath<false, false>; + } key_def->key_hash = key_hash_slowpath; } @@ -319,10 +326,11 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple, return tuple_hash_field(ph1, pcarry, &field, part->coll); } -template <bool has_optional_parts> +template <bool has_optional_parts, bool has_json_paths> uint32_t tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def) { + assert(has_json_paths == key_def->has_json_paths); assert(has_optional_parts == key_def->has_optional_parts); uint32_t h = HASH_SEED; uint32_t carry = 0; @@ -331,9 +339,13 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def) struct tuple_format *format = tuple_format(tuple); const char *tuple_raw = tuple_data(tuple); const uint32_t *field_map = tuple_field_map(tuple); - const char *field = - tuple_field_by_part_raw(format, tuple_raw, field_map, - key_def->parts); + const char *field; + if (!has_json_paths) { + field = tuple_field(tuple, prev_fieldno); + } else { + field = tuple_field_by_part_raw(format, tuple_raw, field_map, + key_def->parts); + } const char *end = (char *)tuple + tuple_size(tuple); if (has_optional_parts && field == NULL) { total_size += tuple_hash_null(&h, &carry); @@ -347,9 +359,18 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def) * need of tuple_field */ if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) { - struct key_part *part = &key_def->parts[part_id]; - field = tuple_field_by_part_raw(format, tuple_raw, - field_map, part); + if (!has_json_paths) { + field = tuple_field(tuple, + key_def->parts[part_id]. + fieldno); + } else { + struct key_part *part = + &key_def->parts[part_id]; + field = tuple_field_by_part_raw(format, + tuple_raw, + field_map, + part); + } } if (has_optional_parts && (field == NULL || field >= end)) { total_size += tuple_hash_null(&h, &carry); -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] [PATCH v6 5/8] box: introduce has_json_paths flag in templates 2018-12-17 6:52 ` [PATCH v6 5/8] box: introduce has_json_paths flag in templates Kirill Shcherbatov @ 2018-12-27 11:51 ` Konstantin Osipov 2018-12-27 11:52 ` Konstantin Osipov 0 siblings, 1 reply; 14+ messages in thread From: Konstantin Osipov @ 2018-12-27 11:51 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov * Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/12/17 14:27]: > Introduced has_json_path flag for compare, hash and extract > functions(that are really hot) to make possible do not look to > path field for flat indexes without any JSON paths. Shouldn't it be a template parameter, rather than a runtime one? -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] [PATCH v6 5/8] box: introduce has_json_paths flag in templates 2018-12-27 11:51 ` [tarantool-patches] " Konstantin Osipov @ 2018-12-27 11:52 ` Konstantin Osipov 2018-12-27 11:57 ` [tarantool-patches] " Konstantin Osipov 0 siblings, 1 reply; 14+ messages in thread From: Konstantin Osipov @ 2018-12-27 11:52 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov * Konstantin Osipov <kostja@tarantool.org> [18/12/27 14:51]: > * Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/12/17 14:27]: > > Introduced has_json_path flag for compare, hash and extract > > functions(that are really hot) to make possible do not look to > > path field for flat indexes without any JSON paths. > > Shouldn't it be a template parameter, rather than a runtime one? Sorry, obvious question, noticed it is already. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [tarantool-patches] Re: [PATCH v6 5/8] box: introduce has_json_paths flag in templates 2018-12-27 11:52 ` Konstantin Osipov @ 2018-12-27 11:57 ` Konstantin Osipov 0 siblings, 0 replies; 14+ messages in thread From: Konstantin Osipov @ 2018-12-27 11:57 UTC (permalink / raw) To: tarantool-patches; +Cc: vdavydov.dev, Kirill Shcherbatov * Konstantin Osipov <kostja@tarantool.org> [18/12/27 14:54]: > * Konstantin Osipov <kostja@tarantool.org> [18/12/27 14:51]: > > * Kirill Shcherbatov <kshcherbatov@tarantool.org> [18/12/17 14:27]: > > > Introduced has_json_path flag for compare, hash and extract > > > functions(that are really hot) to make possible do not look to > > > path field for flat indexes without any JSON paths. > > > > Shouldn't it be a template parameter, rather than a runtime one? > > Sorry, obvious question, noticed it is already. Which begs another question, shouldn't tuple_init_field_map be a template function as well? I think time has come with json paths, it's a very hot path. -- Konstantin Osipov, Moscow, Russia, +7 903 626 22 32 http://tarantool.io - www.twitter.com/kostja_osipov ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 6/8] box: tune tuple_field_raw_by_path for indexed data 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov ` (4 preceding siblings ...) 2018-12-17 6:52 ` [PATCH v6 5/8] box: introduce has_json_paths flag in templates Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 7/8] box: introduce offset_slot cache in key_part Kirill Shcherbatov ` (2 subsequent siblings) 8 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov We don't need to parse tuple in tuple_field_raw_by_path if required field has been indexed. We do path lookup in field tree of JSON paths and return data by it's offset from field_map instead of whole tuple parsing. Part of #1012 --- src/box/tuple.h | 19 ------------------- src/box/tuple_format.c | 23 +++++++++++------------ src/box/tuple_format.h | 24 ------------------------ test/engine/json.result | 5 +++++ test/engine/json.test.lua | 2 ++ 5 files changed, 18 insertions(+), 55 deletions(-) diff --git a/src/box/tuple.h b/src/box/tuple.h index 3c8b8825e..9d1313e93 100644 --- a/src/box/tuple.h +++ b/src/box/tuple.h @@ -547,25 +547,6 @@ tuple_field_by_path(const struct tuple *tuple, const char *path, path_hash, field); } -/** - * Get tuple field by its name. - * @param tuple Tuple to get field from. - * @param name Field name. - * @param name_len Length of @a name. - * @param name_hash Hash of @a name. - * - * @retval not NULL MessagePack field. - * @retval NULL No field with @a name. - */ -static inline const char * -tuple_field_by_name(const struct tuple *tuple, const char *name, - uint32_t name_len, uint32_t name_hash) -{ - return tuple_field_raw_by_name(tuple_format(tuple), tuple_data(tuple), - tuple_field_map(tuple), name, name_len, - name_hash); -} - /** * @brief Tuple Interator */ diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 4314d3b1d..7eac3cf50 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -1134,10 +1134,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, goto error; switch(token.type) { case JSON_TOKEN_NUM: { - int index = token.num; - *field = tuple_field_raw(format, tuple, field_map, index); - if (*field == NULL) - return 0; + fieldno = token.num; break; } case JSON_TOKEN_STR: { @@ -1154,10 +1151,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, */ name_hash = field_name_hash(token.str, token.len); } - *field = tuple_field_raw_by_name(format, tuple, field_map, - token.str, token.len, - name_hash); - if (*field == NULL) + if (tuple_fieldno_by_name(format->dict, token.str, token.len, + name_hash, &fieldno) != 0) return 0; break; } @@ -1166,13 +1161,17 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, *field = NULL; return 0; } - rc = tuple_field_go_to_path(field, path + lexer.offset, - path_len - lexer.offset); + /* Optimize indexed JSON field data access. */ + struct key_part part; + part.fieldno = fieldno; + part.path = (char *)path + lexer.offset; + part.path_len = path_len - lexer.offset; + rc = tuple_field_by_part_raw_slowpath(format, tuple, field_map, &part, + field); if (rc == 0) return 0; /* Setup absolute error position. */ rc += lexer.offset; - error: assert(rc > 0); diag_set(ClientError, ER_ILLEGAL_PARAMS, @@ -1189,7 +1188,7 @@ tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data, struct tuple_field *field = tuple_format_field_by_path(format, part->fieldno, part->path, part->path_len); - if (field != NULL) { + if (field != NULL && field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { int32_t offset_slot = field->offset_slot; assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size); diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index 8c05c3322..a9b4bb675 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -435,30 +435,6 @@ tuple_field_raw(const struct tuple_format *format, const char *tuple, return tuple; } -/** - * Get tuple field by its name. - * @param format Tuple format. - * @param tuple MessagePack tuple's body. - * @param field_map Tuple field map. - * @param name Field name. - * @param name_len Length of @a name. - * @param name_hash Hash of @a name. - * - * @retval not NULL MessagePack field. - * @retval NULL No field with @a name. - */ -static inline const char * -tuple_field_raw_by_name(struct tuple_format *format, const char *tuple, - const uint32_t *field_map, const char *name, - uint32_t name_len, uint32_t name_hash) -{ - uint32_t fieldno; - if (tuple_fieldno_by_name(format->dict, name, name_len, name_hash, - &fieldno) != 0) - return NULL; - return tuple_field_raw(format, tuple, field_map, fieldno); -} - /** * Retrieve msgpack data by JSON path. * @param data Pointer to msgpack with data. diff --git a/test/engine/json.result b/test/engine/json.result index 7e7374e6f..c33e568b3 100644 --- a/test/engine/json.result +++ b/test/engine/json.result @@ -215,6 +215,11 @@ assert(idx2 ~= nil) t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} --- ... +-- Test field_map in tuple speed-up access by indexed path. +t["[3][\"FIO\"][\"fname\"]"] +--- +- Agent +... idx:select() --- - - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5] diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua index 50759a5b1..45153743d 100644 --- a/test/engine/json.test.lua +++ b/test/engine/json.test.lua @@ -59,6 +59,8 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"][" idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}}) assert(idx2 ~= nil) t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5} +-- Test field_map in tuple speed-up access by indexed path. +t["[3][\"FIO\"][\"fname\"]"] idx:select() idx:min() idx:max() -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 7/8] box: introduce offset_slot cache in key_part 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov ` (5 preceding siblings ...) 2018-12-17 6:52 ` [PATCH v6 6/8] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 8/8] box: specify indexes in user-friendly form Kirill Shcherbatov 2018-12-18 20:46 ` [PATCH v6 0/8] box: Indexes by JSON path Vladimir Davydov 8 siblings, 0 replies; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov tuple_field_by_part looks up the tuple_field corresponding to the given key part in tuple_format in order to quickly retrieve the offset of indexed data from the tuple field map. For regular indexes this operation is blazing fast, however of JSON indexes it is not as we have to parse the path to data and then do multiple lookups in a JSON tree. Since tuple_field_by_part is used by comparators, we should strive to make this routine as fast as possible for all kinds of indexes. This patch introduces an optimization that is supposed to make tuple_field_by_part for JSON indexes as fast as it is for regular indexes in most cases. We do that by caching the offset slot right in key_part. There's a catch here however - we create a new format whenever an index is dropped or created and we don't reindex old tuples. As a result, there may be several generations of tuples in the same space, all using different formats while there's the only key_def used for comparison. To overcome this problem, we introduce the notion of tuple_format epoch. This is a counter incremented each time a new format is created. We store it in tuple_format and key_def, and we only use the offset slot cached in a key_def if it's epoch coincides with the epoch of the tuple format. If they don't, we look up a tuple_field as before, and then update the cached value provided the epoch of the tuple format. Part of #1012 --- src/box/key_def.c | 16 +++++++++++----- src/box/key_def.h | 11 +++++++++++ src/box/tuple_format.c | 10 ++++++++++ src/box/tuple_format.h | 12 ++++++++++++ 4 files changed, 44 insertions(+), 5 deletions(-) diff --git a/src/box/key_def.c b/src/box/key_def.c index a6bb89f5b..ea2824992 100644 --- a/src/box/key_def.c +++ b/src/box/key_def.c @@ -168,7 +168,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, enum field_type type, enum on_conflict_action nullable_action, struct coll *coll, uint32_t coll_id, enum sort_order sort_order, const char *path, - uint32_t path_len, char **static_pool) + uint32_t path_len, char **static_pool, + int32_t offset_slot_cache, uint64_t format_epoch) { assert(part_no < def->part_count); assert(type < field_type_MAX); @@ -180,6 +181,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno, def->parts[part_no].coll = coll; def->parts[part_no].coll_id = coll_id; def->parts[part_no].sort_order = sort_order; + def->parts[part_no].offset_slot_cache = offset_slot_cache; + def->parts[part_no].format_epoch = format_epoch; if (path != NULL) { assert(static_pool != NULL); def->parts[part_no].path = *static_pool; @@ -226,7 +229,8 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count) uint32_t path_len = part->path != NULL ? strlen(part->path) : 0; key_def_set_part(def, i, part->fieldno, part->type, part->nullable_action, coll, part->coll_id, - part->sort_order, part->path, path_len, &data); + part->sort_order, part->path, path_len, &data, + TUPLE_OFFSET_SLOT_NIL, 0); } key_def_set_cmp(def); return def; @@ -280,7 +284,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count) (enum field_type)types[item], ON_CONFLICT_ACTION_DEFAULT, NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0, - NULL); + NULL, TUPLE_OFFSET_SLOT_NIL, 0); } key_def_set_cmp(key_def); return key_def; @@ -705,7 +709,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second) key_def_set_part(new_def, pos++, part->fieldno, part->type, part->nullable_action, part->coll, part->coll_id, part->sort_order, part->path, - part->path_len, &data); + part->path_len, &data, part->offset_slot_cache, + part->format_epoch); } /* Set-append second key def's part to the new key def. */ @@ -717,7 +722,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second) key_def_set_part(new_def, pos++, part->fieldno, part->type, part->nullable_action, part->coll, part->coll_id, part->sort_order, part->path, - part->path_len, &data); + part->path_len, &data, part->offset_slot_cache, + part->format_epoch); } key_def_set_cmp(new_def); return new_def; diff --git a/src/box/key_def.h b/src/box/key_def.h index df498964c..8c1f7ce9e 100644 --- a/src/box/key_def.h +++ b/src/box/key_def.h @@ -100,6 +100,17 @@ struct key_part { char *path; /** The length of JSON path. */ uint32_t path_len; + /** Cached format epoch. */ + uint64_t format_epoch; + /** + * Cached value of the offset slot corresponding to + * the indexed field (tuple_field::offset_slot). + * Valid only if key_def::epoch equals the epoch of + * the tuple format. Updated in + * tuple_field_by_part_slowpath to always store the + * offset corresponding to the last used tuple format. + */ + int32_t offset_slot_cache; }; struct key_def; diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c index 7eac3cf50..d0b5dfa16 100644 --- a/src/box/tuple_format.c +++ b/src/box/tuple_format.c @@ -38,6 +38,7 @@ struct tuple_format **tuple_formats; static intptr_t recycled_format_ids = FORMAT_ID_NIL; static uint32_t formats_size = 0, formats_capacity = 0; +static uint64_t formats_epoch = 0; static struct tuple_field * tuple_field_new(void) @@ -608,6 +609,7 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys, format->vtab = *vtab; format->engine = NULL; format->is_temporary = false; + format->epoch = ++formats_epoch; if (tuple_format_register(format) < 0) { tuple_format_destroy(format); free(format); @@ -1166,6 +1168,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple, part.fieldno = fieldno; part.path = (char *)path + lexer.offset; part.path_len = path_len - lexer.offset; + part.format_epoch = 0; rc = tuple_field_by_part_raw_slowpath(format, tuple, field_map, &part, field); if (rc == 0) @@ -1192,6 +1195,13 @@ tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data, int32_t offset_slot = field->offset_slot; assert(-offset_slot * sizeof(uint32_t) <= format->field_map_size); + + /* Update format epoch cache. */ + assert(part->format_epoch != format->epoch); + assert(format->epoch != 0); + part->offset_slot_cache = offset_slot; + part->format_epoch = format->epoch; + *raw = field_map[offset_slot] == 0 ? NULL : data + field_map[offset_slot]; return 0; diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h index a9b4bb675..d6bc08bc0 100644 --- a/src/box/tuple_format.h +++ b/src/box/tuple_format.h @@ -137,6 +137,12 @@ tuple_field_is_nullable(const struct tuple_field *tuple_field) * Tuple format describes how tuple is stored and information about its fields */ struct tuple_format { + /** + * Counter that grows incrementally on space rebuild + * used for caching offset slot in key_part, for more + * details see key_part::offset_slot_cache. + */ + uint64_t epoch; /** Virtual function table */ struct tuple_format_vtab vtab; /** Pointer to engine-specific data. */ @@ -486,6 +492,12 @@ tuple_field_by_part_raw(struct tuple_format *format, const char *data, { if (likely(part->path == NULL)) { return tuple_field_raw(format, data, field_map, part->fieldno); + } else if (part->format_epoch == format->epoch) { + int32_t offset_slot = part->offset_slot_cache; + assert(-offset_slot * sizeof(uint32_t) <= + format->field_map_size); + return field_map[offset_slot] != 0 ? + data + field_map[offset_slot] : NULL; } else { const char *raw; MAYBE_UNUSED int rc = -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v6 8/8] box: specify indexes in user-friendly form 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov ` (6 preceding siblings ...) 2018-12-17 6:52 ` [PATCH v6 7/8] box: introduce offset_slot cache in key_part Kirill Shcherbatov @ 2018-12-17 6:52 ` Kirill Shcherbatov 2018-12-18 20:58 ` Vladimir Davydov 2018-12-18 20:46 ` [PATCH v6 0/8] box: Indexes by JSON path Vladimir Davydov 8 siblings, 1 reply; 14+ messages in thread From: Kirill Shcherbatov @ 2018-12-17 6:52 UTC (permalink / raw) To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov Implemented a more convenient interface for creating an index by JSON path. Instead of specifying fieldno and relative path it comes possible to pass full JSON path to data. Closes #1012 @TarantoolBot document Title: Indexes by JSON path Sometimes field data could have complex document structure. When this structure is consistent across whole document, you are able to create an index by JSON path. Example: s = box.schema.space.create('sample') format = {{'id', 'unsigned'}, {'data', 'map'}} s:format(format) -- explicit JSON index creation age_idx = s:create_index('age', {{2, 'number', path = ".age"}}) -- user-friendly syntax for JSON index creation parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'}, {'data.age', 'number'}} info_idx = s:create_index('info', {parts = parts}}) s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}}) --- src/box/lua/index.c | 64 +++++++++++++++++++++++++++++++++++++++ src/box/lua/schema.lua | 22 +++++++------- test/engine/json.result | 28 +++++++++++++++++ test/engine/json.test.lua | 8 +++++ 4 files changed, 111 insertions(+), 11 deletions(-) diff --git a/src/box/lua/index.c b/src/box/lua/index.c index 6265c044a..9b04c5d9a 100644 --- a/src/box/lua/index.c +++ b/src/box/lua/index.c @@ -35,6 +35,9 @@ #include "box/box.h" #include "box/index.h" #include "box/lua/tuple.h" +#include "box/schema.h" +#include "box/tuple_format.h" +#include "json/json.h" #include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */ /** {{{ box.index Lua library: access to spaces and indexes @@ -328,6 +331,66 @@ lbox_index_compact(lua_State *L) return 0; } +/** + * Resolve field index by absolute JSON path first component and + * return relative JSON path. + */ +static int +lbox_index_path_resolve(struct lua_State *L) +{ + if (lua_gettop(L) != 3 || + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) { + return luaL_error(L, "Usage box.internal." + "path_resolve(part_id, space_id, path)"); + } + uint32_t part_id = lua_tonumber(L, 1); + uint32_t space_id = lua_tonumber(L, 2); + size_t path_len; + const char *path = lua_tolstring(L, 3, &path_len); + struct space *space = space_cache_find(space_id); + if (space == NULL) + return luaT_error(L); + struct json_lexer lexer; + struct json_token token; + json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); + int rc = json_lexer_next_token(&lexer, &token); + if (rc != 0) { + const char *err_msg = + tt_sprintf("options.parts[%d]: error in path on " + "position %d", part_id, rc); + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); + return luaT_error(L); + } + assert(space->format != NULL && space->format->dict != NULL); + uint32_t fieldno = token.num; + uint32_t field_count = tuple_format_field_count(space->format); + if (token.type == JSON_TOKEN_NUM && fieldno >= field_count) { + const char *err_msg = + tt_sprintf("options.parts[%d]: field '%d' referenced " + "in path is greater than format field " + "count %d", part_id, + fieldno + TUPLE_INDEX_BASE, field_count); + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); + return luaT_error(L); + } else if (token.type == JSON_TOKEN_STR && + tuple_fieldno_by_name(space->format->dict, token.str, + token.len, + field_name_hash(token.str, token.len), + &fieldno) != 0) { + const char *err_msg = + tt_sprintf("options.parts[%d]: field was not found by " + "name '%.*s'", part_id, token.len, + token.str); + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); + return luaT_error(L); + } + fieldno += TUPLE_INDEX_BASE; + path += lexer.offset; + lua_pushnumber(L, fieldno); + lua_pushstring(L, path); + return 2; +} + /* }}} */ void @@ -365,6 +428,7 @@ box_lua_index_init(struct lua_State *L) {"truncate", lbox_truncate}, {"stat", lbox_index_stat}, {"compact", lbox_index_compact}, + {"path_resolve", lbox_index_path_resolve}, {NULL, NULL} }; diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua index 8a804f0ba..497cf197c 100644 --- a/src/box/lua/schema.lua +++ b/src/box/lua/schema.lua @@ -575,7 +575,7 @@ local function update_index_parts_1_6_0(parts) return result end -local function update_index_parts(format, parts) +local function update_index_parts(format, parts, space_id) if type(parts) ~= "table" then box.error(box.error.ILLEGAL_PARAMS, "options.parts parameter should be a table") @@ -626,16 +626,16 @@ local function update_index_parts(format, parts) box.error(box.error.ILLEGAL_PARAMS, "options.parts[" .. i .. "]: field (name or number) is expected") elseif type(part.field) == 'string' then - for k,v in pairs(format) do - if v.name == part.field then - part.field = k - break - end - end - if type(part.field) == 'string' then + local idx, path = box.internal.path_resolve(i, space_id, part.field) + if part.path ~= nil and part.path ~= path then box.error(box.error.ILLEGAL_PARAMS, - "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'") + "options.parts[" .. i .. "]: field path '".. + part.path.." doesn't math the path '" .. + part.field .. "'") end + parts_can_be_simplified = parts_can_be_simplified and path == nil + part.field = idx + part.path = path or part.path elseif part.field == 0 then box.error(box.error.ILLEGAL_PARAMS, "options.parts[" .. i .. "]: field (number) must be one-based") @@ -792,7 +792,7 @@ box.schema.index.create = function(space_id, name, options) end end local parts, parts_can_be_simplified = - update_index_parts(format, options.parts) + update_index_parts(format, options.parts, space_id) -- create_index() options contains type, parts, etc, -- stored separately. Remove these members from index_opts local index_opts = { @@ -959,7 +959,7 @@ box.schema.index.alter = function(space_id, index_id, options) if options.parts then local parts_can_be_simplified parts, parts_can_be_simplified = - update_index_parts(format, options.parts) + update_index_parts(format, options.parts, space_id) -- save parts in old format if possible if parts_can_be_simplified then parts = simplify_index_parts(parts) diff --git a/test/engine/json.result b/test/engine/json.result index c33e568b3..60d71bdfa 100644 --- a/test/engine/json.result +++ b/test/engine/json.result @@ -71,6 +71,34 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '.FIO.fnam - error: Field ["data"]["FIO"]["fname"] has type 'string' in one index, but type 'number' in another ... +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: error in path on position 1' +... +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is + greater than format field count 5' +... +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) +--- +- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid''' +... +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) +--- +... +assert(idx3 ~= nil) +--- +- true +... +assert(idx3.parts[2].path == ".FIO[\"fname\"]") +--- +- true +... +-- Vinyl has optimizations that omit index checks, so errors could differ. +idx3:drop() +--- +... s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} --- - error: 'Tuple field ["data"]["FIO"] type does not match one required by operation: diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua index 45153743d..f91846d79 100644 --- a/test/engine/json.test.lua +++ b/test/engine/json.test.lua @@ -19,6 +19,14 @@ s:format(format) format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}} s:format(format) s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '.FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) +assert(idx3 ~= nil) +assert(idx3.parts[2].path == ".FIO[\"fname\"]") +-- Vinyl has optimizations that omit index checks, so errors could differ. +idx3:drop() s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} -- 2.19.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 8/8] box: specify indexes in user-friendly form 2018-12-17 6:52 ` [PATCH v6 8/8] box: specify indexes in user-friendly form Kirill Shcherbatov @ 2018-12-18 20:58 ` Vladimir Davydov 0 siblings, 0 replies; 14+ messages in thread From: Vladimir Davydov @ 2018-12-18 20:58 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja https://www.freelists.org/post/tarantool-patches/PATCH-v5-99-box-specify-indexes-in-userfriendly-form,1 On Mon, Dec 17, 2018 at 09:52:52AM +0300, Kirill Shcherbatov wrote: > Implemented a more convenient interface for creating an index > by JSON path. Instead of specifying fieldno and relative path > it comes possible to pass full JSON path to data. > > Closes #1012 > > @TarantoolBot document > Title: Indexes by JSON path > Sometimes field data could have complex document structure. > When this structure is consistent across whole document, > you are able to create an index by JSON path. > > Example: > s = box.schema.space.create('sample') > format = {{'id', 'unsigned'}, {'data', 'map'}} > s:format(format) > -- explicit JSON index creation > age_idx = s:create_index('age', {{2, 'number', path = ".age"}}) > -- user-friendly syntax for JSON index creation > parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'}, > {'data.age', 'number'}} > info_idx = s:create_index('info', {parts = parts}}) > s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}}) > --- > src/box/lua/index.c | 64 +++++++++++++++++++++++++++++++++++++++ > src/box/lua/schema.lua | 22 +++++++------- > test/engine/json.result | 28 +++++++++++++++++ > test/engine/json.test.lua | 8 +++++ > 4 files changed, 111 insertions(+), 11 deletions(-) > > diff --git a/src/box/lua/index.c b/src/box/lua/index.c > index 6265c044a..9b04c5d9a 100644 > --- a/src/box/lua/index.c > +++ b/src/box/lua/index.c > @@ -35,6 +35,9 @@ > #include "box/box.h" > #include "box/index.h" > #include "box/lua/tuple.h" > +#include "box/schema.h" > +#include "box/tuple_format.h" > +#include "json/json.h" > #include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */ > > /** {{{ box.index Lua library: access to spaces and indexes > @@ -328,6 +331,66 @@ lbox_index_compact(lua_State *L) > return 0; > } > > +/** > + * Resolve field index by absolute JSON path first component and > + * return relative JSON path. > + */ > +static int > +lbox_index_path_resolve(struct lua_State *L) > +{ > + if (lua_gettop(L) != 3 || > + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) { > + return luaL_error(L, "Usage box.internal." > + "path_resolve(part_id, space_id, path)"); > + } > + uint32_t part_id = lua_tonumber(L, 1); > + uint32_t space_id = lua_tonumber(L, 2); > + size_t path_len; > + const char *path = lua_tolstring(L, 3, &path_len); > + struct space *space = space_cache_find(space_id); > + if (space == NULL) > + return luaT_error(L); > + struct json_lexer lexer; > + struct json_token token; > + json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE); > + int rc = json_lexer_next_token(&lexer, &token); > + if (rc != 0) { > + const char *err_msg = > + tt_sprintf("options.parts[%d]: error in path on " > + "position %d", part_id, rc); > + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); > + return luaT_error(L); > + } > + assert(space->format != NULL && space->format->dict != NULL); > + uint32_t fieldno = token.num; > + uint32_t field_count = tuple_format_field_count(space->format); > + if (token.type == JSON_TOKEN_NUM && fieldno >= field_count) { > + const char *err_msg = > + tt_sprintf("options.parts[%d]: field '%d' referenced " > + "in path is greater than format field " > + "count %d", part_id, > + fieldno + TUPLE_INDEX_BASE, field_count); > + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); > + return luaT_error(L); > + } else if (token.type == JSON_TOKEN_STR && > + tuple_fieldno_by_name(space->format->dict, token.str, > + token.len, > + field_name_hash(token.str, token.len), > + &fieldno) != 0) { > + const char *err_msg = > + tt_sprintf("options.parts[%d]: field was not found by " > + "name '%.*s'", part_id, token.len, > + token.str); > + diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg); > + return luaT_error(L); > + } > + fieldno += TUPLE_INDEX_BASE; > + path += lexer.offset; > + lua_pushnumber(L, fieldno); > + lua_pushstring(L, path); > + return 2; > +} > + > /* }}} */ > > void > @@ -365,6 +428,7 @@ box_lua_index_init(struct lua_State *L) > {"truncate", lbox_truncate}, > {"stat", lbox_index_stat}, > {"compact", lbox_index_compact}, > + {"path_resolve", lbox_index_path_resolve}, > {NULL, NULL} > }; > > diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua > index 8a804f0ba..497cf197c 100644 > --- a/src/box/lua/schema.lua > +++ b/src/box/lua/schema.lua > @@ -575,7 +575,7 @@ local function update_index_parts_1_6_0(parts) > return result > end > > -local function update_index_parts(format, parts) > +local function update_index_parts(format, parts, space_id) > if type(parts) ~= "table" then > box.error(box.error.ILLEGAL_PARAMS, > "options.parts parameter should be a table") > @@ -626,16 +626,16 @@ local function update_index_parts(format, parts) > box.error(box.error.ILLEGAL_PARAMS, > "options.parts[" .. i .. "]: field (name or number) is expected") > elseif type(part.field) == 'string' then > - for k,v in pairs(format) do > - if v.name == part.field then > - part.field = k > - break > - end > - end > - if type(part.field) == 'string' then > + local idx, path = box.internal.path_resolve(i, space_id, part.field) > + if part.path ~= nil and part.path ~= path then > box.error(box.error.ILLEGAL_PARAMS, > - "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'") > + "options.parts[" .. i .. "]: field path '".. > + part.path.." doesn't math the path '" .. > + part.field .. "'") > end > + parts_can_be_simplified = parts_can_be_simplified and path == nil > + part.field = idx > + part.path = path or part.path > elseif part.field == 0 then > box.error(box.error.ILLEGAL_PARAMS, > "options.parts[" .. i .. "]: field (number) must be one-based") > @@ -792,7 +792,7 @@ box.schema.index.create = function(space_id, name, options) > end > end > local parts, parts_can_be_simplified = > - update_index_parts(format, options.parts) > + update_index_parts(format, options.parts, space_id) > -- create_index() options contains type, parts, etc, > -- stored separately. Remove these members from index_opts > local index_opts = { > @@ -959,7 +959,7 @@ box.schema.index.alter = function(space_id, index_id, options) > if options.parts then > local parts_can_be_simplified > parts, parts_can_be_simplified = > - update_index_parts(format, options.parts) > + update_index_parts(format, options.parts, space_id) > -- save parts in old format if possible > if parts_can_be_simplified then > parts = simplify_index_parts(parts) > diff --git a/test/engine/json.result b/test/engine/json.result > index c33e568b3..60d71bdfa 100644 > --- a/test/engine/json.result > +++ b/test/engine/json.result > @@ -71,6 +71,34 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '.FIO.fnam > - error: Field ["data"]["FIO"]["fname"] has type 'string' in one index, but type 'number' > in another > ... > +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) > +--- > +- error: 'Illegal parameters, options.parts[2]: error in path on position 1' > +... > +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) > +--- > +- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is > + greater than format field count 5' > +... > +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) > +--- > +- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid''' > +... > +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) > +--- > +... > +assert(idx3 ~= nil) > +--- > +- true > +... > +assert(idx3.parts[2].path == ".FIO[\"fname\"]") > +--- > +- true > +... > +-- Vinyl has optimizations that omit index checks, so errors could differ. > +idx3:drop() > +--- > +... > s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} > --- > - error: 'Tuple field ["data"]["FIO"] type does not match one required by operation: > diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua > index 45153743d..f91846d79 100644 > --- a/test/engine/json.test.lua > +++ b/test/engine/json.test.lua > @@ -19,6 +19,14 @@ s:format(format) > format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}} > s:format(format) > s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '.FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}}) > +s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}}) > +s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}}) > +s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}}) > +idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}}) > +assert(idx3 ~= nil) > +assert(idx3.parts[2].path == ".FIO[\"fname\"]") > +-- Vinyl has optimizations that omit index checks, so errors could differ. > +idx3:drop() > s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5} > s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5} > s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5} > -- > 2.19.2 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v6 0/8] box: Indexes by JSON path 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov ` (7 preceding siblings ...) 2018-12-17 6:52 ` [PATCH v6 8/8] box: specify indexes in user-friendly form Kirill Shcherbatov @ 2018-12-18 20:46 ` Vladimir Davydov 8 siblings, 0 replies; 14+ messages in thread From: Vladimir Davydov @ 2018-12-18 20:46 UTC (permalink / raw) To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja On Mon, Dec 17, 2018 at 09:52:44AM +0300, Kirill Shcherbatov wrote: > Sometimes field data could have complex document structure. > When this structure is consistent across whole document, > you are able to create an index by JSON path. > > Changes in version 6: > - key_part(s) path string is not 0-terminated anymore > - global formats_epoch pool used to initialize tuple_format:epoch > numeric field > - more informative error messages > - splitted changes to few new additional preparatory commits > - got rid off hashtable in vy_stmt_build > - prevent traversing field tree twice in vy_stmt_build > - inline basic field_by_part with slowpath alternative > - simplified tuple_format1_can_store_format2_tuples > - splitted tuple_init_field_map to helper routines > - moved tests to separate file > - large and descriptive comments for many things > > http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes > https://github.com/tarantool/tarantool/issues/1012 > > Kirill Shcherbatov (8): > box: refactor tuple_validate_raw > box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} > box: build path to field string uniformly on error I'd love to apply these preparatory patches before looking at the rest, but they are badly split and leave the code in a rather incomplete state. Let's sort them out first. What I want to see: 1. Implement tuple_validate with tuple_init_field_map. Do not mix field_map_template in this patch, as it has nothing to do with that. And it's not just refactoring. Actually, it can degrade the function performance. Please fix the commit message accordingly. 2. Refactor field type/nullability checks. Basically, this is what patch #2 of your series does, but it's not good enough. First, mp_type_is_compatible and key_mp_type_validate share the same piece of code. You should factor it out instead of copying and pasting, e.g. - field_mp_type_is_compatible returns true if given field and msgpack types are compatible. It should probably be defined in field_def.c. key_mp_type should probably be renamed to field_mp_type and moved to field_def.c, as it deals with field_type. I think it's also worth moving mp_type_strs to field_def.c, too. - key_part_validate takes a pointer to msgpack and a field type. Calls field_mp_type_is_compatible(mp_typeof(*key), field_type) to check if the given key part is valid. Sets diag on error. - key_validate_parts calls key_part_validate for each msgpack part. I'm not sure about function names - please think/try by yourself. Second, key_mp_type_validate doesn't need to take an error code anymore - now we always pass ER_KEY_PART_COUNT to it. Third, changing %u to %s in error messages should be done later, see step 4. And you should use int2str instead of tt_snprintf for it. This patch is going to do quite a bit of work - that's OK I think, just enumerate all the changes done by it in a bullet list in the commit message. 3. Add a helper function for getting a path to a json_token in a json_tree. Should be defined in json.c. We will use this function for reporting tuple format violations. I understand that if we go down this path, we won't be able to report top-level field names as they are defined in a tuple_dictionary, not in a json_tree. I think it's OK as the messages will be descriptive enough even if we use field numbers instead. At the same time, this will allow us to move a relatively big chunk of code that doesn't really have anything to do with tuples out of tuple_format.c. This function should be implemented without an alloc() callback. It should be passed a buffer of a fixed length. It should return the number of bytes that would be printed if there were enough space in the buffer, in snprintf-like manner, so that one could first call it with a 0-length buffer to figure out the buffer length, then allocate a buffer and call it again, this time for real. Don't forget to add a unit test. 4. Use the helper introduced at step #3 to report field names in error messages. You should change %u to %s in error messages in this patch, not in patch 2, because after patch 2 we still use field numbers, not paths. Let's leave field_map validation introduction to the main patch of the series for now. It would be great to add it in a separate patch, but it wouldn't work for plain indexes as missing fields are already checked with ER_MIN_FIELD_COUNT, i.e. we would just add some dead code, and unfortunately we can't get rid of ER_MIN_FIELD_COUNT, because space.format fields aren't assigned slots in the field map so we can't use the field map to check if they are present. This makes me wonder what we are going to do when we get to support setting JSON schema in space.format. How will we check that all formatted fields are present then? We need to figure it out before proceeding with the main patch. Let's discuss it. If you agree with my points above, please send the four patches in a separate series. Do not resubmit the rest of the series before I've committed the preparatory patches. Side notes: - Never add dead code in a patch. All code added by a patch should be tested. Introducing tuple_field_map_validate before implementing JSON index support violates this rule. - Try not to rewrite the same piece of code multiple times in the same series, because this practically doubles my work as a reviewer. Patches of a patch set should fit in like pieces of a jigsaw puzzle. Here's an example of you doing this: @@ -547,8 +559,10 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, if (validate && !mp_type_is_compatible(type, field->type, is_nullable) != 0) { - diag_set(ClientError, ER_FIELD_TYPE, - tt_sprintf("%u", token.num + 1), + const char *path = + tuple_field_json_path(format, field); + assert(path != NULL); + diag_set(ClientError, ER_FIELD_TYPE, path, field_type_strs[field->type]); return -1; } and in the next patch @@ -560,10 +922,13 @@ tuple_init_field_map(const struct tuple_format *format, uint32_t *field_map, !mp_type_is_compatible(type, field->type, is_nullable) != 0) { const char *path = - tuple_field_json_path(format, field); - assert(path != NULL); - diag_set(ClientError, ER_FIELD_TYPE, path, - field_type_strs[field->type]); + tuple_field_json_path(format, field, + region); + if (path != NULL) { + diag_set(ClientError, ER_FIELD_TYPE, + path, + field_type_strs[field->type]); + } return -1; } if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) { ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-12-27 11:57 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-17 6:52 [PATCH v6 0/8] box: Indexes by JSON path Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 1/8] box: refactor tuple_validate_raw Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 2/8] box: refactor ER_{FIELD_TYPE, ACTION_MISMATCH} Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 3/8] box: build path to field string uniformly on error Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 4/8] box: introduce JSON Indexes Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 5/8] box: introduce has_json_paths flag in templates Kirill Shcherbatov 2018-12-27 11:51 ` [tarantool-patches] " Konstantin Osipov 2018-12-27 11:52 ` Konstantin Osipov 2018-12-27 11:57 ` [tarantool-patches] " Konstantin Osipov 2018-12-17 6:52 ` [PATCH v6 6/8] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 7/8] box: introduce offset_slot cache in key_part Kirill Shcherbatov 2018-12-17 6:52 ` [PATCH v6 8/8] box: specify indexes in user-friendly form Kirill Shcherbatov 2018-12-18 20:58 ` Vladimir Davydov 2018-12-18 20:46 ` [PATCH v6 0/8] box: Indexes by JSON path Vladimir Davydov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox