* [PATCH v7 0/5] box: Indexes by JSON path
@ 2019-01-09 8:29 Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 1/5] box: introduce JSON Indexes Kirill Shcherbatov
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-01-09 8:29 UTC (permalink / raw)
To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
Changes in version 7:
- rebased to actual 2.1 containing partialy-reworked part of v6
- introduced a JSON path length limit to guarantee it fits in
static buffer
- reworked tuple_format1_can_store_format2_tuples to use trivial
routines tuple_field_path and json_tree_lookup_path_entry to
lookup record(it is not perf-critical place)
- reworked similar code in tuple_format_create that does stmt
size estimation and vy_stmt_new_surrogate_from_key tuple built
as a new tuple_format_stmt_encode routine.
v6:
https://www.freelists.org/post/tarantool-patches/PATCH-v6-88-box-specify-indexes-in-userfriendly-form,1
v5:
https://www.freelists.org/post/tarantool-patches/PATCH-v5-99-box-specify-indexes-in-userfriendly-form,1
http://github.com/tarantool/tarantool/tree/kshch/gh-1012-json-indexes
https://github.com/tarantool/tarantool/issues/1012
Kirill Shcherbatov (5):
box: introduce JSON Indexes
box: introduce has_json_paths flag in templates
box: tune tuple_field_raw_by_path for indexed data
box: introduce offset_slot cache in key_part
box: specify indexes in user-friendly form
src/box/alter.cc | 2 +-
src/box/index_def.c | 10 +-
src/box/key_def.c | 172 ++++++++++--
src/box/key_def.h | 44 +++-
src/box/lua/index.c | 64 +++++
src/box/lua/schema.lua | 22 +-
src/box/lua/space.cc | 5 +
src/box/memtx_engine.c | 4 +
src/box/schema_def.h | 1 +
src/box/sql.c | 1 +
src/box/sql/build.c | 1 +
src/box/sql/select.c | 3 +-
src/box/sql/where.c | 1 +
src/box/tuple.h | 19 --
src/box/tuple_compare.cc | 119 ++++++---
src/box/tuple_extract_key.cc | 126 ++++++---
src/box/tuple_format.c | 494 +++++++++++++++++++++++++++++------
src/box/tuple_format.h | 100 +++++--
src/box/tuple_hash.cc | 47 +++-
src/box/vinyl.c | 4 +
src/box/vy_log.c | 11 +-
src/box/vy_point_lookup.c | 2 -
src/box/vy_stmt.c | 49 ++--
src/lib/json/json.c | 7 +-
src/lib/json/json.h | 16 ++
test/engine/json.result | 481 ++++++++++++++++++++++++++++++++++
test/engine/json.test.lua | 139 ++++++++++
27 files changed, 1665 insertions(+), 279 deletions(-)
create mode 100644 test/engine/json.result
create mode 100644 test/engine/json.test.lua
--
2.19.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v7 1/5] box: introduce JSON Indexes
2019-01-09 8:29 [PATCH v7 0/5] box: Indexes by JSON path Kirill Shcherbatov
@ 2019-01-09 8:29 ` Kirill Shcherbatov
2019-01-10 10:16 ` Vladimir Davydov
2019-01-09 8:29 ` [PATCH v7 2/5] box: introduce has_json_paths flag in templates Kirill Shcherbatov
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-01-09 8:29 UTC (permalink / raw)
To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov
New JSON indexes allows to index documents content.
At first, introduced new key_part fields path and path_len
representing JSON path string specified by user. Modified
tuple_format_use_key_part routine constructs corresponding
tuple_fields chain in tuple_format:fields tree to indexed data.
The resulting tree is used for type checking and for alloctating
indexed fields offset slots.
Refined tuple_init_field_map routine logic parses tuple msgpack
in depth using stack allocated on region and initialize field
map with corresponding tuple_format:field if any. This stack is
necessary as mp-container(map or array) length is specified at
the frame beginning, but this information is also required to
determine mp-container end.
The other essential feature is vinyl's secondary key restored by
key_part (stmt) extracted keys loaded from disc.
New tuple_format_stmt_encode would traverse tuple_format:fields
tree and construct vy_stmt data using iov's array to place data
blobs for indexed leafs.
Introduced vy_stmt_meta_size - precalculated stmt size as if all
leaf fields are zero. It allows allocate stmt chunk without extra
traversing a tree.
Example:
To create a new JSON index specify path to document data as a
part of key_part:
parts = {{3, 'str', path = '.FIO.fname', is_nullable = false}}
idx = s:create_index('json_idx', {parts = parse})
idx:select("Ivanov")
Part of #1012
---
src/box/alter.cc | 2 +-
src/box/index_def.c | 10 +-
src/box/key_def.c | 166 +++++++++++--
src/box/key_def.h | 33 ++-
src/box/lua/space.cc | 5 +
src/box/memtx_engine.c | 4 +
src/box/schema_def.h | 1 +
src/box/sql.c | 1 +
src/box/sql/build.c | 1 +
src/box/sql/select.c | 3 +-
src/box/sql/where.c | 1 +
src/box/tuple_compare.cc | 7 +-
| 26 +-
src/box/tuple_format.c | 463 +++++++++++++++++++++++++++++------
src/box/tuple_format.h | 70 +++++-
src/box/tuple_hash.cc | 2 +-
src/box/vinyl.c | 4 +
src/box/vy_log.c | 11 +-
src/box/vy_point_lookup.c | 2 -
src/box/vy_stmt.c | 49 ++--
src/lib/json/json.c | 7 +-
src/lib/json/json.h | 16 ++
test/engine/json.result | 448 +++++++++++++++++++++++++++++++++
test/engine/json.test.lua | 129 ++++++++++
24 files changed, 1321 insertions(+), 140 deletions(-)
create mode 100644 test/engine/json.result
create mode 100644 test/engine/json.test.lua
diff --git a/src/box/alter.cc b/src/box/alter.cc
index 0589c9678..9656a4189 100644
--- a/src/box/alter.cc
+++ b/src/box/alter.cc
@@ -268,7 +268,7 @@ index_def_new_from_tuple(struct tuple *tuple, struct space *space)
});
if (key_def_decode_parts(part_def, part_count, &parts,
space->def->fields,
- space->def->field_count) != 0)
+ space->def->field_count, &fiber()->gc) != 0)
diag_raise();
key_def = key_def_new(part_def, part_count);
if (key_def == NULL)
diff --git a/src/box/index_def.c b/src/box/index_def.c
index 2ba57ee9d..58137ed07 100644
--- a/src/box/index_def.c
+++ b/src/box/index_def.c
@@ -31,6 +31,8 @@
#include "index_def.h"
#include "schema_def.h"
#include "identifier.h"
+#include "tuple_format.h"
+#include "json/json.h"
const char *index_type_strs[] = { "HASH", "TREE", "BITSET", "RTREE" };
@@ -278,8 +280,12 @@ index_def_is_valid(struct index_def *index_def, const char *space_name)
* Courtesy to a user who could have made
* a typo.
*/
- if (index_def->key_def->parts[i].fieldno ==
- index_def->key_def->parts[j].fieldno) {
+ struct key_part *part_a = &index_def->key_def->parts[i];
+ struct key_part *part_b = &index_def->key_def->parts[j];
+ if (part_a->fieldno == part_b->fieldno &&
+ json_path_cmp(part_a->path, part_a->path_len,
+ part_b->path, part_b->path_len,
+ TUPLE_INDEX_BASE) == 0){
diag_set(ClientError, ER_MODIFY_INDEX,
index_def->name, space_name,
"same key part is indexed twice");
diff --git a/src/box/key_def.c b/src/box/key_def.c
index dae3580e2..3012b05df 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -28,6 +28,7 @@
* THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
+#include "json/json.h"
#include "key_def.h"
#include "tuple_compare.h"
#include "tuple_extract_key.h"
@@ -35,6 +36,7 @@
#include "column_mask.h"
#include "schema_def.h"
#include "coll_id_cache.h"
+#include "small/region.h"
const char *sort_order_strs[] = { "asc", "desc", "undef" };
@@ -44,7 +46,8 @@ const struct key_part_def key_part_def_default = {
COLL_NONE,
false,
ON_CONFLICT_ACTION_DEFAULT,
- SORT_ORDER_ASC
+ SORT_ORDER_ASC,
+ NULL
};
static int64_t
@@ -59,6 +62,7 @@ part_type_by_name_wrapper(const char *str, uint32_t len)
#define PART_OPT_NULLABILITY "is_nullable"
#define PART_OPT_NULLABLE_ACTION "nullable_action"
#define PART_OPT_SORT_ORDER "sort_order"
+#define PART_OPT_PATH "path"
const struct opt_def part_def_reg[] = {
OPT_DEF_ENUM(PART_OPT_TYPE, field_type, struct key_part_def, type,
@@ -71,19 +75,30 @@ const struct opt_def part_def_reg[] = {
struct key_part_def, nullable_action, NULL),
OPT_DEF_ENUM(PART_OPT_SORT_ORDER, sort_order, struct key_part_def,
sort_order, NULL),
+ OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path),
OPT_END,
};
struct key_def *
key_def_dup(const struct key_def *src)
{
- size_t sz = key_def_sizeof(src->part_count);
- struct key_def *res = (struct key_def *)malloc(sz);
+ size_t sz = 0;
+ for (uint32_t i = 0; i < src->part_count; i++)
+ sz += src->parts[i].path_len;
+ sz = key_def_sizeof(src->part_count, sz);
+ struct key_def *res = (struct key_def *)calloc(1, sz);
if (res == NULL) {
diag_set(OutOfMemory, sz, "malloc", "res");
return NULL;
}
memcpy(res, src, sz);
+ /* Update paths to point to the new memory chunk.*/
+ for (uint32_t i = 0; i < src->part_count; i++) {
+ if (src->parts[i].path == NULL)
+ continue;
+ size_t path_offset = src->parts[i].path - (char *)src;
+ res->parts[i].path = (char *)res + path_offset;
+ }
return res;
}
@@ -91,8 +106,16 @@ void
key_def_swap(struct key_def *old_def, struct key_def *new_def)
{
assert(old_def->part_count == new_def->part_count);
- for (uint32_t i = 0; i < new_def->part_count; i++)
+ for (uint32_t i = 0; i < new_def->part_count; i++) {
SWAP(old_def->parts[i], new_def->parts[i]);
+ /*
+ * Paths are allocated as a part of key_def so
+ * we need to swap path pointers back - it's OK
+ * as paths aren't supposed to change.
+ */
+ assert(old_def->parts[i].path_len == new_def->parts[i].path_len);
+ SWAP(old_def->parts[i].path, new_def->parts[i].path);
+ }
SWAP(*old_def, *new_def);
}
@@ -115,24 +138,39 @@ static void
key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
enum field_type type, enum on_conflict_action nullable_action,
struct coll *coll, uint32_t coll_id,
- enum sort_order sort_order)
+ enum sort_order sort_order, const char *path,
+ uint32_t path_len, char **paths)
{
assert(part_no < def->part_count);
assert(type < field_type_MAX);
def->is_nullable |= (nullable_action == ON_CONFLICT_ACTION_NONE);
+ def->has_json_paths |= path != NULL;
def->parts[part_no].nullable_action = nullable_action;
def->parts[part_no].fieldno = fieldno;
def->parts[part_no].type = type;
def->parts[part_no].coll = coll;
def->parts[part_no].coll_id = coll_id;
def->parts[part_no].sort_order = sort_order;
+ if (path != NULL) {
+ assert(paths != NULL);
+ def->parts[part_no].path = *paths;
+ *paths += path_len;
+ memcpy(def->parts[part_no].path, path, path_len);
+ def->parts[part_no].path_len = path_len;
+ } else {
+ def->parts[part_no].path = NULL;
+ def->parts[part_no].path_len = 0;
+ }
column_mask_set_fieldno(&def->column_mask, fieldno);
}
struct key_def *
key_def_new(const struct key_part_def *parts, uint32_t part_count)
{
- size_t sz = key_def_sizeof(part_count);
+ ssize_t sz = 0;
+ for (uint32_t i = 0; i < part_count; i++)
+ sz += parts[i].path != NULL ? strlen(parts[i].path) : 0;
+ sz = key_def_sizeof(part_count, sz);
struct key_def *def = calloc(1, sz);
if (def == NULL) {
diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -142,6 +180,8 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
def->part_count = part_count;
def->unique_part_count = part_count;
+ /* Paths data in key_def chunk. */
+ char *paths = (char *)def + key_def_sizeof(part_count, 0);
for (uint32_t i = 0; i < part_count; i++) {
const struct key_part_def *part = &parts[i];
struct coll *coll = NULL;
@@ -155,16 +195,18 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
}
coll = coll_id->coll;
}
+ uint32_t path_len = part->path != NULL ? strlen(part->path) : 0;
key_def_set_part(def, i, part->fieldno, part->type,
part->nullable_action, coll, part->coll_id,
- part->sort_order);
+ part->sort_order, part->path, path_len, &paths);
}
key_def_set_cmp(def);
return def;
}
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
+int
+key_def_dump_parts(const struct key_def *def, struct key_part_def *parts,
+ struct region *region)
{
for (uint32_t i = 0; i < def->part_count; i++) {
const struct key_part *part = &def->parts[i];
@@ -174,13 +216,27 @@ key_def_dump_parts(const struct key_def *def, struct key_part_def *parts)
part_def->is_nullable = key_part_is_nullable(part);
part_def->nullable_action = part->nullable_action;
part_def->coll_id = part->coll_id;
+ if (part->path != NULL) {
+ char *path = region_alloc(region, part->path_len + 1);
+ if (path == NULL) {
+ diag_set(OutOfMemory, part->path_len + 1,
+ "region_alloc", "part_def->path");
+ return -1;
+ }
+ memcpy(path, part->path, part->path_len);
+ path[part->path_len] = '\0';
+ part_def->path = path;
+ } else {
+ part_def->path = NULL;
+ }
}
+ return 0;
}
box_key_def_t *
box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
{
- size_t sz = key_def_sizeof(part_count);
+ size_t sz = key_def_sizeof(part_count, 0);
struct key_def *key_def = calloc(1, sz);
if (key_def == NULL) {
diag_set(OutOfMemory, sz, "malloc", "struct key_def");
@@ -194,7 +250,8 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
key_def_set_part(key_def, item, fields[item],
(enum field_type)types[item],
ON_CONFLICT_ACTION_DEFAULT,
- NULL, COLL_NONE, SORT_ORDER_ASC);
+ NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0,
+ NULL);
}
key_def_set_cmp(key_def);
return key_def;
@@ -243,6 +300,11 @@ key_part_cmp(const struct key_part *parts1, uint32_t part_count1,
if (key_part_is_nullable(part1) != key_part_is_nullable(part2))
return key_part_is_nullable(part1) <
key_part_is_nullable(part2) ? -1 : 1;
+ int rc = json_path_cmp(part1->path, part1->path_len,
+ part2->path, part2->path_len,
+ TUPLE_INDEX_BASE);
+ if (rc != 0)
+ return rc;
}
return part_count1 < part_count2 ? -1 : part_count1 > part_count2;
}
@@ -274,8 +336,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts,
for (uint32_t i = 0; i < part_count; i++) {
const struct key_part_def *part = &parts[i];
assert(part->type < field_type_MAX);
- SNPRINT(total, snprintf, buf, size, "%d, '%s'",
- (int)part->fieldno, field_type_strs[part->type]);
+ if (part->path != NULL) {
+ SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'",
+ (int)part->fieldno, field_type_strs[part->type],
+ part->path);
+ } else {
+ SNPRINT(total, snprintf, buf, size, "%d, '%s'",
+ (int)part->fieldno,
+ field_type_strs[part->type]);
+ }
if (i < part_count - 1)
SNPRINT(total, snprintf, buf, size, ", ");
}
@@ -294,6 +363,8 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
count++;
if (part->is_nullable)
count++;
+ if (part->path != NULL)
+ count++;
size += mp_sizeof_map(count);
size += mp_sizeof_str(strlen(PART_OPT_FIELD));
size += mp_sizeof_uint(part->fieldno);
@@ -308,6 +379,10 @@ key_def_sizeof_parts(const struct key_part_def *parts, uint32_t part_count)
size += mp_sizeof_str(strlen(PART_OPT_NULLABILITY));
size += mp_sizeof_bool(part->is_nullable);
}
+ if (part->path != NULL) {
+ size += mp_sizeof_str(strlen(PART_OPT_PATH));
+ size += mp_sizeof_str(strlen(part->path));
+ }
}
return size;
}
@@ -323,6 +398,8 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
count++;
if (part->is_nullable)
count++;
+ if (part->path != NULL)
+ count++;
data = mp_encode_map(data, count);
data = mp_encode_str(data, PART_OPT_FIELD,
strlen(PART_OPT_FIELD));
@@ -342,6 +419,12 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
strlen(PART_OPT_NULLABILITY));
data = mp_encode_bool(data, part->is_nullable);
}
+ if (part->path != NULL) {
+ data = mp_encode_str(data, PART_OPT_PATH,
+ strlen(PART_OPT_PATH));
+ data = mp_encode_str(data, part->path,
+ strlen(part->path));
+ }
}
return data;
}
@@ -403,6 +486,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
fields[part->fieldno].is_nullable :
key_part_def_default.is_nullable);
part->coll_id = COLL_NONE;
+ part->path = NULL;
}
return 0;
}
@@ -410,7 +494,7 @@ key_def_decode_parts_166(struct key_part_def *parts, uint32_t part_count,
int
key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
const char **data, const struct field_def *fields,
- uint32_t field_count)
+ uint32_t field_count, struct region *region)
{
if (mp_typeof(**data) == MP_ARRAY) {
return key_def_decode_parts_166(parts, part_count, data,
@@ -439,7 +523,7 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
const char *key = mp_decode_str(data, &key_len);
if (opts_parse_key(part, part_def_reg, key, key_len, data,
ER_WRONG_INDEX_OPTIONS,
- i + TUPLE_INDEX_BASE, NULL,
+ i + TUPLE_INDEX_BASE, region,
false) != 0)
return -1;
if (is_action_missing &&
@@ -485,6 +569,27 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
"index part: unknown sort order");
return -1;
}
+ if (part->path != NULL) {
+ uint32_t path_len = strlen(part->path);
+ if (path_len > BOX_JSON_PATH_MAX) {
+ diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+ "JSON path is too long");
+ return -1;
+ }
+ int rc = json_path_validate(part->path, path_len,
+ TUPLE_INDEX_BASE);
+ if (rc != 0) {
+ const char *err_msg =
+ tt_sprintf("invalid JSON path '%s': "
+ "error in path on "
+ "position %d", part->path,
+ rc);
+ diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
+ part->fieldno + TUPLE_INDEX_BASE,
+ err_msg);
+ return -1;
+ }
+ }
}
return 0;
}
@@ -504,7 +609,10 @@ key_def_find(const struct key_def *key_def, const struct key_part *to_find)
const struct key_part *part = key_def->parts;
const struct key_part *end = part + key_def->part_count;
for (; part != end; part++) {
- if (part->fieldno == to_find->fieldno)
+ if (part->fieldno == to_find->fieldno &&
+ json_path_cmp(part->path, part->path_len,
+ to_find->path, to_find->path_len,
+ TUPLE_INDEX_BASE) == 0)
return part;
}
return NULL;
@@ -530,18 +638,25 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
* Find and remove part duplicates, i.e. parts counted
* twice since they are present in both key defs.
*/
- const struct key_part *part = second->parts;
- const struct key_part *end = part + second->part_count;
+ size_t sz = 0;
+ const struct key_part *part = first->parts;
+ const struct key_part *end = part + first->part_count;
+ for (; part != end; part++)
+ sz += part->path_len;
+ part = second->parts;
+ end = part + second->part_count;
for (; part != end; part++) {
if (key_def_find(first, part) != NULL)
--new_part_count;
+ else
+ sz += part->path_len;
}
+ sz = key_def_sizeof(new_part_count, sz);
struct key_def *new_def;
- new_def = (struct key_def *)calloc(1, key_def_sizeof(new_part_count));
+ new_def = (struct key_def *)calloc(1, sz);
if (new_def == NULL) {
- diag_set(OutOfMemory, key_def_sizeof(new_part_count), "malloc",
- "new_def");
+ diag_set(OutOfMemory, sz, "malloc", "new_def");
return NULL;
}
new_def->part_count = new_part_count;
@@ -549,6 +664,9 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
new_def->is_nullable = first->is_nullable || second->is_nullable;
new_def->has_optional_parts = first->has_optional_parts ||
second->has_optional_parts;
+
+ /* Paths data in the new key_def chunk. */
+ char *paths = (char *)new_def + key_def_sizeof(new_part_count, 0);
/* Write position in the new key def. */
uint32_t pos = 0;
/* Append first key def's parts to the new index_def. */
@@ -557,7 +675,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
for (; part != end; part++) {
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
- part->coll_id, part->sort_order);
+ part->coll_id, part->sort_order, part->path,
+ part->path_len, &paths);
}
/* Set-append second key def's part to the new key def. */
@@ -568,7 +687,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
continue;
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
- part->coll_id, part->sort_order);
+ part->coll_id, part->sort_order, part->path,
+ part->path_len, &paths);
}
key_def_set_cmp(new_def);
return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index d1866303b..c6b7a8c74 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -64,6 +64,11 @@ struct key_part_def {
enum on_conflict_action nullable_action;
/** Part sort order. */
enum sort_order sort_order;
+ /**
+ * JSON path to indexed data, relative to the field number,
+ * or NULL if this key part indexes a top-level field.
+ */
+ const char *path;
};
extern const struct key_part_def key_part_def_default;
@@ -82,6 +87,15 @@ struct key_part {
enum on_conflict_action nullable_action;
/** Part sort order. */
enum sort_order sort_order;
+ /**
+ * JSON path to indexed data, relative to the field number,
+ * or NULL if this key part indexes a top-level field.
+ * This sting is not 0-terminated. Memory is allocated
+ * at the end of key_def chunk.
+ */
+ char *path;
+ /** The length of JSON path. */
+ uint32_t path_len;
};
struct key_def;
@@ -148,6 +162,8 @@ struct key_def {
uint32_t unique_part_count;
/** True, if at least one part can store NULL. */
bool is_nullable;
+ /** True, if some key part has JSON path. */
+ bool has_json_paths;
/**
* True, if some key parts can be absent in a tuple. These
* fields assumed to be MP_NIL.
@@ -241,9 +257,10 @@ box_tuple_compare_with_key(const box_tuple_t *tuple_a, const char *key_b,
/** \endcond public */
static inline size_t
-key_def_sizeof(uint32_t part_count)
+key_def_sizeof(uint32_t part_count, uint32_t paths_size)
{
- return sizeof(struct key_def) + sizeof(struct key_part) * part_count;
+ return sizeof(struct key_def) + sizeof(struct key_part) * part_count +
+ paths_size;
}
/**
@@ -255,9 +272,13 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count);
/**
* Dump part definitions of the given key def.
+ * Region is required to make allocations for JSON paths when some
+ * path present. JSON path strings are 0-terminated.
+ * Return -1 on memory allocation error, 0 on success.
*/
-void
-key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
+int
+key_def_dump_parts(const struct key_def *def, struct key_part_def *parts,
+ struct region *region);
/**
* Update 'has_optional_parts' of @a key_def with correspondence
@@ -303,7 +324,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
int
key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
const char **data, const struct field_def *fields,
- uint32_t field_count);
+ uint32_t field_count, struct region *region);
/**
* Returns the part in index_def->parts for the specified fieldno.
@@ -364,6 +385,8 @@ key_validate_parts(const struct key_def *key_def, const char *key,
static inline bool
key_def_is_sequential(const struct key_def *key_def)
{
+ if (key_def->has_json_paths)
+ return false;
for (uint32_t part_id = 0; part_id < key_def->part_count; part_id++) {
if (key_def->parts[part_id].fieldno != part_id)
return false;
diff --git a/src/box/lua/space.cc b/src/box/lua/space.cc
index 7cae436f1..1f152917e 100644
--- a/src/box/lua/space.cc
+++ b/src/box/lua/space.cc
@@ -296,6 +296,11 @@ lbox_fillspace(struct lua_State *L, struct space *space, int i)
lua_pushnumber(L, part->fieldno + TUPLE_INDEX_BASE);
lua_setfield(L, -2, "fieldno");
+ if (part->path != NULL) {
+ lua_pushlstring(L, part->path, part->path_len);
+ lua_setfield(L, -2, "path");
+ }
+
lua_pushboolean(L, key_part_is_nullable(part));
lua_setfield(L, -2, "is_nullable");
diff --git a/src/box/memtx_engine.c b/src/box/memtx_engine.c
index 5cf70ab94..2cae791e1 100644
--- a/src/box/memtx_engine.c
+++ b/src/box/memtx_engine.c
@@ -1317,6 +1317,10 @@ memtx_index_def_change_requires_rebuild(struct index *index,
return true;
if (old_part->coll != new_part->coll)
return true;
+ if (json_path_cmp(old_part->path, old_part->path_len,
+ new_part->path, new_part->path_len,
+ TUPLE_INDEX_BASE) != 0)
+ return true;
}
return false;
}
diff --git a/src/box/schema_def.h b/src/box/schema_def.h
index a760ecc3f..b7a9d3284 100644
--- a/src/box/schema_def.h
+++ b/src/box/schema_def.h
@@ -44,6 +44,7 @@ enum {
BOX_INDEX_MAX = 128,
BOX_NAME_MAX = 65000,
BOX_INVALID_NAME_MAX = 64,
+ BOX_JSON_PATH_MAX = 512,
ENGINE_NAME_MAX = 16,
FIELD_TYPE_NAME_MAX = 16,
GRANT_NAME_MAX = 16,
diff --git a/src/box/sql.c b/src/box/sql.c
index 8c7607d84..c54a0c0ce 100644
--- a/src/box/sql.c
+++ b/src/box/sql.c
@@ -380,6 +380,7 @@ sql_ephemeral_space_create(uint32_t field_count, struct sql_key_info *key_info)
part->nullable_action = ON_CONFLICT_ACTION_NONE;
part->is_nullable = true;
part->sort_order = SORT_ORDER_ASC;
+ part->path = NULL;
if (def != NULL && i < def->part_count)
part->coll_id = def->parts[i].coll_id;
else
diff --git a/src/box/sql/build.c b/src/box/sql/build.c
index 49b90b5d0..947daf8f6 100644
--- a/src/box/sql/build.c
+++ b/src/box/sql/build.c
@@ -2185,6 +2185,7 @@ index_fill_def(struct Parse *parse, struct index *index,
part->is_nullable = part->nullable_action == ON_CONFLICT_ACTION_NONE;
part->sort_order = SORT_ORDER_ASC;
part->coll_id = coll_id;
+ part->path = NULL;
}
key_def = key_def_new(key_parts, expr_list->nExpr);
if (key_def == NULL)
diff --git a/src/box/sql/select.c b/src/box/sql/select.c
index 02ee225f1..3f136a342 100644
--- a/src/box/sql/select.c
+++ b/src/box/sql/select.c
@@ -1360,6 +1360,7 @@ sql_key_info_new(sqlite3 *db, uint32_t part_count)
part->is_nullable = false;
part->nullable_action = ON_CONFLICT_ACTION_ABORT;
part->sort_order = SORT_ORDER_ASC;
+ part->path = NULL;
}
return key_info;
}
@@ -1377,7 +1378,7 @@ sql_key_info_new_from_key_def(sqlite3 *db, const struct key_def *key_def)
key_info->key_def = NULL;
key_info->refs = 1;
key_info->part_count = key_def->part_count;
- key_def_dump_parts(key_def, key_info->parts);
+ key_def_dump_parts(key_def, key_info->parts, NULL);
return key_info;
}
diff --git a/src/box/sql/where.c b/src/box/sql/where.c
index 571b5af78..814bd3926 100644
--- a/src/box/sql/where.c
+++ b/src/box/sql/where.c
@@ -2807,6 +2807,7 @@ whereLoopAddBtree(WhereLoopBuilder * pBuilder, /* WHERE clause information */
part.is_nullable = false;
part.sort_order = SORT_ORDER_ASC;
part.coll_id = COLL_NONE;
+ part.path = NULL;
struct key_def *key_def = key_def_new(&part, 1);
if (key_def == NULL) {
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 3fe4cae32..7ab6e3bf6 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -469,7 +469,8 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
struct key_part *part = key_def->parts;
const char *tuple_a_raw = tuple_data(tuple_a);
const char *tuple_b_raw = tuple_data(tuple_b);
- if (key_def->part_count == 1 && part->fieldno == 0) {
+ if (key_def->part_count == 1 && part->fieldno == 0 &&
+ part->path == NULL) {
/*
* First field can not be optional - empty tuples
* can not exist.
@@ -1027,7 +1028,7 @@ tuple_compare_create(const struct key_def *def)
}
}
assert(! def->has_optional_parts);
- if (!key_def_has_collation(def)) {
+ if (!key_def_has_collation(def) && !def->has_json_paths) {
/* Precalculated comparators don't use collation */
for (uint32_t k = 0;
k < sizeof(cmp_arr) / sizeof(cmp_arr[0]); k++) {
@@ -1247,7 +1248,7 @@ tuple_compare_with_key_create(const struct key_def *def)
}
}
assert(! def->has_optional_parts);
- if (!key_def_has_collation(def)) {
+ if (!key_def_has_collation(def) && !def->has_json_paths) {
/* Precalculated comparators don't use collation */
for (uint32_t k = 0;
k < sizeof(cmp_wk_arr) / sizeof(cmp_wk_arr[0]);
--git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index ac8b5a44e..c40d7887d 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -10,7 +10,8 @@ key_def_parts_are_sequential(const struct key_def *def, int i)
{
uint32_t fieldno1 = def->parts[i].fieldno + 1;
uint32_t fieldno2 = def->parts[i + 1].fieldno;
- return fieldno1 == fieldno2;
+ return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+ def->parts[i + 1].path == NULL;
}
/** True, if a key con contain two or more parts in sequence. */
@@ -241,7 +242,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
if (!key_def_parts_are_sequential(key_def, i))
break;
}
- uint32_t end_fieldno = key_def->parts[i].fieldno;
+ const struct key_part *part = &key_def->parts[i];
+ uint32_t end_fieldno = part->fieldno;
if (fieldno < current_fieldno) {
/* Rewind. */
@@ -283,6 +285,22 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
current_fieldno++;
}
}
+ const char *field_last, *field_end_last;
+ if (part->path != NULL) {
+ field_last = field;
+ field_end_last = field_end;
+ MAYBE_UNUSED int rc =
+ tuple_field_go_to_path(&field, part->path,
+ part->path_len);
+ /*
+ * All tuples must be valid as all
+ * integrity checks has already been
+ * passed.
+ */
+ assert(rc == 0);
+ field_end = field;
+ mp_next(&field_end);
+ }
memcpy(key_buf, field, field_end - field);
key_buf += field_end - field;
if (has_optional_parts && null_count != 0) {
@@ -291,6 +309,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
} else {
assert(key_buf - key <= data_end - data);
}
+ if (part->path != NULL) {
+ field = field_last;
+ field_end = field_end_last;
+ }
}
if (key_size != NULL)
*key_size = (uint32_t)(key_buf - key);
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index e11b4e6f3..c81c23fd1 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -28,6 +28,7 @@
* THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
+#include <sys/uio.h>
#include "bit/bit.h"
#include "fiber.h"
#include "json/json.h"
@@ -66,12 +67,88 @@ tuple_field_delete(struct tuple_field *field)
/** Return path to a tuple field. Used for error reporting. */
static const char *
-tuple_field_path(const struct tuple_field *field)
+tuple_field_path(const struct tuple_field *field, bool json_only)
{
assert(field->token.parent != NULL);
- assert(field->token.parent->parent == NULL);
- assert(field->token.type == JSON_TOKEN_NUM);
- return int2str(field->token.num + TUPLE_INDEX_BASE);
+ char *path;
+ if (!json_only && field->token.parent->type == JSON_TOKEN_END) {
+ assert(field->token.type == JSON_TOKEN_NUM);
+ path = int2str(field->token.num + TUPLE_INDEX_BASE);
+ } else {
+ path = tt_static_buf();
+ MAYBE_UNUSED int rc =
+ json_tree_snprint_path(path, TT_STATIC_BUF_LEN,
+ &field->token, TUPLE_INDEX_BASE);
+ assert(rc > 0 && rc < TT_STATIC_BUF_LEN);
+ }
+ return path;
+}
+
+/**
+ * Add corresponding format:fields for specified JSON path.
+ * Return a pointer to the leaf field on success, NULL on memory
+ * allocation error or type/nullability mistmatch error, diag
+ * message is set.
+ */
+static struct tuple_field *
+tuple_field_tree_add_path(struct tuple_format *format, const char *path,
+ uint32_t path_len, uint32_t fieldno)
+{
+ int rc = 0;
+ struct json_tree *tree = &format->fields;
+ struct tuple_field *parent = tuple_format_field(format, fieldno);
+ struct tuple_field *field = tuple_field_new();
+ if (field == NULL)
+ goto fail;
+
+ struct json_lexer lexer;
+ uint32_t token_count = 0;
+ json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE);
+ while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 &&
+ field->token.type != JSON_TOKEN_END) {
+ enum field_type expected_type =
+ field->token.type == JSON_TOKEN_STR ?
+ FIELD_TYPE_MAP : FIELD_TYPE_ARRAY;
+ if (field_type1_contains_type2(parent->type, expected_type)) {
+ parent->type = expected_type;
+ } else if (!field_type1_contains_type2(expected_type,
+ parent->type)) {
+ diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH,
+ tuple_field_path(parent, false),
+ field_type_strs[parent->type],
+ field_type_strs[expected_type]);
+ goto fail;
+ }
+ struct tuple_field *next =
+ json_tree_lookup_entry(tree, &parent->token,
+ &field->token,
+ struct tuple_field, token);
+ if (next == NULL) {
+ rc = json_tree_add(tree, &parent->token, &field->token);
+ if (rc != 0) {
+ diag_set(OutOfMemory, sizeof(struct json_token),
+ "json_tree_add", "tree");
+ goto fail;
+ }
+ next = field;
+ field = tuple_field_new();
+ if (field == NULL)
+ goto fail;
+ }
+ parent = next;
+ token_count++;
+ }
+ /* Path has been verified key_def_decode_parts. */
+ assert(rc == 0 && field->token.type == JSON_TOKEN_END);
+ assert(parent != NULL);
+ /* Update tree depth information. */
+ format->max_path_tokens = MAX(format->max_path_tokens, token_count + 1);
+end:
+ tuple_field_delete(field);
+ return parent;
+fail:
+ parent = NULL;
+ goto end;
}
/**
@@ -95,10 +172,25 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id)
static int
tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count,
const struct key_part *part, bool is_sequential,
- int *current_slot)
+ int *current_slot, char **paths)
{
assert(part->fieldno < tuple_format_field_count(format));
- struct tuple_field *field = tuple_format_field(format, part->fieldno);
+ struct tuple_field *field;
+ if (part->path == NULL) {
+ field = tuple_format_field(format, part->fieldno);
+ } else {
+ assert(!is_sequential);
+ /**
+ * Copy JSON path data to reserved area at the
+ * end of format allocation.
+ */
+ memcpy(*paths, part->path, part->path_len);
+ field = tuple_field_tree_add_path(format, *paths, part->path_len,
+ part->fieldno);
+ if (field == NULL)
+ return -1;
+ *paths += part->path_len;
+ }
/*
* If a field is not present in the space format,
* inherit nullable action of the first key part
@@ -124,7 +216,7 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count,
field->nullable_action = part->nullable_action;
} else if (field->nullable_action != part->nullable_action) {
diag_set(ClientError, ER_ACTION_MISMATCH,
- tuple_field_path(field),
+ tuple_field_path(field, false),
on_conflict_action_strs[field->nullable_action],
on_conflict_action_strs[part->nullable_action]);
return -1;
@@ -146,7 +238,7 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count,
errcode = ER_FORMAT_MISMATCH_INDEX_PART;
else
errcode = ER_INDEX_PART_TYPE_MISMATCH;
- diag_set(ClientError, errcode, tuple_field_path(field),
+ diag_set(ClientError, errcode, tuple_field_path(field, false),
field_type_strs[field->type],
field_type_strs[part->type]);
return -1;
@@ -158,13 +250,93 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count,
* simply accessible, so we don't store an offset for it.
*/
if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
- is_sequential == false && part->fieldno > 0) {
+ is_sequential == false &&
+ (part->fieldno > 0 || part->path != NULL)) {
*current_slot = *current_slot - 1;
field->offset_slot = *current_slot;
}
return 0;
}
+/**
+ * Get format:field parent field_type.
+ * This routine is required as first-level fields has no parent
+ * field so it could not be retrieved with json_tree_entry.
+ */
+static enum field_type
+tuple_format_field_parent_type(struct tuple_format *format,
+ struct tuple_field *field)
+{
+ struct json_token *parent = field->token.parent;
+ if (parent == &format->fields.root)
+ return FIELD_TYPE_ARRAY;
+ return json_tree_entry(parent, struct tuple_field, token)->type;
+}
+
+uint32_t
+tuple_format_stmt_encode(struct tuple_format *format, char **offset,
+ char *tuple_raw, uint32_t *field_map,
+ struct iovec *iov)
+{
+ bool write = offset != NULL;
+ uint32_t size = 0;
+ struct tuple_field *field;
+ json_tree_foreach_entry_preorder(field, &format->fields.root,
+ struct tuple_field, token) {
+ enum field_type parent_type =
+ tuple_format_field_parent_type(format, field);
+ if (parent_type == FIELD_TYPE_ARRAY &&
+ field->token.sibling_idx > 0) {
+ /*
+ * Write nil istead of omitted array
+ * members.
+ */
+ struct json_token **neighbors =
+ field->token.parent->children;
+ for (uint32_t i = field->token.sibling_idx - 1;
+ neighbors[i] == NULL && i > 0; i--) {
+ if (write)
+ *offset = mp_encode_nil(*offset);
+ size += mp_sizeof_nil();
+ }
+ } else if (parent_type == FIELD_TYPE_MAP) {
+ /* Write map key string. */
+ const char *str = field->token.str;
+ uint32_t len = field->token.len;
+ if (write)
+ *offset = mp_encode_str(*offset, str, len);
+ size += mp_sizeof_str(len);
+ }
+ /* Fill data. */
+ uint32_t children_cnt = field->token.max_child_idx + 1;
+ if (json_token_is_leaf(&field->token)) {
+ if (!write || iov[field->id].iov_len == 0) {
+ if (write)
+ *offset = mp_encode_nil(*offset);
+ size += mp_sizeof_nil();
+ } else {
+ memcpy(*offset, iov[field->id].iov_base,
+ iov[field->id].iov_len);
+ uint32_t data_offset = *offset - tuple_raw;
+ int32_t slot = field->offset_slot;
+ if (slot != TUPLE_OFFSET_SLOT_NIL)
+ field_map[slot] = data_offset;
+ *offset += iov[field->id].iov_len;
+ size += iov[field->id].iov_len;
+ }
+ } else if (field->type == FIELD_TYPE_ARRAY) {
+ if (write)
+ *offset = mp_encode_array(*offset, children_cnt);
+ size += mp_sizeof_array(children_cnt);
+ } else if (field->type == FIELD_TYPE_MAP) {
+ if (write)
+ *offset = mp_encode_map(*offset, children_cnt);
+ size += mp_sizeof_map(children_cnt);
+ }
+ }
+ return size;
+}
+
/**
* Extract all available type info from keys and field
* definitions.
@@ -203,6 +375,11 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
int current_slot = 0;
+ /*
+ * Set pointer to reserved area in the format chunk
+ * allocated with tuple_format_alloc call.
+ */
+ char *paths = (char *)format + sizeof(struct tuple_format);
/* extract field type info */
for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
const struct key_def *key_def = keys[key_no];
@@ -213,7 +390,8 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
for (; part < parts_end; part++) {
if (tuple_format_use_key_part(format, field_count, part,
is_sequential,
- ¤t_slot) != 0)
+ ¤t_slot,
+ &paths) != 0)
return -1;
}
}
@@ -236,9 +414,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
"malloc", "required field bitmap");
return -1;
}
+ uint32_t id = 0;
struct tuple_field *field;
json_tree_foreach_entry_preorder(field, &format->fields.root,
struct tuple_field, token) {
+ /* Set the unique field identifier. */
+ field->id = id++;
/*
* Mark all leaf non-nullable fields as required
* by setting the corresponding bit in the bitmap
@@ -248,6 +429,10 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
!tuple_field_is_nullable(field))
bit_set(format->required_fields, field->id);
}
+ /* Update format metadate for a new format:fields tree. */
+ format->total_field_count = id;
+ format->vy_stmt_size = tuple_format_stmt_encode(format, NULL, NULL,
+ NULL, NULL);
return 0;
}
@@ -317,6 +502,8 @@ static struct tuple_format *
tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
uint32_t space_field_count, struct tuple_dictionary *dict)
{
+ /* Size of area to store paths. */
+ uint32_t paths_size = 0;
uint32_t index_field_count = 0;
/* find max max field no */
for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
@@ -326,13 +513,15 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
for (; part < pend; part++) {
index_field_count = MAX(index_field_count,
part->fieldno + 1);
+ paths_size += part->path_len;
}
}
uint32_t field_count = MAX(space_field_count, index_field_count);
- struct tuple_format *format = malloc(sizeof(struct tuple_format));
+ uint32_t allocation_size = sizeof(struct tuple_format) + paths_size;
+ struct tuple_format *format = malloc(allocation_size);
if (format == NULL) {
- diag_set(OutOfMemory, sizeof(struct tuple_format), "malloc",
+ diag_set(OutOfMemory, allocation_size, "malloc",
"tuple format");
return NULL;
}
@@ -346,7 +535,6 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
struct tuple_field *field = tuple_field_new();
if (field == NULL)
goto error;
- field->id = fieldno;
field->token.num = fieldno;
field->token.type = JSON_TOKEN_NUM;
if (json_tree_add(&format->fields, &format->fields.root,
@@ -368,6 +556,8 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
}
format->total_field_count = field_count;
format->required_fields = NULL;
+ format->max_path_tokens = 1;
+ format->vy_stmt_size = UINT32_MAX;
format->refs = 0;
format->id = FORMAT_ID_NIL;
format->index_field_count = index_field_count;
@@ -428,15 +618,22 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
{
if (format1->exact_field_count != format2->exact_field_count)
return false;
- uint32_t format1_field_count = tuple_format_field_count(format1);
- uint32_t format2_field_count = tuple_format_field_count(format2);
- for (uint32_t i = 0; i < format1_field_count; ++i) {
- struct tuple_field *field1 = tuple_format_field(format1, i);
+ struct tuple_field *field1;
+ json_tree_foreach_entry_preorder(field1, &format1->fields.root,
+ struct tuple_field, token) {
+next:;
+ const char *path = tuple_field_path(field1, true);
+ struct tuple_field *field2 =
+ json_tree_lookup_path_entry(&format2->fields,
+ &format2->fields.root,
+ path, strlen(path),
+ TUPLE_INDEX_BASE,
+ struct tuple_field, token);
/*
* The field has a data type in format1, but has
* no data type in format2.
*/
- if (i >= format2_field_count) {
+ if (field2 == NULL) {
/*
* The field can get a name added
* for it, and this doesn't require a data
@@ -447,12 +644,22 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
* NULLs or miss the subject field.
*/
if (field1->type == FIELD_TYPE_ANY &&
- tuple_field_is_nullable(field1))
- continue;
- else
+ tuple_field_is_nullable(field1)) {
+ /* Skip subtree. */
+ struct json_token *token = &field1->token;
+ struct json_token *parent = token->parent;
+ field1 = json_tree_child_next_entry(parent,
+ token,
+ struct
+ tuple_field,
+ token);
+ if (field1 == NULL)
+ break;
+ goto next;
+ } else {
return false;
+ }
}
- struct tuple_field *field2 = tuple_format_field(format2, i);
if (! field_type1_contains_type2(field1->type, field2->type))
return false;
/*
@@ -466,6 +673,90 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
return true;
}
+/**
+ * Descriptor of the parsed msgpack frame.
+ * Due to the fact that the msgpack has nested structures whose
+ * length is stored in the frame header at the blob beginning, we
+ * need to be able to determine that we have finished parsing the
+ * current component and should move on to the next one.
+ * For this purpose a stack of disassembled levels is organized,
+ * where the type of the level, the total number of elements,
+ * and the number of elements that have already been parsed are
+ * stored.
+ */
+struct mp_frame {
+ /** JSON token type representing frame data structure. */
+ enum json_token_type child_type;
+ /** Total count of MP members to process. */
+ uint32_t total;
+ /** Count of MP elements that already have parseed. */
+ uint32_t curr;
+};
+
+/**
+ * Emit token to analyze and do msgpack pointer shift using top
+ * mp_stack frame. Return 0 on success, -1 when analyse step must
+ * be skipped (on usuported term detection).
+ */
+static int
+mp_frame_parse(struct mp_frame *mp_stack, uint32_t mp_stack_idx,
+ const char **pos, struct json_token *token)
+{
+ token->type = mp_stack[mp_stack_idx].child_type;
+ ++mp_stack[mp_stack_idx].curr;
+ if (token->type == JSON_TOKEN_NUM) {
+ token->num = mp_stack[mp_stack_idx].curr - TUPLE_INDEX_BASE;
+ } else if (token->type == JSON_TOKEN_STR) {
+ if (mp_typeof(**pos) != MP_STR) {
+ /* Skip key. */
+ mp_next(pos);
+ return -1;
+ }
+ token->str = mp_decode_str(pos, (uint32_t *)&token->len);
+ } else {
+ unreachable();
+ }
+ return 0;
+}
+
+/**
+ * Prepare mp_frame for futher iterations. Store container length
+ * and child_type. Update parent token pointer and shift msgpack
+ * pointer.
+ */
+static int
+mp_frame_prepare(struct mp_frame *mp_stack, uint32_t *mp_stack_idx,
+ uint32_t mp_stack_total, struct json_token *token,
+ const char **pos, struct json_token **parent)
+{
+ enum mp_type type = mp_typeof(**pos);
+ if (token != NULL && *mp_stack_idx + 1 < mp_stack_total &&
+ (type == MP_MAP || type == MP_ARRAY)) {
+ uint32_t size = type == MP_ARRAY ? mp_decode_array(pos) :
+ mp_decode_map(pos);
+ if (size == 0)
+ return 0;
+ *parent = token;
+ enum json_token_type child_type =
+ type == MP_ARRAY ? JSON_TOKEN_NUM : JSON_TOKEN_STR;
+ *mp_stack_idx = *mp_stack_idx + 1;
+ mp_stack[*mp_stack_idx].child_type = child_type;
+ mp_stack[*mp_stack_idx].total = size;
+ mp_stack[*mp_stack_idx].curr = 0;
+ } else {
+ mp_next(pos);
+ while (mp_stack[*mp_stack_idx].curr >=
+ mp_stack[*mp_stack_idx].total) {
+ assert(*parent != NULL);
+ *parent = (*parent)->parent;
+ if (*mp_stack_idx == 0)
+ return -1;
+ *mp_stack_idx = *mp_stack_idx - 1;
+ }
+ }
+ return 0;
+}
+
/** @sa declaration for details. */
int
tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
@@ -512,49 +803,64 @@ tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
/* Empty tuple, nothing to do. */
goto skip;
}
- /* first field is simply accessible, so we do not store offset to it */
- struct tuple_field *field = tuple_format_field(format, 0);
- if (validate &&
- !field_mp_type_is_compatible(field->type, mp_typeof(*pos),
- tuple_field_is_nullable(field))) {
- diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field),
- field_type_strs[field->type]);
- goto error;
- }
- if (required_fields != NULL)
- bit_clear(required_fields, field->id);
- mp_next(&pos);
- /* other fields...*/
- uint32_t i = 1;
uint32_t defined_field_count = MIN(field_count, validate ?
tuple_format_field_count(format) :
format->index_field_count);
- if (field_count < format->index_field_count) {
- /*
- * Nullify field map to be able to detect by 0,
- * which key fields are absent in tuple_field().
- */
- memset((char *)field_map - format->field_map_size, 0,
- format->field_map_size);
+ /*
+ * Nullify field map to be able to detect by 0,
+ * which key fields are absent in tuple_field().
+ */
+ memset((char *)field_map - format->field_map_size, 0,
+ format->field_map_size);
+ uint32_t mp_stack_size =
+ format->max_path_tokens * sizeof(struct mp_frame);
+ struct mp_frame *mp_stack = region_alloc(region, mp_stack_size);
+ if (mp_stack == NULL) {
+ diag_set(OutOfMemory, mp_stack_size, "region_alloc",
+ "mp_stack");
+ goto error;
}
- for (; i < defined_field_count; ++i) {
- field = tuple_format_field(format, i);
- if (validate &&
- !field_mp_type_is_compatible(field->type, mp_typeof(*pos),
- tuple_field_is_nullable(field))) {
- diag_set(ClientError, ER_FIELD_TYPE,
- tuple_field_path(field),
- field_type_strs[field->type]);
- goto error;
+ struct tuple_field *field;
+ mp_stack[0].child_type = JSON_TOKEN_NUM;
+ mp_stack[0].total = defined_field_count;
+ mp_stack[0].curr = 0;
+ uint32_t mp_stack_idx = 0;
+ struct json_tree *tree = (struct json_tree *)&format->fields;
+ struct json_token *parent = &tree->root;
+ while (mp_stack[0].curr <= mp_stack[0].total) {
+ struct json_token token;
+ if (mp_frame_parse(mp_stack, mp_stack_idx, &pos, &token) != 0) {
+ /* Unsupported token. */
+ goto finish_frame;
}
- if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
- field_map[field->offset_slot] =
- (uint32_t) (pos - tuple);
+ field = json_tree_lookup_entry(tree, parent, &token,
+ struct tuple_field, token);
+ if (field != NULL) {
+ bool is_nullable = tuple_field_is_nullable(field);
+ if (validate &&
+ !field_mp_type_is_compatible(field->type,
+ mp_typeof(*pos),
+ is_nullable) != 0) {
+ diag_set(ClientError, ER_FIELD_TYPE,
+ tuple_field_path(field, false),
+ field_type_strs[field->type]);
+ goto error;
+ }
+ if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
+ field_map[field->offset_slot] =
+ (uint32_t)(pos - tuple);
+ }
+ if (required_fields != NULL)
+ bit_clear(required_fields, field->id);
}
- if (required_fields != NULL)
- bit_clear(required_fields, field->id);
- mp_next(&pos);
- }
+finish_frame:
+ /* Prepare stack info for next iteration. */
+ if (mp_frame_prepare(mp_stack, &mp_stack_idx,
+ format->max_path_tokens,
+ field != NULL ? &field->token : NULL,
+ &pos, &parent) != 0)
+ break;
+ };
skip:
/*
* Check the required field bitmap for missing fields.
@@ -569,7 +875,7 @@ skip:
field = tuple_format_field_by_id(format, id);
assert(field != NULL);
diag_set(ClientError, ER_FIELD_MISSING,
- tuple_field_path(field));
+ tuple_field_path(field, false));
goto error;
}
}
@@ -713,15 +1019,7 @@ tuple_field_go_to_key(const char **field, const char *key, int len)
return -1;
}
-/**
- * Retrieve msgpack data by JSON path.
- * @param data Pointer to msgpack with data.
- * @param path The path to process.
- * @param path_len The length of the @path.
- * @retval 0 On success.
- * @retval >0 On path parsing error, invalid character position.
- */
-static int
+int
tuple_field_go_to_path(const char **data, const char *path, uint32_t path_len)
{
int rc;
@@ -820,3 +1118,30 @@ error:
tt_sprintf("error in path on position %d", rc));
return -1;
}
+
+int
+tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data,
+ const uint32_t *field_map,
+ struct key_part *part, const char **raw)
+{
+ assert(part->path != NULL);
+ struct tuple_field *field =
+ tuple_format_field_by_path(format, part->fieldno, part->path,
+ part->path_len);
+ if (field != NULL) {
+ int32_t offset_slot = field->offset_slot;
+ assert(-offset_slot * sizeof(uint32_t) <=
+ format->field_map_size);
+ *raw = field_map[offset_slot] == 0 ?
+ NULL : data + field_map[offset_slot];
+ return 0;
+ }
+ /*
+ * Format doesn't have field representing specified part.
+ * Make slow tuple parsing.
+ */
+ *raw = tuple_field_raw(format, data, field_map, part->fieldno);
+ if (*raw == NULL)
+ return 0;
+ return tuple_field_go_to_path(raw, part->path, part->path_len);
+}
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 30b93b610..3b630c3bb 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -65,6 +65,7 @@ enum { TUPLE_OFFSET_SLOT_NIL = INT32_MAX };
struct tuple;
struct tuple_format;
struct coll;
+struct iovec;
/** Engine-specific tuple format methods. */
struct tuple_format_vtab {
@@ -185,6 +186,15 @@ struct tuple_format {
* Shared names storage used by all formats of a space.
*/
struct tuple_dictionary *dict;
+ /**
+ * A maximum depth of format:fields subtree.
+ */
+ uint32_t max_path_tokens;
+ /**
+ * The size of the secondary key built for format:fields
+ * with all leaf records set to nil.
+ */
+ uint32_t vy_stmt_size;
/**
* Fields comprising the format, organized in a tree.
* First level nodes correspond to tuple fields.
@@ -221,6 +231,37 @@ tuple_format_field(struct tuple_format *format, uint32_t fieldno)
&token, struct tuple_field, token);
}
+/**
+ * Lookup field by relative JSON path and root field fieldno in
+ * format:fields tree.
+*/
+static inline struct tuple_field *
+tuple_format_field_by_path(struct tuple_format *format, uint32_t fieldno,
+ const char *path, uint32_t path_len)
+{
+ uint32_t field_count = tuple_format_field_count(format);
+ if (fieldno >= field_count)
+ return NULL;
+ struct tuple_field *root = tuple_format_field(format, fieldno);
+ assert(root != NULL);
+ return json_tree_lookup_path_entry(&format->fields, &root->token,
+ path, path_len, TUPLE_INDEX_BASE,
+ struct tuple_field, token);
+}
+
+/**
+ * Construct secondary-index tuple and initialize field_map.
+ * The iov[field->id] array item contains an extracted key
+ * for indexed field identified with unique field->id.
+ * Return the size of constructed tuple.
+ * In case of offset == NULL routine may be used for tuple size up
+ * limit estimation: all leaf records are assumed to be nil(s).
+ */
+uint32_t
+tuple_format_stmt_encode(struct tuple_format *format, char **offset,
+ char *tuple_raw, uint32_t *field_map,
+ struct iovec *iov);
+
extern struct tuple_format **tuple_formats;
static inline uint32_t
@@ -420,6 +461,18 @@ tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
return tuple_field_raw(format, tuple, field_map, fieldno);
}
+/**
+ * Retrieve msgpack data by JSON path.
+ * @param data Pointer to msgpack with data.
+ * @param path The path to process.
+ * @param path_len The length of the @path.
+ * @retval 0 On success.
+ * @retval >0 On path parsing error, invalid character position.
+ */
+int
+tuple_field_go_to_path(const char **data, const char *path,
+ uint32_t path_len);
+
/**
* Get tuple field by its path.
* @param format Tuple format.
@@ -439,6 +492,12 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
uint32_t path_len, uint32_t path_hash,
const char **field);
+/** Internal function, use tuple_field_by_part_raw instead. */
+int
+tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data,
+ const uint32_t *field_map,
+ struct key_part *part, const char **raw);
+
/**
* Get a tuple field pointed to by an index part.
* @param format Tuple format.
@@ -451,7 +510,16 @@ static inline const char *
tuple_field_by_part_raw(struct tuple_format *format, const char *data,
const uint32_t *field_map, struct key_part *part)
{
- return tuple_field_raw(format, data, field_map, part->fieldno);
+ if (likely(part->path == NULL)) {
+ return tuple_field_raw(format, data, field_map, part->fieldno);
+ } else {
+ const char *raw;
+ MAYBE_UNUSED int rc =
+ tuple_field_by_part_raw_slowpath(format, data,
+ field_map, part, &raw);
+ assert(rc == 0);
+ return raw;
+ }
}
#if defined(__cplusplus)
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index b394804fe..3486ce11c 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -222,7 +222,7 @@ key_hash_slowpath(const char *key, struct key_def *key_def);
void
tuple_hash_func_set(struct key_def *key_def) {
- if (key_def->is_nullable)
+ if (key_def->is_nullable || key_def->has_json_paths)
goto slowpath;
/*
* Check that key_def defines sequential a key without holes
diff --git a/src/box/vinyl.c b/src/box/vinyl.c
index ca987134c..acd2d7fd6 100644
--- a/src/box/vinyl.c
+++ b/src/box/vinyl.c
@@ -982,6 +982,10 @@ vinyl_index_def_change_requires_rebuild(struct index *index,
return true;
if (!field_type1_contains_type2(new_part->type, old_part->type))
return true;
+ if (json_path_cmp(old_part->path, old_part->path_len,
+ new_part->path, new_part->path_len,
+ TUPLE_INDEX_BASE) != 0)
+ return true;
}
return false;
}
diff --git a/src/box/vy_log.c b/src/box/vy_log.c
index c9e0713c8..6fc051648 100644
--- a/src/box/vy_log.c
+++ b/src/box/vy_log.c
@@ -581,9 +581,11 @@ vy_log_record_decode(struct vy_log_record *record,
record->group_id = mp_decode_uint(&pos);
break;
case VY_LOG_KEY_DEF: {
+ struct region *region = &fiber()->gc;
uint32_t part_count = mp_decode_array(&pos);
- struct key_part_def *parts = region_alloc(&fiber()->gc,
- sizeof(*parts) * part_count);
+ struct key_part_def *parts =
+ region_alloc(region,
+ sizeof(*parts) * part_count);
if (parts == NULL) {
diag_set(OutOfMemory,
sizeof(*parts) * part_count,
@@ -591,7 +593,7 @@ vy_log_record_decode(struct vy_log_record *record,
return -1;
}
if (key_def_decode_parts(parts, part_count, &pos,
- NULL, 0) != 0) {
+ NULL, 0, region) != 0) {
diag_log();
diag_set(ClientError, ER_INVALID_VYLOG_FILE,
"Bad record: failed to decode "
@@ -705,7 +707,8 @@ vy_log_record_dup(struct region *pool, const struct vy_log_record *src)
"struct key_part_def");
goto err;
}
- key_def_dump_parts(src->key_def, dst->key_parts);
+ if (key_def_dump_parts(src->key_def, dst->key_parts, pool) != 0)
+ goto err;
dst->key_part_count = src->key_def->part_count;
dst->key_def = NULL;
}
diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
index ddbc2d46f..14e0c0c93 100644
--- a/src/box/vy_point_lookup.c
+++ b/src/box/vy_point_lookup.c
@@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
const struct vy_read_view **rv,
struct tuple *key, struct tuple **ret)
{
- assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
-
*ret = NULL;
double start_time = ev_monotonic_now(loop());
int rc = 0;
diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
index 47f135c65..7a302e6f3 100644
--- a/src/box/vy_stmt.c
+++ b/src/box/vy_stmt.c
@@ -385,26 +385,43 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
struct region *region = &fiber()->gc;
uint32_t field_count = format->index_field_count;
- struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
+ uint32_t iov_sz =
+ sizeof(struct iovec) * format->total_field_count;
+ struct iovec *iov = region_alloc(region, iov_sz);
if (iov == NULL) {
- diag_set(OutOfMemory, sizeof(*iov) * field_count,
- "region", "iov for surrogate key");
+ diag_set(OutOfMemory, iov_sz, "region_alloc",
+ "iov for surrogate key");
return NULL;
}
- memset(iov, 0, sizeof(*iov) * field_count);
+ memset(iov, 0, iov_sz);
uint32_t part_count = mp_decode_array(&key);
assert(part_count == cmp_def->part_count);
- assert(part_count <= field_count);
- uint32_t nulls_count = field_count - cmp_def->part_count;
+ assert(part_count <= format->total_field_count);
+ /**
+ * The format:vy_stmt_size contains a size of
+ * stmt tuple having all leaf fields set to null.
+ * Calculate bsize as vy_stmt_size where parts_count
+ * nulls replaced with extracted keys.
+ */
uint32_t bsize = mp_sizeof_array(field_count) +
- mp_sizeof_nil() * nulls_count;
+ format->vy_stmt_size - mp_sizeof_nil() * part_count;
for (uint32_t i = 0; i < part_count; ++i) {
const struct key_part *part = &cmp_def->parts[i];
assert(part->fieldno < field_count);
+ struct tuple_field *field;
+ if (part->path != NULL) {
+ field = tuple_format_field_by_path(format,
+ part->fieldno,
+ part->path,
+ part->path_len);
+ } else {
+ field = tuple_format_field(format, part->fieldno);
+ }
+ assert(field != NULL);
const char *svp = key;
- iov[part->fieldno].iov_base = (char *) key;
+ iov[field->id].iov_base = (char *) key;
mp_next(&key);
- iov[part->fieldno].iov_len = key - svp;
+ iov[field->id].iov_len = key - svp;
bsize += key - svp;
}
@@ -414,18 +431,10 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
char *raw = (char *) tuple_data(stmt);
uint32_t *field_map = (uint32_t *) raw;
+ memset((char *)field_map - format->field_map_size, 0,
+ format->field_map_size);
char *wpos = mp_encode_array(raw, field_count);
- for (uint32_t i = 0; i < field_count; ++i) {
- struct tuple_field *field = tuple_format_field(format, i);
- if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL)
- field_map[field->offset_slot] = wpos - raw;
- if (iov[i].iov_base == NULL) {
- wpos = mp_encode_nil(wpos);
- } else {
- memcpy(wpos, iov[i].iov_base, iov[i].iov_len);
- wpos += iov[i].iov_len;
- }
- }
+ (void)tuple_format_stmt_encode(format, &wpos, raw, field_map, iov);
assert(wpos == raw + bsize);
vy_stmt_set_type(stmt, type);
return stmt;
diff --git a/src/lib/json/json.c b/src/lib/json/json.c
index 010a61d62..1d79bceb0 100644
--- a/src/lib/json/json.c
+++ b/src/lib/json/json.c
@@ -572,12 +572,7 @@ json_tree_lookup_path(struct json_tree *tree, struct json_token *root,
return ret;
}
-/**
- * Return the child of @parent following @pos or NULL if @pos
- * points to the last child in the children array. If @pos is
- * NULL, this function returns the first child.
- */
-static struct json_token *
+struct json_token *
json_tree_child_next(struct json_token *parent, struct json_token *pos)
{
assert(pos == NULL || pos->parent == parent);
diff --git a/src/lib/json/json.h b/src/lib/json/json.h
index 66cddd026..fc441a887 100644
--- a/src/lib/json/json.h
+++ b/src/lib/json/json.h
@@ -353,6 +353,14 @@ struct json_token *
json_tree_lookup_path(struct json_tree *tree, struct json_token *root,
const char *path, int path_len, int index_base);
+/**
+ * Return the child of @parent following @pos or NULL if @pos
+ * points to the last child in the children array. If @pos is
+ * NULL, this function returns the first child.
+ */
+struct json_token *
+json_tree_child_next(struct json_token *parent, struct json_token *pos);
+
/**
* Perform pre-order traversal in a JSON subtree rooted
* at a given node.
@@ -436,6 +444,14 @@ json_tree_postorder_next(struct json_token *root, struct json_token *pos);
json_tree_entry_safe(ret, type, member); \
})
+/**
+ * Container-aware wrapper around json_tree_child_next().
+ */
+#define json_tree_child_next_entry(parent, pos, type, member) ({ \
+ struct json_token *next = json_tree_child_next((parent), (pos)); \
+ json_tree_entry_safe(next, type, member); \
+})
+
/**
* Container-aware wrapper around json_tree_preorder_next().
*/
diff --git a/test/engine/json.result b/test/engine/json.result
new file mode 100644
index 000000000..711f7f256
--- /dev/null
+++ b/test/engine/json.result
@@ -0,0 +1,448 @@
+test_run = require('test_run').new()
+---
+...
+engine = test_run:get_cfg('engine')
+---
+...
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
+ part is indexed twice'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+- error: 'Wrong index options (field 2): ''path'' must be string'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+ ''map'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+---
+- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
+ ''array'' is not supported'
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = 'FIO.fname'}}})
+---
+- error: Field [3]["FIO"] has type 'string' in one index, but type 'map' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+---
+- error: Field 3 has type 'array' in one index, but type 'map' in another
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+---
+- error: 'Wrong index options (field 3): invalid JSON path ''FIO....fname'': error
+ in path on position 5'
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname', is_nullable = false}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+assert(idx.parts[2].path == 'FIO.fname')
+---
+- true
+...
+format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'array'}, {'age', 'unsigned'}, {'level', 'unsigned'}}
+---
+...
+s:format(format)
+---
+- error: Field 3 has type 'array' in one index, but type 'map' in another
+...
+format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}}
+---
+...
+s:format(format)
+---
+...
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: Field [3]["FIO"]["fname"] has type 'string' in one index, but type 'number'
+ in another
+...
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+---
+- error: 'Tuple field [3]["FIO"] type does not match one required by operation: expected
+ map'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+---
+- error: 'Tuple field [3]["FIO"]["fname"] type does not match one required by operation:
+ expected string'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+---
+- error: Tuple field [3]["FIO"]["sname"] required by space format is missing
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+ 4, 5]
+...
+idx:select()
+---
+- - [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+ - [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+ 4, 5]
+...
+idx:min()
+---
+- [7, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:max()
+---
+- [7, 7, {'town': 'Moscow', 'FIO': {'fname': 'Max', 'data': 'extra', 'sname': 'Isaev'}},
+ 4, 5]
+...
+s:drop()
+---
+...
+s = box.schema.create_space('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[2]'}
+---
+...
+pk = s:create_index('pk', {parts = parts})
+---
+...
+s:insert{{1, 2}, 3}
+---
+- [[1, 2], 3]
+...
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+---
+...
+s:get(2)
+---
+- [[1, 2], 8]
+...
+s:drop()
+---
+...
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk = s:create_index('primary', { type = 'tree' })
+---
+...
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+---
+- [1, 7, {'town': 'London', 'FIO': 1234}, 4, 5]
+...
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [2, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+---
+- [4, 7, {'town': 'London', 'FIO': [1, 2, 3]}, 4, 5]
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field [3]["FIO"] type does not match one required by operation: expected
+ map'
+...
+_ = s:delete(1)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: Duplicate key exists in unique index 'test1' in space 'withdata'
+...
+_ = s:delete(2)
+---
+...
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+---
+- error: 'Tuple field [3]["FIO"] type does not match one required by operation: expected
+ map'
+...
+_ = s:delete(4)
+---
+...
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+---
+...
+assert(idx ~= nil)
+---
+- true
+...
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+---
+- error: Field [3]["FIO"]["fname"] has type 'string' in one index, but type 'number'
+ in another
+...
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+---
+...
+assert(idx2 ~= nil)
+---
+- true
+...
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+---
+...
+idx:select()
+---
+- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+ - [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:min()
+---
+- [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
+...
+idx:max()
+---
+- [3, 7, {'town': 'London', 'FIO': {'fname': 'James', 'sname': 'Bond'}}, 4, 5]
+...
+idx:drop()
+---
+...
+s:drop()
+---
+...
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+parts = {}
+---
+...
+parts[1] = {1, 'str', path='[3][2].a'}
+---
+...
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+---
+...
+parts[3] = {2, 'str', path = '[2].d[1]'}
+---
+...
+pk = s:create_index('primary', { type = 'tree', parts = parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+---
+- [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6,
+ [1, 2, 3]]
+...
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+---
+- error: Duplicate key exists in unique index 'primary' in space 'withdata'
+...
+parts = {}
+---
+...
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+---
+...
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+---
+...
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+---
+...
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+---
+...
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+---
+- error: Tuple field [4][1] required by space format is missing
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+---
+...
+parts[2] = {3, 'unsigned'}
+---
+...
+crosspart_idx = s:create_index('crosspart', { parts = parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+parts = {}
+---
+...
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+---
+...
+num_idx = s:create_index('numeric', {parts = parts})
+---
+...
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:get(2)
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+num_idx:select()
+---
+- - [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+ 9, 2, 3]]
+ - [[1, 2, [3, {1: 3, 'a': 'str', 'b': 5}]], ['c', {'d': ['e', 'f'], 'e': 'g'}],
+ 6, [1, 2, 3]]
+ - [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [
+ 0]]
+...
+num_idx:max()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+num_idx:min()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+assert(crosspart_idx:max() == num_idx:max())
+---
+- true
+...
+assert(crosspart_idx:min() == num_idx:min())
+---
+- true
+...
+trap_idx:max()
+---
+- [[1, 2, [3, {'a': 'str2', 'b': 2}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [9,
+ 2, 3]]
+...
+trap_idx:min()
+---
+- [[1, 2, [3, {'a': 'str3', 'b': 9}]], ['c', {'d': ['e', 'f'], 'e': 'g'}], 6, [0]]
+...
+s:drop()
+---
+...
+s = box.schema.space.create('withdata', {engine = engine})
+---
+...
+pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned'}}})
+---
+...
+assert(pk_simplified.path == box.NULL)
+---
+- true
+...
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+---
+...
+s:insert{31, {a = 1, aa = -1}}
+---
+- [31, {'a': 1, 'aa': -1}]
+...
+s:insert{22, {a = 2, aa = -2}}
+---
+- [22, {'a': 2, 'aa': -2}]
+...
+s:insert{13, {a = 3, aa = -3}}
+---
+- [13, {'a': 3, 'aa': -3}]
+...
+idx:select()
+---
+- - [31, {'a': 1, 'aa': -1}]
+ - [22, {'a': 2, 'aa': -2}]
+ - [13, {'a': 3, 'aa': -3}]
+...
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+---
+...
+idx:select()
+---
+- - [13, {'a': 3, 'aa': -3}]
+ - [22, {'a': 2, 'aa': -2}]
+ - [31, {'a': 1, 'aa': -1}]
+...
+s:drop()
+---
+...
+-- incompatible format change
+s = box.schema.space.create('test')
+---
+...
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
+---
+...
+s:insert{{-1}}
+---
+- [[-1]]
+...
+i:alter{parts = {{1, 'string', path = '[1]'}}}
+---
+- error: 'Tuple field [1][1] type does not match one required by operation: expected
+ string'
+...
+s:insert{{'a'}}
+---
+- error: 'Tuple field [1][1] type does not match one required by operation: expected
+ integer'
+...
+i:drop()
+---
+...
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}})
+---
+...
+s:insert{{{FIO=-1}}}
+---
+- [[{'FIO': -1}]]
+...
+i:alter{parts = {{1, 'integer', path = '[1][1]'}}}
+---
+- error: 'Tuple field [1][1] type does not match one required by operation: expected
+ array'
+...
+i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}}
+---
+- error: 'Tuple field [1][1]["FIO"] type does not match one required by operation:
+ expected array'
+...
+s:drop()
+---
+...
+engine = nil
+---
+...
+test_run = nil
+---
+...
diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua
new file mode 100644
index 000000000..2a20fc3e5
--- /dev/null
+++ b/test/engine/json.test.lua
@@ -0,0 +1,129 @@
+test_run = require('test_run').new()
+engine = test_run:get_cfg('engine')
+--
+-- gh-1012: Indexes for JSON-defined paths.
+--
+s = box.schema.space.create('withdata', {engine = engine})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = 'FIO.fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname', is_nullable = false}, {3, 'str', path = '["FIO"]["sname"]'}}})
+assert(idx ~= nil)
+assert(idx.parts[2].path == 'FIO.fname')
+format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'array'}, {'age', 'unsigned'}, {'level', 'unsigned'}}
+s:format(format)
+format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}}
+s:format(format)
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{7, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond', data = "extra"}}, 4, 5}
+s:insert{7, 7, {town = 'Moscow', FIO = {fname = 'Max', sname = 'Isaev', data = "extra"}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+s:drop()
+
+s = box.schema.create_space('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'unsigned', path='[2]'}
+pk = s:create_index('pk', {parts = parts})
+s:insert{{1, 2}, 3}
+s:upsert({{box.null, 2}}, {{'+', 2, 5}})
+s:get(2)
+s:drop()
+
+-- Create index on space with data
+s = box.schema.space.create('withdata', {engine = engine})
+pk = s:create_index('primary', { type = 'tree' })
+s:insert{1, 7, {town = 'London', FIO = 1234}, 4, 5}
+s:insert{2, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{3, 7, {town = 'London', FIO = {fname = 'James', sname = 'Bond'}}, 4, 5}
+s:insert{4, 7, {town = 'London', FIO = {1,2,3}}, 4, 5}
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(1)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(2)
+s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+_ = s:delete(4)
+idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]', is_nullable = true}, {3, 'str', path = '["FIO"]["sname"]'}, {3, 'str', path = '["FIO"]["extra"]', is_nullable = true}}})
+assert(idx ~= nil)
+s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["fname"]'}}})
+idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
+assert(idx2 ~= nil)
+t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+idx:select()
+idx:min()
+idx:max()
+idx:drop()
+s:drop()
+
+-- Test complex JSON indexes
+s = box.schema.space.create('withdata', {engine = engine})
+parts = {}
+parts[1] = {1, 'str', path='[3][2].a'}
+parts[2] = {1, 'unsigned', path = '[3][1]'}
+parts[3] = {2, 'str', path = '[2].d[1]'}
+pk = s:create_index('primary', { type = 'tree', parts = parts})
+s:insert{{1, 2, {3, {3, a = 'str', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {1, 2, 3}}
+s:insert{{1, 2, {3, {a = 'str', b = 1}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6}
+parts = {}
+parts[1] = {4, 'unsigned', path='[1]', is_nullable = false}
+parts[2] = {4, 'unsigned', path='[2]', is_nullable = true}
+parts[3] = {4, 'unsigned', path='[4]', is_nullable = true}
+trap_idx = s:create_index('trap', { type = 'tree', parts = parts})
+s:insert{{1, 2, {3, {3, a = 'str2', b = 5}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b' }
+parts[2] = {3, 'unsigned'}
+crosspart_idx = s:create_index('crosspart', { parts = parts})
+s:insert{{1, 2, {3, {a = 'str2', b = 2}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {9, 2, 3}}
+parts = {}
+parts[1] = {1, 'unsigned', path='[3][2].b'}
+num_idx = s:create_index('numeric', {parts = parts})
+s:insert{{1, 2, {3, {a = 'str3', b = 9}}}, {'c', {d = {'e', 'f'}, e = 'g'}}, 6, {0}}
+num_idx:get(2)
+num_idx:select()
+num_idx:max()
+num_idx:min()
+assert(crosspart_idx:max() == num_idx:max())
+assert(crosspart_idx:min() == num_idx:min())
+trap_idx:max()
+trap_idx:min()
+s:drop()
+
+s = box.schema.space.create('withdata', {engine = engine})
+pk_simplified = s:create_index('primary', { type = 'tree', parts = {{1, 'unsigned'}}})
+assert(pk_simplified.path == box.NULL)
+idx = s:create_index('idx', {parts = {{2, 'integer', path = 'a'}}})
+s:insert{31, {a = 1, aa = -1}}
+s:insert{22, {a = 2, aa = -2}}
+s:insert{13, {a = 3, aa = -3}}
+idx:select()
+idx:alter({parts = {{2, 'integer', path = 'aa'}}})
+idx:select()
+s:drop()
+
+-- incompatible format change
+s = box.schema.space.create('test')
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1]'}}})
+s:insert{{-1}}
+i:alter{parts = {{1, 'string', path = '[1]'}}}
+s:insert{{'a'}}
+i:drop()
+i = s:create_index('pk', {parts = {{1, 'integer', path = '[1].FIO'}}})
+s:insert{{{FIO=-1}}}
+i:alter{parts = {{1, 'integer', path = '[1][1]'}}}
+i:alter{parts = {{1, 'integer', path = '[1].FIO[1]'}}}
+s:drop()
+
+engine = nil
+test_run = nil
+
--
2.19.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v7 2/5] box: introduce has_json_paths flag in templates
2019-01-09 8:29 [PATCH v7 0/5] box: Indexes by JSON path Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 1/5] box: introduce JSON Indexes Kirill Shcherbatov
@ 2019-01-09 8:29 ` Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 3/5] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-01-09 8:29 UTC (permalink / raw)
To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov
Introduced has_json_path flag for compare, hash and extract
functions(that are really hot) to make possible do not look to
path field for flat indexes without any JSON paths.
Part of #1012
---
src/box/tuple_compare.cc | 112 +++++++++++++++++++++++++----------
| 104 ++++++++++++++++++++------------
src/box/tuple_hash.cc | 45 ++++++++++----
3 files changed, 182 insertions(+), 79 deletions(-)
diff --git a/src/box/tuple_compare.cc b/src/box/tuple_compare.cc
index 7ab6e3bf6..dff3aebbd 100644
--- a/src/box/tuple_compare.cc
+++ b/src/box/tuple_compare.cc
@@ -458,11 +458,12 @@ tuple_common_key_parts(const struct tuple *tuple_a, const struct tuple *tuple_b,
return i;
}
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_path>
static inline int
tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
struct key_def *key_def)
{
+ assert(has_json_path == key_def->has_json_paths);
assert(!has_optional_parts || is_nullable);
assert(is_nullable == key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
@@ -508,10 +509,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
end = part + key_def->part_count;
for (; part < end; part++) {
- field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
- field_map_a, part);
- field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
- field_map_b, part);
+ if (!has_json_path) {
+ field_a = tuple_field_raw(format_a, tuple_a_raw,
+ field_map_a,
+ part->fieldno);
+ field_b = tuple_field_raw(format_b, tuple_b_raw,
+ field_map_b,
+ part->fieldno);
+ } else {
+ field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+ field_map_a, part);
+ field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+ field_map_b, part);
+ }
assert(has_optional_parts ||
(field_a != NULL && field_b != NULL));
if (! is_nullable) {
@@ -558,10 +568,19 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
*/
end = key_def->parts + key_def->part_count;
for (; part < end; ++part) {
- field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
- field_map_a, part);
- field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
- field_map_b, part);
+ if (!has_json_path) {
+ field_a = tuple_field_raw(format_a, tuple_a_raw,
+ field_map_a,
+ part->fieldno);
+ field_b = tuple_field_raw(format_b, tuple_b_raw,
+ field_map_b,
+ part->fieldno);
+ } else {
+ field_a = tuple_field_by_part_raw(format_a, tuple_a_raw,
+ field_map_a, part);
+ field_b = tuple_field_by_part_raw(format_b, tuple_b_raw,
+ field_map_b, part);
+ }
/*
* Extended parts are primary, and they can not
* be absent or be NULLs.
@@ -575,11 +594,12 @@ tuple_compare_slowpath(const struct tuple *tuple_a, const struct tuple *tuple_b,
return 0;
}
-template<bool is_nullable, bool has_optional_parts>
+template<bool is_nullable, bool has_optional_parts, bool has_json_paths>
static inline int
tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
uint32_t part_count, struct key_def *key_def)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(!has_optional_parts || is_nullable);
assert(is_nullable == key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
@@ -591,9 +611,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
const uint32_t *field_map = tuple_field_map(tuple);
enum mp_type a_type, b_type;
if (likely(part_count == 1)) {
- const char *field =
- tuple_field_by_part_raw(format, tuple_raw, field_map,
- part);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, tuple_raw, field_map,
+ part->fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, tuple_raw,
+ field_map, part);
+ }
if (! is_nullable) {
return tuple_compare_field(field, key, part->type,
part->coll);
@@ -617,9 +642,14 @@ tuple_compare_with_key_slowpath(const struct tuple *tuple, const char *key,
struct key_part *end = part + part_count;
int rc;
for (; part < end; ++part, mp_next(&key)) {
- const char *field =
- tuple_field_by_part_raw(format, tuple_raw,
- field_map, part);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, tuple_raw, field_map,
+ part->fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, tuple_raw,
+ field_map, part);
+ }
if (! is_nullable) {
rc = tuple_compare_field(field, key, part->type,
part->coll);
@@ -1012,19 +1042,31 @@ static const comparator_signature cmp_arr[] = {
#undef COMPARATOR
+static const tuple_compare_t compare_slowpath_funcs[] = {
+ tuple_compare_slowpath<false, false, false>,
+ tuple_compare_slowpath<true, false, false>,
+ tuple_compare_slowpath<false, true, false>,
+ tuple_compare_slowpath<true, true, false>,
+ tuple_compare_slowpath<false, false, true>,
+ tuple_compare_slowpath<true, false, true>,
+ tuple_compare_slowpath<false, true, true>,
+ tuple_compare_slowpath<true, true, true>
+};
+
tuple_compare_t
tuple_compare_create(const struct key_def *def)
{
+ int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+ 2 * (def->has_optional_parts ? 1 : 0) +
+ 4 * (def->has_json_paths ? 1 : 0);
if (def->is_nullable) {
if (key_def_is_sequential(def)) {
if (def->has_optional_parts)
return tuple_compare_sequential<true, true>;
else
return tuple_compare_sequential<true, false>;
- } else if (def->has_optional_parts) {
- return tuple_compare_slowpath<true, true>;
} else {
- return tuple_compare_slowpath<true, false>;
+ return compare_slowpath_funcs[cmp_func_idx];
}
}
assert(! def->has_optional_parts);
@@ -1044,10 +1086,9 @@ tuple_compare_create(const struct key_def *def)
return cmp_arr[k].f;
}
}
- if (key_def_is_sequential(def))
- return tuple_compare_sequential<false, false>;
- else
- return tuple_compare_slowpath<false, false>;
+ return key_def_is_sequential(def) ?
+ tuple_compare_sequential<false, false> :
+ compare_slowpath_funcs[cmp_func_idx];
}
/* }}} tuple_compare */
@@ -1229,9 +1270,23 @@ static const comparator_with_key_signature cmp_wk_arr[] = {
#undef KEY_COMPARATOR
+static const tuple_compare_with_key_t compare_with_key_slowpath_funcs[] = {
+ tuple_compare_with_key_slowpath<false, false, false>,
+ tuple_compare_with_key_slowpath<true, false, false>,
+ tuple_compare_with_key_slowpath<false, true, false>,
+ tuple_compare_with_key_slowpath<true, true, false>,
+ tuple_compare_with_key_slowpath<false, false, true>,
+ tuple_compare_with_key_slowpath<true, false, true>,
+ tuple_compare_with_key_slowpath<false, true, true>,
+ tuple_compare_with_key_slowpath<true, true, true>
+};
+
tuple_compare_with_key_t
tuple_compare_with_key_create(const struct key_def *def)
{
+ int cmp_func_idx = (def->is_nullable ? 1 : 0) +
+ 2 * (def->has_optional_parts ? 1 : 0) +
+ 4 * (def->has_json_paths ? 1 : 0);
if (def->is_nullable) {
if (key_def_is_sequential(def)) {
if (def->has_optional_parts) {
@@ -1241,10 +1296,8 @@ tuple_compare_with_key_create(const struct key_def *def)
return tuple_compare_with_key_sequential<true,
false>;
}
- } else if (def->has_optional_parts) {
- return tuple_compare_with_key_slowpath<true, true>;
} else {
- return tuple_compare_with_key_slowpath<true, false>;
+ return compare_with_key_slowpath_funcs[cmp_func_idx];
}
}
assert(! def->has_optional_parts);
@@ -1267,10 +1320,9 @@ tuple_compare_with_key_create(const struct key_def *def)
return cmp_wk_arr[k].f;
}
}
- if (key_def_is_sequential(def))
- return tuple_compare_with_key_sequential<false, false>;
- else
- return tuple_compare_with_key_slowpath<false, false>;
+ return key_def_is_sequential(def) ?
+ tuple_compare_with_key_sequential<false, false> :
+ compare_with_key_slowpath_funcs[cmp_func_idx];
}
/* }}} tuple_compare_with_key */
--git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
index c40d7887d..30f4f2f35 100644
--- a/src/box/tuple_extract_key.cc
+++ b/src/box/tuple_extract_key.cc
@@ -5,13 +5,18 @@
enum { MSGPACK_NULL = 0xc0 };
/** True if key part i and i+1 are sequential. */
+template <bool has_json_paths>
static inline bool
key_def_parts_are_sequential(const struct key_def *def, int i)
{
uint32_t fieldno1 = def->parts[i].fieldno + 1;
uint32_t fieldno2 = def->parts[i + 1].fieldno;
- return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
- def->parts[i + 1].path == NULL;
+ if (!has_json_paths) {
+ return fieldno1 == fieldno2;
+ } else {
+ return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
+ def->parts[i + 1].path == NULL;
+ }
}
/** True, if a key con contain two or more parts in sequence. */
@@ -19,7 +24,7 @@ static bool
key_def_contains_sequential_parts(const struct key_def *def)
{
for (uint32_t i = 0; i < def->part_count - 1; ++i) {
- if (key_def_parts_are_sequential(def, i))
+ if (key_def_parts_are_sequential<true>(def, i))
return true;
}
return false;
@@ -99,11 +104,13 @@ tuple_extract_key_sequential(const struct tuple *tuple, struct key_def *key_def,
* General-purpose implementation of tuple_extract_key()
* @copydoc tuple_extract_key()
*/
-template <bool contains_sequential_parts, bool has_optional_parts>
+template <bool contains_sequential_parts, bool has_optional_parts,
+ bool has_json_paths>
static char *
tuple_extract_key_slowpath(const struct tuple *tuple,
struct key_def *key_def, uint32_t *key_size)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(!has_optional_parts || key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
assert(contains_sequential_parts ==
@@ -118,9 +125,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
/* Calculate the key size. */
for (uint32_t i = 0; i < part_count; ++i) {
- const char *field =
- tuple_field_by_part_raw(format, data, field_map,
- &key_def->parts[i]);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, data, field_map,
+ key_def->parts[i].fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, data, field_map,
+ &key_def->parts[i]);
+ }
if (has_optional_parts && field == NULL) {
bsize += mp_sizeof_nil();
continue;
@@ -133,7 +145,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* minimize tuple_field_raw() calls.
*/
for (; i < part_count - 1; i++) {
- if (!key_def_parts_are_sequential(key_def, i)) {
+ if (!key_def_parts_are_sequential
+ <has_json_paths>(key_def, i)) {
/*
* End of sequential part.
*/
@@ -159,9 +172,14 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
}
char *key_buf = mp_encode_array(key, part_count);
for (uint32_t i = 0; i < part_count; ++i) {
- const char *field =
- tuple_field_by_part_raw(format, data, field_map,
- &key_def->parts[i]);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field_raw(format, data, field_map,
+ key_def->parts[i].fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, data, field_map,
+ &key_def->parts[i]);
+ }
if (has_optional_parts && field == NULL) {
key_buf = mp_encode_nil(key_buf);
continue;
@@ -174,7 +192,8 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* minimize tuple_field_raw() calls.
*/
for (; i < part_count - 1; i++) {
- if (!key_def_parts_are_sequential(key_def, i)) {
+ if (!key_def_parts_are_sequential
+ <has_json_paths>(key_def, i)) {
/*
* End of sequential part.
*/
@@ -207,11 +226,12 @@ tuple_extract_key_slowpath(const struct tuple *tuple,
* General-purpose version of tuple_extract_key_raw()
* @copydoc tuple_extract_key_raw()
*/
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
static char *
tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
struct key_def *key_def, uint32_t *key_size)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(!has_optional_parts || key_def->is_nullable);
assert(has_optional_parts == key_def->has_optional_parts);
assert(mp_sizeof_nil() == 1);
@@ -239,7 +259,8 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
uint32_t fieldno = key_def->parts[i].fieldno;
uint32_t null_count = 0;
for (; i < key_def->part_count - 1; i++) {
- if (!key_def_parts_are_sequential(key_def, i))
+ if (!key_def_parts_are_sequential
+ <has_json_paths>(key_def, i))
break;
}
const struct key_part *part = &key_def->parts[i];
@@ -319,6 +340,17 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
return key;
}
+static const tuple_extract_key_t extract_key_slowpath_funcs[] = {
+ tuple_extract_key_slowpath<false, false, false>,
+ tuple_extract_key_slowpath<true, false, false>,
+ tuple_extract_key_slowpath<false, true, false>,
+ tuple_extract_key_slowpath<true, true, false>,
+ tuple_extract_key_slowpath<false, false, true>,
+ tuple_extract_key_slowpath<true, false, true>,
+ tuple_extract_key_slowpath<false, true, true>,
+ tuple_extract_key_slowpath<true, true, true>
+};
+
/**
* Initialize tuple_extract_key() and tuple_extract_key_raw()
*/
@@ -339,32 +371,30 @@ tuple_extract_key_set(struct key_def *key_def)
tuple_extract_key_sequential_raw<false>;
}
} else {
- if (key_def->has_optional_parts) {
- assert(key_def->is_nullable);
- if (key_def_contains_sequential_parts(key_def)) {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<true, true>;
- } else {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<false, true>;
- }
- } else {
- if (key_def_contains_sequential_parts(key_def)) {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<true, false>;
- } else {
- key_def->tuple_extract_key =
- tuple_extract_key_slowpath<false,
- false>;
- }
- }
+ int func_idx =
+ (key_def_contains_sequential_parts(key_def) ? 1 : 0) +
+ 2 * (key_def->has_optional_parts ? 1 : 0) +
+ 4 * (key_def->has_json_paths ? 1 : 0);
+ key_def->tuple_extract_key =
+ extract_key_slowpath_funcs[func_idx];
+ assert(!key_def->has_optional_parts || key_def->is_nullable);
}
if (key_def->has_optional_parts) {
assert(key_def->is_nullable);
- key_def->tuple_extract_key_raw =
- tuple_extract_key_slowpath_raw<true>;
+ if (key_def->has_json_paths) {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<true, true>;
+ } else {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<true, false>;
+ }
} else {
- key_def->tuple_extract_key_raw =
- tuple_extract_key_slowpath_raw<false>;
+ if (key_def->has_json_paths) {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<false, true>;
+ } else {
+ key_def->tuple_extract_key_raw =
+ tuple_extract_key_slowpath_raw<false, false>;
+ }
}
}
diff --git a/src/box/tuple_hash.cc b/src/box/tuple_hash.cc
index 3486ce11c..8ede29045 100644
--- a/src/box/tuple_hash.cc
+++ b/src/box/tuple_hash.cc
@@ -213,7 +213,7 @@ static const hasher_signature hash_arr[] = {
#undef HASHER
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
uint32_t
tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def);
@@ -256,10 +256,17 @@ tuple_hash_func_set(struct key_def *key_def) {
}
slowpath:
- if (key_def->has_optional_parts)
- key_def->tuple_hash = tuple_hash_slowpath<true>;
- else
- key_def->tuple_hash = tuple_hash_slowpath<false>;
+ if (key_def->has_optional_parts) {
+ if (key_def->has_json_paths)
+ key_def->tuple_hash = tuple_hash_slowpath<true, true>;
+ else
+ key_def->tuple_hash = tuple_hash_slowpath<true, false>;
+ } else {
+ if (key_def->has_json_paths)
+ key_def->tuple_hash = tuple_hash_slowpath<false, true>;
+ else
+ key_def->tuple_hash = tuple_hash_slowpath<false, false>;
+ }
key_def->key_hash = key_hash_slowpath;
}
@@ -319,10 +326,11 @@ tuple_hash_key_part(uint32_t *ph1, uint32_t *pcarry, const struct tuple *tuple,
return tuple_hash_field(ph1, pcarry, &field, part->coll);
}
-template <bool has_optional_parts>
+template <bool has_optional_parts, bool has_json_paths>
uint32_t
tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
{
+ assert(has_json_paths == key_def->has_json_paths);
assert(has_optional_parts == key_def->has_optional_parts);
uint32_t h = HASH_SEED;
uint32_t carry = 0;
@@ -331,9 +339,13 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
struct tuple_format *format = tuple_format(tuple);
const char *tuple_raw = tuple_data(tuple);
const uint32_t *field_map = tuple_field_map(tuple);
- const char *field =
- tuple_field_by_part_raw(format, tuple_raw, field_map,
- key_def->parts);
+ const char *field;
+ if (!has_json_paths) {
+ field = tuple_field(tuple, prev_fieldno);
+ } else {
+ field = tuple_field_by_part_raw(format, tuple_raw, field_map,
+ key_def->parts);
+ }
const char *end = (char *)tuple + tuple_size(tuple);
if (has_optional_parts && field == NULL) {
total_size += tuple_hash_null(&h, &carry);
@@ -347,9 +359,18 @@ tuple_hash_slowpath(const struct tuple *tuple, struct key_def *key_def)
* need of tuple_field
*/
if (prev_fieldno + 1 != key_def->parts[part_id].fieldno) {
- struct key_part *part = &key_def->parts[part_id];
- field = tuple_field_by_part_raw(format, tuple_raw,
- field_map, part);
+ if (!has_json_paths) {
+ field = tuple_field(tuple,
+ key_def->parts[part_id].
+ fieldno);
+ } else {
+ struct key_part *part =
+ &key_def->parts[part_id];
+ field = tuple_field_by_part_raw(format,
+ tuple_raw,
+ field_map,
+ part);
+ }
}
if (has_optional_parts && (field == NULL || field >= end)) {
total_size += tuple_hash_null(&h, &carry);
--
2.19.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v7 3/5] box: tune tuple_field_raw_by_path for indexed data
2019-01-09 8:29 [PATCH v7 0/5] box: Indexes by JSON path Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 1/5] box: introduce JSON Indexes Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 2/5] box: introduce has_json_paths flag in templates Kirill Shcherbatov
@ 2019-01-09 8:29 ` Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 4/5] box: introduce offset_slot cache in key_part Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
4 siblings, 0 replies; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-01-09 8:29 UTC (permalink / raw)
To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov
We don't need to parse tuple in tuple_field_raw_by_path if
required field has been indexed. We do path lookup in field
tree of JSON paths and return data by it's offset from field_map
instead of whole tuple parsing.
Part of #1012
---
src/box/tuple.h | 19 -------------------
src/box/tuple_format.c | 23 +++++++++++------------
src/box/tuple_format.h | 24 ------------------------
test/engine/json.result | 5 +++++
test/engine/json.test.lua | 2 ++
5 files changed, 18 insertions(+), 55 deletions(-)
diff --git a/src/box/tuple.h b/src/box/tuple.h
index 83e5b7013..dac467d00 100644
--- a/src/box/tuple.h
+++ b/src/box/tuple.h
@@ -547,25 +547,6 @@ tuple_field_by_path(const struct tuple *tuple, const char *path,
path_hash, field);
}
-/**
- * Get tuple field by its name.
- * @param tuple Tuple to get field from.
- * @param name Field name.
- * @param name_len Length of @a name.
- * @param name_hash Hash of @a name.
- *
- * @retval not NULL MessagePack field.
- * @retval NULL No field with @a name.
- */
-static inline const char *
-tuple_field_by_name(const struct tuple *tuple, const char *name,
- uint32_t name_len, uint32_t name_hash)
-{
- return tuple_field_raw_by_name(tuple_format(tuple), tuple_data(tuple),
- tuple_field_map(tuple), name, name_len,
- name_hash);
-}
-
/**
* @brief Tuple Interator
*/
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index c81c23fd1..401752654 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -1073,10 +1073,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
goto error;
switch(token.type) {
case JSON_TOKEN_NUM: {
- int index = token.num;
- *field = tuple_field_raw(format, tuple, field_map, index);
- if (*field == NULL)
- return 0;
+ fieldno = token.num;
break;
}
case JSON_TOKEN_STR: {
@@ -1093,10 +1090,8 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
*/
name_hash = field_name_hash(token.str, token.len);
}
- *field = tuple_field_raw_by_name(format, tuple, field_map,
- token.str, token.len,
- name_hash);
- if (*field == NULL)
+ if (tuple_fieldno_by_name(format->dict, token.str, token.len,
+ name_hash, &fieldno) != 0)
return 0;
break;
}
@@ -1105,13 +1100,17 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
*field = NULL;
return 0;
}
- rc = tuple_field_go_to_path(field, path + lexer.offset,
- path_len - lexer.offset);
+ /* Optimize indexed JSON field data access. */
+ struct key_part part;
+ part.fieldno = fieldno;
+ part.path = (char *)path + lexer.offset;
+ part.path_len = path_len - lexer.offset;
+ rc = tuple_field_by_part_raw_slowpath(format, tuple, field_map, &part,
+ field);
if (rc == 0)
return 0;
/* Setup absolute error position. */
rc += lexer.offset;
-
error:
assert(rc > 0);
diag_set(ClientError, ER_ILLEGAL_PARAMS,
@@ -1128,7 +1127,7 @@ tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data,
struct tuple_field *field =
tuple_format_field_by_path(format, part->fieldno, part->path,
part->path_len);
- if (field != NULL) {
+ if (field != NULL && field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
int32_t offset_slot = field->offset_slot;
assert(-offset_slot * sizeof(uint32_t) <=
format->field_map_size);
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index 3b630c3bb..dd7cd147a 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -437,30 +437,6 @@ tuple_field_raw(struct tuple_format *format, const char *tuple,
return tuple;
}
-/**
- * Get tuple field by its name.
- * @param format Tuple format.
- * @param tuple MessagePack tuple's body.
- * @param field_map Tuple field map.
- * @param name Field name.
- * @param name_len Length of @a name.
- * @param name_hash Hash of @a name.
- *
- * @retval not NULL MessagePack field.
- * @retval NULL No field with @a name.
- */
-static inline const char *
-tuple_field_raw_by_name(struct tuple_format *format, const char *tuple,
- const uint32_t *field_map, const char *name,
- uint32_t name_len, uint32_t name_hash)
-{
- uint32_t fieldno;
- if (tuple_fieldno_by_name(format->dict, name, name_len, name_hash,
- &fieldno) != 0)
- return NULL;
- return tuple_field_raw(format, tuple, field_map, fieldno);
-}
-
/**
* Retrieve msgpack data by JSON path.
* @param data Pointer to msgpack with data.
diff --git a/test/engine/json.result b/test/engine/json.result
index 711f7f256..ef3440a2d 100644
--- a/test/engine/json.result
+++ b/test/engine/json.result
@@ -214,6 +214,11 @@ assert(idx2 ~= nil)
t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
---
...
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
+---
+- Agent
+...
idx:select()
---
- - [5, 7, {'town': 'Matrix', 'FIO': {'fname': 'Agent', 'sname': 'Smith'}}, 4, 5]
diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua
index 2a20fc3e5..7f2aed790 100644
--- a/test/engine/json.test.lua
+++ b/test/engine/json.test.lua
@@ -59,6 +59,8 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = '["FIO"]["
idx2 = s:create_index('test2', {parts = {{2, 'number'}, {3, 'str', path = '["FIO"]["fname"]'}}})
assert(idx2 ~= nil)
t = s:insert{5, 7, {town = 'Matrix', FIO = {fname = 'Agent', sname = 'Smith'}}, 4, 5}
+-- Test field_map in tuple speed-up access by indexed path.
+t["[3][\"FIO\"][\"fname\"]"]
idx:select()
idx:min()
idx:max()
--
2.19.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v7 4/5] box: introduce offset_slot cache in key_part
2019-01-09 8:29 [PATCH v7 0/5] box: Indexes by JSON path Kirill Shcherbatov
` (2 preceding siblings ...)
2019-01-09 8:29 ` [PATCH v7 3/5] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
@ 2019-01-09 8:29 ` Kirill Shcherbatov
2019-01-10 11:28 ` Vladimir Davydov
2019-01-09 8:29 ` [PATCH v7 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
4 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-01-09 8:29 UTC (permalink / raw)
To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov
tuple_field_by_part looks up the tuple_field corresponding to the
given key part in tuple_format in order to quickly retrieve the offset
of indexed data from the tuple field map. For regular indexes this
operation is blazing fast, however of JSON indexes it is not as we
have to parse the path to data and then do multiple lookups in a JSON
tree. Since tuple_field_by_part is used by comparators, we should
strive to make this routine as fast as possible for all kinds of
indexes.
This patch introduces an optimization that is supposed to make
tuple_field_by_part for JSON indexes as fast as it is for regular
indexes in most cases. We do that by caching the offset slot right in
key_part. There's a catch here however - we create a new format
whenever an index is dropped or created and we don't reindex old
tuples. As a result, there may be several generations of tuples in the
same space, all using different formats while there's the only key_def
used for comparison.
To overcome this problem, we introduce the notion of tuple_format
epoch. This is a counter incremented each time a new format is
created. We store it in tuple_format and key_def, and we only use
the offset slot cached in a key_def if it's epoch coincides with the
epoch of the tuple format. If they don't, we look up a tuple_field as
before, and then update the cached value provided the epoch of the
tuple format.
Part of #1012
---
src/box/key_def.c | 16 +++++++++++-----
src/box/key_def.h | 11 +++++++++++
src/box/tuple_format.c | 10 ++++++++++
src/box/tuple_format.h | 12 ++++++++++++
4 files changed, 44 insertions(+), 5 deletions(-)
diff --git a/src/box/key_def.c b/src/box/key_def.c
index 3012b05df..328954296 100644
--- a/src/box/key_def.c
+++ b/src/box/key_def.c
@@ -139,7 +139,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
enum field_type type, enum on_conflict_action nullable_action,
struct coll *coll, uint32_t coll_id,
enum sort_order sort_order, const char *path,
- uint32_t path_len, char **paths)
+ uint32_t path_len, char **paths, int32_t offset_slot_cache,
+ uint64_t format_epoch)
{
assert(part_no < def->part_count);
assert(type < field_type_MAX);
@@ -151,6 +152,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
def->parts[part_no].coll = coll;
def->parts[part_no].coll_id = coll_id;
def->parts[part_no].sort_order = sort_order;
+ def->parts[part_no].offset_slot_cache = offset_slot_cache;
+ def->parts[part_no].format_epoch = format_epoch;
if (path != NULL) {
assert(paths != NULL);
def->parts[part_no].path = *paths;
@@ -198,7 +201,8 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
uint32_t path_len = part->path != NULL ? strlen(part->path) : 0;
key_def_set_part(def, i, part->fieldno, part->type,
part->nullable_action, coll, part->coll_id,
- part->sort_order, part->path, path_len, &paths);
+ part->sort_order, part->path, path_len, &paths,
+ TUPLE_OFFSET_SLOT_NIL, 0);
}
key_def_set_cmp(def);
return def;
@@ -251,7 +255,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
(enum field_type)types[item],
ON_CONFLICT_ACTION_DEFAULT,
NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0,
- NULL);
+ NULL, TUPLE_OFFSET_SLOT_NIL, 0);
}
key_def_set_cmp(key_def);
return key_def;
@@ -676,7 +680,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
part->coll_id, part->sort_order, part->path,
- part->path_len, &paths);
+ part->path_len, &paths, part->offset_slot_cache,
+ part->format_epoch);
}
/* Set-append second key def's part to the new key def. */
@@ -688,7 +693,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
key_def_set_part(new_def, pos++, part->fieldno, part->type,
part->nullable_action, part->coll,
part->coll_id, part->sort_order, part->path,
- part->path_len, &paths);
+ part->path_len, &paths, part->offset_slot_cache,
+ part->format_epoch);
}
key_def_set_cmp(new_def);
return new_def;
diff --git a/src/box/key_def.h b/src/box/key_def.h
index c6b7a8c74..54e3dca89 100644
--- a/src/box/key_def.h
+++ b/src/box/key_def.h
@@ -96,6 +96,17 @@ struct key_part {
char *path;
/** The length of JSON path. */
uint32_t path_len;
+ /** Cached format epoch. */
+ uint64_t format_epoch;
+ /**
+ * Cached value of the offset slot corresponding to
+ * the indexed field (tuple_field::offset_slot).
+ * Valid only if key_def::epoch equals the epoch of
+ * the tuple format. Updated in
+ * tuple_field_by_part_slowpath to always store the
+ * offset corresponding to the last used tuple format.
+ */
+ int32_t offset_slot_cache;
};
struct key_def;
diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
index 401752654..9178c2722 100644
--- a/src/box/tuple_format.c
+++ b/src/box/tuple_format.c
@@ -40,6 +40,7 @@ struct tuple_format **tuple_formats;
static intptr_t recycled_format_ids = FORMAT_ID_NIL;
static uint32_t formats_size = 0, formats_capacity = 0;
+static uint64_t formats_epoch = 0;
static struct tuple_field *
tuple_field_new(void)
@@ -599,6 +600,7 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
format->vtab = *vtab;
format->engine = NULL;
format->is_temporary = false;
+ format->epoch = ++formats_epoch;
if (tuple_format_register(format) < 0) {
tuple_format_destroy(format);
free(format);
@@ -1105,6 +1107,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
part.fieldno = fieldno;
part.path = (char *)path + lexer.offset;
part.path_len = path_len - lexer.offset;
+ part.format_epoch = 0;
rc = tuple_field_by_part_raw_slowpath(format, tuple, field_map, &part,
field);
if (rc == 0)
@@ -1131,6 +1134,13 @@ tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data,
int32_t offset_slot = field->offset_slot;
assert(-offset_slot * sizeof(uint32_t) <=
format->field_map_size);
+
+ /* Update format epoch cache. */
+ assert(part->format_epoch != format->epoch);
+ assert(format->epoch != 0);
+ part->offset_slot_cache = offset_slot;
+ part->format_epoch = format->epoch;
+
*raw = field_map[offset_slot] == 0 ?
NULL : data + field_map[offset_slot];
return 0;
diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
index dd7cd147a..00540207c 100644
--- a/src/box/tuple_format.h
+++ b/src/box/tuple_format.h
@@ -138,6 +138,12 @@ tuple_field_is_nullable(struct tuple_field *tuple_field)
* Tuple format describes how tuple is stored and information about its fields
*/
struct tuple_format {
+ /**
+ * Counter that grows incrementally on space rebuild
+ * used for caching offset slot in key_part, for more
+ * details see key_part::offset_slot_cache.
+ */
+ uint64_t epoch;
/** Virtual function table */
struct tuple_format_vtab vtab;
/** Pointer to engine-specific data. */
@@ -488,6 +494,12 @@ tuple_field_by_part_raw(struct tuple_format *format, const char *data,
{
if (likely(part->path == NULL)) {
return tuple_field_raw(format, data, field_map, part->fieldno);
+ } else if (part->format_epoch == format->epoch) {
+ int32_t offset_slot = part->offset_slot_cache;
+ assert(-offset_slot * sizeof(uint32_t) <=
+ format->field_map_size);
+ return field_map[offset_slot] != 0 ?
+ data + field_map[offset_slot] : NULL;
} else {
const char *raw;
MAYBE_UNUSED int rc =
--
2.19.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v7 5/5] box: specify indexes in user-friendly form
2019-01-09 8:29 [PATCH v7 0/5] box: Indexes by JSON path Kirill Shcherbatov
` (3 preceding siblings ...)
2019-01-09 8:29 ` [PATCH v7 4/5] box: introduce offset_slot cache in key_part Kirill Shcherbatov
@ 2019-01-09 8:29 ` Kirill Shcherbatov
2019-01-10 10:21 ` Vladimir Davydov
4 siblings, 1 reply; 9+ messages in thread
From: Kirill Shcherbatov @ 2019-01-09 8:29 UTC (permalink / raw)
To: tarantool-patches, vdavydov.dev; +Cc: kostja, Kirill Shcherbatov
Implemented a more convenient interface for creating an index
by JSON path. Instead of specifying fieldno and relative path
it comes possible to pass full JSON path to data.
Closes #1012
@TarantoolBot document
Title: Indexes by JSON path
Sometimes field data could have complex document structure.
When this structure is consistent across whole document,
you are able to create an index by JSON path.
Example:
s = box.schema.space.create('sample')
format = {{'id', 'unsigned'}, {'data', 'map'}}
s:format(format)
-- explicit JSON index creation
age_idx = s:create_index('age', {{2, 'number', path = ".age"}})
-- user-friendly syntax for JSON index creation
parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'},
{'data.age', 'number'}}
info_idx = s:create_index('info', {parts = parts}})
s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}})
---
src/box/lua/index.c | 64 +++++++++++++++++++++++++++++++++++++++
src/box/lua/schema.lua | 22 +++++++-------
test/engine/json.result | 28 +++++++++++++++++
test/engine/json.test.lua | 8 +++++
4 files changed, 111 insertions(+), 11 deletions(-)
diff --git a/src/box/lua/index.c b/src/box/lua/index.c
index 6265c044a..9b04c5d9a 100644
--- a/src/box/lua/index.c
+++ b/src/box/lua/index.c
@@ -35,6 +35,9 @@
#include "box/box.h"
#include "box/index.h"
#include "box/lua/tuple.h"
+#include "box/schema.h"
+#include "box/tuple_format.h"
+#include "json/json.h"
#include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */
/** {{{ box.index Lua library: access to spaces and indexes
@@ -328,6 +331,66 @@ lbox_index_compact(lua_State *L)
return 0;
}
+/**
+ * Resolve field index by absolute JSON path first component and
+ * return relative JSON path.
+ */
+static int
+lbox_index_path_resolve(struct lua_State *L)
+{
+ if (lua_gettop(L) != 3 ||
+ !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
+ return luaL_error(L, "Usage box.internal."
+ "path_resolve(part_id, space_id, path)");
+ }
+ uint32_t part_id = lua_tonumber(L, 1);
+ uint32_t space_id = lua_tonumber(L, 2);
+ size_t path_len;
+ const char *path = lua_tolstring(L, 3, &path_len);
+ struct space *space = space_cache_find(space_id);
+ if (space == NULL)
+ return luaT_error(L);
+ struct json_lexer lexer;
+ struct json_token token;
+ json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE);
+ int rc = json_lexer_next_token(&lexer, &token);
+ if (rc != 0) {
+ const char *err_msg =
+ tt_sprintf("options.parts[%d]: error in path on "
+ "position %d", part_id, rc);
+ diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+ return luaT_error(L);
+ }
+ assert(space->format != NULL && space->format->dict != NULL);
+ uint32_t fieldno = token.num;
+ uint32_t field_count = tuple_format_field_count(space->format);
+ if (token.type == JSON_TOKEN_NUM && fieldno >= field_count) {
+ const char *err_msg =
+ tt_sprintf("options.parts[%d]: field '%d' referenced "
+ "in path is greater than format field "
+ "count %d", part_id,
+ fieldno + TUPLE_INDEX_BASE, field_count);
+ diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+ return luaT_error(L);
+ } else if (token.type == JSON_TOKEN_STR &&
+ tuple_fieldno_by_name(space->format->dict, token.str,
+ token.len,
+ field_name_hash(token.str, token.len),
+ &fieldno) != 0) {
+ const char *err_msg =
+ tt_sprintf("options.parts[%d]: field was not found by "
+ "name '%.*s'", part_id, token.len,
+ token.str);
+ diag_set(ClientError, ER_ILLEGAL_PARAMS, err_msg);
+ return luaT_error(L);
+ }
+ fieldno += TUPLE_INDEX_BASE;
+ path += lexer.offset;
+ lua_pushnumber(L, fieldno);
+ lua_pushstring(L, path);
+ return 2;
+}
+
/* }}} */
void
@@ -365,6 +428,7 @@ box_lua_index_init(struct lua_State *L)
{"truncate", lbox_truncate},
{"stat", lbox_index_stat},
{"compact", lbox_index_compact},
+ {"path_resolve", lbox_index_path_resolve},
{NULL, NULL}
};
diff --git a/src/box/lua/schema.lua b/src/box/lua/schema.lua
index 8a804f0ba..497cf197c 100644
--- a/src/box/lua/schema.lua
+++ b/src/box/lua/schema.lua
@@ -575,7 +575,7 @@ local function update_index_parts_1_6_0(parts)
return result
end
-local function update_index_parts(format, parts)
+local function update_index_parts(format, parts, space_id)
if type(parts) ~= "table" then
box.error(box.error.ILLEGAL_PARAMS,
"options.parts parameter should be a table")
@@ -626,16 +626,16 @@ local function update_index_parts(format, parts)
box.error(box.error.ILLEGAL_PARAMS,
"options.parts[" .. i .. "]: field (name or number) is expected")
elseif type(part.field) == 'string' then
- for k,v in pairs(format) do
- if v.name == part.field then
- part.field = k
- break
- end
- end
- if type(part.field) == 'string' then
+ local idx, path = box.internal.path_resolve(i, space_id, part.field)
+ if part.path ~= nil and part.path ~= path then
box.error(box.error.ILLEGAL_PARAMS,
- "options.parts[" .. i .. "]: field was not found by name '" .. part.field .. "'")
+ "options.parts[" .. i .. "]: field path '"..
+ part.path.." doesn't math the path '" ..
+ part.field .. "'")
end
+ parts_can_be_simplified = parts_can_be_simplified and path == nil
+ part.field = idx
+ part.path = path or part.path
elseif part.field == 0 then
box.error(box.error.ILLEGAL_PARAMS,
"options.parts[" .. i .. "]: field (number) must be one-based")
@@ -792,7 +792,7 @@ box.schema.index.create = function(space_id, name, options)
end
end
local parts, parts_can_be_simplified =
- update_index_parts(format, options.parts)
+ update_index_parts(format, options.parts, space_id)
-- create_index() options contains type, parts, etc,
-- stored separately. Remove these members from index_opts
local index_opts = {
@@ -959,7 +959,7 @@ box.schema.index.alter = function(space_id, index_id, options)
if options.parts then
local parts_can_be_simplified
parts, parts_can_be_simplified =
- update_index_parts(format, options.parts)
+ update_index_parts(format, options.parts, space_id)
-- save parts in old format if possible
if parts_can_be_simplified then
parts = simplify_index_parts(parts)
diff --git a/test/engine/json.result b/test/engine/json.result
index ef3440a2d..29a65cc0c 100644
--- a/test/engine/json.result
+++ b/test/engine/json.result
@@ -71,6 +71,34 @@ s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = 'FIO.fname
- error: Field [3]["FIO"]["fname"] has type 'string' in one index, but type 'number'
in another
...
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: error in path on position 1'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field ''666'' referenced in path is
+ greater than format field count 5'
+...
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+---
+- error: 'Illegal parameters, options.parts[2]: field was not found by name ''invalid'''
+...
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+---
+...
+assert(idx3 ~= nil)
+---
+- true
+...
+assert(idx3.parts[2].path == ".FIO[\"fname\"]")
+---
+- true
+...
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
+---
+...
s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
---
- error: 'Tuple field [3]["FIO"] type does not match one required by operation: expected
diff --git a/test/engine/json.test.lua b/test/engine/json.test.lua
index 7f2aed790..a2ba85a78 100644
--- a/test/engine/json.test.lua
+++ b/test/engine/json.test.lua
@@ -19,6 +19,14 @@ s:format(format)
format = {{'id', 'unsigned'}, {'meta', 'unsigned'}, {'data', 'map'}, {'age', 'unsigned'}, {'level', 'unsigned'}}
s:format(format)
s:create_index('test2', {parts = {{2, 'number'}, {3, 'number', path = 'FIO.fname'}, {3, 'str', path = '["FIO"]["sname"]'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {']sad.FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'[666].FIO["fname"]', 'str'}}})
+s:create_index('test3', {parts = {{2, 'number'}, {'invalid.FIO["fname"]', 'str'}}})
+idx3 = s:create_index('test3', {parts = {{2, 'number'}, {'data.FIO["fname"]', 'str'}}})
+assert(idx3 ~= nil)
+assert(idx3.parts[2].path == ".FIO[\"fname\"]")
+-- Vinyl has optimizations that omit index checks, so errors could differ.
+idx3:drop()
s:insert{7, 7, {town = 'London', FIO = 666}, 4, 5}
s:insert{7, 7, {town = 'London', FIO = {fname = 666, sname = 'Bond'}}, 4, 5}
s:insert{7, 7, {town = 'London', FIO = {fname = "James"}}, 4, 5}
--
2.19.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v7 1/5] box: introduce JSON Indexes
2019-01-09 8:29 ` [PATCH v7 1/5] box: introduce JSON Indexes Kirill Shcherbatov
@ 2019-01-10 10:16 ` Vladimir Davydov
0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-01-10 10:16 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja
On Wed, Jan 09, 2019 at 11:29:36AM +0300, Kirill Shcherbatov wrote:
> diff --git a/src/box/alter.cc b/src/box/alter.cc
> index 0589c9678..9656a4189 100644
> --- a/src/box/alter.cc
> +++ b/src/box/alter.cc
> @@ -268,7 +268,7 @@ index_def_new_from_tuple(struct tuple *tuple, struct space *space)
> });
> if (key_def_decode_parts(part_def, part_count, &parts,
> space->def->fields,
> - space->def->field_count) != 0)
> + space->def->field_count, &fiber()->gc) != 0)
> diag_raise();
> key_def = key_def_new(part_def, part_count);
> if (key_def == NULL)
> diff --git a/src/box/index_def.c b/src/box/index_def.c
> index 2ba57ee9d..58137ed07 100644
> --- a/src/box/index_def.c
> +++ b/src/box/index_def.c
> @@ -31,6 +31,8 @@
> #include "index_def.h"
> #include "schema_def.h"
> #include "identifier.h"
> +#include "tuple_format.h"
Pointless inclusion.
> +#include "json/json.h"
>
> const char *index_type_strs[] = { "HASH", "TREE", "BITSET", "RTREE" };
>
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index dae3580e2..3012b05df 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -71,19 +75,30 @@ const struct opt_def part_def_reg[] = {
> struct key_part_def, nullable_action, NULL),
> OPT_DEF_ENUM(PART_OPT_SORT_ORDER, sort_order, struct key_part_def,
> sort_order, NULL),
> + OPT_DEF(PART_OPT_PATH, OPT_STRPTR, struct key_part_def, path),
> OPT_END,
> };
>
> struct key_def *
> key_def_dup(const struct key_def *src)
> {
> - size_t sz = key_def_sizeof(src->part_count);
> - struct key_def *res = (struct key_def *)malloc(sz);
> + size_t sz = 0;
> + for (uint32_t i = 0; i < src->part_count; i++)
> + sz += src->parts[i].path_len;
> + sz = key_def_sizeof(src->part_count, sz);
> + struct key_def *res = (struct key_def *)calloc(1, sz);
No need to zero 'res' here. Use malloc() pls.
> if (res == NULL) {
> diag_set(OutOfMemory, sz, "malloc", "res");
> return NULL;
> }
> memcpy(res, src, sz);
> + /* Update paths to point to the new memory chunk.*/
> + for (uint32_t i = 0; i < src->part_count; i++) {
> + if (src->parts[i].path == NULL)
> + continue;
> + size_t path_offset = src->parts[i].path - (char *)src;
> + res->parts[i].path = (char *)res + path_offset;
> + }
> return res;
> }
>
> @@ -115,24 +138,39 @@ static void
> key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
> enum field_type type, enum on_conflict_action nullable_action,
> struct coll *coll, uint32_t coll_id,
> - enum sort_order sort_order)
> + enum sort_order sort_order, const char *path,
> + uint32_t path_len, char **paths)
path, paths - easy to mix up. I think we should rename paths to
path_pool or path_data or something like that here and everywhere
where this function is called.
> {
> assert(part_no < def->part_count);
> assert(type < field_type_MAX);
> def->is_nullable |= (nullable_action == ON_CONFLICT_ACTION_NONE);
> + def->has_json_paths |= path != NULL;
> def->parts[part_no].nullable_action = nullable_action;
> def->parts[part_no].fieldno = fieldno;
> def->parts[part_no].type = type;
> def->parts[part_no].coll = coll;
> def->parts[part_no].coll_id = coll_id;
> def->parts[part_no].sort_order = sort_order;
> + if (path != NULL) {
> + assert(paths != NULL);
> + def->parts[part_no].path = *paths;
> + *paths += path_len;
> + memcpy(def->parts[part_no].path, path, path_len);
> + def->parts[part_no].path_len = path_len;
> + } else {
> + def->parts[part_no].path = NULL;
> + def->parts[part_no].path_len = 0;
> + }
> column_mask_set_fieldno(&def->column_mask, fieldno);
> }
>
> struct key_def *
> key_def_new(const struct key_part_def *parts, uint32_t part_count)
> {
> - size_t sz = key_def_sizeof(part_count);
> + ssize_t sz = 0;
size_t
> + for (uint32_t i = 0; i < part_count; i++)
> + sz += parts[i].path != NULL ? strlen(parts[i].path) : 0;
> + sz = key_def_sizeof(part_count, sz);
> struct key_def *def = calloc(1, sz);
> if (def == NULL) {
> diag_set(OutOfMemory, sz, "malloc", "struct key_def");
> @@ -274,8 +336,15 @@ key_def_snprint_parts(char *buf, int size, const struct key_part_def *parts,
> for (uint32_t i = 0; i < part_count; i++) {
> const struct key_part_def *part = &parts[i];
> assert(part->type < field_type_MAX);
> - SNPRINT(total, snprintf, buf, size, "%d, '%s'",
> - (int)part->fieldno, field_type_strs[part->type]);
> + if (part->path != NULL) {
> + SNPRINT(total, snprintf, buf, size, "%d, '%s', '%s'",
> + (int)part->fieldno, field_type_strs[part->type],
> + part->path);
> + } else {
> + SNPRINT(total, snprintf, buf, size, "%d, '%s'",
> + (int)part->fieldno,
> + field_type_strs[part->type]);
> + }
SNPRINT(total, snprintf, buf, size, "[%d, '%s'",
(int)part->fieldno, field_type_strs[part->type]);
if (part->path != NULL)
SNPRINT(total, snprintf, buf, size, ", path='%s'",
part->path);
SNPRINT(total, snprintf, buf, size, "]");
> if (i < part_count - 1)
> SNPRINT(total, snprintf, buf, size, ", ");
> }
> @@ -485,6 +569,27 @@ key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
> "index part: unknown sort order");
> return -1;
> }
> + if (part->path != NULL) {
> + uint32_t path_len = strlen(part->path);
> + if (path_len > BOX_JSON_PATH_MAX) {
> + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
> + "JSON path is too long");
> + return -1;
> + }
Looking at the definition of BOX_JSON_PATH_MAX, I see that it's 512.
This limitation seems to be way too rigorous. Why do you need it,
anyway? To avoid truncating error messages? I don't think that anybody
will give a damn about that, and in case somebody does, we can simply
increase the size of the static buffer or even make it grow dynamically.
No point in limiting functionality because of that.
> + int rc = json_path_validate(part->path, path_len,
> + TUPLE_INDEX_BASE);
> + if (rc != 0) {
> + const char *err_msg =
> + tt_sprintf("invalid JSON path '%s': "
> + "error in path on "
> + "position %d", part->path,
> + rc);
I don't think we need to print the path as we already print the key part
number so the user can find it by himself. I wouldn't even print the
symbol number here - 'invalid path' would be enough.
> + diag_set(ClientError, ER_WRONG_INDEX_OPTIONS,
> + part->fieldno + TUPLE_INDEX_BASE,
> + err_msg);
> + return -1;
> + }
> + }
> }
> return 0;
> }
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index d1866303b..c6b7a8c74 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -82,6 +87,15 @@ struct key_part {
> enum on_conflict_action nullable_action;
> /** Part sort order. */
> enum sort_order sort_order;
> + /**
> + * JSON path to indexed data, relative to the field number,
> + * or NULL if this key part indexes a top-level field.
> + * This sting is not 0-terminated. Memory is allocated
s/sting/string
> + * at the end of key_def chunk.
> + */
> + char *path;
> + /** The length of JSON path. */
> + uint32_t path_len;
> };
>
> struct key_def;
> @@ -148,6 +162,8 @@ struct key_def {
> uint32_t unique_part_count;
> /** True, if at least one part can store NULL. */
> bool is_nullable;
> + /** True, if some key part has JSON path. */
Redundant comma before 'if'.
> + bool has_json_paths;
> /**
> * True, if some key parts can be absent in a tuple. These
> * fields assumed to be MP_NIL.
> @@ -255,9 +272,13 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count);
>
> /**
> * Dump part definitions of the given key def.
> + * Region is required to make allocations for JSON paths when some
> + * path present. JSON path strings are 0-terminated.
The region is used for allocating JSON paths, if any.
No need to say anything about 0-terminated strings - it implicitly
follows from the key_part_def definition.
> + * Return -1 on memory allocation error, 0 on success.
> */
> -void
> -key_def_dump_parts(const struct key_def *def, struct key_part_def *parts);
> +int
> +key_def_dump_parts(const struct key_def *def, struct key_part_def *parts,
> + struct region *region);
>
> /**
> * Update 'has_optional_parts' of @a key_def with correspondence
> @@ -303,7 +324,7 @@ key_def_encode_parts(char *data, const struct key_part_def *parts,
> int
> key_def_decode_parts(struct key_part_def *parts, uint32_t part_count,
> const char **data, const struct field_def *fields,
> - uint32_t field_count);
> + uint32_t field_count, struct region *region);
What's the region used for? Update the comment please.
>
> /**
> * Returns the part in index_def->parts for the specified fieldno.
> diff --git a/src/box/tuple_extract_key.cc b/src/box/tuple_extract_key.cc
> index ac8b5a44e..c40d7887d 100644
> --- a/src/box/tuple_extract_key.cc
> +++ b/src/box/tuple_extract_key.cc
> @@ -10,7 +10,8 @@ key_def_parts_are_sequential(const struct key_def *def, int i)
> {
> uint32_t fieldno1 = def->parts[i].fieldno + 1;
> uint32_t fieldno2 = def->parts[i + 1].fieldno;
> - return fieldno1 == fieldno2;
> + return fieldno1 == fieldno2 && def->parts[i].path == NULL &&
> + def->parts[i + 1].path == NULL;
This piece of code looks kinda weird now: we use fieldno1 and fieldno2
auxiliary variables, but we still dereference def->parts to check paths.
Let's rewrite it as
struct key_part *part1 = &def->parts[i];
struct key_part *part2 = &def->parts[i + 1];
return part1->fieldno + 1 == part2->fieldno &&
part1->path == NULL && part2->path == NULL;
> }
>
> /** True, if a key con contain two or more parts in sequence. */
> @@ -283,6 +285,22 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
> current_fieldno++;
> }
> }
> + const char *field_last, *field_end_last;
> + if (part->path != NULL) {
> + field_last = field;
> + field_end_last = field_end;
> + MAYBE_UNUSED int rc =
> + tuple_field_go_to_path(&field, part->path,
> + part->path_len);
I don't like using MAYBE_UNUSED when it could be easily avoided:
if (tuple_field_go_to_path(&field, part->path,
part->path_len) != 0)
unreachable();
> + /*
> + * All tuples must be valid as all
> + * integrity checks has already been
> + * passed.
have already passed.
> + */
> + assert(rc == 0);
> + field_end = field;
> + mp_next(&field_end);
> + }
> memcpy(key_buf, field, field_end - field);
> key_buf += field_end - field;
> if (has_optional_parts && null_count != 0) {
> @@ -291,6 +309,10 @@ tuple_extract_key_slowpath_raw(const char *data, const char *data_end,
> } else {
> assert(key_buf - key <= data_end - data);
> }
> + if (part->path != NULL) {
> + field = field_last;
> + field_end = field_end_last;
> + }
I don't like how you overwrite and then restore field/field_end here.
Let's rewrite the code without it:
const char *src = field;
const char *src_end = field_end
if (part->path != NULL) {
tuple_field_go_to_path(&src, ...);
mp_next(&src_end);
}
memcpy(key_buf, src, src_end - src);
key_buf += src_end - src;
> }
> if (key_size != NULL)
> *key_size = (uint32_t)(key_buf - key);
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index e11b4e6f3..c81c23fd1 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -66,12 +67,88 @@ tuple_field_delete(struct tuple_field *field)
>
> /** Return path to a tuple field. Used for error reporting. */
> static const char *
> -tuple_field_path(const struct tuple_field *field)
> +tuple_field_path(const struct tuple_field *field, bool json_only)
This extra argument looks horrible. Please remove it. This function is
used only for error reporting according to the comment so you don't need
to format the output as a valid json path in case of a top-level field.
If somewhere else you need to do that, you should use
json_tree_snprint_path directly there.
> {
> assert(field->token.parent != NULL);
> - assert(field->token.parent->parent == NULL);
> - assert(field->token.type == JSON_TOKEN_NUM);
> - return int2str(field->token.num + TUPLE_INDEX_BASE);
> + char *path;
> + if (!json_only && field->token.parent->type == JSON_TOKEN_END) {
> + assert(field->token.type == JSON_TOKEN_NUM);
> + path = int2str(field->token.num + TUPLE_INDEX_BASE);
> + } else {
> + path = tt_static_buf();
> + MAYBE_UNUSED int rc =
> + json_tree_snprint_path(path, TT_STATIC_BUF_LEN,
> + &field->token, TUPLE_INDEX_BASE);
> + assert(rc > 0 && rc < TT_STATIC_BUF_LEN);
We don't really care if an error message gets truncated so these checks
are not needed:
if (field->token.parent->parent == NULL) {
/* Top-level field, no need to format the path. */
return int2str(field->token.num + TUPLE_INDEX_BASE);
}
char *path = tt_static_buf();
json_tree_snprint_path(path, TT_STATIC_BUF_LEN,
&field->token, TUPLE_INDEX_BASE);
return path;
> + }
> + return path;
> +}
> +
> +/**
> + * Add corresponding format:fields for specified JSON path.
Please use either double colon :: or -> or . when you denote accessing
a struct member in comments, because single colon : is meaningless in
terms of C/C++.
Anyway, I'd rather rewrite the comment in plain English:
Given a field number and a path, add the corresponding tuple field
to the tuple format, allocating intermediate fields if necessary.
> + * Return a pointer to the leaf field on success, NULL on memory
> + * allocation error or type/nullability mistmatch error, diag
> + * message is set.
> + */
> +static struct tuple_field *
> +tuple_field_tree_add_path(struct tuple_format *format, const char *path,
The function name is confusing as there's no such thing as
tuple_field_tree.
Should be called tuple_format_add_field, I guess.
> + uint32_t path_len, uint32_t fieldno)
Semantically, fieldno should go before path in the argument list.
> +{
> + int rc = 0;
> + struct json_tree *tree = &format->fields;
> + struct tuple_field *parent = tuple_format_field(format, fieldno);
> + struct tuple_field *field = tuple_field_new();
> + if (field == NULL)
> + goto fail;
> +
> + struct json_lexer lexer;
> + uint32_t token_count = 0;
> + json_lexer_create(&lexer, path, path_len, TUPLE_INDEX_BASE);
> + while ((rc = json_lexer_next_token(&lexer, &field->token)) == 0 &&
> + field->token.type != JSON_TOKEN_END) {
> + enum field_type expected_type =
> + field->token.type == JSON_TOKEN_STR ?
> + FIELD_TYPE_MAP : FIELD_TYPE_ARRAY;
> + if (field_type1_contains_type2(parent->type, expected_type)) {
> + parent->type = expected_type;
> + } else if (!field_type1_contains_type2(expected_type,
> + parent->type)) {
Why the second if()?
> + diag_set(ClientError, ER_INDEX_PART_TYPE_MISMATCH,
> + tuple_field_path(parent, false),
> + field_type_strs[parent->type],
> + field_type_strs[expected_type]);
> + goto fail;
> + }
> + struct tuple_field *next =
> + json_tree_lookup_entry(tree, &parent->token,
> + &field->token,
> + struct tuple_field, token);
> + if (next == NULL) {
> + rc = json_tree_add(tree, &parent->token, &field->token);
> + if (rc != 0) {
> + diag_set(OutOfMemory, sizeof(struct json_token),
> + "json_tree_add", "tree");
> + goto fail;
> + }
> + next = field;
> + field = tuple_field_new();
> + if (field == NULL)
> + goto fail;
> + }
> + parent = next;
> + token_count++;
> + }
> + /* Path has been verified key_def_decode_parts. */
by key_def_decode_parts
> + assert(rc == 0 && field->token.type == JSON_TOKEN_END);
> + assert(parent != NULL);
> + /* Update tree depth information. */
> + format->max_path_tokens = MAX(format->max_path_tokens, token_count + 1);
This is confusing - the variable name implies that it denotes the
maximal number of tokens among JSON paths, but actually it also
includes the top level field. We need to either remove +1 or rename
the variable (tuple_format::fields_depth or something like that).
> +end:
> + tuple_field_delete(field);
> + return parent;
> +fail:
> + parent = NULL;
> + goto end;
> }
>
> /**
> @@ -95,10 +172,25 @@ tuple_format_field_by_id(struct tuple_format *format, uint32_t id)
> static int
> tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count,
> const struct key_part *part, bool is_sequential,
> - int *current_slot)
> + int *current_slot, char **paths)
> {
> assert(part->fieldno < tuple_format_field_count(format));
> - struct tuple_field *field = tuple_format_field(format, part->fieldno);
> + struct tuple_field *field;
> + if (part->path == NULL) {
> + field = tuple_format_field(format, part->fieldno);
> + } else {
> + assert(!is_sequential);
> + /**
> + * Copy JSON path data to reserved area at the
> + * end of format allocation.
> + */
> + memcpy(*paths, part->path, part->path_len);
> + field = tuple_field_tree_add_path(format, *paths, part->path_len,
> + part->fieldno);
> + if (field == NULL)
> + return -1;
> + *paths += part->path_len;
> + }
I don't like this branching. AFAIU the code for path != NULL would work
even if path == NULL.
> /*
> * If a field is not present in the space format,
> * inherit nullable action of the first key part
While you are at it, please fix indentation here. No need to submit such
a small collateral change separately, do it in this patch please.
> @@ -158,13 +250,93 @@ tuple_format_use_key_part(struct tuple_format *format, uint32_t field_count,
> * simply accessible, so we don't store an offset for it.
> */
> if (field->offset_slot == TUPLE_OFFSET_SLOT_NIL &&
> - is_sequential == false && part->fieldno > 0) {
> + is_sequential == false &&
> + (part->fieldno > 0 || part->path != NULL)) {
> *current_slot = *current_slot - 1;
> field->offset_slot = *current_slot;
> }
> return 0;
> }
>
> +/**
> + * Get format:field parent field_type.
> + * This routine is required as first-level fields has no parent
> + * field so it could not be retrieved with json_tree_entry.
> + */
> +static enum field_type
> +tuple_format_field_parent_type(struct tuple_format *format,
> + struct tuple_field *field)
I don't like this auxiliary function, because its applicability is
quite limited IMO. Since it's used in the only place, why don't you
simply inline it there?
> +{
> + struct json_token *parent = field->token.parent;
> + if (parent == &format->fields.root)
> + return FIELD_TYPE_ARRAY;
> + return json_tree_entry(parent, struct tuple_field, token)->type;
> +}
> +
> +uint32_t
> +tuple_format_stmt_encode(struct tuple_format *format, char **offset,
> + char *tuple_raw, uint32_t *field_map,
> + struct iovec *iov)
You either pass all NULLs to this function so as to estimate the
resulting msgpack size or don't use the return value at all. This
doesn't look good. Semantically, there are two functions. Please
split.
The part that encodes a tuple should stay in vy_stmt.c IMO. In fact,
I don't think there's any point in factoring this code out of
vy_stmt_new_surrogate_from_key(), because without the rest of it it's
quite difficult to understand why it exists.
The part that calculates the size shouldn't have stmt in its name,
because what it actually does is calculate the size of a tuple matching
the format and filled with nulls.
> +{
> + bool write = offset != NULL;
> + uint32_t size = 0;
> + struct tuple_field *field;
> + json_tree_foreach_entry_preorder(field, &format->fields.root,
> + struct tuple_field, token) {
> + enum field_type parent_type =
> + tuple_format_field_parent_type(format, field);
> + if (parent_type == FIELD_TYPE_ARRAY &&
Do you really need to know parent_type here? I think that instead you
could simply check field->token.type: NUM means array, STR means map.
> + field->token.sibling_idx > 0) {
> + /*
> + * Write nil istead of omitted array
> + * members.
> + */
> + struct json_token **neighbors =
> + field->token.parent->children;
> + for (uint32_t i = field->token.sibling_idx - 1;
I'd use token.num here instead if possible - would be consistent with
the fact that you use token.str below.
> + neighbors[i] == NULL && i > 0; i--) {
> + if (write)
> + *offset = mp_encode_nil(*offset);
> + size += mp_sizeof_nil();
> + }
Could be rewritten without the extra check for (sibling_idx > 0) above,
making the code a little bit more straightforward:
if (field->token.type == JSON_TOKEN_NUM) {
/* Fill missing array slots with nils. */
for (int i = field->token.num - 1;
i > 0 && neighbors[i] == NULL; i--) {
...
}
}
> + } else if (parent_type == FIELD_TYPE_MAP) {
> + /* Write map key string. */
> + const char *str = field->token.str;
> + uint32_t len = field->token.len;
> + if (write)
> + *offset = mp_encode_str(*offset, str, len);
> + size += mp_sizeof_str(len);
> + }
> + /* Fill data. */
> + uint32_t children_cnt = field->token.max_child_idx + 1;
It's not child count - the number of children can be less. Let's please
avoid introducing this extra variable and use max_child_idx+1 directly -
if you split this function the code will look even better without it.
> + if (json_token_is_leaf(&field->token)) {
> + if (!write || iov[field->id].iov_len == 0) {
> + if (write)
> + *offset = mp_encode_nil(*offset);
> + size += mp_sizeof_nil();
> + } else {
> + memcpy(*offset, iov[field->id].iov_base,
> + iov[field->id].iov_len);
> + uint32_t data_offset = *offset - tuple_raw;
> + int32_t slot = field->offset_slot;
> + if (slot != TUPLE_OFFSET_SLOT_NIL)
> + field_map[slot] = data_offset;
> + *offset += iov[field->id].iov_len;
> + size += iov[field->id].iov_len;
> + }
> + } else if (field->type == FIELD_TYPE_ARRAY) {
> + if (write)
> + *offset = mp_encode_array(*offset, children_cnt);
> + size += mp_sizeof_array(children_cnt);
> + } else if (field->type == FIELD_TYPE_MAP) {
> + if (write)
> + *offset = mp_encode_map(*offset, children_cnt);
> + size += mp_sizeof_map(children_cnt);
> + }
> + }
> + return size;
> +}
> +
> /**
> * Extract all available type info from keys and field
> * definitions.
> @@ -236,9 +414,12 @@ tuple_format_create(struct tuple_format *format, struct key_def * const *keys,
> "malloc", "required field bitmap");
> return -1;
> }
> + uint32_t id = 0;
> struct tuple_field *field;
> json_tree_foreach_entry_preorder(field, &format->fields.root,
> struct tuple_field, token) {
> + /* Set the unique field identifier. */
> + field->id = id++;
This should be done in tuple_format_add_field. You can post-inc
total_field_count there to allocate a new id.
And as I've already told you, it's no good moving pieces of code around
in patches of the same series. If you intended to allocate an id here in
the first place you should have added this code in the patch that
introduced field->id. Now, you should follow the pattern you laid in
that patch and allocate field->id when a field is added.
> /*
> * Mark all leaf non-nullable fields as required
> * by setting the corresponding bit in the bitmap
> @@ -317,6 +502,8 @@ static struct tuple_format *
> tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
> uint32_t space_field_count, struct tuple_dictionary *dict)
> {
> + /* Size of area to store paths. */
> + uint32_t paths_size = 0;
Just like I don't like 'paths', because it can be confused with 'path',
I don't like paths_size. Can't come up with a better name though now.
Think of the name please and give me some options.
> uint32_t index_field_count = 0;
> /* find max max field no */
> for (uint16_t key_no = 0; key_no < key_count; ++key_no) {
> @@ -368,6 +556,8 @@ tuple_format_alloc(struct key_def * const *keys, uint16_t key_count,
> }
> format->total_field_count = field_count;
> format->required_fields = NULL;
> + format->max_path_tokens = 1;
> + format->vy_stmt_size = UINT32_MAX;
Why? Should be 0, I think.
> format->refs = 0;
> format->id = FORMAT_ID_NIL;
> format->index_field_count = index_field_count;
> @@ -428,15 +618,22 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
> {
> if (format1->exact_field_count != format2->exact_field_count)
> return false;
> - uint32_t format1_field_count = tuple_format_field_count(format1);
> - uint32_t format2_field_count = tuple_format_field_count(format2);
> - for (uint32_t i = 0; i < format1_field_count; ++i) {
> - struct tuple_field *field1 = tuple_format_field(format1, i);
> + struct tuple_field *field1;
> + json_tree_foreach_entry_preorder(field1, &format1->fields.root,
> + struct tuple_field, token) {
> +next:;
Please, don't use :; - it looks weird.
> + const char *path = tuple_field_path(field1, true);
> + struct tuple_field *field2 =
> + json_tree_lookup_path_entry(&format2->fields,
> + &format2->fields.root,
> + path, strlen(path),
> + TUPLE_INDEX_BASE,
> + struct tuple_field, token);
Use json_tree_snprint_path directly here. Allocating a buffer on the
region for it is OK. Panicking on memory allocation error is OK as well.
> /*
> * The field has a data type in format1, but has
> * no data type in format2.
> */
> - if (i >= format2_field_count) {
> + if (field2 == NULL) {
> /*
> * The field can get a name added
> * for it, and this doesn't require a data
> @@ -447,12 +644,22 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
> * NULLs or miss the subject field.
> */
> if (field1->type == FIELD_TYPE_ANY &&
> - tuple_field_is_nullable(field1))
> - continue;
> - else
> + tuple_field_is_nullable(field1)) {
> + /* Skip subtree. */
> + struct json_token *token = &field1->token;
> + struct json_token *parent = token->parent;
> + field1 = json_tree_child_next_entry(parent,
Skipping some entries while iterating over a tree looks precarious.
It feels like it may break the iterator assumption, even if it actually
doesn't.
I don't think it's worth it - AFAIU you could simply continue the loop
instead. That would be suboptimal, but then again this is a cold path.
> + token,
> + struct
> + tuple_field,
> + token);
BTW there's no strong requirement to align function arguments by
parenthesis. You could write
field1 = json_tree_child_next_entry(parent,
token, struct tuple_field, token);
and that would be OK.
> + if (field1 == NULL)
> + break;
> + goto next;
> + } else {
> return false;
> + }
> }
> - struct tuple_field *field2 = tuple_format_field(format2, i);
> if (! field_type1_contains_type2(field1->type, field2->type))
> return false;
> /*
> @@ -466,6 +673,90 @@ tuple_format1_can_store_format2_tuples(struct tuple_format *format1,
> return true;
> }
>
> +/**
> + * Descriptor of the parsed msgpack frame.
> + * Due to the fact that the msgpack has nested structures whose
> + * length is stored in the frame header at the blob beginning, we
What's blob?
> + * need to be able to determine that we have finished parsing the
s/that/if
> + * current component and should move on to the next one.
> + * For this purpose a stack of disassembled levels is organized,
Disassembled levels?
> + * where the type of the level, the total number of elements,
> + * and the number of elements that have already been parsed are
> + * stored.
> + */
> +struct mp_frame {
> + /** JSON token type representing frame data structure. */
> + enum json_token_type child_type;
> + /** Total count of MP members to process. */
> + uint32_t total;
> + /** Count of MP elements that already have parseed. */
> + uint32_t curr;
> +};
It feels like this thing should be independent of json/tuple stuff and
be a part of the msgpuck library, because what it actually does is
decode mp data providing the caller with array index / map key of each
decoded entry. Please think whether we can make it so.
> +
> +/**
> + * Emit token to analyze and do msgpack pointer shift using top
> + * mp_stack frame. Return 0 on success, -1 when analyse step must
> + * be skipped (on usuported term detection).
> + */
> +static int
> +mp_frame_parse(struct mp_frame *mp_stack, uint32_t mp_stack_idx,
> + const char **pos, struct json_token *token)
> +{
> + token->type = mp_stack[mp_stack_idx].child_type;
> + ++mp_stack[mp_stack_idx].curr;
> + if (token->type == JSON_TOKEN_NUM) {
> + token->num = mp_stack[mp_stack_idx].curr - TUPLE_INDEX_BASE;
> + } else if (token->type == JSON_TOKEN_STR) {
> + if (mp_typeof(**pos) != MP_STR) {
> + /* Skip key. */
> + mp_next(pos);
> + return -1;
> + }
> + token->str = mp_decode_str(pos, (uint32_t *)&token->len);
> + } else {
> + unreachable();
> + }
> + return 0;
> +}
> +
> +/**
> + * Prepare mp_frame for futher iterations. Store container length
> + * and child_type. Update parent token pointer and shift msgpack
> + * pointer.
> + */
> +static int
> +mp_frame_prepare(struct mp_frame *mp_stack, uint32_t *mp_stack_idx,
> + uint32_t mp_stack_total, struct json_token *token,
> + const char **pos, struct json_token **parent)
I don't quite like the names, nor the API. Please put more thoughts in
it in the scope of moving this thing to the msgpuck library. Gimme some
options, please.
> +{
> + enum mp_type type = mp_typeof(**pos);
> + if (token != NULL && *mp_stack_idx + 1 < mp_stack_total &&
> + (type == MP_MAP || type == MP_ARRAY)) {
> + uint32_t size = type == MP_ARRAY ? mp_decode_array(pos) :
> + mp_decode_map(pos);
> + if (size == 0)
> + return 0;
> + *parent = token;
> + enum json_token_type child_type =
> + type == MP_ARRAY ? JSON_TOKEN_NUM : JSON_TOKEN_STR;
> + *mp_stack_idx = *mp_stack_idx + 1;
> + mp_stack[*mp_stack_idx].child_type = child_type;
> + mp_stack[*mp_stack_idx].total = size;
> + mp_stack[*mp_stack_idx].curr = 0;
> + } else {
> + mp_next(pos);
> + while (mp_stack[*mp_stack_idx].curr >=
> + mp_stack[*mp_stack_idx].total) {
Bad indentation.
> + assert(*parent != NULL);
> + *parent = (*parent)->parent;
> + if (*mp_stack_idx == 0)
> + return -1;
> + *mp_stack_idx = *mp_stack_idx - 1;
> + }
> + }
> + return 0;
> +}
> +
> /** @sa declaration for details. */
> int
> tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
> @@ -512,49 +803,64 @@ tuple_init_field_map(struct tuple_format *format, uint32_t *field_map,
> /* Empty tuple, nothing to do. */
> goto skip;
> }
> - /* first field is simply accessible, so we do not store offset to it */
> - struct tuple_field *field = tuple_format_field(format, 0);
> - if (validate &&
> - !field_mp_type_is_compatible(field->type, mp_typeof(*pos),
> - tuple_field_is_nullable(field))) {
> - diag_set(ClientError, ER_FIELD_TYPE, tuple_field_path(field),
> - field_type_strs[field->type]);
> - goto error;
> - }
> - if (required_fields != NULL)
> - bit_clear(required_fields, field->id);
> - mp_next(&pos);
> - /* other fields...*/
> - uint32_t i = 1;
> uint32_t defined_field_count = MIN(field_count, validate ?
> tuple_format_field_count(format) :
> format->index_field_count);
I think that decoding the top level should be done with the same mp
helpers that are used for decoding nested levels. Please think whether
we can achieve that.
> - if (field_count < format->index_field_count) {
> - /*
> - * Nullify field map to be able to detect by 0,
> - * which key fields are absent in tuple_field().
> - */
> - memset((char *)field_map - format->field_map_size, 0,
> - format->field_map_size);
> + /*
> + * Nullify field map to be able to detect by 0,
> + * which key fields are absent in tuple_field().
> + */
> + memset((char *)field_map - format->field_map_size, 0,
> + format->field_map_size);
> + uint32_t mp_stack_size =
> + format->max_path_tokens * sizeof(struct mp_frame);
> + struct mp_frame *mp_stack = region_alloc(region, mp_stack_size);
> + if (mp_stack == NULL) {
> + diag_set(OutOfMemory, mp_stack_size, "region_alloc",
> + "mp_stack");
> + goto error;
> }
> - for (; i < defined_field_count; ++i) {
> - field = tuple_format_field(format, i);
> - if (validate &&
> - !field_mp_type_is_compatible(field->type, mp_typeof(*pos),
> - tuple_field_is_nullable(field))) {
> - diag_set(ClientError, ER_FIELD_TYPE,
> - tuple_field_path(field),
> - field_type_strs[field->type]);
> - goto error;
> + struct tuple_field *field;
> + mp_stack[0].child_type = JSON_TOKEN_NUM;
> + mp_stack[0].total = defined_field_count;
> + mp_stack[0].curr = 0;
> + uint32_t mp_stack_idx = 0;
> + struct json_tree *tree = (struct json_tree *)&format->fields;
As I've already asked you, please remove all no-op type conversions left
from the time when tuple_format was const.
> + struct json_token *parent = &tree->root;
> + while (mp_stack[0].curr <= mp_stack[0].total) {
> + struct json_token token;
> + if (mp_frame_parse(mp_stack, mp_stack_idx, &pos, &token) != 0) {
> + /* Unsupported token. */
> + goto finish_frame;
> }
> - if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> - field_map[field->offset_slot] =
> - (uint32_t) (pos - tuple);
Pointless type conversion?
> + field = json_tree_lookup_entry(tree, parent, &token,
> + struct tuple_field, token);
> + if (field != NULL) {
> + bool is_nullable = tuple_field_is_nullable(field);
> + if (validate &&
> + !field_mp_type_is_compatible(field->type,
> + mp_typeof(*pos),
> + is_nullable) != 0) {
> + diag_set(ClientError, ER_FIELD_TYPE,
> + tuple_field_path(field, false),
> + field_type_strs[field->type]);
> + goto error;
> + }
> + if (field->offset_slot != TUPLE_OFFSET_SLOT_NIL) {
> + field_map[field->offset_slot] =
> + (uint32_t)(pos - tuple);
Ditto.
> + }
> + if (required_fields != NULL)
> + bit_clear(required_fields, field->id);
> }
> - if (required_fields != NULL)
> - bit_clear(required_fields, field->id);
> - mp_next(&pos);
> - }
> +finish_frame:
> + /* Prepare stack info for next iteration. */
> + if (mp_frame_prepare(mp_stack, &mp_stack_idx,
> + format->max_path_tokens,
> + field != NULL ? &field->token : NULL,
> + &pos, &parent) != 0)
> + break;
> + };
> skip:
> /*
> * Check the required field bitmap for missing fields.
> @@ -820,3 +1118,30 @@ error:
> tt_sprintf("error in path on position %d", rc));
> return -1;
> }
> +
> +int
> +tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data,
> + const uint32_t *field_map,
> + struct key_part *part, const char **raw)
You never use the return value so why don't you return void * instead,
like tuple_field_raw() does?
> +{
> + assert(part->path != NULL);
> + struct tuple_field *field =
> + tuple_format_field_by_path(format, part->fieldno, part->path,
> + part->path_len);
> + if (field != NULL) {
> + int32_t offset_slot = field->offset_slot;
> + assert(-offset_slot * sizeof(uint32_t) <=
> + format->field_map_size);
> + *raw = field_map[offset_slot] == 0 ?
> + NULL : data + field_map[offset_slot];
> + return 0;
> + }
> + /*
> + * Format doesn't have field representing specified part.
> + * Make slow tuple parsing.
> + */
> + *raw = tuple_field_raw(format, data, field_map, part->fieldno);
> + if (*raw == NULL)
> + return 0;
> + return tuple_field_go_to_path(raw, part->path, part->path_len);
> +}
> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index 30b93b610..3b630c3bb 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -185,6 +186,15 @@ struct tuple_format {
> * Shared names storage used by all formats of a space.
> */
> struct tuple_dictionary *dict;
> + /**
> + * A maximum depth of format:fields subtree.
> + */
> + uint32_t max_path_tokens;
> + /**
> + * The size of the secondary key built for format:fields
> + * with all leaf records set to nil.
> + */
> + uint32_t vy_stmt_size;
Bad name - it shouldn't refer to vinyl or secondary indexes. It's just
the size of a minimal tuple conforming to the format and filled with
nils. Please think of a better name.
> /**
> * Fields comprising the format, organized in a tree.
> * First level nodes correspond to tuple fields.
> @@ -451,7 +510,16 @@ static inline const char *
> tuple_field_by_part_raw(struct tuple_format *format, const char *data,
> const uint32_t *field_map, struct key_part *part)
> {
> - return tuple_field_raw(format, data, field_map, part->fieldno);
> + if (likely(part->path == NULL)) {
> + return tuple_field_raw(format, data, field_map, part->fieldno);
> + } else {
> + const char *raw;
> + MAYBE_UNUSED int rc =
> + tuple_field_by_part_raw_slowpath(format, data,
> + field_map, part, &raw);
So now we have
- tuple_field_raw(fieldno) - which looks up tuple_field in
tuple_format by fieldno and uses field map for a quick lookup;
if not found decodes msgpack.
- tuple_field_raw_by_path(path) - which looks up a field by full path
(including fieldno or field name). Doesn't use field map.
- tuple_field_by_part_raw(part) - which is equivalent to
tuple_field_raw(part->fieldno) if part->path is NULL, otherwise it
looks up tuple_field in tuple_format by fieldno and path. If found,
it uses field map for a quick lookup, otherwise it decodes msgpack
with the aid of tuple_field_go_to_path().
In the following patches you make tuple_field_raw_by_path() use
tuple_field_by_part_raw() passing a fake key_part, which looks kinda
ugly.
I expect tuple_field_by_part() to use tuple_field_by_path(), not vice
versa. Please refactor the code. May be, you'll need a separate patch.
> + assert(rc == 0);
> + return raw;
> + }
> }
>
> #if defined(__cplusplus)
> diff --git a/src/box/vy_log.c b/src/box/vy_log.c
> index c9e0713c8..6fc051648 100644
> --- a/src/box/vy_log.c
> +++ b/src/box/vy_log.c
> @@ -581,9 +581,11 @@ vy_log_record_decode(struct vy_log_record *record,
> record->group_id = mp_decode_uint(&pos);
> break;
> case VY_LOG_KEY_DEF: {
> + struct region *region = &fiber()->gc;
> uint32_t part_count = mp_decode_array(&pos);
> - struct key_part_def *parts = region_alloc(&fiber()->gc,
> - sizeof(*parts) * part_count);
> + struct key_part_def *parts =
> + region_alloc(region,
> + sizeof(*parts) * part_count);
TBO I hate when one carries the rvalue to another line in an assignment,
like this
int very_long_name =
very_long_initializer;
Looks ugly and complicates grepping.
In this particular case there's no point to do that, but you seem to be
quite obsessed with aligning function arguments by the parenthesis for
some reason :-)
> if (parts == NULL) {
> diag_set(OutOfMemory,
> sizeof(*parts) * part_count,
> diff --git a/src/box/vy_point_lookup.c b/src/box/vy_point_lookup.c
> index ddbc2d46f..14e0c0c93 100644
> --- a/src/box/vy_point_lookup.c
> +++ b/src/box/vy_point_lookup.c
> @@ -196,8 +196,6 @@ vy_point_lookup(struct vy_lsm *lsm, struct vy_tx *tx,
> const struct vy_read_view **rv,
> struct tuple *key, struct tuple **ret)
> {
> - assert(tuple_field_count(key) >= lsm->cmp_def->part_count);
> -
Quoting myself from the previous review round:
} I don't like that you simply throw away this assertion. Please think how
} we can keep it.
( see https://www.freelists.org/post/tarantool-patches/PATCH-v5-59-box-introduce-JSON-indexes,1 )
You haven't answered me back then, but in the new version you're
still trying to remove this assertion...
> *ret = NULL;
> double start_time = ev_monotonic_now(loop());
> int rc = 0;
> diff --git a/src/box/vy_stmt.c b/src/box/vy_stmt.c
> index 47f135c65..7a302e6f3 100644
> --- a/src/box/vy_stmt.c
> +++ b/src/box/vy_stmt.c
> @@ -385,26 +385,43 @@ vy_stmt_new_surrogate_from_key(const char *key, enum iproto_type type,
> struct region *region = &fiber()->gc;
>
> uint32_t field_count = format->index_field_count;
> - struct iovec *iov = region_alloc(region, sizeof(*iov) * field_count);
> + uint32_t iov_sz =
> + sizeof(struct iovec) * format->total_field_count;
Why carry it to the next line? It fits.
> + struct iovec *iov = region_alloc(region, iov_sz);
> if (iov == NULL) {
> - diag_set(OutOfMemory, sizeof(*iov) * field_count,
> - "region", "iov for surrogate key");
> + diag_set(OutOfMemory, iov_sz, "region_alloc",
> + "iov for surrogate key");
We usually pass "region", not "region_alloc" to OutOfMemory.
Why change it here?
> return NULL;
> }
> - memset(iov, 0, sizeof(*iov) * field_count);
> + memset(iov, 0, iov_sz);
> uint32_t part_count = mp_decode_array(&key);
> assert(part_count == cmp_def->part_count);
> - assert(part_count <= field_count);
> - uint32_t nulls_count = field_count - cmp_def->part_count;
> + assert(part_count <= format->total_field_count);
> + /**
> + * The format:vy_stmt_size contains a size of
the size
> + * stmt tuple having all leaf fields set to null.
> + * Calculate bsize as vy_stmt_size where parts_count
> + * nulls replaced with extracted keys.
> + */
> uint32_t bsize = mp_sizeof_array(field_count) +
Isn't mp_sizeof_array(field_count) included into vy_stmt_size?
It should be.
> - mp_sizeof_nil() * nulls_count;
> + format->vy_stmt_size - mp_sizeof_nil() * part_count;
> for (uint32_t i = 0; i < part_count; ++i) {
> const struct key_part *part = &cmp_def->parts[i];
> assert(part->fieldno < field_count);
> + struct tuple_field *field;
> + if (part->path != NULL) {
> + field = tuple_format_field_by_path(format,
> + part->fieldno,
> + part->path,
> + part->path_len);
> + } else {
> + field = tuple_format_field(format, part->fieldno);
> + }
tuple_format_field_by_part() helper would be appreciated.
> + assert(field != NULL);
> const char *svp = key;
> - iov[part->fieldno].iov_base = (char *) key;
> + iov[field->id].iov_base = (char *) key;
> mp_next(&key);
> - iov[part->fieldno].iov_len = key - svp;
> + iov[field->id].iov_len = key - svp;
> bsize += key - svp;
> }
>
> diff --git a/test/engine/json.result b/test/engine/json.result
> new file mode 100644
> index 000000000..711f7f256
> --- /dev/null
> +++ b/test/engine/json.result
> @@ -0,0 +1,448 @@
> +test_run = require('test_run').new()
> +---
> +...
> +engine = test_run:get_cfg('engine')
> +---
> +...
> +--
> +-- gh-1012: Indexes for JSON-defined paths.
> +--
> +s = box.schema.space.create('withdata', {engine = engine})
> +---
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO["fname"]'}, {3, 'str', path = '["FIO"].fname'}}})
> +---
> +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': same key
> + part is indexed twice'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 666}, {3, 'str', path = '["FIO"]["fname"]'}}})
> +---
> +- error: 'Wrong index options (field 2): ''path'' must be string'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'map', path = 'FIO'}}})
> +---
> +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
> + ''map'' is not supported'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'array', path = '[1]'}}})
> +---
> +- error: 'Can''t create or modify index ''test1'' in space ''withdata'': field type
> + ''array'' is not supported'
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO'}, {3, 'str', path = 'FIO.fname'}}})
> +---
> +- error: Field [3]["FIO"] has type 'string' in one index, but type 'map' in another
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = '[1].sname'}, {3, 'str', path = '["FIO"].fname'}}})
> +---
> +- error: Field 3 has type 'array' in one index, but type 'map' in another
> +...
> +s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO....fname'}}})
> +---
> +- error: 'Wrong index options (field 3): invalid JSON path ''FIO....fname'': error
> + in path on position 5'
> +...
> +idx = s:create_index('test1', {parts = {{2, 'number'}, {3, 'str', path = 'FIO.fname', is_nullable = false}, {3, 'str', path = '["FIO"]["sname"]'}}})
> +---
> +...
> +assert(idx ~= nil)
No point in using assert() in a test.
I'll look at the test later, when you fix the code according to my
comments.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v7 5/5] box: specify indexes in user-friendly form
2019-01-09 8:29 ` [PATCH v7 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
@ 2019-01-10 10:21 ` Vladimir Davydov
0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-01-10 10:21 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja
On Wed, Jan 09, 2019 at 11:29:40AM +0300, Kirill Shcherbatov wrote:
> Implemented a more convenient interface for creating an index
> by JSON path. Instead of specifying fieldno and relative path
> it comes possible to pass full JSON path to data.
>
> Closes #1012
>
> @TarantoolBot document
> Title: Indexes by JSON path
> Sometimes field data could have complex document structure.
> When this structure is consistent across whole document,
> you are able to create an index by JSON path.
>
> Example:
> s = box.schema.space.create('sample')
> format = {{'id', 'unsigned'}, {'data', 'map'}}
> s:format(format)
> -- explicit JSON index creation
> age_idx = s:create_index('age', {{2, 'number', path = ".age"}})
> -- user-friendly syntax for JSON index creation
> parts = {{'data.FIO["fname"]', 'str'}, {'data.FIO["sname"]', 'str'},
> {'data.age', 'number'}}
> info_idx = s:create_index('info', {parts = parts}})
> s:insert({1, {FIO={fname="James", sname="Bond"}, age=35}})
> ---
> src/box/lua/index.c | 64 +++++++++++++++++++++++++++++++++++++++
> src/box/lua/schema.lua | 22 +++++++-------
> test/engine/json.result | 28 +++++++++++++++++
> test/engine/json.test.lua | 8 +++++
> 4 files changed, 111 insertions(+), 11 deletions(-)
>
> diff --git a/src/box/lua/index.c b/src/box/lua/index.c
> index 6265c044a..9b04c5d9a 100644
> --- a/src/box/lua/index.c
> +++ b/src/box/lua/index.c
> @@ -35,6 +35,9 @@
> #include "box/box.h"
> #include "box/index.h"
> #include "box/lua/tuple.h"
> +#include "box/schema.h"
> +#include "box/tuple_format.h"
> +#include "json/json.h"
> #include "box/lua/misc.h" /* lbox_encode_tuple_on_gc() */
>
> /** {{{ box.index Lua library: access to spaces and indexes
> @@ -328,6 +331,66 @@ lbox_index_compact(lua_State *L)
> return 0;
> }
>
> +/**
> + * Resolve field index by absolute JSON path first component and
> + * return relative JSON path.
> + */
> +static int
> +lbox_index_path_resolve(struct lua_State *L)
> +{
> + if (lua_gettop(L) != 3 ||
> + !lua_isnumber(L, 1) || !lua_isnumber(L, 2) || !lua_isstring(L, 3)) {
> + return luaL_error(L, "Usage box.internal."
> + "path_resolve(part_id, space_id, path)");
> + }
> + uint32_t part_id = lua_tonumber(L, 1);
> + uint32_t space_id = lua_tonumber(L, 2);
Are you kidding me? You silently ignored my comments to this patch
TWICE!
https://www.freelists.org/post/tarantool-patches/PATCH-v6-88-box-specify-indexes-in-userfriendly-form,1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v7 4/5] box: introduce offset_slot cache in key_part
2019-01-09 8:29 ` [PATCH v7 4/5] box: introduce offset_slot cache in key_part Kirill Shcherbatov
@ 2019-01-10 11:28 ` Vladimir Davydov
0 siblings, 0 replies; 9+ messages in thread
From: Vladimir Davydov @ 2019-01-10 11:28 UTC (permalink / raw)
To: Kirill Shcherbatov; +Cc: tarantool-patches, kostja
On Wed, Jan 09, 2019 at 11:29:39AM +0300, Kirill Shcherbatov wrote:
> diff --git a/src/box/key_def.c b/src/box/key_def.c
> index 3012b05df..328954296 100644
> --- a/src/box/key_def.c
> +++ b/src/box/key_def.c
> @@ -139,7 +139,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
> enum field_type type, enum on_conflict_action nullable_action,
> struct coll *coll, uint32_t coll_id,
> enum sort_order sort_order, const char *path,
> - uint32_t path_len, char **paths)
> + uint32_t path_len, char **paths, int32_t offset_slot_cache,
> + uint64_t format_epoch)
> {
> assert(part_no < def->part_count);
> assert(type < field_type_MAX);
> @@ -151,6 +152,8 @@ key_def_set_part(struct key_def *def, uint32_t part_no, uint32_t fieldno,
> def->parts[part_no].coll = coll;
> def->parts[part_no].coll_id = coll_id;
> def->parts[part_no].sort_order = sort_order;
> + def->parts[part_no].offset_slot_cache = offset_slot_cache;
> + def->parts[part_no].format_epoch = format_epoch;
> if (path != NULL) {
> assert(paths != NULL);
> def->parts[part_no].path = *paths;
> @@ -198,7 +201,8 @@ key_def_new(const struct key_part_def *parts, uint32_t part_count)
> uint32_t path_len = part->path != NULL ? strlen(part->path) : 0;
> key_def_set_part(def, i, part->fieldno, part->type,
> part->nullable_action, coll, part->coll_id,
> - part->sort_order, part->path, path_len, &paths);
> + part->sort_order, part->path, path_len, &paths,
> + TUPLE_OFFSET_SLOT_NIL, 0);
> }
> key_def_set_cmp(def);
> return def;
> @@ -251,7 +255,7 @@ box_key_def_new(uint32_t *fields, uint32_t *types, uint32_t part_count)
> (enum field_type)types[item],
> ON_CONFLICT_ACTION_DEFAULT,
> NULL, COLL_NONE, SORT_ORDER_ASC, NULL, 0,
> - NULL);
> + NULL, TUPLE_OFFSET_SLOT_NIL, 0);
> }
> key_def_set_cmp(key_def);
> return key_def;
> @@ -676,7 +680,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
> key_def_set_part(new_def, pos++, part->fieldno, part->type,
> part->nullable_action, part->coll,
> part->coll_id, part->sort_order, part->path,
> - part->path_len, &paths);
> + part->path_len, &paths, part->offset_slot_cache,
> + part->format_epoch);
> }
>
> /* Set-append second key def's part to the new key def. */
> @@ -688,7 +693,8 @@ key_def_merge(const struct key_def *first, const struct key_def *second)
> key_def_set_part(new_def, pos++, part->fieldno, part->type,
> part->nullable_action, part->coll,
> part->coll_id, part->sort_order, part->path,
> - part->path_len, &paths);
> + part->path_len, &paths, part->offset_slot_cache,
> + part->format_epoch);
> }
> key_def_set_cmp(new_def);
> return new_def;
> diff --git a/src/box/key_def.h b/src/box/key_def.h
> index c6b7a8c74..54e3dca89 100644
> --- a/src/box/key_def.h
> +++ b/src/box/key_def.h
> @@ -96,6 +96,17 @@ struct key_part {
> char *path;
> /** The length of JSON path. */
> uint32_t path_len;
> + /** Cached format epoch. */
Not quite clear comment, please elaborate. May be
Epoch of the tuple format the offset slot cached in this
part is valid for, see tuple_format::epoch.
?
> + uint64_t format_epoch;
> + /**
> + * Cached value of the offset slot corresponding to
> + * the indexed field (tuple_field::offset_slot).
> + * Valid only if key_def::epoch equals the epoch of
> + * the tuple format. Updated in
> + * tuple_field_by_part_slowpath to always store the
This is nitpicking, but still please try to make the comments look
rectangular with no abrupt line breaks. Rephrase it if necessary or
make other lines a few characters longer or shorter.
In this case you could use tuple_field_by_part() instead of
tuple_field_by_part_slowpath() - it would still be clear, but the
comment would look better :-)
> + * offset corresponding to the last used tuple format.
> + */
> + int32_t offset_slot_cache;
> };
>
> struct key_def;
> diff --git a/src/box/tuple_format.c b/src/box/tuple_format.c
> index 401752654..9178c2722 100644
> --- a/src/box/tuple_format.c
> +++ b/src/box/tuple_format.c
> @@ -40,6 +40,7 @@ struct tuple_format **tuple_formats;
> static intptr_t recycled_format_ids = FORMAT_ID_NIL;
>
> static uint32_t formats_size = 0, formats_capacity = 0;
> +static uint64_t formats_epoch = 0;
>
> static struct tuple_field *
> tuple_field_new(void)
> @@ -599,6 +600,7 @@ tuple_format_new(struct tuple_format_vtab *vtab, struct key_def * const *keys,
> format->vtab = *vtab;
> format->engine = NULL;
> format->is_temporary = false;
> + format->epoch = ++formats_epoch;
> if (tuple_format_register(format) < 0) {
> tuple_format_destroy(format);
> free(format);
> @@ -1105,6 +1107,7 @@ tuple_field_raw_by_path(struct tuple_format *format, const char *tuple,
> part.fieldno = fieldno;
> part.path = (char *)path + lexer.offset;
> part.path_len = path_len - lexer.offset;
> + part.format_epoch = 0;
> rc = tuple_field_by_part_raw_slowpath(format, tuple, field_map, &part,
> field);
> if (rc == 0)
> @@ -1131,6 +1134,13 @@ tuple_field_by_part_raw_slowpath(struct tuple_format *format, const char *data,
> int32_t offset_slot = field->offset_slot;
> assert(-offset_slot * sizeof(uint32_t) <=
> format->field_map_size);
> +
> + /* Update format epoch cache. */
> + assert(part->format_epoch != format->epoch);
> + assert(format->epoch != 0);
> + part->offset_slot_cache = offset_slot;
> + part->format_epoch = format->epoch;
> +
> *raw = field_map[offset_slot] == 0 ?
> NULL : data + field_map[offset_slot];
> return 0;
> diff --git a/src/box/tuple_format.h b/src/box/tuple_format.h
> index dd7cd147a..00540207c 100644
> --- a/src/box/tuple_format.h
> +++ b/src/box/tuple_format.h
> @@ -138,6 +138,12 @@ tuple_field_is_nullable(struct tuple_field *tuple_field)
> * Tuple format describes how tuple is stored and information about its fields
> */
> struct tuple_format {
> + /**
> + * Counter that grows incrementally on space rebuild
> + * used for caching offset slot in key_part, for more
> + * details see key_part::offset_slot_cache.
> + */
> + uint64_t epoch;
> /** Virtual function table */
> struct tuple_format_vtab vtab;
> /** Pointer to engine-specific data. */
> @@ -488,6 +494,12 @@ tuple_field_by_part_raw(struct tuple_format *format, const char *data,
> {
> if (likely(part->path == NULL)) {
> return tuple_field_raw(format, data, field_map, part->fieldno);
> + } else if (part->format_epoch == format->epoch) {
> + int32_t offset_slot = part->offset_slot_cache;
> + assert(-offset_slot * sizeof(uint32_t) <=
> + format->field_map_size);
> + return field_map[offset_slot] != 0 ?
> + data + field_map[offset_slot] : NULL;
I think that offset_slot_cache should be checked before part->path - as
it won't make tuple_field_by_part() less efficient for path-less fields.
Actually, quite the contrary - we will avoid an extra memory lookup in
tuple_format->fields if offset_slot_cache is up-to-date.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-01-10 11:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 8:29 [PATCH v7 0/5] box: Indexes by JSON path Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 1/5] box: introduce JSON Indexes Kirill Shcherbatov
2019-01-10 10:16 ` Vladimir Davydov
2019-01-09 8:29 ` [PATCH v7 2/5] box: introduce has_json_paths flag in templates Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 3/5] box: tune tuple_field_raw_by_path for indexed data Kirill Shcherbatov
2019-01-09 8:29 ` [PATCH v7 4/5] box: introduce offset_slot cache in key_part Kirill Shcherbatov
2019-01-10 11:28 ` Vladimir Davydov
2019-01-09 8:29 ` [PATCH v7 5/5] box: specify indexes in user-friendly form Kirill Shcherbatov
2019-01-10 10:21 ` Vladimir Davydov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox